API: add top-level functions as method #12640

Open
jreback opened this Issue Mar 16, 2016 · 7 comments

Comments

Projects
None yet
3 participants
Contributor

jreback commented Mar 16, 2016 edited

xref pydata#12578

so its pretty easy to add (and eventually deprecate) certain top-level functions that could/should be methods. Promotes a cleaner syntax and method chaining.

In 0.18.1 would propose adding these, deprecating in 0.19.0.

This would eliminate the ability to use directly with np.ndarrays, but I view this is as a positive (note we effectively did this for the .rolling/expanding/ewm with np.ndarrays as well, though deprecated for a while).

  • pd.crosstab (DataFrame)
  • pd.melt (DataFrame) (PR #15521); added to DataFrame
  • pd.get_dummies (both)
  • pd.cut/qcut (only on Series)

jreback added this to the 0.18.1 milestone Mar 16, 2016

DaveBackus referenced this issue in DaveBackus/Data_Bootcamp_Book Apr 13, 2016

Open

cleaning #25

@jreback jreback modified the milestone: Next Major Release, 0.18.1 Apr 26, 2016

From the above list, pivot_table is already a method, so I think that one can be removed.

Further, I don't think I find it necessary to deprecate those. Those are frequently used functions, so not sure it is worth deprecating them (which does not mean some of them would be valuable as a method as well).

Contributor

ResidentMario commented Feb 25, 2017

@jreback I'm interested in taking a stab at this, but since I've never toyed with pandas before I wanted to first ask what your opinion is as to implementation?

The easiest way to port these to methods would be to simply thread a call to the globals from a method (akin to what is done in Series.from_csv/Series.to_csv). The advantage of this approach is that it's easy to do and avoids code refactoring. The disadvantage is that you would have to add a handful of additional dependencies (e.g. from pandas.tools.tile import (cut, qcut)) to series.py and dataframe.py, which I assume would slow down module initialization. Perhaps these imports could be made in the method body? Seems suboptimal...

As I see it, the best option would be to instead move the core of each of the method bodies into a core module (common.py?), then simplify the old global and new class methods into wrappers on those. This might allow object method implementations to use a (marginally?) faster code path, since less type-checking would be required. However, it would require a refactor.

Contributor

jreback commented Feb 25, 2017

@ResidentMario conceptually this is very easy, (look at something like .unstack()) or this:

def melt(self, ....):
    from pandas.core.reshape import melt
    return melt(self, ....)

BUT I didn't put a doc-string. And you cannot import things directly at the top-level in core/frame.py. It wouldn't slow down initialization much, the issue is dependencies (e.g. these modules import DataFrame, so they can't be imported directly.

so here's a way to do this.

create DataFrame.melt (like above), but use a _shared_docs doc-string (and define it in core/frame.py). Then you can set the doc-string directly in pandas.core.reshape.melt directly.

I think this is a nice pattern that will work.

Contributor

ResidentMario commented Feb 26, 2017

Ok, great, glad I asked!

Contributor

ResidentMario commented Feb 26, 2017 edited

Line no. 186 in pivot.py in the current source:

DataFrame.pivot_table = pivot_table

git blame pandas/tools/pivot.py -L 186 shows that this is a very old Wes McKinney commit from circa 2011. Technically, this approach is correct, but it doesn't generate a reference in the docs. Do we want to refactor this into the new pattern?

Edit: actually it does generate a doc entry. Sphinx is more clever than I thought! So I think that pd.pivot_table can be marked "done" as-is.

Contributor

ResidentMario commented Feb 26, 2017

Technically speaking, if reusing the original docstring is OK, this pattern could be used for all of the other conversions as well, since the files they are defined in already import that Series and DataFrame names where needed.

However I think that the old functions ought to get links to the new methods in their docstrings, and vice versa, and that the new methods ought to additionally get a versionadded docstring. To do that, Jeff's pattern is required.

In the case of pivot_table, the alias is already ancient. Nevertheless, refactoring it using Jeff's pattern is an arguable documentation improvement, since DataFrame.pivot_table would now link to pd.pivot_table in the docs (and vice versa).

@jreback what do you think?

Contributor

jreback commented Feb 26, 2017

@ResidentMario yeah ok with having a standard way of doing this for doc-strings and such (so should fix pivot_table).

So for how to do this, Ideally establish the pattern doing 1 or 2 methods, then in additional PR's can do the rest. IOW, can tweek to get the pattern that we want (and maybe make helpers / decorators to assist first).

FYI, I have made a decorator e75eacb (see pandas.utils.decorator for doing something almost like this.

I am going to split this off and push to master shortly. (the decorator)

@jreback jreback added a commit that referenced this issue Apr 4, 2017

@ResidentMario @jreback ResidentMario + jreback API: add top-level melt function as method to DataFrame
xref #12640
xref #14876

Author: Aleksey Bilogur <aleksey.bilogur@gmail.com>

Closes #15521 from ResidentMario/12640 and squashes the following commits:

1657246 [Aleksey Bilogur] two doc changes
28a38f2 [Aleksey Bilogur] tweak whatsnew entry.
5f306a9 [Aleksey Bilogur] +whatsnew
ff895fe [Aleksey Bilogur] Add tests, update docs.
11f3fe4 [Aleksey Bilogur] rm stray debug.
3cbbed5 [Aleksey Bilogur] Melt docstring.
d54dc2f [Aleksey Bilogur] +pd.DataFrame.melt.
e50d397

@linebp linebp added a commit to linebp/pandas that referenced this issue Apr 17, 2017

@ResidentMario @linebp ResidentMario + linebp API: add top-level melt function as method to DataFrame
xref #12640
xref #14876

Author: Aleksey Bilogur <aleksey.bilogur@gmail.com>

Closes #15521 from ResidentMario/12640 and squashes the following commits:

1657246 [Aleksey Bilogur] two doc changes
28a38f2 [Aleksey Bilogur] tweak whatsnew entry.
5f306a9 [Aleksey Bilogur] +whatsnew
ff895fe [Aleksey Bilogur] Add tests, update docs.
11f3fe4 [Aleksey Bilogur] rm stray debug.
3cbbed5 [Aleksey Bilogur] Melt docstring.
d54dc2f [Aleksey Bilogur] +pd.DataFrame.melt.
6ea1216
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment