Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Add the moment function as DataFrame and Series method WITH namespacing #10702

Closed
wants to merge 1 commit into from

Conversation

Konubinix
Copy link

This makes the usage of those methods more object oriented

Also, delete some trailing whitespaces.

@jreback
Copy link
Contributor

jreback commented Jul 30, 2015

ok, this looks reasonable.

  • I would like to have some smoke tests along the lines of this.
  • pls add to the API.rst as well (maybe have to do that somewhat manually).
  • Further I think that computation.rst could be modified to use this syntax.
  • finally would need a release note showing these changes

@jreback jreback added Docs API Design Numeric Operations Arithmetic, Comparison, and Logical operations labels Jul 30, 2015
@jreback jreback added this to the 0.17.0 milestone Jul 30, 2015
@jorisvandenbossche
Copy link
Member

While in principle I agree that it would be consistent to have these on the objects, I am just not sure if I find it a good idea to add yet another 37 methods at once to DataFrame/Series.

@TomAugspurger
Copy link
Contributor

I share @jorisvandenbossche's concern. We could do it in a .rolling and .expanding namespace. df.rolling.mean(periods=2), but I don't have a strong preference for that over where they are now.

@jreback
Copy link
Contributor

jreback commented Jul 30, 2015

ok, these are good points. Not real strong on this, so tabling for now, @TomAugspurger of namespacing makes more sense.

@jreback jreback modified the milestones: Next Major Release, 0.17.0 Jul 30, 2015
@jreback jreback changed the title ENH: Add the moment function as DataFrame and Series method ENH: Add the moment function as DataFrame and Series method WITH namespacing Jul 30, 2015
@jreback jreback added the Needs Discussion Requires discussion from core team before further action label Jul 30, 2015
@Konubinix
Copy link
Author

I like the idea of adding a namespace to avoid adding 37 names into DataFrame.

IMHO, there are two clean ways of doing this:

  1. create on the fly the rolling and expanding namespaces in DataFrame, something close to what is done here https://github.com/dalejung/trtools/blob/master/trtools/core/rolling.py (though I must admit I did not take the time to understand in detail this solution)
  2. change the moments library to gather all the rolling methods in a rolling namespace and all the expanding methods in an expanding namespace

I prefer the solution 2 since it would also make the moments library cleaner.

I am not very comfortable with pandas' code yet. Could you tell me whether one of those ideas could be acceptable or if another solution could be preferable?

@Konubinix
Copy link
Author

I gathered the rolling and expanding methods into separate namespaces and added them to DataFrame and Series

This way, the dataframes methods are available via: DataFrame.rolling.,
DataFrame.expanding.
, DataFrame.ewma. Same for Series.

Considering the ewm* function, only 8 objects are added to the DataFrames. I did
not not whether it was a good idea to put the ewm* method in a separate
namespace.

@Konubinix
Copy link
Author

Hmm, the last implementation just does not work because df.rolling.mean provides the rolling object to the rolling_mean method as self argument instead of the dataframe.

In order to use the rolling_ and expanding_ methods as is but still aggregate them into separate namespaces, I don't see an easy solution.

https://github.com/dalejung/trtools/blob/master/trtools/monkey/__init__.py provides a way to do so, but it looks a bit complicated to do a so simple stuff.

@jorisvandenbossche
Copy link
Member

@Konubinix We have some machinery for this (which is also used by dt/cat/str). Look for AccessorProperty in core/base.py.

But maybe wait a moment to first let some other people react and settle the discussion on which we actually want (as methods, as namespace methods or leave as is) before you put too much work in it.

@jreback
Copy link
Contributor

jreback commented Jul 31, 2015

.rolling, .expaning, .ewma make name spaces for Series make sense, though these only should show up for int/float dtypes (similar to how .str/.cat/.dt work).

For DataFrame this is much tricker. You only really would want them to be visible for a single dtype frame (of int/float), otherwise they don't make sense.

@shoyer
Copy link
Member

shoyer commented Aug 2, 2015

I like the idea of namespace accessors for rolling and expanding aggregations. It has always seemed a little strange to me how these are functions instead of methods (like most things in pandas).

@Konubinix
Copy link
Author

I took into account your remarks and comments.
Now the methods are gathered into separate namespaces (ewm, rolling and expanding) and a check is made whether a Serie is of dtype int of float and a DataFrame contains at least one dtype int or float.

# Add all the methods to DataFrame and Series
import sys
thismodule = sys.modules[__name__]

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you are creating lots of machinery that can simply use PandasDelegate, see pandas/tseries.common.py for an example

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Konubinix
Copy link
Author

@jreback, thank you for the PandasDelegate pointer.

I checked into that possibility.

If I understood correctly, it consists in subclassing PandasDelegate with a
class that performs the computation for each delegated name. Then we have to
call the method class _add_delegate_accessors to indicate from which class to
copy the methods signatures and docstring.

I my situation, since I don't propose to copy some methods from one delegate
class, it looks not really well suited.

For instance, to allow to use df.rolling.mean, I would need to create an
intermediate class RollingMethods with the method mean (avg, count, ...) that
would wrap the rolling_mean (rolling_avg, rolling_count, ...) methods, then
create a delegator class RollingDelegator that would implement the abstract
method _delegate_method inherited from PandasDelegate.

Then, I would have to call RollingDelegator._add_delegate_accessors(delegate=RollingMethods, accessors=[...the list of rolling methods...], typ="method").

Finally, I would have to add to do the same thing as in the previous patch:
create an accessor for the rolling namespace that would check the dtypes and
return an instanciated RollingDelegator.

I would also need to do that for expanding and ewm.

I think I can do this, but it results in much more code that is not more
readable.

I may have understood something wrong. If so, could you please give me some hint on how to use PandasDelegate?

Thanks again for your help.

@Konubinix
Copy link
Author

I pushed a version making use of the PandasDelegate mechanism. I am not sure I use it correctly.

This makes the usage of those methods more object oriented.

In order not to pollute the DataFrame and Series namespaces, the rolling and
expanding methods were gathered into separate namespaces, using as far as
possible the dedicated PandasDelegate mechanism.

Also, delete some trailing whitespaces.
@shoyer
Copy link
Member

shoyer commented Aug 17, 2015

Just to point out another possible API here --

Instead of writing df.rolling.mean(periods=2), we might move the periods argument earlier: df.rolling(periods=2).mean(). df.rolling(periods=2) could return a Rolling object with a groupby like API (e.g., supporting iteration and automatic method dispatch).

Conceptually, this makes some sense -- rolling window operations are very similar to grouped operations with overlapping groups. One possible downside is that methods off an accessor property are more discoverable -- IPython can do direct auto-complete.

@max-sixty
Copy link
Contributor

@shoyer +1

@Konubinix
Copy link
Author

@shoyer, I like the idea. Nevertheless, it looks way more ambitious than this
pull request.

The purpose of this pull request is to make the functions in the moments
namespace available in the dataframe and series namespaces. It was based on the
observation that all those functions take a dataframe or serie as first argument
and hence are particularly designed to work like instance methods. It then does
no more that inspecting the moments namespace and make its methods accessible
from dataframe and series instances.

You are now suggesting to change the API. I agree with this idea, but I think it
should be the purpose of another issue that could be entitled "Refactor the
moments namespace".

"Can only use .{} accessor with floats or int values".format(name)
)
if isinstance(self, DataFrame) \
and not True in [ # check whether there is one dtype float or integer
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you not any(.....) more idiomatic

@jreback
Copy link
Contributor

jreback commented Aug 17, 2015

this need a test along the lines of this: https://github.com/pydata/pandas/blob/master/pandas/tests/test_series.py#L79

IOW you are testing that the __dir__ is correct (and that they execute the same as the pd.* functions.

I think your impl is fine. We can discuss doing a Rolling in another issue. Its a nice idea but is a pretty big expansion.

@shoyer
Copy link
Member

shoyer commented Aug 17, 2015

@Konubinix @jreback I think this change itself is a great idea, but I'm raising the issue of the alternate API for a few reasons --

  1. it's generally better to hold off API changes if they would cause awkward compatibility issues with a different, superior API that we might want to use in the future (that isn't really the case here since .rolling as an accessor and as a method can co-exist).
  2. from a marketing perspective, it's awkward to make APIs that may only live for a few versions
  3. it might not actually be that hard to make a basic Rolling object. I would certainly rather have a Rolling implementation missing some features we'd like for the final version (e.g., with only support for aggregation for now) than a new API that we might immediately deprecate.

@jreback jreback modified the milestones: 0.17.1, Next Major Release Nov 14, 2015
@jreback jreback modified the milestones: 0.18.0, 0.17.1 Nov 15, 2015
@jreback
Copy link
Contributor

jreback commented Nov 15, 2015

closing in favor of #11603

@jreback jreback closed this Nov 15, 2015
jreback added a commit to jreback/pandas that referenced this pull request Dec 19, 2015
jreback added a commit that referenced this pull request Dec 19, 2015
API: provide Rolling/Expanding/EWM objects for deferred rolling type calculations #10702
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Design Docs Needs Discussion Requires discussion from core team before further action Numeric Operations Arithmetic, Comparison, and Logical operations
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants