Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API: Consider porting extension accessor from xarray #14781

Closed
chris-b1 opened this Issue Dec 1, 2016 · 7 comments

Comments

Projects
None yet
4 participants
@chris-b1
Copy link
Contributor

commented Dec 1, 2016

xarray provides a decorator to register a custom accessor on a Dataset - see docs here.

It's a bit of a blunt tool, in that it applies to all instances, but outside of that, I found it to be a nice alternative to subclassing or a composition class.

I haven't looked at the implementation, so not sure how hard it would be retrofit onto DataFrame

@shoyer - any thoughts or feedback from people using it?

@shoyer

This comment has been minimized.

Copy link
Member

commented Dec 1, 2016

This is pretty lightweight and self-contained -- would be easy to copy into pandas: https://github.com/pydata/xarray/blob/master/xarray/core/extensions.py

The only change in the works is switching to warning instead of erroring when overriding a pre-existing attribute: pydata/xarray#1149

@jbrockmendel

This comment has been minimized.

Copy link
Member

commented Jul 3, 2017

Following up on this instead of opening a new issue: are accessors considered user-facing? If so, is there a One True Way to put one together? The str accessor comes from mixing in strings.StringAccessorMixin, while the dt and cat accessors are constructed inside the Series class definition.

Related 1: are _make_dt_accessor and _make_cat_accessor needed in the Series namespace? Would those make sense to put into _dir_deletions?

Related 2: Right now ser.dt == ser.dt returns False. If an enterprising young soul were to make a PR implementing an __eq__ method, what class would that belong in?

@jreback

This comment has been minimized.

Copy link
Contributor

commented Jul 3, 2017

@jbrockmendel not sure what u are asking here

@jbrockmendel

This comment has been minimized.

Copy link
Member

commented Jul 3, 2017

a) Are users supposed to be able to implement custom accessors?
b) Is there a recipe for doing so?

The docstring example in @shoyer's suggestion is pretty similar to what I have in mind.

@chris-b1

This comment has been minimized.

Copy link
Contributor Author

commented Jul 3, 2017

No, the existing accessor machinery is not user facing, this issue was about potentially adding some that is.

I would be supportive of adding something like the xarray extension decorators, so PR would be welcome if you're interested in implementing something like that. I think you could basically copy that implementation, most of the work would be docs and tests.

@jbrockmendel

This comment has been minimized.

Copy link
Member

commented Jul 12, 2017

Back-migrating question #16890

Are [make_dt_accessor, _make_cat_accessor, _make_str_accesor] needed in the user-facing namespace for some reason? If not, can we wedge them into _dir_deletions? I know the leading underscore makes it private, but dir(pd.DataFrame) and dir(pd.Series) have 444 and 441 members respectively. This gets pretty imposing.

not sure what you are mean here. accessor are by-definition namespaces that make the discovery easier and much more geared to the dtype.

I communicated poorly by using _add_series_or_dataframe_operations as an example instead of the more-pertinent _make_foo_accessor. My claim is that we can have e.g. the dt accessor without _make_dt_accessor being in the Series namespace. Is there a reason it is needed in the namespace other than creating the accessor?

The "wedge into _dir_deletions" comment referred more generally to items that aren't so easy to get rid of entirely, but may be namespace-clutter. This is not all that pertinent to the accessor topic.

@shoyer

This comment has been minimized.

Copy link
Member

commented Jul 12, 2017

Is there a reason it is needed in the namespace other than creating the accessor?

I think these are basically one-time use methods. We're been somewhat sloppy about adding lots of private methods in the past, but since IPython exposes them in tab-completition there is indeed some virtue to cleaning things up later. Better still might be making these functions instead, so they never pollute the namespace in the first place.

TomAugspurger added a commit to TomAugspurger/pandas that referenced this issue Dec 18, 2017

ENH: Added public accessor registrar
Adds new methods for registing custom accessors to pandas objects.

This will be helpful for implementing pandas-dev#18767
outside of pandas.

Closes pandas-dev#14781

TomAugspurger added a commit to TomAugspurger/pandas that referenced this issue Dec 18, 2017

ENH: Added public accessor registrar
Adds new methods for registing custom accessors to pandas objects.

This will be helpful for implementing pandas-dev#18767
outside of pandas.

Closes pandas-dev#14781

TomAugspurger added a commit to TomAugspurger/pandas that referenced this issue Dec 18, 2017

ENH: Added public accessor registrar
Adds new methods for registing custom accessors to pandas objects.

This will be helpful for implementing pandas-dev#18767
outside of pandas.

Closes pandas-dev#14781

TomAugspurger added a commit to TomAugspurger/pandas that referenced this issue Dec 18, 2017

ENH: Added public accessor registrar
Adds new methods for registing custom accessors to pandas objects.

This will be helpful for implementing pandas-dev#18767
outside of pandas.

Closes pandas-dev#14781

TomAugspurger added a commit to TomAugspurger/pandas that referenced this issue Dec 18, 2017

ENH: Added public accessor registrar
Adds new methods for registing custom accessors to pandas objects.

This will be helpful for implementing pandas-dev#18767
outside of pandas.

Closes pandas-dev#14781

TomAugspurger added a commit to TomAugspurger/pandas that referenced this issue Jan 8, 2018

ENH: Added public accessor registrar
Adds new methods for registing custom accessors to pandas objects.

This will be helpful for implementing pandas-dev#18767
outside of pandas.

Closes pandas-dev#14781

TomAugspurger added a commit to TomAugspurger/pandas that referenced this issue Jan 9, 2018

ENH: Added public accessor registrar
Adds new methods for registing custom accessors to pandas objects.

This will be helpful for implementing pandas-dev#18767
outside of pandas.

Closes pandas-dev#14781

@jreback jreback added this to the 0.23.0 milestone Jan 10, 2018

TomAugspurger added a commit that referenced this issue Jan 16, 2018

ENH: Added public accessor registrar (#18827)
* ENH: Added public accessor registrar

Adds new methods for registing custom accessors to pandas objects.

This will be helpful for implementing #18767
outside of pandas.

Closes #14781

* PEP8

* Moved to extensions

* More docs

* Fix see also

* DOC: Added whatsnew

* Move to api

* Update post review

* flake8

* Raise the underlying error instead of a RuntimeError

* str validate

* DOC: Moved to developer

* REF: Use public registrars for accessors

* cleanup

* Implemented optional caching

* Document cache

* Tests passing

* Use for plot

* Fix autodoc

* Fix the class instantiation

* Refactor again.

1. Removed optional caching
2. Refactored `Properties` to create the indexes it uses on demand
3. Moved accessor definitions to classes for clarity

* Fix API files

* Remove stale comment

* Tests pass

* DOC: some cleanup

* No need to assign doc

* Rename, shared docs

* Doc __new__

* Use UserWarning

* Update test
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.