New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent adding new attributes to the accessors .str, .dt and .cat #11575

Merged
merged 1 commit into from Nov 14, 2015

Conversation

Projects
None yet
4 participants
@jankatins
Contributor

jankatins commented Nov 11, 2015

assigning to Series.str, Series.dt, or Series.cat was not failing
although you couldn't get the value back:

In[10]: a = pandas.Series(pandas.Categorical(list("abc")))
In[11]: a.cat.labels = [1,2]
In[12]: a.cat.labels
Traceback (most recent call last):
  File "C:\portabel\miniconda\envs\pandas_dev\lib\site-packages\IPython\core\interactiveshell.py", line 2883, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-12-a68ee763e4e8>", line 1, in <module>
    a.cat.labels
AttributeError: 'CategoricalAccessor' object has no attribute 'labels'

Now we fail early:

In[10]: a = pandas.Series(pandas.Categorical(list("abc")))
In[11]: a.cat.labels = [1,2]
Traceback (most recent call last):
  File "C:\data\external\pandas\pandas\tests\test_categorical.py", line 3633, in test_cat_accessor_no_new_attributes
    c.cat.labels = [1,2]
  File "C:\data\external\pandas\pandas\core\base.py", line 121, in __setattr__
    raise AttributeError( "You cannot add any new attribute '{key}'".format(key=key))
AttributeError: You cannot add any new attribute 'labels'

Closes: #10673

@jankatins jankatins changed the title from Prevent adding new attributes to the accessors .str and .cat to Prevent adding new attributes to the accessors .str, .dt and .cat Nov 11, 2015

@max-sixty

This comment has been minimized.

Contributor

max-sixty commented Nov 11, 2015

I've had some pickling issues with doing this in the past, as __setattr__ gets called during unpickling, without __init__ being called.
Another way - slightly less elegant but lower blast radius - is to set a private variable _str and then have a property def str(self): ; return self._str. That'll also prevent setting of str.

@jreback

This comment has been minimized.

Contributor

jreback commented Nov 12, 2015

I agree here with @MaximilianR
this is already done in NDFrame using _accessors as a white-list with __getattr__, so should be straightforward, so I think would be ok to emulate in PandasDelegate (not that these allow sub-classes to override, e.g. Series does for .dt,.str.,.cat themselves.

@jankatins

This comment has been minimized.

Contributor

jankatins commented Nov 12, 2015

I'm not sure I can follow: as far as I can see, the accessors .str, .dt and .cat are not pickled (and they should not!). This is not trying to prevent s.str = something but s.str.someattr = something. The main benefit is that a user, who misspells an attribute, is not surprised. E.g. when using s.cat.labels instead of s.cat.categories.

Currently assigning (s.cat.labels = "aaa") works, but you can't get the value from this attribute back (print(s.cat.labels) raises an AttributeError). I've actually no idea where this behaviour comes from, the last should work, as the first (without this PR) uses the normal item assignment and the latter normal get (that's at least what the pycharm debugger lets me see).

@jankatins

This comment has been minimized.

Contributor

jankatins commented Nov 12, 2015

This needs a rebase if pulled after #11582...

@jreback

View changes

pandas/core/index.py Outdated
@@ -3106,6 +3106,11 @@ def _simple_new(cls, values, name=None, categories=None, ordered=None, **kwargs)
result._reset_identity()
return result
# PandasDelegate prevents adding any new attributes as that's what reasonable if used as
# accessor. But we want it again.

This comment has been minimized.

@jreback

jreback Nov 12, 2015

Contributor

this inherits from PandasObject so not necessary

This comment has been minimized.

@jankatins

jankatins Nov 13, 2015

Contributor

At least during my tests it was? It inherits from PandasDelegate:

class CategoricalIndex(Index, PandasDelegate):
@jreback

View changes

pandas/core/strings.py Outdated
@@ -1058,7 +1058,8 @@ class StringMethods(object):
"""
def __init__(self, series):
self.series = series
# can't use direct assignment as we prevent that in __set_item__
object.__setattr__(self, "series", series)

This comment has been minimized.

@jreback

jreback Nov 12, 2015

Contributor

try making this inherit from PandasDelegate rather than have the changes twice.

This comment has been minimized.

@jankatins

jankatins Nov 13, 2015

Contributor

StringMethods does not inherit from PandasDelegate, just from object:

class StringMethods(object):

Edit: I don't think letting it inherit from PandasDelegate will help here: the only needed part from PandasDelegate is the definition of __setattr__(self, key, value) , but the rest (the abstract getter/setter/forwarder would pollute the object, as we don't delegate anything to the underlying object but do all here in the StringMethod class/instance.

This comment has been minimized.

@jreback

jreback Nov 13, 2015

Contributor

so instead of all of this complication then
make a new mixim in base and just include it where needed

This comment has been minimized.

@jankatins

jankatins Nov 13, 2015

Contributor

ok, will do tomorrow

@jreback

View changes

pandas/core/strings.py Outdated
@@ -1067,6 +1068,13 @@ def __getitem__(self, key):
else:
return self.get(key)
# prevent adding any attribute via s.str.new_attribute = ...
def __setattr__(self, key, value):
if not (hasattr(self, key)):

This comment has been minimized.

@jreback

jreback Nov 12, 2015

Contributor

see above

@jankatins

View changes

pandas/tseries/common.py Outdated
@@ -62,9 +62,10 @@ def maybe_to_datetimelike(data, copy=False):
class Properties(PandasDelegate):

This comment has been minimized.

@jankatins

jankatins Nov 13, 2015

Contributor

This should actually be renamed to class AbstractTimerelatedDelegate(PandasDelegate): or BaseTimerelatedDelegate (and the rest too: DatetimeProperties -> DatetimeDelegate), as there is a AccessorProperty which only constructs the delegates... The same names (Property vs AcessorProperty made understanding what is happening here harder :-( )

@jankatins

This comment has been minimized.

Contributor

jankatins commented Nov 13, 2015

Ok, took me a while to find a place for the mixin: both core.common and core.base resulted in circular depencies due to the way the StringMethods are defined (str = ...(StringMethods,...) -> can't get the import to a function :-():

Traceback (most recent call last):
  File "C:\Program Files (x86)\JetBrains\PyCharm Community Edition 143.165\helpers\pycharm\utrunner.py", line 121, in <module>
    modules = [loadSource(a[0])]
  File "C:\Program Files (x86)\JetBrains\PyCharm Community Edition 143.165\helpers\pycharm\utrunner.py", line 41, in loadSource
    module = imp.load_source(moduleName, fileName)
  File "C:\data\external\pandas\pandas\tests\test_base.py", line 6, in <module>
    import pandas.compat as compat
  File "C:\data\external\pandas\pandas\__init__.py", line 42, in <module>
    import pandas.core.config_init
  File "C:\data\external\pandas\pandas\core\config_init.py", line 17, in <module>
    from pandas.core.format import detect_console_encoding
  File "C:\data\external\pandas\pandas\core\format.py", line 8, in <module>
    from pandas.core.base import PandasObject
  File "C:\data\external\pandas\pandas\core\base.py", line 10, in <module>
    from pandas.core.strings import StringMethods
  File "C:\data\external\pandas\pandas\core\strings.py", line 4, in <module>
    from pandas.core.base import NoNewAttributesMixin
ImportError: cannot import name NoNewAttributesMixin
@jreback

View changes

doc/source/whatsnew/v0.17.1.txt Outdated
@@ -25,6 +25,7 @@ Enhancements
objects for the ``filepath_or_buffer`` argument. (:issue:`11033`)
- ``DataFrame`` now uses the fields of a ``namedtuple`` as columns, if columns are not supplied (:issue:`11181`)
- Improve the error message displayed in :func:`pandas.io.gbq.to_gbq` when the DataFrame does not match the schema of the destination table (:issue:`11359`)
- Prevent adding new attributes to the accessors .str, .dt and .cat (:issue:`10673`)

This comment has been minimized.

@jreback

jreback Nov 13, 2015

Contributor

API change section

This comment has been minimized.

@jankatins

jankatins Nov 13, 2015

Contributor

I was actually thinking it should go to bugfix: currently you can't get the value back, so the concept is broken and we only make the error more explicit.

This comment has been minimized.

@jreback

jreback Nov 13, 2015

Contributor

ok (put the .str etal with backticks)

This comment has been minimized.

@jankatins

jankatins Nov 13, 2015

Contributor

done (to bug and added backticks)

@jreback

View changes

pandas/util/mixins.py Outdated
@@ -0,0 +1,28 @@
"""
Mixins which can't go to core.common or core.base due to circular imports
"""

This comment has been minimized.

@jreback

jreback Nov 13, 2015

Contributor

this should simply be right before PandasDelegate in core/base.py ; that is setup to be the base module for these types of things.

This comment has been minimized.

@jankatins

jankatins Nov 13, 2015

Contributor

It can't (Erorr: #11575 (comment)), it gets a circular import when it is imported in strings.py:

  • core/base.py imports StringMethods, which can't go to a function as it uses str = ...(StringMethods,...) directly in the class declaration.
  • core/strings.py would then need to import from core/base.py

Same with core/common.py

This comment has been minimized.

@jreback

jreback Nov 13, 2015

Contributor

the import for StringMethods needs to be inside the _make_str_accessor

This comment has been minimized.

@jankatins

jankatins Nov 13, 2015

Contributor

It can't because that would not work with str = AccessorProperty(StringMethods, _make_str_accessor) -> The AccessorProperty needs a handle to the real class to do this:

    def __get__(self, instance, owner=None):
        if instance is None:
            # this ensures that Series.str.<method> is well defined
            return self.accessor_cls
        return self.construct_accessor(instance)

This comment has been minimized.

@kawochen

kawochen Nov 13, 2015

Contributor
import pandas.core.strings as strings

use strings.StringMethods or, to avoid dict look-up, StringMethods = strings.StringMethods

This comment has been minimized.

@jreback

jreback Nov 13, 2015

Contributor

the point here is that the entire StringMethods infrastructure is wrong. It should use PandasDelegate and such; this was never fixed. This will also fix all of the import issues.

This comment has been minimized.

@jankatins

jankatins Nov 13, 2015

Contributor

But letting StringMethods use PandasDelegate will not fix this, as PandasDelegate is in the same dir as IndexOpsMixins.

I could try moving IndexOpsMixin into core/indexing.py -> Series/index -> loads indexing for IndexOpsMixin -> loads string for StringMethods -> loads base for NoNewAttributesMixin. But somehow, despite the name, it has little to do with indexing :-(

This comment has been minimized.

@jreback

jreback Nov 13, 2015

Contributor

no, don't move to indexing

This comment has been minimized.

@jankatins

jankatins Nov 13, 2015

Contributor

One could also split the str generation to it's own mixin in string.py and let both Series and index take that...

This comment has been minimized.

@jreback

jreback Nov 13, 2015

Contributor

that would be the best, the actuall accessors are added where they are defined.

@jankatins

This comment has been minimized.

Contributor

jankatins commented Nov 13, 2015

BTW: The Series and Indexes are kind of inconsistent:

  • Series only has accessors, so no datetime/string/categorical related functions directly exposed under Series but .dt, .cat, which are implemented as Delegate and .str, which is no Delegate
  • CategoricalIndex is a Delegate directly (-> CatIndex.codes works directly without CatIndex.cat.codes)
  • DatetimeIndex is not a Delegate, but exposes it's methods directly, too, via mixins
  • There is no StringIndex but Index.str.... exists (via IndexOpsMixin) for all Indexes?
@jreback

This comment has been minimized.

Contributor

jreback commented Nov 13, 2015

@JanSchulz

those are not inconsistent, merely implementations. DatetimeIndex, actually defines things (IOW it is the 'target' of the delegation). CategoricalIndex uses the Delegate (so its like Series in this respect). Index/Series are also users of .str.

@jankatins

This comment has been minimized.

Contributor

jankatins commented Nov 13, 2015

I disagree: Consistent would be either:

  • Series and Index would use accessors, which are delegates to the real data below (like Series.cat or to a wrapper like in Series.str / Series.dt (ok, preoblematic, because we currently don't have real methods on a string/datetime objects, only numpy/pandas functions. Or the DateTimeIndex, which is used like one...)
  • Series and Index would both have subclasses like DateTimeSeries (does not exist) or DateTimeIndex (does exist), which expose the API directly.
Prevent adding new attributes to the accessors .str, .dt and .cat
This commit is mostly for the benefit of users who misspell things
on the accessors.

Assigning to `Series.str`, `Series.dt`, or `Series.cat` was not failing
although you couldn't get the value back:

```
In[10]: a = pandas.Series(pandas.Categorical(list("abc")))
In[11]: a.cat.labels = [1,2]
In[12]: a.cat.labels
AttributeError: 'CategoricalAccessor' object has no attribute 'labels'
```

Now we fail early:

```
In[10]: a = pandas.Series(pandas.Categorical(list("abc")))
In[11]: a.cat.labels = [1,2]
AttributeError: You cannot add any new attribute 'labels'
```

Refactor/add a StringAccessorMixin to break a import cycle.
@jankatins

This comment has been minimized.

Contributor

jankatins commented Nov 13, 2015

So, move the string part to a new mixin in string.py.. If this is green, I think this should be ready to merge...

@jreback jreback added this to the 0.17.1 milestone Nov 13, 2015

@jreback

This comment has been minimized.

Contributor

jreback commented Nov 13, 2015

@JanSchulz looks good! ping on green.

@jankatins

This comment has been minimized.

Contributor

jankatins commented Nov 13, 2015

Seems travis is dead?!

Ok, I will rebase #11582 on top of this PR, to shortcut that process a bit...

jreback added a commit that referenced this pull request Nov 14, 2015

Merge pull request #11575 from JanSchulz/prevent_new_accessor_attributes
Prevent adding new attributes to the accessors .str, .dt and .cat

@jreback jreback merged commit 9cbc179 into pandas-dev:master Nov 14, 2015

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
@jreback

This comment has been minimized.

Contributor

jreback commented Nov 14, 2015

thanks @JanSchulz

go ahead and rebase #11582

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment