New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: astype('category', categories=...) failes on a series of categorical type #10696

Closed
jorisvandenbossche opened this Issue Jul 29, 2015 · 8 comments

Comments

Projects
None yet
3 participants
@jorisvandenbossche
Member

jorisvandenbossche commented Jul 29, 2015

s.astype('category', categories=['a', 'b', 'c']) fails when the series is already of Categorical dtype:

TypeError: _astype() got an unexpected keyword argument 'categories'

I am not sure if this should work (it would then be equivalent to set_categories?), but in any case the current error message is not informative:

In [49]: s = pd.Series(['a', 'b', 'a'])

In [50]: s
Out[50]:
0    a
1    b
2    a
dtype: object

In [51]: s.astype('category')
Out[51]:
0    a
1    b
2    a
dtype: category
Categories (2, object): [a, b]

In [52]: s.astype('category', categories=['a', 'b', 'c'])
Out[52]:
0    a
1    b
2    a
dtype: category
Categories (3, object): [a, b, c]

In [53]: scat = s.astype('category')

In [54]: scat.astype('category', categories=['a', 'b', 'c'])
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-54-f955e6286a85> in <module>()
----> 1 scat.astype('category', categories=['a', 'b', 'c'])

c:\users\vdbosscj\scipy\pandas-joris\pandas\core\generic.pyc in astype(self, dty
pe, copy, raise_on_error, **kwargs)
   2415
   2416         mgr = self._data.astype(
-> 2417             dtype=dtype, copy=copy, raise_on_error=raise_on_error, **kwa
rgs)
   2418         return self._constructor(mgr).__finalize__(self)
   2419

c:\users\vdbosscj\scipy\pandas-joris\pandas\core\internals.pyc in astype(self, d
type, **kwargs)
   2516
   2517     def astype(self, dtype, **kwargs):
-> 2518         return self.apply('astype', dtype=dtype, **kwargs)
   2519
   2520     def convert(self, **kwargs):

c:\users\vdbosscj\scipy\pandas-joris\pandas\core\internals.pyc in apply(self, f,
 axes, filter, do_integrity_check, **kwargs)
   2471                                                  copy=align_copy)
   2472
-> 2473             applied = getattr(b, f)(**kwargs)
   2474
   2475             if isinstance(applied, list):

c:\users\vdbosscj\scipy\pandas-joris\pandas\core\internals.pyc in astype(self, d
type, copy, raise_on_error, values, **kwargs)
    371     def astype(self, dtype, copy=False, raise_on_error=True, values=None
, **kwargs):
    372         return self._astype(dtype, copy=copy, raise_on_error=raise_on_er
ror,
--> 373                             values=values, **kwargs)
    374
    375     def _astype(self, dtype, copy=False, raise_on_error=True, values=Non
e,

TypeError: _astype() got an unexpected keyword argument 'categories'

@jreback

This comment has been minimized.

Contributor

jreback commented Jul 29, 2015

Actually I think can just blow away this entire function _astype, here: https://github.com/pydata/pandas/blob/master/pandas/core/internals.py#L1768

As the top-level is ok for this.

I think an astype to a different astype (with different categories) is ok, though not efficient

@jreback jreback modified the milestones: Next Major Release, 0.17.0 Aug 20, 2015

@jreback jreback added the Prio-medium label Aug 20, 2015

@jreback jreback modified the milestones: 0.18.1, Next Major Release Mar 12, 2016

@jreback jreback modified the milestones: 0.18.2, 0.18.1 Apr 26, 2016

@jorisvandenbossche jorisvandenbossche modified the milestones: Next Major Release, 0.19.0 Aug 21, 2016

has2k1 added a commit to has2k1/plotnine that referenced this issue Jun 20, 2017

Remove workaround for categoricals in stat_summary
The workaround was due to a bug in pandas,
pandas-dev/pandas#10409 that has been fixed.
When that was fixed upstream, the local fix led to another
bug, pandas-dev/pandas#10696!!
@Aylr

This comment has been minimized.

Aylr commented Oct 31, 2017

For users looking waiting for a fix, I'm using this inefficient hack of changing a category to an object, then immediately back to a category with new levels.:

 X[col] = X[col].astype(object).astype('category', categories=self.categorical_levels[col])
@jreback

This comment has been minimized.

Contributor

jreback commented Oct 31, 2017

this is fixed in 0.21.0; @Aylr would you like to put up a validation test?

In [21]: scat.astype(pd.api.types.CategoricalDtype(categories=['a', 'b', 'c']))
Out[21]: 
0    a
1    b
2    a
dtype: category
Categories (2, object): [a, b]

the original useage with passing categories should actually show the deprecation warning however

@jreback jreback modified the milestones: Next Major Release, 0.21.1 Oct 31, 2017

@Aylr

This comment has been minimized.

Aylr commented Nov 4, 2017

@jreback happy to write some tests. Should I create a new issue number? What branch should I submit a PR to?

@jreback

This comment has been minimized.

Contributor

jreback commented Nov 4, 2017

no need to create a new issue. submit to master (we will then back port to 0.21.1)

@jorisvandenbossche jorisvandenbossche modified the milestones: 0.21.1, 0.22.0 Nov 30, 2017

@jorisvandenbossche

This comment has been minimized.

Member

jorisvandenbossche commented Nov 30, 2017

This is actually not really fixed, as the resulting categories differ from the one specified in the dtype in astype, see #10696 (comment) (the output has cats [a, b], but ['a', 'b', 'c'] was specified).

I am not fully sure what is the best behaviour. But either the output categories should be conformed to the passed categories (like set_categories, I think is a sensible thing to do), or a more helpful error message should be raised.

(anyhow, since it is not a regression and already present for a longer time, removed from 0.21.1 milestone)

@jreback

This comment has been minimized.

Contributor

jreback commented Nov 30, 2017

@jorisvandenbossche this is not fixed at all

@jorisvandenbossche

This comment has been minimized.

Member

jorisvandenbossche commented Nov 30, 2017

@jorisvandenbossche this is not fixed at all

That's what I am saying?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment