Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Categorical.take with fill_value #23296

Closed
TomAugspurger opened this issue Oct 23, 2018 · 2 comments

Comments

Projects
None yet
2 participants
@TomAugspurger
Copy link
Contributor

commented Oct 23, 2018

We need to translate the user-provided fill_value to the code for that category before taking.

In [1]: import pandas as pd

In [2]: c = pd.Categorical(['a', 'b', 'c'])

In [3]: c.take([0, 1, -1], fill_value='a', allow_fill=True)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-3-97f966c41cb2> in <module>
----> 1 c.take([0, 1, -1], fill_value='a', allow_fill=True)

~/sandbox/pandas/pandas/core/arrays/categorical.py in take_nd(self, indexer, allow_fill, fill_value)
   1806         codes = take(self._codes, indexer, allow_fill=allow_fill,
   1807                      fill_value=fill_value)
-> 1808         result = self._constructor(codes, dtype=self.dtype, fastpath=True)
   1809         return result
   1810

~/sandbox/pandas/pandas/core/arrays/categorical.py in __init__(self, values, categories, ordered, dtype, fastpath)
    371
    372         if fastpath:
--> 373             self._codes = coerce_indexer_dtype(values, categories)
    374             self._dtype = self._dtype.update_dtype(dtype)
    375             return

~/sandbox/pandas/pandas/core/dtypes/cast.py in coerce_indexer_dtype(indexer, categories)
    603     length = len(categories)
    604     if length < _int8_max:
--> 605         return ensure_int8(indexer)
    606     elif length < _int16_max:
    607         return ensure_int16(indexer)

~/sandbox/pandas/pandas/_libs/algos_common_helper.pxi in pandas._libs.algos.ensure_int8()
    413             return arr
    414         else:
--> 415             return arr.astype(np.int8, copy=copy)
    416     else:
    417         return np.array(arr, dtype=np.int8)

ValueError: invalid literal for int() with base 10: 'a'
@TomAugspurger

This comment has been minimized.

Copy link
Contributor Author

commented Oct 23, 2018

API discussion, which we've maybe had before, should we all fill_value that's outside of the original categories? i.e. should this be

In [2]: cat = pd.Categorical(['a', 'a', 'b'])

In [3]: cat.take([0, -1, -1], fill_value='d', allow_fill=True)
Out[3]:
[a, d, d]
Categories (3, object): [a, b, d]

or should it raise an error?

Right now, I think we should allow it, but I could see either way.

cc @jorisvandenbossche @jankatins.

@jorisvandenbossche

This comment has been minimized.

Copy link
Member

commented Oct 23, 2018

I would think that take should not alter the dtype, so I would not allow it (meaning, raising a TypeError).

TomAugspurger added a commit to TomAugspurger/pandas that referenced this issue Oct 23, 2018

TomAugspurger added a commit that referenced this issue Oct 23, 2018

Categorical take fill value (#23297)
* BUG: Handle fill_value in Categorical.take

Closes #23296

* no new categories

* revert add_categories

brute4s99 added a commit to brute4s99/pandas that referenced this issue Nov 19, 2018

Categorical take fill value (pandas-dev#23297)
* BUG: Handle fill_value in Categorical.take

Closes pandas-dev#23296

* no new categories

* revert add_categories

Pingviinituutti added a commit to Pingviinituutti/pandas that referenced this issue Feb 28, 2019

Categorical take fill value (pandas-dev#23297)
* BUG: Handle fill_value in Categorical.take

Closes pandas-dev#23296

* no new categories

* revert add_categories
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.