Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regression in CategoricalIndex in 0.20rc1 #16115

Closed
bashtage opened this issue Apr 24, 2017 · 4 comments · Fixed by #16123
Closed

Regression in CategoricalIndex in 0.20rc1 #16115

bashtage opened this issue Apr 24, 2017 · 4 comments · Fixed by #16123
Labels
Bug Categorical Categorical Data Type Regression Functionality that used to work in a prior pandas version
Milestone

Comments

@bashtage
Copy link
Contributor

bashtage commented Apr 24, 2017

Code Sample, a copy-pastable example if possible

cats = pd.Categorical([pd.Timestamp('12-31-1999'),pd.Timestamp('12-31-2000')])
dummies = pd.get_dummies(cats)
dummies[[c for c in dummies.columns]]

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-7-4bc701ecdf50> in <module>()
----> 1 dummies[dummies.columns]

C:\anaconda\envs\py35-pandas-20\lib\site-packages\pandas\core\frame.py in __getitem__(self, key)
   2052         if isinstance(key, (Series, np.ndarray, Index, list)):
   2053             # either boolean or fancy integer index
-> 2054             return self._getitem_array(key)
   2055         elif isinstance(key, DataFrame):
   2056             return self._getitem_frame(key)

C:\anaconda\envs\py35-pandas-20\lib\site-packages\pandas\core\frame.py in _getitem_array(self, key)
   2096             return self.take(indexer, axis=0, convert=False)
   2097         else:
-> 2098             indexer = self.loc._convert_to_indexer(key, axis=1)
   2099             return self.take(indexer, axis=1, convert=True)
   2100

C:\anaconda\envs\py35-pandas-20\lib\site-packages\pandas\core\indexing.py in _convert_to_indexer(self, obj, axis, is_setter)
   1211                 # if it cannot handle
   1212                 indexer, objarr = labels._convert_listlike_indexer(
-> 1213                     obj, kind=self.name)
   1214                 if indexer is not None:
   1215                     return indexer

C:\anaconda\envs\py35-pandas-20\lib\site-packages\pandas\core\indexes\base.py in _convert_listlike_indexer(self, keyarr, kind)
   1384             keyarr = self._convert_arr_indexer(keyarr)
   1385
-> 1386         indexer = self._convert_list_indexer(keyarr, kind=kind)
   1387         return indexer, keyarr
   1388

C:\anaconda\envs\py35-pandas-20\lib\site-packages\pandas\core\indexes\category.py in _convert_list_indexer(self, keyarr, kind)
    508         if (indexer == -1).any():
    509             raise KeyError(
--> 510                 "a list-indexer must only "
    511                 "include values that are "
    512                 "in the categories")

KeyError: 'a list-indexer must only include values that are in the categories'

Problem description

There is no obvious change in get_dummies. The problem must be deeper in the indexing of a CateogricalIndex

Expected Output

The original dummy frame -- this is a trivial selection.

Output of pd.show_versions()

INSTALLED VERSIONS
------------------
commit: None
python: 3.5.3.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 58 Stepping 9, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None

pandas: 0.20.0rc1
pytest: 3.0.7
pip: 9.0.1
setuptools: 27.2.0
Cython: None
numpy: 1.12.1
scipy: 0.19.0
xarray: 0.9.3
IPython: 6.0.0
sphinx: None
patsy: 0.4.1
dateutil: 2.6.0
pytz: 2017.2
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 2.0.0
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: 0.999
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
pandas_gbq: None
pandas_datareader: None
@bashtage bashtage changed the title Regression in 0.20rc1 Regression in CategoricalIndex in 0.20rc1 Apr 24, 2017
@TomAugspurger TomAugspurger added Bug Categorical Categorical Data Type Regression Functionality that used to work in a prior pandas version labels Apr 24, 2017
@TomAugspurger TomAugspurger added this to the 0.20.0 milestone Apr 24, 2017
jreback added a commit to jreback/pandas that referenced this issue Apr 25, 2017
@jorisvandenbossche
Copy link
Member

There is another case that error as well:

In [16]: cats = pd.Categorical([pd.Timestamp('12-31-1999'),pd.Timestamp('12-31-2000')])

In [17]: s = pd.Series([1, 2], index=cats)

In [19]: s[cats[0]]
...
KeyError: Timestamp('1999-12-31 00:00:00')

In [20]: s.loc[cats[0]]
Out[20]: 1

In [21]: s.loc[[cats[0]]]
...
KeyError: 'a list-indexer must only include values that are in the categories'

although the __getitem__ one is not a regression I think (was already a bug in 0.19.2)

and with frames:

In [23]: df = pd.DataFrame([[1, 2], [3, 4]], columns=cats)

In [24]: df
Out[24]: 
   1999-12-31  2000-12-31
0           1           2
1           3           4

In [25]: df[cats[0]]
Out[25]: 
0    1
1    3
Name: 1999-12-31 00:00:00, dtype: int64

In [26]: df[[cats[0]]]
...
KeyError: 'a list-indexer must only include values that are in the categories'

In [27]: df.loc[:, cats[0]]
Out[27]: 
0    1
1    3
Name: 1999-12-31 00:00:00, dtype: int64

In [28]: df.loc[:, [cats[0]]]
...
KeyError: 'a list-indexer must only include values that are in the categories'

but those are related to the same issue.

@jreback
Copy link
Contributor

jreback commented Apr 25, 2017

@jorisvandenbossche I don't think we have an issue for this bug in scalar gettitem indexing with a CI? (your example)

@jorisvandenbossche
Copy link
Member

@jreback indeed: #16131

@jorisvandenbossche
Copy link
Member

Ah, I see you already added a fix to the PR :-)
So that PR can then also close that new issue

jreback added a commit to jreback/pandas that referenced this issue Apr 26, 2017
jreback added a commit that referenced this issue Apr 26, 2017
* REGR: Bug in indexing with a CategoricalIndex

closes #16115

* some cleaning

* BUG: scalar getitem with a CI

closes #16131
pcluo pushed a commit to pcluo/pandas that referenced this issue May 22, 2017
* REGR: Bug in indexing with a CategoricalIndex

closes pandas-dev#16115

* some cleaning

* BUG: scalar getitem with a CI

closes pandas-dev#16131
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Categorical Categorical Data Type Regression Functionality that used to work in a prior pandas version
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants