Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Indexing into a series with multi-index containing periods raises an exception #22803

Closed
PatrickDRusk opened this issue Sep 21, 2018 · 2 comments · Fixed by #23540
Closed

Indexing into a series with multi-index containing periods raises an exception #22803

PatrickDRusk opened this issue Sep 21, 2018 · 2 comments · Fixed by #23540
Labels
Bug Indexing Related to indexing on series/frames, not to indexes themselves Period Period data type
Milestone

Comments

@PatrickDRusk
Copy link

Code Sample, a copy-pastable example if possible

import pandas
tuples = [(pandas.Period('2018-01-01', freq='D'), 1)]
index = pandas.MultiIndex.from_tuples(tuples)
s = pandas.Series([1.0], index=index)
print(s[~s.isnull()])

Problem description

Running the code above causes an exception with a long traceback:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.Int64HashTable.get_item()

TypeError: an integer is required

During handling of the above exception, another exception occurred:

KeyError                                  Traceback (most recent call last)
~/.virtualenvs/shackleton3/lib/python3.7/site-packages/pandas/core/indexes/period.py in get_loc(self, key, method, tolerance)
    881         try:
--> 882             return self._engine.get_loc(key)
    883         except KeyError:

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

KeyError: '__next__'

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
pandas/_libs/tslibs/parsing.pyx in pandas._libs.tslibs.parsing.parse_datetime_string_with_reso()

pandas/_libs/tslibs/parsing.pyx in pandas._libs.tslibs.parsing.dateutil_parse()

ValueError: Unknown datetime string format, unable to parse: __next__

During handling of the above exception, another exception occurred:

DateParseError                            Traceback (most recent call last)
<ipython-input-9-81179fad4d42> in <module>()
      3 index = pandas.MultiIndex.from_tuples(tuples)
      4 s = pandas.Series([1.0], index=index)
----> 5 print(s[~s.isnull()])

~/.virtualenvs/shackleton3/lib/python3.7/site-packages/pandas/core/series.py in __getitem__(self, key)
    802             raise
    803 
--> 804         if is_iterator(key):
    805             key = list(key)
    806 

~/.virtualenvs/shackleton3/lib/python3.7/site-packages/pandas/core/dtypes/inference.py in is_iterator(obj)
    153         # Python 3 generators have
    154         # __next__ instead of next
--> 155         return hasattr(obj, '__next__')
    156 
    157 

~/.virtualenvs/shackleton3/lib/python3.7/site-packages/pandas/core/generic.py in __getattr__(self, name)
   4372             return object.__getattribute__(self, name)
   4373         else:
-> 4374             if self._info_axis._can_hold_identifiers_and_holds_name(name):
   4375                 return self[name]
   4376             return object.__getattribute__(self, name)

~/.virtualenvs/shackleton3/lib/python3.7/site-packages/pandas/core/indexes/base.py in _can_hold_identifiers_and_holds_name(self, name)
   2109         """
   2110         if self.is_object() or self.is_categorical():
-> 2111             return name in self
   2112         return False
   2113 

~/.virtualenvs/shackleton3/lib/python3.7/site-packages/pandas/core/indexes/multi.py in __contains__(self, key)
    547         hash(key)
    548         try:
--> 549             self.get_loc(key)
    550             return True
    551         except (LookupError, TypeError):

~/.virtualenvs/shackleton3/lib/python3.7/site-packages/pandas/core/indexes/multi.py in get_loc(self, key, method)
   2235 
   2236         if not isinstance(key, tuple):
-> 2237             loc = self._get_level_indexer(key, level=0)
   2238 
   2239             # _get_level_indexer returns an empty slice if the key has

~/.virtualenvs/shackleton3/lib/python3.7/site-packages/pandas/core/indexes/multi.py in _get_level_indexer(self, key, level, indexer)
   2494         else:
   2495 
-> 2496             loc = level_index.get_loc(key)
   2497             if isinstance(loc, slice):
   2498                 return loc

~/.virtualenvs/shackleton3/lib/python3.7/site-packages/pandas/core/indexes/period.py in get_loc(self, key, method, tolerance)
    886 
    887             try:
--> 888                 asdt, parsed, reso = parse_time_string(key, self.freq)
    889                 key = asdt
    890             except TypeError:

pandas/_libs/tslibs/parsing.pyx in pandas._libs.tslibs.parsing.parse_time_string()

pandas/_libs/tslibs/parsing.pyx in pandas._libs.tslibs.parsing.parse_datetime_string_with_reso()

DateParseError: Unknown datetime string format, unable to parse: __next__

This appears to happen only if the series has a multi-index that contains periods as one of its levels. Datetimes won't cause it, for instance.

The s[~s.isnull()] is equivalent to s.dropna(), which does work.

Neither ~s.isnull() nor s[[True]] cause the exception by themselves. It appears to take the combination.

Expected Output

2018-01-01  1    1.0
dtype: float64

Output of pd.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.7.0.final.0 python-bits: 64 OS: Darwin OS-release: 17.7.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8

pandas: 0.23.4
pytest: 3.8.0
pip: 18.0
setuptools: 40.2.0
Cython: 0.28.5
numpy: 1.14.5
scipy: 1.1.0
pyarrow: None
xarray: None
IPython: 6.5.0
sphinx: None
patsy: 0.5.0
dateutil: 2.7.3
pytz: 2018.5
blosc: None
bottleneck: None
tables: 3.4.4
numexpr: 2.6.8
feather: None
matplotlib: 2.2.3
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: 1.0.1
sqlalchemy: 1.2.11
pymysql: 0.9.2
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

@gfyoung gfyoung added Indexing Related to indexing on series/frames, not to indexes themselves Period Period data type MultiIndex labels Sep 23, 2018
@gfyoung
Copy link
Member

gfyoung commented Sep 23, 2018

@PatrickDRusk : That does look a little odd indeed!

cc @toobaz @jreback

@gfyoung gfyoung added the Bug label Sep 23, 2018
toobaz added a commit to toobaz/pandas that referenced this issue Nov 7, 2018
toobaz added a commit to toobaz/pandas that referenced this issue Nov 7, 2018
@toobaz
Copy link
Member

toobaz commented Nov 7, 2018

Unrelated to MultiIndex, the problem is more general:

In [2]: hasattr(s, '__next__')
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
/home/nobackup/repo/pandas/pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
    129         try:
--> 130             return self.mapping.get_item(val)
    131         except (TypeError, ValueError):

/home/nobackup/repo/pandas/pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.Int64HashTable.get_item()
    921 
--> 922     cpdef get_item(self, int64_t val):
    923         cdef khiter_t k

TypeError: an integer is required

During handling of the above exception, another exception occurred:

KeyError                                  Traceback (most recent call last)
/home/nobackup/repo/pandas/pandas/core/indexes/period.py in get_loc(self, key, method, tolerance)
    703         try:
--> 704             return self._engine.get_loc(key)
    705         except KeyError:

/home/nobackup/repo/pandas/pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
    107 
--> 108     cpdef get_loc(self, object val):
    109         if is_definitely_invalid_key(val):

/home/nobackup/repo/pandas/pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
    131         except (TypeError, ValueError):
--> 132             raise KeyError(val)
    133 

KeyError: '__next__'

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
/home/nobackup/repo/pandas/pandas/_libs/tslibs/parsing.pyx in pandas._libs.tslibs.parsing.parse_datetime_string_with_reso()
    163     try:
--> 164         parsed, reso = dateutil_parse(date_string, _DEFAULT_DATETIME,
    165                                       dayfirst=dayfirst, yearfirst=yearfirst,

/home/nobackup/repo/pandas/pandas/_libs/tslibs/parsing.pyx in pandas._libs.tslibs.parsing.dateutil_parse()
    316         msg = "Unknown datetime string format, unable to parse: {timestr}"
--> 317         raise ValueError(msg.format(timestr=timestr))
    318 

ValueError: Unknown datetime string format, unable to parse: __next__

During handling of the above exception, another exception occurred:

DateParseError                            Traceback (most recent call last)
<ipython-input-2-c6c5c35d0556> in <module>()
----> 1 hasattr(s, '__next__')

/home/nobackup/repo/pandas/pandas/core/generic.py in __getattr__(self, name)
   4726             return object.__getattribute__(self, name)
   4727         else:
-> 4728             if self._info_axis._can_hold_identifiers_and_holds_name(name):
   4729                 return self[name]
   4730             return object.__getattribute__(self, name)

/home/nobackup/repo/pandas/pandas/core/indexes/base.py in _can_hold_identifiers_and_holds_name(self, name)
   2098         """
   2099         if self.is_object() or self.is_categorical():
-> 2100             return name in self
   2101         return False
   2102 

/home/nobackup/repo/pandas/pandas/core/indexes/multi.py in __contains__(self, key)
    572         hash(key)
    573         try:
--> 574             self.get_loc(key)
    575             return True
    576         except (LookupError, TypeError):

/home/nobackup/repo/pandas/pandas/core/indexes/multi.py in get_loc(self, key, method)
   2232 
   2233         if not isinstance(key, tuple):
-> 2234             loc = self._get_level_indexer(key, level=0)
   2235             return _maybe_to_slice(loc)
   2236 

/home/nobackup/repo/pandas/pandas/core/indexes/multi.py in _get_level_indexer(self, key, level, indexer)
   2486         else:
   2487 
-> 2488             code = level_index.get_loc(key)
   2489 
   2490             if level > 0 or self.lexsort_depth == 0:

/home/nobackup/repo/pandas/pandas/core/indexes/period.py in get_loc(self, key, method, tolerance)
    708 
    709             try:
--> 710                 asdt, parsed, reso = parse_time_string(key, self.freq)
    711                 key = asdt
    712             except TypeError:

/home/nobackup/repo/pandas/pandas/_libs/tslibs/parsing.pyx in pandas._libs.tslibs.parsing.parse_time_string()
    126         yearfirst = get_option("display.date_yearfirst")
    127 
--> 128     res = parse_datetime_string_with_reso(arg, freq=freq,
    129                                           dayfirst=dayfirst,
    130                                           yearfirst=yearfirst)

/home/nobackup/repo/pandas/pandas/_libs/tslibs/parsing.pyx in pandas._libs.tslibs.parsing.parse_datetime_string_with_reso()
    167     except Exception as e:
    168         # TODO: allow raise of errors within instead
--> 169         raise DateParseError(e)
    170     if parsed is None:
    171         raise DateParseError("Could not parse {dstr}".format(dstr=date_string))

DateParseError: Unknown datetime string format, unable to parse: __next__

Preparing a PR.

@toobaz toobaz removed the MultiIndex label Nov 7, 2018
toobaz added a commit to toobaz/pandas that referenced this issue Nov 7, 2018
@jreback jreback added this to the 0.24.0 milestone Nov 7, 2018
JustinZhengBC pushed a commit to JustinZhengBC/pandas that referenced this issue Nov 14, 2018
tm9k1 pushed a commit to tm9k1/pandas that referenced this issue Nov 19, 2018
Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this issue Feb 28, 2019
Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this issue Feb 28, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Indexing Related to indexing on series/frames, not to indexes themselves Period Period data type
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants