Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Index.str.partition not nan-safe #23558

Closed
h-vetinari opened this issue Nov 8, 2018 · 2 comments · Fixed by #23618
Closed

BUG: Index.str.partition not nan-safe #23558

h-vetinari opened this issue Nov 8, 2018 · 2 comments · Fixed by #23618
Labels
Bug Indexing Related to indexing on series/frames, not to indexes themselves Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Strings String extension data type and string data
Milestone

Comments

@h-vetinari
Copy link
Contributor

h-vetinari commented Nov 8, 2018

While working on #23167, I found a corner case where Index.str.partition and Index.str.rpartition break in the presence of NaNs. I do not believe this is intentional (and it's not mentioned in the docs):

>>> import pandas as pd
>>> pd.Index(['a', 'b', 'c']).str.partition(' ')  # works
MultiIndex(levels=[['a', 'b', 'c'], [''], ['']],
           labels=[[0, 1, 2], [0, 0, 0], [0, 0, 0]])
>>>
>>> pd.Index(['a', np.nan, 'c']).str.partition(' ')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\ProgramData\Miniconda3\envs\pandas-dev\lib\site-packages\pandas\core\strings.py", line 2391, in partition
    return self._wrap_result(result, expand=expand)
  File "C:\ProgramData\Miniconda3\envs\pandas-dev\lib\site-packages\pandas\core\strings.py", line 2014, in _wrap_result
    out = MultiIndex.from_tuples(result, names=name)
  File "C:\ProgramData\Miniconda3\envs\pandas-dev\lib\site-packages\pandas\core\indexes\multi.py", line 1326, in from_tuples
    arrays = list(lib.to_object_array_tuples(tuples).T)
  File "pandas/_libs/src\inference.pyx", line 1559, in pandas._libs.lib.to_object_array_tuples
TypeError: object of type 'float' has no len()
@TomAugspurger TomAugspurger added Bug Strings String extension data type and string data labels Nov 8, 2018
@TomAugspurger TomAugspurger added this to the Contributions Welcome milestone Nov 8, 2018
@h-vetinari h-vetinari changed the title BUG: .str.partition not nan-safe BUG: Index.str.partition not nan-safe Nov 8, 2018
@h-vetinari
Copy link
Contributor Author

First off, forgot to mention in the OP (now edited) that the problem appears only for Index.

The solution is also to be found there, because the failure stems from trying to create a MultiIndex from a list of tuples containing NaNs:

>>> pd.MultiIndex.from_tuples([('a', 'b', 'c'), np.nan, ('d', '', '')])
[...]
TypeError: object of type 'float' has no len()

However, it works easily when passing a tuple of NaNs

>>> pd.MultiIndex.from_tuples([('a', 'b', 'c'), (np.nan,) * 3, ('d', '', '')])
MultiIndex(levels=[['a', 'd'], ['', 'b'], ['', 'c']],
           labels=[[0, -1, 1], [1, -1, 0], [1, -1, 0]])

Opened #23578 for that.

meiermark added a commit to meiermark/pandas that referenced this issue Nov 10, 2018
@gfyoung gfyoung added Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Indexing Related to indexing on series/frames, not to indexes themselves labels Nov 11, 2018
meiermark added a commit to meiermark/pandas that referenced this issue Nov 11, 2018
h-vetinari added a commit to h-vetinari/pandas that referenced this issue Nov 11, 2018
@jreback jreback modified the milestones: Contributions Welcome, 0.24.0 Nov 11, 2018
@toobaz
Copy link
Member

toobaz commented Nov 13, 2018

However, it works easily when passing a tuple of NaNs

I already commented in #23578 , but I think this bug should be solved by just passing a tuple of NaNs, indeed.

thoo added a commit to thoo/pandas that referenced this issue Nov 19, 2018
…fixed

* upstream/master: (46 commits)
  DEPS: bump xlrd min version to 1.0.0 (pandas-dev#23774)
  BUG: Don't warn if default conflicts with dialect (pandas-dev#23775)
  BUG: Fixing memory leaks in read_csv (pandas-dev#23072)
  TST: Extend datetime64 arith tests to array classes, fix several broken cases (pandas-dev#23771)
  STYLE: Specify bare exceptions in pandas/tests (pandas-dev#23370)
  ENH: between_time, at_time accept axis parameter (pandas-dev#21799)
  PERF: Use is_utc check to improve performance of dateutil UTC in DatetimeIndex methods (pandas-dev#23772)
  CLN: io/formats/html.py: refactor (pandas-dev#22726)
  API: Make Categorical.searchsorted returns a scalar when supplied a scalar (pandas-dev#23466)
  TST: Add test case for GH14080 for overflow exception (pandas-dev#23762)
  BUG: Don't extract header names if none specified (pandas-dev#23703)
  BUG: Index.str.partition not nan-safe (pandas-dev#23558) (pandas-dev#23618)
  DEPR: tz_convert in the Timestamp constructor (pandas-dev#23621)
  PERF: Datetime/Timestamp.normalize for timezone naive datetimes (pandas-dev#23634)
  TST: Use new arithmetic fixtures, parametrize many more tests (pandas-dev#23757)
  REF/TST: Add more pytest idiom to parsers tests (pandas-dev#23761)
  DOC: Add ignore-deprecate argument to validate_docstrings.py (pandas-dev#23650)
  ENH: update pandas-gbq to 0.8.0, adds credentials arg (pandas-dev#23662)
  DOC: Improve error message to show correct order (pandas-dev#23652)
  ENH: Improve error message for empty object array (pandas-dev#23718)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Indexing Related to indexing on series/frames, not to indexes themselves Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Strings String extension data type and string data
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants