Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Index.str.partition not nan-safe #23558

Closed
h-vetinari opened this issue Nov 8, 2018 · 2 comments

Comments

Projects
None yet
5 participants
@h-vetinari
Copy link
Contributor

commented Nov 8, 2018

While working on #23167, I found a corner case where Index.str.partition and Index.str.rpartition break in the presence of NaNs. I do not believe this is intentional (and it's not mentioned in the docs):

>>> import pandas as pd
>>> pd.Index(['a', 'b', 'c']).str.partition(' ')  # works
MultiIndex(levels=[['a', 'b', 'c'], [''], ['']],
           labels=[[0, 1, 2], [0, 0, 0], [0, 0, 0]])
>>>
>>> pd.Index(['a', np.nan, 'c']).str.partition(' ')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\ProgramData\Miniconda3\envs\pandas-dev\lib\site-packages\pandas\core\strings.py", line 2391, in partition
    return self._wrap_result(result, expand=expand)
  File "C:\ProgramData\Miniconda3\envs\pandas-dev\lib\site-packages\pandas\core\strings.py", line 2014, in _wrap_result
    out = MultiIndex.from_tuples(result, names=name)
  File "C:\ProgramData\Miniconda3\envs\pandas-dev\lib\site-packages\pandas\core\indexes\multi.py", line 1326, in from_tuples
    arrays = list(lib.to_object_array_tuples(tuples).T)
  File "pandas/_libs/src\inference.pyx", line 1559, in pandas._libs.lib.to_object_array_tuples
TypeError: object of type 'float' has no len()

@TomAugspurger TomAugspurger added this to the Contributions Welcome milestone Nov 8, 2018

@h-vetinari h-vetinari changed the title BUG: .str.partition not nan-safe BUG: Index.str.partition not nan-safe Nov 8, 2018

@h-vetinari

This comment has been minimized.

Copy link
Contributor Author

commented Nov 8, 2018

First off, forgot to mention in the OP (now edited) that the problem appears only for Index.

The solution is also to be found there, because the failure stems from trying to create a MultiIndex from a list of tuples containing NaNs:

>>> pd.MultiIndex.from_tuples([('a', 'b', 'c'), np.nan, ('d', '', '')])
[...]
TypeError: object of type 'float' has no len()

However, it works easily when passing a tuple of NaNs

>>> pd.MultiIndex.from_tuples([('a', 'b', 'c'), (np.nan,) * 3, ('d', '', '')])
MultiIndex(levels=[['a', 'd'], ['', 'b'], ['', 'c']],
           labels=[[0, -1, 1], [1, -1, 0], [1, -1, 0]])

Opened #23578 for that.

meiermark added a commit to meiermark/pandas that referenced this issue Nov 10, 2018

meiermark added a commit to meiermark/pandas that referenced this issue Nov 11, 2018

h-vetinari added a commit to h-vetinari/pandas that referenced this issue Nov 11, 2018

@jreback jreback modified the milestones: Contributions Welcome, 0.24.0 Nov 11, 2018

@toobaz

This comment has been minimized.

Copy link
Member

commented Nov 13, 2018

However, it works easily when passing a tuple of NaNs

I already commented in #23578 , but I think this bug should be solved by just passing a tuple of NaNs, indeed.

jreback added a commit that referenced this issue Nov 18, 2018

thoo added a commit to thoo/pandas that referenced this issue Nov 19, 2018

Merge remote-tracking branch 'upstream/master' into io_csv_docstring_…
…fixed

* upstream/master: (46 commits)
  DEPS: bump xlrd min version to 1.0.0 (pandas-dev#23774)
  BUG: Don't warn if default conflicts with dialect (pandas-dev#23775)
  BUG: Fixing memory leaks in read_csv (pandas-dev#23072)
  TST: Extend datetime64 arith tests to array classes, fix several broken cases (pandas-dev#23771)
  STYLE: Specify bare exceptions in pandas/tests (pandas-dev#23370)
  ENH: between_time, at_time accept axis parameter (pandas-dev#21799)
  PERF: Use is_utc check to improve performance of dateutil UTC in DatetimeIndex methods (pandas-dev#23772)
  CLN: io/formats/html.py: refactor (pandas-dev#22726)
  API: Make Categorical.searchsorted returns a scalar when supplied a scalar (pandas-dev#23466)
  TST: Add test case for GH14080 for overflow exception (pandas-dev#23762)
  BUG: Don't extract header names if none specified (pandas-dev#23703)
  BUG: Index.str.partition not nan-safe (pandas-dev#23558) (pandas-dev#23618)
  DEPR: tz_convert in the Timestamp constructor (pandas-dev#23621)
  PERF: Datetime/Timestamp.normalize for timezone naive datetimes (pandas-dev#23634)
  TST: Use new arithmetic fixtures, parametrize many more tests (pandas-dev#23757)
  REF/TST: Add more pytest idiom to parsers tests (pandas-dev#23761)
  DOC: Add ignore-deprecate argument to validate_docstrings.py (pandas-dev#23650)
  ENH: update pandas-gbq to 0.8.0, adds credentials arg (pandas-dev#23662)
  DOC: Improve error message to show correct order (pandas-dev#23652)
  ENH: Improve error message for empty object array (pandas-dev#23718)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.