BUG: Index.str.partition not nan-safe #23558

h-vetinari · 2018-11-08T00:06:37Z

While working on #23167, I found a corner case where Index.str.partition and Index.str.rpartition break in the presence of NaNs. I do not believe this is intentional (and it's not mentioned in the docs):

>>> import pandas as pd
>>> pd.Index(['a', 'b', 'c']).str.partition(' ')  # works
MultiIndex(levels=[['a', 'b', 'c'], [''], ['']],
           labels=[[0, 1, 2], [0, 0, 0], [0, 0, 0]])
>>>
>>> pd.Index(['a', np.nan, 'c']).str.partition(' ')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\ProgramData\Miniconda3\envs\pandas-dev\lib\site-packages\pandas\core\strings.py", line 2391, in partition
    return self._wrap_result(result, expand=expand)
  File "C:\ProgramData\Miniconda3\envs\pandas-dev\lib\site-packages\pandas\core\strings.py", line 2014, in _wrap_result
    out = MultiIndex.from_tuples(result, names=name)
  File "C:\ProgramData\Miniconda3\envs\pandas-dev\lib\site-packages\pandas\core\indexes\multi.py", line 1326, in from_tuples
    arrays = list(lib.to_object_array_tuples(tuples).T)
  File "pandas/_libs/src\inference.pyx", line 1559, in pandas._libs.lib.to_object_array_tuples
TypeError: object of type 'float' has no len()

The text was updated successfully, but these errors were encountered:

h-vetinari · 2018-11-08T19:23:37Z

First off, forgot to mention in the OP (now edited) that the problem appears only for Index.

The solution is also to be found there, because the failure stems from trying to create a MultiIndex from a list of tuples containing NaNs:

>>> pd.MultiIndex.from_tuples([('a', 'b', 'c'), np.nan, ('d', '', '')])
[...]
TypeError: object of type 'float' has no len()

However, it works easily when passing a tuple of NaNs

>>> pd.MultiIndex.from_tuples([('a', 'b', 'c'), (np.nan,) * 3, ('d', '', '')])
MultiIndex(levels=[['a', 'd'], ['', 'b'], ['', 'c']],
           labels=[[0, -1, 1], [1, -1, 0], [1, -1, 0]])

Opened #23578 for that.

toobaz · 2018-11-13T00:13:10Z

However, it works easily when passing a tuple of NaNs

I already commented in #23578 , but I think this bug should be solved by just passing a tuple of NaNs, indeed.

…fixed * upstream/master: (46 commits) DEPS: bump xlrd min version to 1.0.0 (pandas-dev#23774) BUG: Don't warn if default conflicts with dialect (pandas-dev#23775) BUG: Fixing memory leaks in read_csv (pandas-dev#23072) TST: Extend datetime64 arith tests to array classes, fix several broken cases (pandas-dev#23771) STYLE: Specify bare exceptions in pandas/tests (pandas-dev#23370) ENH: between_time, at_time accept axis parameter (pandas-dev#21799) PERF: Use is_utc check to improve performance of dateutil UTC in DatetimeIndex methods (pandas-dev#23772) CLN: io/formats/html.py: refactor (pandas-dev#22726) API: Make Categorical.searchsorted returns a scalar when supplied a scalar (pandas-dev#23466) TST: Add test case for GH14080 for overflow exception (pandas-dev#23762) BUG: Don't extract header names if none specified (pandas-dev#23703) BUG: Index.str.partition not nan-safe (pandas-dev#23558) (pandas-dev#23618) DEPR: tz_convert in the Timestamp constructor (pandas-dev#23621) PERF: Datetime/Timestamp.normalize for timezone naive datetimes (pandas-dev#23634) TST: Use new arithmetic fixtures, parametrize many more tests (pandas-dev#23757) REF/TST: Add more pytest idiom to parsers tests (pandas-dev#23761) DOC: Add ignore-deprecate argument to validate_docstrings.py (pandas-dev#23650) ENH: update pandas-gbq to 0.8.0, adds credentials arg (pandas-dev#23662) DOC: Improve error message to show correct order (pandas-dev#23652) ENH: Improve error message for empty object array (pandas-dev#23718) ...

…23618)

TomAugspurger added Bug Strings String extension data type and string data labels Nov 8, 2018

TomAugspurger added this to the Contributions Welcome milestone Nov 8, 2018

TomAugspurger added Effort Medium labels Nov 8, 2018

h-vetinari changed the title ~~BUG: .str.partition not nan-safe~~ BUG: Index.str.partition not nan-safe Nov 8, 2018

h-vetinari mentioned this issue Nov 8, 2018

API/ERR/ENH: Allow MultiIndex.from_tuples to handle NaNs #23578

Closed

meiermark added a commit to meiermark/pandas that referenced this issue Nov 10, 2018

BUG: Index.str.partition not nan-safe (pandas-dev#23558)

deab820

meiermark mentioned this issue Nov 10, 2018

BUG: Index.str.partition not nan-safe (#23558) #23618

Merged

3 tasks

gfyoung added Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Indexing Related to indexing on series/frames, not to indexes themselves labels Nov 11, 2018

meiermark added a commit to meiermark/pandas that referenced this issue Nov 11, 2018

BUG: Index.str.partition not nan-safe (pandas-dev#23558)

10552b5

h-vetinari added a commit to h-vetinari/pandas that referenced this issue Nov 11, 2018

Partly solve pandas-dev#23558

10d4da0

jreback modified the milestones: Contributions Welcome, 0.24.0 Nov 11, 2018

h-vetinari mentioned this issue Nov 13, 2018

API/BUG: Index.str.split(expand=True) not nan-safe #23677

Closed

jreback closed this as completed in #23618 Nov 18, 2018

jreback pushed a commit that referenced this issue Nov 18, 2018

BUG: Index.str.partition not nan-safe (#23558) (#23618)

91d1c50

tm9k1 pushed a commit to tm9k1/pandas that referenced this issue Nov 19, 2018

BUG: Index.str.partition not nan-safe (pandas-dev#23558) (pandas-dev#…

d0151db

…23618)

Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this issue Feb 28, 2019

BUG: Index.str.partition not nan-safe (pandas-dev#23558) (pandas-dev#…

24cd841

…23618)

Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this issue Feb 28, 2019

BUG: Index.str.partition not nan-safe (pandas-dev#23558) (pandas-dev#…

c72f05a

…23618)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: Index.str.partition not nan-safe #23558

BUG: Index.str.partition not nan-safe #23558

h-vetinari commented Nov 8, 2018 •

edited

Loading

h-vetinari commented Nov 8, 2018

toobaz commented Nov 13, 2018

BUG: Index.str.partition not nan-safe #23558

BUG: Index.str.partition not nan-safe #23558

Comments

h-vetinari commented Nov 8, 2018 • edited Loading

h-vetinari commented Nov 8, 2018

toobaz commented Nov 13, 2018

h-vetinari commented Nov 8, 2018 •

edited

Loading