Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: ValueError: cannot convert float NaN to integer - on dataframe.reset_index() #35657

Closed
2 of 3 tasks
capelastegui opened this issue Aug 10, 2020 · 4 comments · Fixed by #35673
Closed
2 of 3 tasks
Labels
Duplicate Report Duplicate issue or pull request
Milestone

Comments

@capelastegui
Copy link

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • (optional) I have confirmed this bug exists on the master branch of pandas.


Code Sample

import pandas as pd
df = pd.DataFrame(
    dict(c1=[10.], c2=['a'], c3=pd.to_datetime('2020-01-01')))
# Triggering conditions: multiindex with date, empty dataframe

# Multiindex without date works
df.set_index(['c1', 'c2']).head(0).reset_index()

# Regular index with date also works
df.set_index(['c3']).head(0).reset_index()

# Multiindex with date crashes...
df.set_index(['c2', 'c3']).head(0).reset_index()
# >> ValueError: cannot convert float NaN to integer
# This used to work on pandas 1.0.3, but breaks on pandas 1.1.0

# Though the error doesn't trigger if the dataframe is empty before
# calling set_index()
df.head(0).set_index(['c2', 'c3']).reset_index()

# I originally observed the bug in a groupby call
df.head(0).groupby(['c2', 'c3'])[['c1']].sum().reset_index()
# >> ValueError: cannot convert float NaN to integer
# This used to work on pandas 1.0.3, but breaks on pandas 1.1.0

Problem description

On pandas 1.1.0, I'm getting a ValueError exception when calling dataframe.reset_index() under the following conditions:

  • Input dataframe is empty
  • Multiindex from multiple columns, at least one of which is a datetime

The exception message is ValueError: cannot convert float NaN to integer.

Error trace:

Error
Traceback (most recent call last):
    df_out.reset_index()
  File "/Users/pec21/PycharmProjects/anp_voice_report/virtual/lib/python3.6/site-packages/pandas/core/frame.py", line 4848, in reset_index
    level_values = _maybe_casted_values(lev, lab)
  File "/Users/pec21/PycharmProjects/anp_voice_report/virtual/lib/python3.6/site-packages/pandas/core/frame.py", line 4782, in _maybe_casted_values
    fill_value, len(mask), dtype
  File "/Users/pec21/PycharmProjects/anp_voice_report/virtual/lib/python3.6/site-packages/pandas/core/dtypes/cast.py", line 1554, in construct_1d_arraylike_from_scalar
    subarr.fill(value)
ValueError: cannot convert float NaN to integer

This error didn't happen on pandas 1.0.3 and earlier. I haven't tested any intermediate releases, nor the master branch.

Expected Output

No exception is raised, returns an empty dataframe.

Output of pd.show_versions()

INSTALLED VERSIONS

commit : d9fff27
python : 3.6.6.final.0
python-bits : 64
OS : Darwin
OS-release : 18.6.0
Version : Darwin Kernel Version 18.6.0: Thu Apr 25 23:16:27 PDT 2019; root:xnu-4903.261.4~2/RELEASE_X86_64
machine : x86_64
processor : i386
byteorder : little
LC_ALL : None
LANG : None
LOCALE : en_GB.UTF-8
pandas : 1.1.0
numpy : 1.17.4
pytz : 2019.3
dateutil : 2.8.1
pip : 20.0.2
setuptools : 49.1.0
Cython : None
pytest : 5.3.4
hypothesis : None
sphinx : 2.3.1
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.4.2
html5lib : None
pymysql : None
psycopg2 : 2.8.4 (dt dec pq3 ext lo64)
jinja2 : 2.10.3
IPython : None
pandas_datareader: None
bs4 : 4.8.1
bottleneck : None
fsspec : None
fastparquet : None
gcsfs : None
matplotlib : None
numexpr : None
odfpy : None
openpyxl : 3.0.2
pandas_gbq : None
pyarrow : 0.15.1
pytables : None
pyxlsb : None
s3fs : None
scipy : 1.3.3
sqlalchemy : 1.3.11
tables : None
tabulate : None
xarray : None
xlrd : 1.2.0
xlwt : None
numba : None

@capelastegui capelastegui added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Aug 10, 2020
@jorisvandenbossche jorisvandenbossche added Regression Functionality that used to work in a prior pandas version Needs Triage Issue that has not been reviewed by a pandas team member and removed Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Aug 11, 2020
@jorisvandenbossche jorisvandenbossche added this to the 1.1.1 milestone Aug 11, 2020
@simonjayhawkins
Copy link
Member

Thanks @capelastegui for the report. This is a duplicate of #35606

@simonjayhawkins simonjayhawkins added Duplicate Report Duplicate issue or pull request and removed Needs Triage Issue that has not been reviewed by a pandas team member Regression Functionality that used to work in a prior pandas version labels Aug 11, 2020
@capelastegui
Copy link
Author

Sorry about the duplicate! I had looked at that issue, and thought that it was a similar error, but not quite the same.

@simonjayhawkins
Copy link
Member

Sorry about the duplicate! I had looked at that issue, and thought that it was a similar error, but not quite the same.

the second failing case maybe different, does not involve datetime so will reopen for now.

>>> # Multiindex with date crashes...
>>> res = df.set_index(['c2', 'c3']).head(0)
>>> res
Empty DataFrame
Columns: [c1]
Index: []
>>>
>>> res.index
MultiIndex([], names=['c2', 'c3'])
>>>
>>> res.index.levels, res.index.codes
(FrozenList([['a'], [2020-01-01 00:00:00]]), FrozenList([[], []]))
>>>
>>> res.reset_index()
Traceback (most recent call last):
...
ValueError: cannot convert float NaN to integer
>>>
>>> # I originally observed the bug in a groupby call
>>> res = df.head(0).groupby(['c2', 'c3'])[['c1']].sum()
>>> res
Empty DataFrame
Columns: [c1]
Index: []
>>> # >> ValueError: cannot convert float NaN to integer
>>> # This used to work on pandas 1.0.3, but breaks on pandas 1.1.0
>>>
>>> res.index
MultiIndex([], names=['c2', 'c3'])
>>>
>>> res.index.levels, res.index.codes
(FrozenList([[], []]), FrozenList([[], []]))
>>>
>>> res.reset_index()
Traceback (most recent call last):
...
ValueError: cannot convert float NaN to integer
>>>
>>>
>>>

@simonjayhawkins
Copy link
Member

the second failing case maybe different, does not involve datetime so will reopen for now.

depends in the dtype which is datetime64 so constructing a mutliindex with empty levels and codes does not recreate issue.

PR with fix shortly

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Duplicate Report Duplicate issue or pull request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants