Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: ValueError: cannot convert float NaN to integer when resetting MultiIndex with NaT values #36541

Closed
3 tasks done
ssche opened this issue Sep 22, 2020 · 1 comment · Fixed by #36563
Closed
3 tasks done
Labels
Bug Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate MultiIndex
Milestone

Comments

@ssche
Copy link
Contributor

ssche commented Sep 22, 2020

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • (optional) I have confirmed this bug exists on the master branch of pandas.


Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.

Code Sample, a copy-pastable example

In [18]: ix = pd.MultiIndex.from_tuples([(pd.NaT, 1), (pd.NaT, 2)], names=['a', 'b'])

In [19]: ix
Out[19]: 
MultiIndex([('NaT', 1),
            ('NaT', 2)],
           names=['a', 'b'])

In [20]: d = pd.DataFrame({'x': [11, 12]}, index=ix)

In [21]: d
Out[21]: 
        x
a   b    
NaT 1  11
    2  12

In [22]: d.reset_index()
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-22-4653618060e8> in <module>
----> 1 d.reset_index()

~/envs/pandas-test/lib/python3.8/site-packages/pandas/core/frame.py in reset_index(self, level, drop, inplace, col_level, col_fill)
   4851                     name = tuple(name_lst)
   4852                 # to ndarray and maybe infer different dtype
-> 4853                 level_values = _maybe_casted_values(lev, lab)
   4854                 new_obj.insert(0, name, level_values)
   4855 

~/envs/pandas-test/lib/python3.8/site-packages/pandas/core/frame.py in _maybe_casted_values(index, labels)
   4784                     dtype = index.dtype
   4785                     fill_value = na_value_for_dtype(dtype)
-> 4786                     values = construct_1d_arraylike_from_scalar(
   4787                         fill_value, len(mask), dtype
   4788                     )

~/envs/pandas-test/lib/python3.8/site-packages/pandas/core/dtypes/cast.py in construct_1d_arraylike_from_scalar(value, length, dtype)
   1556 
   1557         subarr = np.empty(length, dtype=dtype)
-> 1558         subarr.fill(value)
   1559 
   1560     return subarr

ValueError: cannot convert float NaN to integer

Problem description

With the introduction and use of groupby(..., dropna=False) multiindex with NaT values are more likely to occur which exhibits a few issues that previously went undetected. This issue was discovered when finding a workaround for another dropna=False related issue (#36060 (comment))

Further investigation shows that this may be an issue with numpy not accepting pd.NaT. The following code reproduces the issue in construct_1d_arraylike_from_scalar:

In [33]: a = np.empty(2, dtype='datetime64[ns]')

In [34]: a.fill(pd.NaT)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-34-9fa9ff66da99> in <module>
----> 1 a.fill(pd.NaT)

ValueError: cannot convert float NaN to integer

which led me to propose this fix:

--- a/pandas/core/dtypes/cast.py
+++ b/pandas/core/dtypes/cast.py
@@ -1559,6 +1559,12 @@ def construct_1d_arraylike_from_scalar(
             dtype = np.dtype("object")
             if not isna(value):
                 value = ensure_str(value)
+        elif isinstance(dtype, np.dtype) and dtype.kind == "M" and value is NaT:
+            # can't fill sub array directly with pandas' NaT:
+            #
+            # > a.fill(pd.NaT)
+            # ValueError: cannot convert float NaN to integer
+            value = np.datetime64("NaT")
 
         subarr = np.empty(length, dtype=dtype)
         subarr.fill(value)

the value is NaT check could possibly be extended to isna(value)...

Expected Output

No ValueError being raised

Output of pd.show_versions()

INSTALLED VERSIONS

commit : 15539fa
python : 3.8.5.final.0
python-bits : 64
OS : Darwin
OS-release : 19.6.0
Version : Darwin Kernel Version 19.6.0: Thu Jun 18 20:49:00 PDT 2020; root:xnu-6153.141.1~1/RELEASE_X86_64
machine : x86_64
processor : i386
byteorder : little
LC_ALL : None
LANG : None
LOCALE : en_AU.UTF-8

pandas : 1.2.0.dev0+453.g15539fa62.dirty
numpy : 1.19.2
pytz : 2020.1
dateutil : 2.8.1
pip : 20.1.1
setuptools : 46.4.0
Cython : 0.29.21
pytest : 6.0.2
hypothesis : 5.35.3
sphinx : 3.2.1
blosc : 1.9.2
feather : None
xlsxwriter : 1.3.4
lxml.etree : 4.5.2
html5lib : 1.1
pymysql : None
psycopg2 : None
jinja2 : 2.11.2
IPython : 7.18.1
pandas_datareader: None
bs4 : 4.9.1
bottleneck : 1.3.2
fsspec : 0.8.2
fastparquet : 0.4.1
gcsfs : 0.7.1
matplotlib : 3.3.2
numexpr : 2.7.1
odfpy : None
openpyxl : 3.0.5
pandas_gbq : None
pyarrow : 1.0.1
pytables : None
pyxlsb : None
s3fs : 0.5.1
scipy : 1.5.2
sqlalchemy : 1.3.19
tables : 3.6.1
tabulate : 0.8.7
xarray : 0.16.1
xlrd : 1.2.0
xlwt : 1.3.0
numba : 0.51.2

@ssche ssche added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Sep 22, 2020
@dsaxton dsaxton added Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate MultiIndex and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Sep 23, 2020
@gerritholl
Copy link

I just ran into this. It works fine with pandas 1.0, this bug was introduced somewhere between 1.0 and 1.1.

gerritholl added a commit to gerritholl/fogtools that referenced this issue Sep 25, 2020
Due to pandas-dev/pandas#36541 mark the
test_extend test as expected failure on pandas before 1.1.3, assuming
the PR fixing 36541 gets merged before 1.1.3 or 1.2.
@jreback jreback added this to the 1.2 milestone Oct 2, 2020
jreback pushed a commit that referenced this issue Oct 7, 2020
…hen resetting MultiIndex with NaT values) (#36563)
kesmit13 pushed a commit to kesmit13/pandas that referenced this issue Nov 2, 2020
… integer when resetting MultiIndex with NaT values) (pandas-dev#36563)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate MultiIndex
Projects
None yet
4 participants