BUG: Replacing NaN with None in Pandas 1.3 does not work #42423

pvieito · 2021-07-07T12:40:11Z

I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
(optional) I have confirmed this bug exists on the master branch of pandas.

Code Sample, a copy-pastable example

Pandas 1.2

>>> import pandas as pd
>>> import numpy as np
>>> df = pd.DataFrame([0.5, np.nan])
>>> df.where(pd.notnull(df), None)
     0
0  0.5
1  None

Pandas 1.3

>>> import pandas as pd
>>> import numpy as np
>>> df = pd.DataFrame([0.5, np.nan])
>>> df.where(pd.notnull(df), None)
     0
0  0.5
1  NaN

Problem description

Replacing NaN values with None (or any other Python object) should work as in previous Pandas versions.

Expected Output

>>> import pandas as pd
>>> import numpy as np
>>> df = pd.DataFrame([0.5, np.nan])
>>> df.where(pd.notnull(df), None)
     0
0  0.5
1  None

Output of `pd.show_versions()`

INSTALLED VERSIONS

commit : f00ed8f
python : 3.8.10.final.0
python-bits : 64
OS : Darwin
OS-release : 21.0.0
Version : Darwin Kernel Version 21.0.0: Sun Jun 20 18:43:49 PDT 2021; root:xnu-8011.0.0.121.4~2/RELEASE_X86_64
machine : x86_64
processor : i386
byteorder : little
LC_ALL : None
LANG : es_ES.UTF-8
LOCALE : es_ES.UTF-8

pandas : 1.3.0
numpy : 1.18.5
pytz : 2021.1
dateutil : 2.8.1
pip : 21.1.3
setuptools : 57.1.0
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : 1.3.7
lxml.etree : 4.6.2
html5lib : None
pymysql : None
psycopg2 : 2.9.1 (dt dec pq3 ext lo64)
jinja2 : 2.11.3
IPython : 7.20.0
pandas_datareader: None
bs4 : 4.9.3
bottleneck : None
fsspec : None
fastparquet : None
gcsfs : None
matplotlib : 3.3.4
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : 3.0.0
pyxlsb : None
s3fs : None
scipy : 1.3.3
sqlalchemy : 1.4.20
tables : None
tabulate : None
xarray : None
xlrd : 2.0.1
xlwt : None
numba : None

The text was updated successfully, but these errors were encountered:

lerome · 2021-07-07T14:16:21Z

That was apparently changed on purpose: #39761 but it's indeed not very convenient, for example we often want to replace NaN by None just before converting the dataframe back to a dict (df.to_dict) or other data structure. I'm pretty sure that this sole change breaks many existing code bases.

rhshadrach · 2021-07-09T03:43:18Z

One can still convert to None, however you need to convert to object dtype first:

df = pd.DataFrame([0.5, np.nan])
df = df.astype(object)
print(df.where(pd.notnull(df), None))

gives

      0
0   0.5
1  None

This behavior similar to the result of assigning using .loc/.iloc:

df = pd.DataFrame([0.5, np.nan])
df.iloc[0, 0] = None
print(df)

gives (on both 1.3.x and 1.2.x):

    0
0 NaN
1 NaN

I can understand this may be somewhat surprising, but is certainly more consistent. It also allows assignment without conversion to object, and conversion to object can involve significant performance degredation.

cc @jbrockmendel for any other thoughts.

shahyash10 · 2021-07-09T17:57:48Z

When you're workin with json, json doesn't recognize NaNs. Conversion to object dtype would be mandatory in this scenario.
Can you please suggest some other workaround? @rhshadrach

rhshadrach · 2021-07-09T19:36:05Z

@yashs97 It would be helpful to understand how the workaround I posted is not sufficient.

shahyash10 · 2021-07-09T19:52:02Z

i was asking for a workaround without converting to object dtype since like you said it involves performance degradation

jbrockmendel · 2021-07-09T21:23:24Z

One can still convert to None, however you need to convert to object dtype first

This is the right way to do this.

i was asking for a workaround without converting to object dtype since like you said it involves performance degradation

The 1.2 behavior referenced by the OP also converts to object, so if you want None in your Series, there is no way to do this.

rhshadrach · 2021-07-10T14:26:58Z

It seems to me this issue can be closed. If anyone (cc @pvieito, @lerome) would like to discuss further, please comment here and can reopen.

dylancaponi · 2021-08-04T22:21:41Z

Is there a heads up for this in the migration guide? Would be helpful.

rhshadrach · 2021-08-07T00:02:51Z

@dylancaponi - By migration guide, do you mean the whatsnew (release notes)?

davmlaw · 2022-12-12T11:29:17Z

This silently broke code that was the top hit for "pandas replace NaN with None" on stack overflow for many years, and was shared on this Git repo:

#17494 (comment)

If at all possible - please produce a warning if you see ```df.where(pd.notnull(df), None)``

pvieito added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Jul 7, 2021

rhshadrach added Dtype Conversions Unexpected or buggy dtype conversions replace replace method and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Jul 10, 2021

rhshadrach added this to the No action milestone Jul 10, 2021

rhshadrach closed this as completed Jul 10, 2021

mzeitlin11 mentioned this issue Aug 4, 2021

BUG: Cannot replace NaNs with None #42888

Closed

3 tasks

MarcoGorelli mentioned this issue Aug 13, 2021

pandas version #43012

Closed

pramodg mentioned this issue Dec 22, 2021

Handle NaNs in data google/weather-tools#33

Merged

ingted mentioned this issue Mar 15, 2022

BUG: Series.astype is unable to handle NaN #46377

Open

3 tasks

davmlaw mentioned this issue Dec 12, 2022

Impact node not showing grid SACGF/variantgrid#727

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: Replacing NaN with None in Pandas 1.3 does not work #42423

BUG: Replacing NaN with None in Pandas 1.3 does not work #42423

pvieito commented Jul 7, 2021

INSTALLED VERSIONS

lerome commented Jul 7, 2021

rhshadrach commented Jul 9, 2021

shahyash10 commented Jul 9, 2021 •

edited

Loading

rhshadrach commented Jul 9, 2021

shahyash10 commented Jul 9, 2021

jbrockmendel commented Jul 9, 2021

rhshadrach commented Jul 10, 2021

dylancaponi commented Aug 4, 2021

rhshadrach commented Aug 7, 2021

davmlaw commented Dec 12, 2022

BUG: Replacing NaN with None in Pandas 1.3 does not work #42423

BUG: Replacing NaN with None in Pandas 1.3 does not work #42423

Comments

pvieito commented Jul 7, 2021

Code Sample, a copy-pastable example

Pandas 1.2

Pandas 1.3

Problem description

Expected Output

Output of pd.show_versions()

INSTALLED VERSIONS

lerome commented Jul 7, 2021

rhshadrach commented Jul 9, 2021

shahyash10 commented Jul 9, 2021 • edited Loading

rhshadrach commented Jul 9, 2021

shahyash10 commented Jul 9, 2021

jbrockmendel commented Jul 9, 2021

rhshadrach commented Jul 10, 2021

dylancaponi commented Aug 4, 2021

rhshadrach commented Aug 7, 2021

davmlaw commented Dec 12, 2022

Output of `pd.show_versions()`

shahyash10 commented Jul 9, 2021 •

edited

Loading