Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Replacing NaN with None in Pandas 1.3 does not work #42423

Closed
2 of 3 tasks
pvieito opened this issue Jul 7, 2021 · 10 comments
Closed
2 of 3 tasks

BUG: Replacing NaN with None in Pandas 1.3 does not work #42423

pvieito opened this issue Jul 7, 2021 · 10 comments
Labels
Bug Dtype Conversions Unexpected or buggy dtype conversions replace replace method

Comments

@pvieito
Copy link

pvieito commented Jul 7, 2021

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of pandas.
  • (optional) I have confirmed this bug exists on the master branch of pandas.

Code Sample, a copy-pastable example

Pandas 1.2

>>> import pandas as pd
>>> import numpy as np
>>> df = pd.DataFrame([0.5, np.nan])
>>> df.where(pd.notnull(df), None)
     0
0  0.5
1  None

Pandas 1.3

>>> import pandas as pd
>>> import numpy as np
>>> df = pd.DataFrame([0.5, np.nan])
>>> df.where(pd.notnull(df), None)
     0
0  0.5
1  NaN

Problem description

Replacing NaN values with None (or any other Python object) should work as in previous Pandas versions.

Expected Output

>>> import pandas as pd
>>> import numpy as np
>>> df = pd.DataFrame([0.5, np.nan])
>>> df.where(pd.notnull(df), None)
     0
0  0.5
1  None

Output of pd.show_versions()

INSTALLED VERSIONS

commit : f00ed8f
python : 3.8.10.final.0
python-bits : 64
OS : Darwin
OS-release : 21.0.0
Version : Darwin Kernel Version 21.0.0: Sun Jun 20 18:43:49 PDT 2021; root:xnu-8011.0.0.121.4~2/RELEASE_X86_64
machine : x86_64
processor : i386
byteorder : little
LC_ALL : None
LANG : es_ES.UTF-8
LOCALE : es_ES.UTF-8

pandas : 1.3.0
numpy : 1.18.5
pytz : 2021.1
dateutil : 2.8.1
pip : 21.1.3
setuptools : 57.1.0
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : 1.3.7
lxml.etree : 4.6.2
html5lib : None
pymysql : None
psycopg2 : 2.9.1 (dt dec pq3 ext lo64)
jinja2 : 2.11.3
IPython : 7.20.0
pandas_datareader: None
bs4 : 4.9.3
bottleneck : None
fsspec : None
fastparquet : None
gcsfs : None
matplotlib : 3.3.4
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : 3.0.0
pyxlsb : None
s3fs : None
scipy : 1.3.3
sqlalchemy : 1.4.20
tables : None
tabulate : None
xarray : None
xlrd : 2.0.1
xlwt : None
numba : None

@pvieito pvieito added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Jul 7, 2021
@lerome
Copy link

lerome commented Jul 7, 2021

That was apparently changed on purpose: #39761 but it's indeed not very convenient, for example we often want to replace NaN by None just before converting the dataframe back to a dict (df.to_dict) or other data structure. I'm pretty sure that this sole change breaks many existing code bases.

@rhshadrach
Copy link
Member

One can still convert to None, however you need to convert to object dtype first:

df = pd.DataFrame([0.5, np.nan])
df = df.astype(object)
print(df.where(pd.notnull(df), None))

gives

      0
0   0.5
1  None

This behavior similar to the result of assigning using .loc/.iloc:

df = pd.DataFrame([0.5, np.nan])
df.iloc[0, 0] = None
print(df)

gives (on both 1.3.x and 1.2.x):

    0
0 NaN
1 NaN

I can understand this may be somewhat surprising, but is certainly more consistent. It also allows assignment without conversion to object, and conversion to object can involve significant performance degredation.

cc @jbrockmendel for any other thoughts.

@shahyash10
Copy link

shahyash10 commented Jul 9, 2021

When you're workin with json, json doesn't recognize NaNs. Conversion to object dtype would be mandatory in this scenario.
Can you please suggest some other workaround? @rhshadrach

@rhshadrach
Copy link
Member

@yashs97 It would be helpful to understand how the workaround I posted is not sufficient.

@shahyash10
Copy link

i was asking for a workaround without converting to object dtype since like you said it involves performance degradation

@jbrockmendel
Copy link
Member

One can still convert to None, however you need to convert to object dtype first

This is the right way to do this.

i was asking for a workaround without converting to object dtype since like you said it involves performance degradation

The 1.2 behavior referenced by the OP also converts to object, so if you want None in your Series, there is no way to do this.

@rhshadrach rhshadrach added Dtype Conversions Unexpected or buggy dtype conversions replace replace method and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Jul 10, 2021
@rhshadrach rhshadrach added this to the No action milestone Jul 10, 2021
@rhshadrach
Copy link
Member

It seems to me this issue can be closed. If anyone (cc @pvieito, @lerome) would like to discuss further, please comment here and can reopen.

@dylancaponi
Copy link

Is there a heads up for this in the migration guide? Would be helpful.

@rhshadrach
Copy link
Member

@dylancaponi - By migration guide, do you mean the whatsnew (release notes)?

@davmlaw
Copy link

davmlaw commented Dec 12, 2022

This silently broke code that was the top hit for "pandas replace NaN with None" on stack overflow for many years, and was shared on this Git repo:

#17494 (comment)

If at all possible - please produce a warning if you see ```df.where(pd.notnull(df), None)``

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Dtype Conversions Unexpected or buggy dtype conversions replace replace method
Projects
None yet
Development

No branches or pull requests

7 participants