You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm wondering why using converters results in returning NaN values as '' when using the same function, but when switching to apply instead of converters, the NaN values are returned as NaN as before.
My function:
def a_cleaning(value: object) -> object:
if isinstance(value, str):
return value.replace(',','')
else:
return value
My Dataframe:
data = pd.DataFrame({
'A':['1,200',np.nan,'400','200',np.nan]
})
data.to_csv('data.csv',index=False)
This looks interesting to me. I'd like to have a go at this.
This will be my first time contributing, please let me know if something should be done a different way.
The function gets different inputs in each of these cases, one is getting raw string data in csv as input and the other is getting elements of a processed DataFrame. The conversion in read_csv happens before the dataframe is created.
Pandas version checks
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
Issue Description
I'm wondering why using converters results in returning NaN values as '' when using the same function, but when switching to apply instead of converters, the NaN values are returned as NaN as before.
My function:
My Dataframe:
My code when using
converters
:Result:
My code when using
apply
:Result:
Why are the results different?
I'm not sure if this issue will occur with other read functions. I've only tested it with read_csv so far.
Expected Behavior
It should produce the same result as using
apply
.Installed Versions
INSTALLED VERSIONS
commit : d9cdd2e
python : 3.10.13.final.0
python-bits : 64
OS : Linux
OS-release : 5.15.133+
Version : #1 SMP Tue Dec 19 13:14:11 UTC 2023
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : POSIX
LANG : C.UTF-8
LOCALE : None.None
pandas : 2.2.2
numpy : 1.26.4
pytz : 2024.1
dateutil : 2.9.0.post0
setuptools : 69.0.3
pip : 23.3.2
Cython : 3.0.8
pytest : 8.2.1
hypothesis : None
sphinx : None
blosc : None
feather : 0.4.1
xlsxwriter : None
lxml.etree : 5.2.2
html5lib : 1.1
pymysql : None
psycopg2 : None
jinja2 : 3.1.2
IPython : 8.20.0
pandas_datareader : 0.10.0
adbc-driver-postgresql: None
adbc-driver-sqlite : None
bs4 : 4.12.2
bottleneck : None
dataframe-api-compat : None
fastparquet : None
fsspec : 2024.3.1
gcsfs : 2024.3.1
matplotlib : 3.7.5
numba : 0.59.1
numexpr : 2.10.0
odfpy : None
openpyxl : 3.1.3
pandas_gbq : None
pyarrow : 14.0.2
pyreadstat : None
python-calamine : None
pyxlsb : None
s3fs : 2024.3.1
scipy : 1.11.4
sqlalchemy : 2.0.25
tables : 3.9.2
tabulate : 0.9.0
xarray : 2024.5.0
xlrd : None
zstandard : 0.19.0
tzdata : 2024.1
qtpy : None
pyqt5 : None
The text was updated successfully, but these errors were encountered: