pd.read_excel() ignores converter functions over na_filter argument #31908

theSuiGenerisAakash · 2020-02-12T03:02:22Z

Please consider the code below.

import pandas as pd
__to_str = lambda x: '' if pd.isnull(x) == True else str(x)

# Let's say that the excel to be read has a column with some empty strings or nulls in them
# by default na_filter is True
df = pd.read_excel('./file1.xlsx', converters = {'Name': __to_str}) # -> Leads to empty strings being read as nan

# when na_filter is False
df = pd.read_excel('./file1.xlsx', converters = {'Name': __to_str}, na_filter=False) # -> Leads to empty strings be read as empty strings (according to __to_str function)

Why this behaviour?

I believe that if there's an explicitly provided argument(converters), its behaviour should override any other argument that has not been set(has a default value -> na_filter), in case of such conflict or overlap of effect arises.

Expected output

When one argument is specified which operates at a certain specificity level, the other arguments having default values should give in to the effect caused by the explicitly set one.
For instance, if converters is provided with a function to handle cell values of a certain column, then na_filter or keep_default_na values which are applied by default and have not been passed explicitly.

Please correct me if my understanding is incorrect.

Output of `pd.show_versions()`

INSTALLED VERSIONS ------------------

commit : None

pandas : 1.0.0
numpy : 1.18.1
pytz : 2019.3
dateutil : 2.8.1
pip : 20.0.2
setuptools : 45.1.0
Cython : 0.29.14
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : 1.2.7
lxml.etree : 4.5.0
html5lib : None
pymysql : None
psycopg2 : 2.7.7 (dt dec pq3 ext lo64)
jinja2 : 2.11.1
IPython : 7.12.0
pandas_datareader: None
bs4 : None
bottleneck : None
fastparquet : None
gcsfs : None
lxml.etree : 4.5.0
matplotlib : None
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pytables : None
pytest : None
pyxlsb : None
s3fs : None
scipy : None
sqlalchemy : 1.3.13
tables : None
tabulate : None
xarray : None
xlrd : 1.2.0
xlwt : None
xlsxwriter : 1.2.7
numba : None

The text was updated successfully, but these errors were encountered:

jbrockmendel added the IO Excel read_excel, to_excel label Feb 25, 2020

mroeschke added the Bug label May 8, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pd.read_excel() ignores converter functions over na_filter argument #31908

pd.read_excel() ignores converter functions over na_filter argument #31908

theSuiGenerisAakash commented Feb 12, 2020 •

edited

pd.read_excel() ignores converter functions over na_filter argument #31908

pd.read_excel() ignores converter functions over na_filter argument #31908

Comments

theSuiGenerisAakash commented Feb 12, 2020 • edited

Why this behaviour?

Expected output

Output of pd.show_versions()

theSuiGenerisAakash commented Feb 12, 2020 •

edited

Output of `pd.show_versions()`