Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Default na values list doc is missing the empty string #10700

Closed
frlnx opened this issue Jul 30, 2015 · 2 comments
Closed

Default na values list doc is missing the empty string #10700

frlnx opened this issue Jul 30, 2015 · 2 comments
Labels
Docs Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate
Milestone

Comments

@frlnx
Copy link

frlnx commented Jul 30, 2015

Recreating my original post from here: http://stackoverflow.com/questions/26659941/what-are-the-default-na-values-when-pandas-loads-data/31705571#31705571

This documentation http://pandas.pydata.org/pandas-docs/stable/io.html#na-values states:

The default NaN recognized values are ['-1.#IND', '1.#QNAN', '1.#IND', '-1.#QNAN', '#N/A','N/A', 'NA', '#NA', 'NULL', 'NaN', '-NaN', 'nan', '-nan'].

However, this list is not complete.

If it was, these two pieces of code should produce the same result

The actual default values:

import pandas as pd
from StringIO import StringIO

sio = StringIO()
sio.write('"foo","bar"\n"1",""\n"NA","4"')
sio.seek(0)
pd.read_csv(sio, sep=",", quotechar='"')
   foo  bar
0    1  NaN
1  NaN    4

The default values copied and given:

sio = StringIO()
sio.write('"foo","bar"\n"1",""\n"NA","4"')
sio.seek(0)
pd.read_csv(sio, sep=",", quotechar='"',
            keep_default_na=False,
            na_values=['-1.#IND', '1.#QNAN', '1.#IND',
                       '-1.#QNAN', '#N/A','N/A', '#NA', 'NA'
                       'NULL', 'NaN', '-NaN', 'nan', '-nan'])

  foo bar
0   1    
1 NaN   4

Pandas version:

pd.__version__
'0.15.2'
@jreback
Copy link
Contributor

jreback commented Jul 30, 2015

I would just update the docs slightly (and not put it in the list, but just say that it treates 0-len strings as missing as well). As these are not tokens that are nan-converted, but a missing token (between the delimeter), and that's why its not directly in the list.

@jreback jreback added Docs Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Difficulty Novice labels Jul 30, 2015
@jreback jreback added this to the Next Major Release milestone Jul 30, 2015
@Winterflower
Copy link
Contributor

Picking this up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Docs Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate
Projects
None yet
Development

No branches or pull requests

4 participants