New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

na_filter=False ignored when index_col set #5239

Closed
cancan101 opened this Issue Oct 16, 2013 · 3 comments

Comments

Projects
None yet
3 participants
@cancan101
Contributor

cancan101 commented Oct 16, 2013

Given the following CSV file:

u1,u2,u3,d1,d2,d3,d4
Good Things,C,,1,1,1,1
Good Things,R,,1,1,1,1
Bad Things,C,,1,1,1,1
Bad Things,T,,1,1,1,1
Okay Things,N,B,1,1,1,1
Okay Things,N,D,1,1,1,1
Okay Things,B,,1,1,1,1
Okay Things,D,,1,1,1,1

First I parse with na_filter=True:

In [13]: pd.read_csv("/home/alex/nan_issue.csv", na_filter=True)
Out[13]: 
            u1 u2   u3  d1  d2  d3  d4
0  Good Things  C  NaN   1   1   1   1
1  Good Things  R  NaN   1   1   1   1
2   Bad Things  C  NaN   1   1   1   1
3   Bad Things  T  NaN   1   1   1   1
4  Okay Things  N    B   1   1   1   1
5  Okay Things  N    D   1   1   1   1
6  Okay Things  B  NaN   1   1   1   1
7  Okay Things  D  NaN   1   1   1   1

then I parse with na_filter=False:

In [12]: pd.read_csv("/home/alex/nan_issue.csv", na_filter=False)
Out[12]: 
            u1 u2 u3  d1  d2  d3  d4
0  Good Things  C      1   1   1   1
1  Good Things  R      1   1   1   1
2   Bad Things  C      1   1   1   1
3   Bad Things  T      1   1   1   1
4  Okay Things  N  B   1   1   1   1
5  Okay Things  N  D   1   1   1   1
6  Okay Things  B      1   1   1   1
7  Okay Things  D      1   1   1   1

then index_cols set:

In [11]: pd.read_csv("/home/alex/nan_issue.csv", na_filter=False,index_col=[0,1,2],)
Out[11]: 
                    d1  d2  d3  d4
u1          u2 u3                 
Good Things C  NaN   1   1   1   1
            R  NaN   1   1   1   1
Bad Things  C  NaN   1   1   1   1
            T  NaN   1   1   1   1
Okay Things N  B     1   1   1   1
               D     1   1   1   1
            B  NaN   1   1   1   1
            D  NaN   1   1   1   1

Finally setting na_values=[], keep_default_na=False seems to fix the issue:

In [14]: pd.read_csv("/home/alex/nan_issue.csv", na_filter=False,index_col=[0,1,2],na_values=[], keep_default_na=False)
Out[14]: 
                   d1  d2  d3  d4
u1          u2 u3                
Good Things C       1   1   1   1
            R       1   1   1   1
Bad Things  C       1   1   1   1
            T       1   1   1   1
Okay Things N  B    1   1   1   1
               D    1   1   1   1
            B       1   1   1   1
            D       1   1   1   1
@jreback

This comment has been minimized.

Show comment
Hide comment
@jreback

jreback Oct 16, 2013

Contributor

There are very limited tests with na_filter=False and its a pretty silly parameter, so moving to low-priority. You are welcome to do a PR if you'd like.

Contributor

jreback commented Oct 16, 2013

There are very limited tests with na_filter=False and its a pretty silly parameter, so moving to low-priority. You are welcome to do a PR if you'd like.

@cancan101

This comment has been minimized.

Show comment
Hide comment
@cancan101

cancan101 Oct 16, 2013

Contributor

TBI I can work around for now. My observation is that there probably too many parameters on the method having to do with handling of nans.It would be great to clean this up.
keep_default_na, na_filter, na_values

There is also interaction between these parameters.

Contributor

cancan101 commented Oct 16, 2013

TBI I can work around for now. My observation is that there probably too many parameters on the method having to do with handling of nans.It would be great to clean this up.
keep_default_na, na_filter, na_values

There is also interaction between these parameters.

@jreback

This comment has been minimized.

Show comment
Hide comment
@jreback

jreback Oct 16, 2013

Contributor

sure.....though to be honest, easiest just to drop na_filter...but if you come up with a better API, gr8

Contributor

jreback commented Oct 16, 2013

sure.....though to be honest, easiest just to drop na_filter...but if you come up with a better API, gr8

gfyoung added a commit to gfyoung/pandas that referenced this issue Nov 5, 2017

@gfyoung gfyoung added Bug Missing-data and removed Prio-low labels Nov 5, 2017

@gfyoung gfyoung modified the milestones: Someday, Next Major Release Nov 5, 2017

@jreback jreback modified the milestones: Next Major Release, 0.21.1 Nov 6, 2017

gfyoung added a commit to gfyoung/pandas that referenced this issue Nov 6, 2017

gfyoung added a commit that referenced this issue Nov 6, 2017

TomAugspurger added a commit to TomAugspurger/pandas that referenced this issue Dec 8, 2017

TomAugspurger added a commit that referenced this issue Dec 11, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment