Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

read_csv, integer dtype and empty cells #2631

Closed
jankatins opened this issue Jan 3, 2013 · 4 comments
Closed

read_csv, integer dtype and empty cells #2631

jankatins opened this issue Jan 3, 2013 · 4 comments
Assignees
Labels
Enhancement IO Data IO issues that don't fit into a more specific label
Milestone

Comments

@jankatins
Copy link
Contributor

Reading in a csv file with an integer column which has empty cells will cast that column to float (which in the end will resulted in problems with merging this dataframe on that column with a dataframe where the corresponding column is int).

It would be nice if a warning could be printed when such conversation (maybe only when an explicit dtype={"col":np.int64} setting is passed to read_csv) takes place and optional let me specify that such rows should be droped (isn't there a NA value for int columns...?)

data = """YEAR, DOY, a
2001,106380451,10
2001,,11
2001,106380451,67"""
import numpy as np
f = pandas.read_csv(StringIO(data), sep=",", dtype={'DOY': np.int64})
f.dtypes
YEAR      int64
 DOY    float64
 a        int64
@wesm
Copy link
Member

wesm commented Jan 3, 2013

There is no integer NA values unfortunately. I plan to fix this (a big project-- requires circumventing NumPy probably) one of these days

@jankatins
Copy link
Contributor Author

I don't mind that it is not possible (yet) but that read_csv changed the datatype even as I specified it and didn't say anything (throw exception or print warning).

pandas/src/pasrer.pyx has commented out exception throwing in line 900, which seems to do what I expected...?

Would it be posible to add a param to specify a strategy (drop row, throw exception, cast to float) what should happen with such cases? I tried to understand the code and it seems that it operates on columns, so dropping rows if an int is NA seems not an easy option :-(

@ghost ghost assigned wesm Jan 20, 2013
@wesm wesm closed this as completed in 5da8df7 Jan 20, 2013
@wesm
Copy link
Member

wesm commented Jan 20, 2013

Done. Thanks for the suggestion; I agree raising the exception is the right move. in your example note you need to pass skipinitialspace=True

@StefRe
Copy link
Contributor

StefRe commented Oct 5, 2019

Now that we have nullable integers since 0.24.0, wouldn't it be a good idea to add a parameter to read_csv like 'use_nullable_ints' to enable inference of Int64 columns?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement IO Data IO issues that don't fit into a more specific label
Projects
None yet
Development

No branches or pull requests

3 participants