Join GitHub today
GitHub is home to over 20 million developers working together to host and review code, manage projects, and build software together.
read_csv, integer dtype and empty cells #2631
Comments
|
There is no integer NA values unfortunately. I plan to fix this (a big project-- requires circumventing NumPy probably) one of these days |
|
I don't mind that it is not possible (yet) but that read_csv changed the datatype even as I specified it and didn't say anything (throw exception or print warning). pandas/src/pasrer.pyx has commented out exception throwing in line 900, which seems to do what I expected...? Would it be posible to add a param to specify a strategy (drop row, throw exception, cast to float) what should happen with such cases? I tried to understand the code and it seems that it operates on columns, so dropping rows if an int is NA seems not an easy option :-( |
wesm
was assigned
Jan 20, 2013
wesm
closed this
in 5da8df7
Jan 20, 2013
|
Done. Thanks for the suggestion; I agree raising the exception is the right move. in your example note you need to pass |
janschulz commentedJan 3, 2013
Reading in a csv file with an integer column which has empty cells will cast that column to float (which in the end will resulted in problems with merging this dataframe on that column with a dataframe where the corresponding column is int).
It would be nice if a warning could be printed when such conversation (maybe only when an explicit dtype={"col":np.int64} setting is passed to read_csv) takes place and optional let me specify that such rows should be droped (isn't there a NA value for int columns...?)