New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
read_csv treats zeroes as nan if column contains any nan #2599
Comments
That's upsetting. Not represented in the test suite obviously. Marked as a bug |
I'm not able to reproduce on pandas 0.10:
|
I'm using a Windows 7 machine with NumPy 1.6.1 and Pandas 0.10. A NumPy version issue? Perhaps?
|
That's super strange. I'll add a unit test and investigate on my windows 7 box |
I was able to reproduce this with pandas master on 64bit Windows and 32bit Linux. I also found that this only happens if the values in the B column are actually integers. If i change the one zero to a float, then things work fine on 64bit Windows and 32bit Linux. |
One additional comment: things work fine when using the python parsing engine instead of the C one. |
dieter, if you're up to help me debug this, the relevant code is the |
if not I'll have to try to reproduce it on my windows VM at home this weekend |
Thanks for tip, Wes. The problem (on 32bit linux anyway) is that for some reason INT64_MIN == np.int64(0). So any values of 0 also get masked as nan's during the _maybe_upcast. The result array in _try_int64 returns contains two zeros, and na_count is 1. |
I got a bit further with this. Consider the following cython file: cdef extern from "stdint.h": cpdef f(): If you run this function on 32bit linux, you will get: |
Would it make sense to use np.iinfo and np.finfo to figure out the dtype dependent na_values instead? |
I added a pull request for this: #2635 |
Merged dieter's fix and this appears to be working now (I was able to reproduce the failure on windows 64-bit and it's fixed now) |
If
data.csv
contains the following (column B has a zero in the first row, and is empty in the second)... pandas 0.10.0 incorrectly reads it as:
... whereas pandas 0.9.0 reads it right:
The text was updated successfully, but these errors were encountered: