-
Notifications
You must be signed in to change notification settings - Fork 105
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PandasCursor doesn't automatically convert int columns with NA's to floats #60
Comments
I do not really know what kind of data this is. Is it possible to present sample data? |
Sure. The column in particular that's giving me issues is a column capturing the 4 digits after a zip/postal code. The snippet below shows that sometimes, this data is missing from that column. Pandas must use
|
I'm experiencing this same issue. Is it possible to tell explicitly tell the PandasCursor to cast the column to floats? |
It seems good to convert with cast. |
Thanks @laughingman7743 , I used ValueError: Integer column has NA values in column 18.
Consider replacing `column18` with `cast(column18 AS double)` in your sql statement |
Thanks @mckeown12. |
Add about ValueError of integer column in Dataframe. (close #60)
Isn't it possible that PyAthena handles the cast for us users? |
Pull requests welcome! |
Pandas 0.24+ has support for nullable ints, so I was able to keep my int columns as ints (rather than converting to double) by changing converter.py like so:
If you're willing to set the minimum requirements to pandas >=0.24, I think this fix would be cleaner than converting to double. |
@xinluo-gogovan Thank you for your information! |
https://travis-ci.org/laughingman7743/PyAthena/jobs/516226474 error details
|
Not sure what those errors are about as it seems that branch has a bunch of refactoring going on, but I had run the tests on master with just my aforementioned change plus this following one and all the tests were passing:
|
@xinluo-gogovan Thanks! I will investigate. |
When I run the test in the local environment, it passes.
|
All tests passed. 🎉 |
…in_pandas_cursor Support integer NA values in PandasCursor (fix #60)
I'm querying a large athena table and can successfully run a query using the below code, however it's really slow (for reasons covered in #46).
I would really like to take advantage of the performance boost that PandasCursor offers, however, when I run the code below, I get a value error.
Now I understand why I'm getting this value error. I have a int column in my athena table which has NA values in it, which Pandas notoriously doesn't handle well (NaN's are floats in Pandas eyes, not ints). The
pd.read_sql()
seems to handle this gracefully. It recognizes there is an int column with NaN's and converts it to a float column. It would be great if pyathena did the same thing.The text was updated successfully, but these errors were encountered: