Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support missing values in Boolean columns #102

Closed
tanguycdls opened this issue Nov 21, 2019 · 5 comments
Closed

Support missing values in Boolean columns #102

tanguycdls opened this issue Nov 21, 2019 · 5 comments

Comments

@tanguycdls
Copy link

#80
Similar to the above issue, I ran into an issue with missing values in Boolean Columns. When parsing to pandas the conversion fails because we set the dtype to boolean while boolean does not support Na.

pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._read_rows()
pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._convert_column_data()
pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._convert_tokens()
pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._convert_with_dtype()
ValueError: Bool column has NA values in column 3

Could we add a try except Value Error around

df = pd.read_csv(io.BytesIO(response['Body'].read()),

If it fails we could raise a warning and try again the DataFrame creation with relaxed constraints:

  • Either we follow pandas and convert to object the column involved in the issue (I'm not sure we can detect the faulty column, so we could convert all columns with dtypes that are not Na compatible.)
  • Convert booleans (I understand from the PR above that int is already na safe ) to int and warn the user.

If needed I can write a PR once we agree on a solution !

Thanks

@laughingman7743
Copy link
Owner

I don't usually use Pandas. I don't know what solution is best.

[1] Either we follow pandas and convert to object the column involved in the issue (I'm not sure we can detect the faulty column, so we could convert all columns with dtypes that are not Na compatible.)

[2] Convert booleans (I understand from the PR above that int is already na safe ) to int and warn the user.

Which method do you think is better?

@laughingman7743
Copy link
Owner

Related: #100

@tanguycdls
Copy link
Author

Hi, thanks for the prompt answer ! I did a PR that respects Pandas Type Promotion so bool will be casted to object dtypes and let the user decide what they want to do with it.

@laughingman7743
Copy link
Owner

laughingman7743 commented Nov 22, 2019

Thank you for your PR. I understood what kind of implementation should be done.
I'm thinking of refactoring the Converter class to use it with PandasCursor.
https://github.com/laughingman7743/PyAthena/blob/master/pyathena/converter.py
I plan to try it on a weekend.

@laughingman7743
Copy link
Owner

laughingman7743 commented Nov 23, 2019

I implemented it so that dtype and converter can be customized freely in the following branches.
Please make sure. 🙏
#104

laughingman7743 added a commit that referenced this issue Dec 6, 2019
…oolean_column

Support NA values with boolean column (fix #100, fix #102, fix #103)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants