Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

iterrows in catalog ingestion casts all dtypes to float #209

Closed
bfhealy opened this issue May 24, 2023 · 3 comments
Closed

iterrows in catalog ingestion casts all dtypes to float #209

bfhealy opened this issue May 24, 2023 · 3 comments

Comments

@bfhealy
Copy link
Contributor

bfhealy commented May 24, 2023

When ingesting a parquet file using ingest_catalog.py, this line casts all columns' dtypes to float. Thus the downstream checks for different dtypes will not work as intended.

Perhaps using itertuples instead would help by preserving the dtypes (at the expense of some re-working of the loop)? Note this could introduce a new issue where columns beginning with an underscore are renamed (e.g. _id, which gets mapped to _1). This might be avoided by setting name=None within itertuples.

@bfhealy
Copy link
Contributor Author

bfhealy commented May 24, 2023

For now I'm able to get around this issue by casting integers to the pandas Int64 type using column.astype("Int64"), then saving in parquet format.

@Theodlz
Copy link
Collaborator

Theodlz commented Jun 10, 2023

Hi @bfhealy. Do you maybe have a sample that I can use to try this out?

@Theodlz
Copy link
Collaborator

Theodlz commented Dec 11, 2023

a PR addressing this issue has been merged

@Theodlz Theodlz closed this as completed Dec 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants