Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[query] Add to_pandas(types={}) argument to specify user-supplied pandas dtypes #12735

Merged
merged 5 commits into from
Mar 5, 2023

Conversation

mkanai
Copy link
Contributor

@mkanai mkanai commented Feb 27, 2023

This PR fixes #11738. Now users can specify arbitrary type conversation between Hail and Pandas dtypes via:

ht.to_pandas(types={"col1": "int32", "col2": np.float64, hl.tstring: "object"})

This maps col1 and col2 to int32 and np.float64, respectively, and all hl.tstring fields to object.

One design question might be whether to have separate arguments for column name and Hail type specifications or not. Any thoughts? cc: @danking

Also, I don't think the current type check would work for np.float64-like numpy dtype specifications...

@danking
Copy link
Contributor

danking commented Feb 27, 2023

Amazing! I'll look into this this week. Thank you Masa!

@danking danking assigned patrick-schultz and unassigned danking Mar 1, 2023
@danking
Copy link
Contributor

danking commented Mar 1, 2023

@patrick-schultz I added some tests and fixed behavior as a result. I think this is a valuable change, but since I've edited it directly seems like someone else should do review as well.

Copy link
Collaborator

@patrick-schultz patrick-schultz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me!

@danking danking merged commit eb60cdd into hail-is:main Mar 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

hl.Table.to_pandas() generates a dtype=string dataframe which is still experimental
3 participants