Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for StringDtype (fixes #1237) #2319

Merged
merged 2 commits into from
Nov 9, 2023

Conversation

MaDufie
Copy link
Contributor

@MaDufie MaDufie commented Jul 30, 2023

Add support for StringDtype (available in Pandas >=1.0)
Fixes #1237

AFTER
image

@timkpaine
Copy link
Member

timkpaine commented Jul 31, 2023

You should try to implement this the same way as done here:

if isinstance(data, pd.DataFrame):

And please use the same PR as you make changes, rather than opening a new one.

@MaDufie
Copy link
Contributor Author

MaDufie commented Aug 3, 2023

You should try to implement this the same way as done here:

if isinstance(data, pd.DataFrame):

And please use the same PR as you make changes, rather than opening a new one.

@timkpaine I have implemented the changes. Kindly let me if there are any more changes after you review them. Thank you.

@MaDufie MaDufie requested a review from timkpaine August 7, 2023 15:58
@timkpaine
Copy link
Member

It should be in the file linked, near the line linked.

@MaDufie MaDufie marked this pull request as ready for review August 8, 2023 05:33
python/perspective/perspective/table/table.py Outdated Show resolved Hide resolved
@MaDufie MaDufie marked this pull request as draft August 9, 2023 05:31
@MaDufie MaDufie requested a review from timkpaine August 9, 2023 05:39
@MaDufie MaDufie marked this pull request as ready for review August 9, 2023 11:13
@MaDufie MaDufie marked this pull request as draft August 10, 2023 19:20
@MaDufie MaDufie marked this pull request as ready for review August 10, 2023 19:21
@timkpaine timkpaine dismissed their stale review August 13, 2023 20:09

out of date

@timkpaine timkpaine changed the title 1237: Add support for StringDtype Add support for StringDtype (fixes #1237) Aug 13, 2023
@timkpaine
Copy link
Member

@texodus this lgtm, I know we're looking to replace all of pandas / numpy with native arrow in/out so I leave up to you whether or not you want to merge.

@@ -78,6 +78,12 @@ def deconstruct_pandas(data, kwargs=None):
if isinstance(v, pd.CategoricalDtype):
data[k] = data[k].astype(str)

# convert StringDtype to str
if isinstance(data, pd.DataFrame) and hasattr(pd, "CategoricalDtype"):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this hasattr necessary?

# convert StringDtype to str
if isinstance(data, pd.DataFrame) and hasattr(pd, "CategoricalDtype"):
for k, v in data.dtypes.items():
if isinstance(v, pd.StringDtype):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think maybe this was meant to be used in the hasattr, hasattr(pd, "StringDType") since StringDType is the new type

@MaDufie MaDufie requested a review from timkpaine August 14, 2023 13:01
Copy link
Member

@texodus texodus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR @MaDufie! Looks good!

We should really be taking advantage of the internal dictionary for Categories, but we don't really provide any fast path loading for DataFrame yet and likely won't as we move to Arrow.

@texodus texodus merged commit 7f122b7 into finos:master Nov 9, 2023
13 checks passed
@texodus texodus added the bug Concrete, reproducible bugs label Nov 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Concrete, reproducible bugs Python
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add support for StringDtype (available in Pandas >=1.0)
3 participants