Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Support large string types in the interchange protocol #56702

Closed
1 of 3 tasks
stinodego opened this issue Jan 2, 2024 · 1 comment · Fixed by #56772
Closed
1 of 3 tasks

ENH: Support large string types in the interchange protocol #56702

stinodego opened this issue Jan 2, 2024 · 1 comment · Fixed by #56772
Labels
Enhancement Interchange Dataframe Interchange Protocol

Comments

@stinodego
Copy link
Contributor

Feature Type

  • Adding new functionality to pandas

  • Changing existing functionality in pandas

  • Removing existing functionality in pandas

Problem Description

The interchange protocol supports the large string type. Pandas could also support it, but currently does not. It only supports 'small' strings.

Feature Description

This currently works:

import pandas as pd

df = pd.Series([], name="a", dtype="string[pyarrow]").to_frame()
dfi = df.__dataframe__()
result = pd.api.interchange.from_dataframe(dfi)
# result is equal to df

This does not:

import pandas as pd

df = pd.Series([], name="a", dtype="large_string[pyarrow]").to_frame()
dfi = df.__dataframe__()
result = pd.api.interchange.from_dataframe(dfi)
# ValueError: Data type large_string[pyarrow] not supported by interchange protocol

Alternative Solutions

n/a

Additional Context

@MarcoGorelli FYI. Supporting large string would mean better interop with Polars and pyarrow.

@stinodego stinodego added Enhancement Needs Triage Issue that has not been reviewed by a pandas team member labels Jan 2, 2024
@MarcoGorelli MarcoGorelli added Interchange Dataframe Interchange Protocol and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Jan 2, 2024
@phofl
Copy link
Member

phofl commented Jan 7, 2024

Yeah this got more urgent since we switched to large strings by default

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement Interchange Dataframe Interchange Protocol
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants