Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for FixedSizeList from Arrow instead of crashing on from_arrow #8023

Closed
2 tasks done
marsupialtail opened this issue Apr 5, 2023 · 8 comments
Closed
2 tasks done
Labels
bug Something isn't working python Related to Python Polars

Comments

@marsupialtail
Copy link

Polars version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of Polars.

Issue description

Arrow has a data type called FixedSizeList. Currently making Polars DataFrame from an Arrow table with this doesn't work.

This data type is quite important for embeddings etc.

Reproducible example

First you need to download the parquet dataset here: https://drive.google.com/file/d/1sk9SM589YTjPFYI1RsxcdJf5WmmZJTzr/view?usp=sharing

Then just try to read the parquet file polars.read_parquet

Interestingly converting first to Pandas and reading from Pandas works with Polars.

Expected behavior

Works.

Installed versions

0.16.15
@marsupialtail marsupialtail added bug Something isn't working python Related to Python Polars labels Apr 5, 2023
@marsupialtail marsupialtail changed the title Add support for FixedSizeList from Arrow? Add support for FixedSizeList from Arrow instead of crashing on from_arrow Apr 5, 2023
@marsupialtail
Copy link
Author

@ritchie46 I would like to contribute a fix to this. Can you point me to some files I should look at?

@kylebarron
Copy link

This is a duplicate of #4014 I think

@kylebarron
Copy link

I've been meaning to add FixedSizeList support for a while but haven't found the time yet. The previous suggestion from Ritchie for reference is here: https://discord.com/channels/908022250106667068/908022461751234570/1074262804183404574

Yes, add a fixed list type (feature gated) and then implement the Series trait. Should be pretty doable.

This diff: #6374 and this diff: #6679

@kylebarron
Copy link

This issue can probably be closed given #8943

@stinodego
Copy link
Member

Yep, this now works fine:

import pyarrow as pa

import polars as pl

values = pa.array([1, 2, 3, 4])
arr = pa.FixedSizeListArray.from_arrays(values, 2)
s = pl.from_arrow(arr)
print(s)
shape: (2,)
Series: '' [array[i64, 2]]
[
        [1, 2]
        [3, 4]
]

@claysmyth
Copy link

Got this error after using df.write_parquet()

PanicException: not yet implemented: Writing FixedSizeList to parquet not yet implemented, is there a work-around?

@stinodego
Copy link
Member

Got this error after using df.write_parquet()

PanicException: not yet implemented: Writing FixedSizeList to parquet not yet implemented, is there a work-around?

Could you please open a new issue for this?

@franz101
Copy link

franz101 commented Mar 9, 2024

#14876

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working python Related to Python Polars
Projects
None yet
Development

No branches or pull requests

5 participants