Skip to content
This repository has been archived by the owner on Feb 18, 2024. It is now read-only.

Parquet regression: exceptions.ArrowErrorException: NotYetImplemented("Can't read Dictionary(UInt32, LargeUtf8, false) from parquet") #955

Closed
ritchie46 opened this issue Apr 19, 2022 · 6 comments
Assignees
Labels
bug Something isn't working

Comments

@ritchie46
Copy link
Collaborator

CI round trips fail on writing/reading dictionary arrays with keys UInt32 and values LargeUtf8.

@jorgecarleitao jorgecarleitao self-assigned this Apr 23, 2022
@jorgecarleitao jorgecarleitao added the bug Something isn't working label Apr 23, 2022
@jorgecarleitao
Copy link
Owner

The issue is that we do not have statistics structs for dictionary arrays - the dictionary is read, but the statistics are not.

@ritchie46
Copy link
Collaborator Author

So I trigger that error when reading statistics? I need to check the error type and ignore the statistic on this error type?

Or should I check the schema first and if any dict in there don't try to parse statistics?

@ritchie46
Copy link
Collaborator Author

What did we do before this? Just not include the dictionary column in the statistics?

@jorgecarleitao
Copy link
Owner

I am working on this - the statistics API is clumsy (we should just have stats as arrays, not individual values), I will just fix this together with the API

@ritchie46
Copy link
Collaborator Author

All right. I'll wait a bit. :)

@ritchie46
Copy link
Collaborator Author

Fixed. Confirmed by the CI run in pola-rs/polars#3181

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants