-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
parquet reader in Kùzu 0.0.9 only supports snappy compression (Polars uses zstd) #2190
Comments
Hi Prashanth, |
Okay, will plan on documenting that I should only use snappy compression upstream! Having less dependency on third-parties is great! Polars also doesn't depend on |
I think this is a bug/regression in version
0.0.9
with the parquet reader that depends on pyarrow.Python version:
3.11.2
OS: MacOS Ventura
13.5.2
Platform: M2 (arm64)
Polars version:
0.19.8
Pyarrow version:
13.0.0
Kùzu versions:
0.0.8
and0.0.9
Issue
The code snippet below works for Kùzu
0.0.8
, while it fails for version0.0.9
. The issue seems to be related to how thepyarrow
reader decompresses and parses columns in the new version.MRE
In the following code, I'm exporting a Polars DataFrame to parquet format. Note that I specify
compression="zstd"
(best compression performance), and alsouse_pyarrow=True
to ensure that it uses the C++ implementation ofpyarrow
under the hood -- this is what I'll be using downstream to ingest the parquet into Kùzu.Workaround
I can use
compression="snappy"
to get the code above to work with Kùzu0.0.9
.However, the default
write_parquet
method in Polars anduse_pyarrow=True
will prefer to usezstd
compression, rather thansnappy
, which the Polars docs says is older and not as efficient with its compression.Why is it that the code that used to work before, doesn't work now in newer versions of
kuzu
andpyarrow
?The text was updated successfully, but these errors were encountered: