Polars parquet writer much slower than pyarrow parquet writer #15455
Labels
A-io
Area: reading and writing data
A-io-parquet
Area: reading/writing Parquet files
bug
Something isn't working
P-low
Priority: low
python
Related to Python Polars
Checks
Reproducible example
df.write_parquet("test.parquet", compression='snappy')
takes 92 secondsdf.write_parquet("test2.parquet", compression='snappy', use_pyarrow=True)
takes 55 seconds.Log output
No response
Issue description
At work we saw one of our pipelines taking around 50 minutes to write a parquet file. The difference was huge compared to pyarrow which took only one minute, see the logs below:
With polars (50minutes):
With pyarrow (1.5 minute):
Expected behavior
Write fast, like pyarrow does.
Installed versions
The text was updated successfully, but these errors were encountered: