LazyFrame OOM on CSVs with rows that contains large amount of data #17354
Labels
A-io-csv
Area: reading/writing CSV files
A-streaming
Related to the streaming engine
accepted
Ready for implementation
enhancement
New feature or an improvement of an existing feature
P-goal
Priority: aligns with long-term Polars goals
performance
Performance issues or improvements
python
Related to Python Polars
Checks
Reproducible example
Log output
No response
Issue description
Hello, I am opening this issue after encountering OOM issue with running filters/sql context queries on CSV datasets in which each individual row contains large amount of data.
I gave an reproducible example that worked on my 6GB RAM machine.
When the row contains a data small than 1 KB, I can even compute datasets larger than 100GB. As soon as the row gets quite large (can't point on the exact number here, but that happened with 2KB for each row), an OOM is starting to occur.
Expected behavior
LazyFrame computing should work even in cases that each row contains larger amount of data
Installed versions
The text was updated successfully, but these errors were encountered: