read_csv_batched not working when separator is included in the field #16953
Labels
bug
Something isn't working
needs triage
Awaiting prioritization by a maintainer
python
Related to Python Polars
Checks
Reproducible example
Log output
Issue description
I am trying to read a relatively large csv +30M rows that cannot fit into memory so I am using
read_csv_batched
. However, I noticed thatreader.next_batch(5)
instead of returning number of batches dfs (in our case 5) it always returned 1 df with all the rows inside (bigger than the given batch size).The issue seems to occur due to the
,
character but since we are using"
it should be escaped and not affect the batch reader.Note that this is a minimum example. In the real scenario we had
batch_size = 100,000
and still the whole csv was read in a single DataFrame of 30M rows.(Posted in SO first: https://stackoverflow.com/questions/78616907/polars-issue-with-read-csv-batched-when-separator-is-included-in-the-field)
Expected behavior
The expected behavior should be the one shown in the correct.csv example where 2 batches of size 1 are created:
Installed versions
The text was updated successfully, but these errors were encountered: