pandas.read_csv leaks memory while opening massive files with chunksize & iterator=True

I am using anaconda and my pandas version is 0.23.1. When dealing with single large file, setting chunksize or iterator=True works fine and memory usage is low. The problem raises when I am trying to dealing with 5000+ files (file names are in `filelist`):

```python
trajectory = [pd.read_csv(f, delim_whitespace=True, header=None, chunksize=10000) for f in filelist]
```

The memory usage raises very soon and exceeds 20GB+ quickly. However, `trajectory = [open(f, 'r')....]` and reading 10000 lines from each file works fine.

I also tried `low_memory=True` option but it's not working. Both `engine='python'` and `memory_map=<some file>` options solve the memory problem but when I use the datas with

```python
X = np.asarray([f.get_chunk().values for f in trajectory])
FX = np.fft.fft(X, axis=0)
```

The multi-threading of MKL-FFT does not work anymore.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

pandas.read_csv leaks memory while opening massive files with chunksize & iterator=True #21516

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

pandas.read_csv leaks memory while opening massive files with chunksize & iterator=True #21516

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions