read_json with lines=True not using buff/cache memory

I have a 3.2 GB json file that I am trying to read into pandas using pd.read_json(lines=True). When I run that, I get a MemoryError, even though my system has >12GB of available memory. This is Pandas version 0.20.2.

I'm on Ubuntu, and the `free` command shows >12GB of "Available" memory, most of which is "buff/cache".

I'm able to read the file into a dataframe by iterating over the file like so:

```python
dfs = []
with open(fp, 'r') as f:
    while True:
        lines = list(itertools.islice(f, 1000))
        
        if lines:
            lines_str = ''.join(lines)
            dfs.append(pd.read_json(StringIO(lines_str), lines=True))
        else:
            break

df = pd.concat(dfs)

```
You'll notice that at the end of this I have the original data in memory **twice** (in the list and in the final df), but no problems.

It seems that `pd.read_json` with `lines=True` doesn't use the available memory, which looks to me like a bug.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

read_json with lines=True not using buff/cache memory #17048

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

read_json with lines=True not using buff/cache memory #17048

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions