Skip to content

Conversation

@Alvaro-Kothe
Copy link
Member


I adapted the test from this PR to Python 3.7 in commit d378852.

@pytest.mark.parametrize("chunksize", [1, 1.0])
@pytest.mark.parametrize("buffer", [BytesIO, StringIO])
def test_readjson_chunks(request, lines_json_df, chunksize, buffer):
    # Basic test that read_json(chunks=True) gives the same result as
    # read_json(chunks=False)
    # GH17048: memory usage when lines=True
    # GH#28906: read binary json lines in chunks

    if buffer == BytesIO:
        lines_json_df = lines_json_df.encode()

    unchunked = read_json(StringIO(lines_json_df), lines=True)
    with buffer(lines_json_df) as buf:
        reader = read_json(buf, lines=True, chunksize=chunksize)
        chunked = pd.concat(reader)

    tm.assert_frame_equal(chunked, unchunked)

Here is the test summary:

$ pytest pandas/tests/io/json/test_readlines.py::test_readjson_chunks -v
...
FAILED pandas/tests/io/json/test_readlines.py::test_readjson_chunks[BytesIO-1] - TypeError: ...
FAILED pandas/tests/io/json/test_readlines.py::test_readjson_chunks[BytesIO-1.0] - TypeError...
=========================== 2 failed, 2 passed, 8 warnings in 0.30s ===========================

The error only occorred when using a context manager.

Copy link
Member

@rhshadrach rhshadrach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@rhshadrach rhshadrach added IO JSON read_json, to_json, json_normalize Needs Tests Unit test(s) needed to prevent regressions labels Nov 16, 2025
@rhshadrach rhshadrach added this to the 3.0 milestone Nov 16, 2025
@rhshadrach rhshadrach merged commit 3508aae into pandas-dev:main Nov 16, 2025
50 of 51 checks passed
rustamali9183 pushed a commit to rustamali9183/pandas that referenced this pull request Nov 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

IO JSON read_json, to_json, json_normalize Needs Tests Unit test(s) needed to prevent regressions

Projects

None yet

Development

Successfully merging this pull request may close these issues.

read_json doesn't work on binary files with lines=True and chunksize

2 participants