Skip to content

read_json doesn't work on binary files with lines=True and chunksize #28906

@nlarusstone

Description

@nlarusstone

Code Sample, a copy-pastable example if possible

Using test.jsonl as the following:

{"test1":10}
{"test2":"hi"}

with open('test.jsonl', 'rb') as f:
    df = pd.read_json(f, lines=True, chunksize=1)
    for chunk in df:
        print(chunk)

Problem description

When I try to use read_json on a binary file with lines=True and chunksize set, I get the following error:
TypeError: sequence item 0: expected str instance, bytes found

It works when I remove the chunksize parameter. I'm guessing that the chunksize isn't taking into account the fact that the file is being read as a binary file.

Expected Output

I expect to be able to read a file using lines=True and chunksize

Output of pd.show_versions()

INSTALLED VERSIONS

commit : None

pandas : 0.25.1
numpy : 1.15.2
pytz : 2019.1
dateutil : 2.8.0
pip : 19.2.3
setuptools : 41.4.0
Cython : None
pytest : 5.0.1
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 2.10.3
IPython : 7.8.0
pandas_datareader: None
bs4 : None
bottleneck : None
fastparquet : None
gcsfs : None
lxml.etree : None
matplotlib : 3.0.2
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : 0.15.0
pytables : None
s3fs : 0.3.5
scipy : 1.2.0
sqlalchemy : 1.3.9
tables : None
xarray : None
xlrd : 1.2.0
xlwt : None
xlsxwriter : None

Metadata

Metadata

Assignees

No one assigned

    Labels

    EnhancementIO JSONread_json, to_json, json_normalizeNeeds DiscussionRequires discussion from core team before further action

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions