-
-
Notifications
You must be signed in to change notification settings - Fork 19.1k
Description
Code Sample, a copy-pastable example if possible
Using test.jsonl
as the following:
{"test1":10}
{"test2":"hi"}
with open('test.jsonl', 'rb') as f:
df = pd.read_json(f, lines=True, chunksize=1)
for chunk in df:
print(chunk)
Problem description
When I try to use read_json
on a binary file with lines=True
and chunksize
set, I get the following error:
TypeError: sequence item 0: expected str instance, bytes found
It works when I remove the chunksize parameter. I'm guessing that the chunksize isn't taking into account the fact that the file is being read as a binary file.
Expected Output
I expect to be able to read a file using lines=True
and chunksize
Output of pd.show_versions()
INSTALLED VERSIONS
commit : None
pandas : 0.25.1
numpy : 1.15.2
pytz : 2019.1
dateutil : 2.8.0
pip : 19.2.3
setuptools : 41.4.0
Cython : None
pytest : 5.0.1
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 2.10.3
IPython : 7.8.0
pandas_datareader: None
bs4 : None
bottleneck : None
fastparquet : None
gcsfs : None
lxml.etree : None
matplotlib : 3.0.2
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : 0.15.0
pytables : None
s3fs : 0.3.5
scipy : 1.2.0
sqlalchemy : 1.3.9
tables : None
xarray : None
xlrd : 1.2.0
xlwt : None
xlsxwriter : None