Skip to content

Bug with read_json from str reporting "Protocol not known" #43594

@Peterl777

Description

@Peterl777

read_json is supposed to parse from a str or from a file-like object. If parsing from a value string has the text http: then pandas crashes with a ValueError

Steps to reproduce:

>>> import json
>>> import pandas as pd
>>> s = '["http://www.example.com"]'
>>> json.loads(s)
['http://www.example.com']          # Proves the JSON is valid
>>> pd.read_json(s)
....
ValueError: Protocol not known: ["http

Work around 1:

Write the string to a file.

>>> with open('test.json', 'w') as f:
...     f.write(s)
>>> pd.read_json('test.json') 
                        0
0  http://www.example.com

Work around 2:

Wrap the string into a file-like object.

>>> import io
>>> pd.read_json(io.StringIO(s))
                        0
0  http://www.example.com

Versions

pandas 1.3.3 on Windows

>>> pd.show_versions()
INSTALLED VERSIONS
------------------
commit           : 73c68257545b5f8530b7044f56647bd2db92e2ba
python           : 3.9.7.final.0
python-bits      : 64
OS               : Windows
OS-release       : 10
Version          : 10.0.19043
machine          : AMD64
processor        : AMD64 Family 25 Model 33 Stepping 0, AuthenticAMD
byteorder        : little
LC_ALL           : None
LANG             : None
LOCALE           : English_Australia.1252

pandas           : 1.3.3
numpy            : 1.21.2
pytz             : 2021.1
dateutil         : 2.8.2
pip              : 21.2.4
setuptools       : 58.0.4
Cython           : 0.29.24
pytest           : None
hypothesis       : None
sphinx           : 4.2.0
blosc            : None
feather          : None
xlsxwriter       : None
lxml.etree       : None
html5lib         : None
pymysql          : None
psycopg2         : None
jinja2           : 3.0.1
IPython          : 7.27.0
pandas_datareader: None
bs4              : None
bottleneck       : None
fsspec           : 2021.08.1
fastparquet      : None
gcsfs            : None
matplotlib       : 3.4.3
numexpr          : 2.7.3
odfpy            : None
openpyxl         : None
pandas_gbq       : None
pyarrow          : None
pyxlsb           : None
s3fs             : None
scipy            : 1.7.1
sqlalchemy       : None
tables           : 3.6.1
tabulate         : None
xarray           : None
xlrd             : None
xlwt             : None
numba            : None

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugDuplicate ReportDuplicate issue or pull requestIO JSONread_json, to_json, json_normalize

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions