-
-
Notifications
You must be signed in to change notification settings - Fork 19.4k
Closed
Labels
Duplicate ReportDuplicate issue or pull requestDuplicate issue or pull requestIO JSONread_json, to_json, json_normalizeread_json, to_json, json_normalize
Description
I work with a byte stream (from Azure DataLake https://pypi.python.org/pypi/azure-datalake-store/0.0.19 which only supports byte stream) that has a UTF-8 byte order mark, and want to read it into a data frame.
pandas.read_json fails.
For comparison, pd.read_csv(file, lines=True, encoding='utf-8-sig') works fine with a similar file
import pandas as pd
def skip_utf_8_bom(file):
bom = file.read(3)
# print(bom)
if bom != b'\xef\xbb\xbf': # undo read
file.seek(len(bom), 1)
path = 'sample-utf-8-sig.txt'
#works
with open(path, 'rb') as file:
skip_utf_8_bom(file)
df = pd.read_json(file, lines=True, encoding='utf-8-sig')
df
#fails
with open(path, 'rb') as file:
df = pd.read_json(file, lines=True, encoding='utf-8-sig')
dfProblem description
pd.read_json seems not to be able to process the encoding='utf-8-sig' parameter.
Expected behavior is that it allows to work with byte streams with an utf-8 byte order mark
Metadata
Metadata
Assignees
Labels
Duplicate ReportDuplicate issue or pull requestDuplicate issue or pull requestIO JSONread_json, to_json, json_normalizeread_json, to_json, json_normalize