-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Xmlb, JSON] Issue with some special characters #8
Comments
After some tests with the test file, I found that the issue seems to be the decoder (utf-8):
The complete code starting at line 314 is as follows with input_path.open(mode='r', encoding='utf-8-sig') as input_file:
json_data = input_file.read() The solutions would be to use different decoder settings or ignore/replace problematic characters. Decoders can all produce the issue, depending on the input format, so using However, after some reading and tests, I came up with this (instead of above code): from chardet import detect
with input_path.open(mode='rb') as input_file:
raw_data = input_file.read()
json_data = raw_data.decode(encoding=detect(raw_data)['encoding']).replace("\r", "") # The decoding sometimes seems to add an extra carriage return Notes:
|
Steps to reproduce:
xmlb legal_360.json legal_360.engb
Error message:
Platform information:
(Note: Other JSON programs on my machine seem to have the same issue)
Things I tried:
Å
,é
, andá
in the source JSON file -> workedxmlb legal_360.xml legal_360.engb
-> workedThe text was updated successfully, but these errors were encountered: