Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Write unescaped unicode characters (Japanese, Chinese, etc) in JSON module when "ensure_ascii=False" #66890

Closed
MichaelKuss mannequin opened this issue Oct 22, 2014 · 6 comments
Labels
extension-modules C modules in the Modules dir topic-unicode type-feature A feature request or enhancement

Comments

@MichaelKuss
Copy link
Mannequin

MichaelKuss mannequin commented Oct 22, 2014

BPO 22701
Nosy @vstinner, @ezio-melotti, @bitdancer, @serhiy-storchaka

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields:

assignee = None
closed_at = <Date 2015-02-10.08:43:39.384>
created_at = <Date 2014-10-22.18:55:23.569>
labels = ['extension-modules', 'type-feature', 'expert-unicode']
title = 'Write unescaped unicode characters (Japanese, Chinese, etc) in JSON module when "ensure_ascii=False"'
updated_at = <Date 2015-02-10.08:43:39.383>
user = 'https://bugs.python.org/MichaelKuss'

bugs.python.org fields:

activity = <Date 2015-02-10.08:43:39.383>
actor = 'serhiy.storchaka'
assignee = 'none'
closed = True
closed_date = <Date 2015-02-10.08:43:39.384>
closer = 'serhiy.storchaka'
components = ['Extension Modules', 'Unicode']
creation = <Date 2014-10-22.18:55:23.569>
creator = 'Michael.Kuss'
dependencies = []
files = []
hgrepos = []
issue_num = 22701
keywords = []
message_count = 6.0
messages = ['229830', '229834', '230365', '230417', '230421', '231994']
nosy_count = 5.0
nosy_names = ['vstinner', 'ezio.melotti', 'r.david.murray', 'serhiy.storchaka', 'Michael.Kuss']
pr_nums = []
priority = 'normal'
resolution = 'works for me'
stage = None
status = 'closed'
superseder = None
type = 'enhancement'
url = 'https://bugs.python.org/issue22701'
versions = ['Python 2.7', 'Python 3.3']

@MichaelKuss
Copy link
Mannequin Author

MichaelKuss mannequin commented Oct 22, 2014

When running the following:

> json.dump(['name': "港区"], myfile.json, indent=4, separators=(',', ': '), ensure_ascii=False)

the function escapes the unicode, even though I have explicitly asked to not force to ascii:
\u6E2F\u533A

By changing "__init__.py" such that the fp.write call encodes the text as utf-8, the output json file displays the human-readable text required (see below).

OLD (starting line 167):

if (not skipkeys and ensure_ascii and
check_circular and allow_nan and
cls is None and indent is None and separators is None and
encoding == 'utf-8' and default is None and not kw):
iterable = _default_encoder.iterencode(obj)
else:
if cls is None:
cls = JSONEncoder
iterable = cls(skipkeys=skipkeys, ensure_ascii=ensure_ascii,
check_circular=check_circular, allow_nan=allow_nan, indent=indent,
separators=separators, encoding=encoding,
default=default, **kw).iterencode(obj)
for chunk in iterable:
fp.write(chunk)

NEW:

if (not skipkeys and ensure_ascii and
check_circular and allow_nan and
cls is None and indent is None and separators is None and
encoding == 'utf-8' and default is None and not kw):
iterable = _default_encoder.iterencode(obj)
for chunk in iterable:
fp.write(chunk)
else:
if cls is None:
cls = JSONEncoder
iterable = cls(skipkeys=skipkeys, ensure_ascii=ensure_ascii,
check_circular=check_circular, allow_nan=allow_nan, indent=indent,
separators=separators, encoding=encoding,
default=default, **kw).iterencode(obj)
for chunk in iterable:
fp.write(chunk.encode('utf-8'))

@MichaelKuss MichaelKuss mannequin added extension-modules C modules in the Modules dir topic-unicode type-feature A feature request or enhancement labels Oct 22, 2014
@bitdancer
Copy link
Member

If I fix your example so it runs:

json.dump({'name': "港区"}, open('myfile.json', 'w'), indent=4, separators=(',', ': '), ensure_ascii=False)

I get the expected output:

rdmurray@pydev:~/python/p34>cat myfile.json
{
"name": "港区"
}

That example won't work in python2, of course, so you'd have to show us your actual code there.

@ezio-melotti
Copy link
Member

The example works for me with both python 2 and 3. I'm going to close this in a while if OP doesn't reply.

$ python2 -c "import json; json.dump({'name': '港区'}, open('py2.json', 'w'), indent=4, separators=(',', ': '), ensure_ascii=False)" && cat py2.json
{
    "name": "港区"
}
$ python3 -c "import json; json.dump({'name': '港区'}, open('py3.json', 'w'), indent=4, separators=(',', ': '), ensure_ascii=False)" && cat py3.json
{
    "name": "港区"
}

@MichaelKuss
Copy link
Mannequin Author

MichaelKuss mannequin commented Nov 1, 2014

Pardon the delay - this json dump function is embedded in a much larger script, so it took some untangling to get it running on Python 3.3, and scrub some personal identifying info from it. This script also does not work in Python 3.3:

File "C:/Users/mkuss/PycharmProjects/TestJSON\dump_list_to_json_file.py", line 319, in dump_list_to_json_file
json.dump(addresses, outfile, indent=4, separators=(',', ': '))
File "C:\Python33\lib\json\init.py", line 184, in dump
fp.write(chunk)
TypeError: 'str' does not support the buffer interface

In python 2.7, I still get escaped unicode when I try writing this dictionary using json.dump, so the work-around that I pasted originally is how I'm choosing to accomplish the task for now.

I'd you'd like, I can spend more time debugging this issue I'm running into running the script in python 3.3, but it maybe be til next week when I have sufficient time to solve. THANKS --mike

@bitdancer
Copy link
Member

That error message indicates you've opened the output file in binary mode instead of text mode.

@serhiy-storchaka
Copy link
Member

Looks either you have opened a file with the backslashreplace error handler or ran Python with PYTHONIOENCODING which sets the backslashreplace error handler.

@ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
extension-modules C modules in the Modules dir topic-unicode type-feature A feature request or enhancement
Projects
None yet
Development

No branches or pull requests

3 participants