-
-
Notifications
You must be signed in to change notification settings - Fork 29.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add an option to json.tool to bypass non-ASCII characters. #71600
Comments
This patch adds a command line option "--no-escape" that allows json.tool to display non-ASCII characters. e.g.: $ echo '"測試"' | python -m json.tool
"\u6e2c\u8a66"
$ echo '"測試"' | python -m json.tool --no-escape
"測試" |
Maybe name it --no-insure-ascii? Or --insure-ascii=no/yes. (or true/false). |
If the arguments should be aligned with those in dump/load, then maybe "--no-ensure-ascii" is an option? |
Sorry, yes, that's what I meant. I think it will make it easier to understand and remember if the option uses the same terminology as the function. |
Use "--no-ensure-ascii" instead. |
The patch needs tests and documentation.
I'd go with |
Ok, I'll update it later.
If I'm not misreading your comment, this will change the original behavior, right? (because options.no_ensure_ascii will be set to True by default) |
Added doc and test. |
Test fails on non-utf8 locale. $ LC_ALL=en_US ./python -m test.regrtest -v -m test_no_ensure_ascii_flag test_json
... ====================================================================== Traceback (most recent call last):
File "/home/serhiy/py/cpython/Lib/test/test_json/test_tool.py", line 115, in test_no_ensure_ascii_flag
self.assertEqual(out.splitlines(), b'"\\u6e2c\\u8a66"\n'.splitlines())
AssertionError: Lists differ: [b'"\\u00e6\\u00b8\\u00ac\\u00e8\\u00a9\\u00a6"'] != [b'"\\u6e2c\\u8a66"'] First differing element 0:
---------------------------------------------------------------------- |
Test passed with LC_ALL=en_US and LC_ALL=en_US.UTF-8 . I've tried to use locale.getdefaultlocale(), but seems the output string will vary in different locales. |
Assuming you also change ensure_ascii = not options.no_ensure_ascii to ensure_ascii = options.no_ensure_ascii no, it won't change the original behavior. That way you won't need to invert the value of |
The last change just sweeps a problem under a rug. For now json.tool never fails with valid data. But with the --no-ensure-ascii option it can fail when output a string not encodable with the locale encoding. All can work with common cases on common UTF-8 environment, but unexpectedly fail on nonstandard environment. It would be better to output encodable characters as is and represent unencodable characters with \uXXXX encoding. |
--no-ensure-ascii
option. #17472Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: