-
-
Notifications
You must be signed in to change notification settings - Fork 31.6k
Trailing whitespace in json dump when using indent #60537
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
When using the indent option in json.JSONEncoder, extra trailing whitespace (preceding the newline) is added to list and dict items. For example: >>> import json
>>> json.dumps(['foo', 'bar'], indent=1)
'[\n "foo", \n "bar"\n]' Notice the blank space between "foo", and \n EXPECTED OUTPUT: '[\n "foo",\n "bar"\n]' |
Thanks for the report and patch. I think we'll need a fix for the C version, too (unless it doesn't have the bug). |
I think the patch adds tests to a mixin class used to test the Python and the C code, so if it passes then no fix is needed in C. |
It does, and the C accelerator doesn't seem to be used for this.
|
To Ezio's first point ... yes, I was wondering if trailing whitespace should be left alone in the case of an explicitly defined separator. However, this behavior seems a little odd to me as I don't see a use case for it (happy to change the patch if there are differing opinions on this). Keep in mind that the patch does retain the *leading* white space (for both default and explicit separators) - only trailing whitespace is removed prior to newlines. |
Nice conflict case for explaining why perfect API is not here. Because ', ' is now used as an item separator inline and separator for items that span multiple lines, we get trailing whitespace. If ',' is used, items will collide. To resolve this conflict democratically, a new option should be added, but.. If it is a JSON module then what separator are we talking about in the first place? It is not CSV, other separators are not in specification, so why is it there? The only reason is to get the most compact representation. I doubt anybody uses any value except (',', ':'), so the As for this patch - there should be no doubt that it should be applied to all branches. The logic works only when indent is in force and that already breaks custom separators by inserting indents and newlines. Other argument that nobody will do JSON parsing with whitespace analysis, and even if they do - they'll still need to implement workaround for newlines. |
I don't think the code should fixed. You can use separators=(',', ': ') with indentation. Perhaps the documentation should contains such suggestion. |
Would you mind providing some counter-arguments? |
Trailing whitespace produce visual warnings in diff comparison tools. If you have to read docs on how fix every piece of code that produces readable JSON to avoid this warnings, it is a very-very bad user experience. The argument that by default the output with indent option should be human-friendly. Don't you agree with that? |
What you think about default separators will be (', ', ': ') if indent is None and (',', ': ') if indent is not None? This will allow indent options be human-friendly and keep the flexibility to specify arbitrary separators if they needed. |
The proposed patch changes default *separators* value to (',', ': ') if *indent* is not None. |
',' makes lists less readable, directly the opposite of what the *indent* option is for. The *separators* variable is a insufficient solution, because it was not designed to work with indents. Therefore the original solution to strip trailing space when indent is active is a perfect intuitive default. |
Patch updated. Changing note about YAML is not needed, JSON lefts YAML-compatible even with identation. |
',' used by default only when indentation used. It increases readability. |
Of course, this is a new feature and should be only in 3.4. |
Do you mean that when indentation is used, the separator only appears on line ends? Otherwise I can see how,is,that,more readable, than, that.
No of course. =) Trailing whitespace is a usability bug. It doesn't affect anything. Produced JSON stays JSON. |
I agree with Serhiy, this should be going in 3.4 only. The reason is that people might rely on exact output and it's not nice to break their code in a bugfix release. |
For older Python we need add in in the documentation the suggestion to use "separators=(',', ': ')" when indentation used. |
Apparently: ./python -c 'from json import dumps; print(dumps([[[1,2,3]]*3]*3, indent=2))' |
This code has a very bad smell. Between supporting people who did that and teaching them a lesson I choose the latter, but I bet the situation like you describe either didn't exist or easily fixable on their side. If it's not fixable, then there is always a virtualenv and previous versions. That's about sympathy. Now about technical side of conservative development. There is no promise of binary compatibility for pretty-printed data. Python never made pretty-prints a serialization format. If you insist that people rely on this behavior, let's document it, because at least on this tracker there are already two people who have questions about that. |
"Code smell" and "Easily fixable on their side" are not an arguments that apply here. We have a strong backward compatibility policy that strives not to break working code in bug fix releases. And yes, this means that there are sometimes bugs that we only fix in feature releases. |
The problem with policy and 'common sense' is that not everybody can feel that 'common sense', especially when there is no time to go deep into the issue. Policy is a quick and good 42 no matter that the matter is. The problem that it is also a filter, which puts a barrier in front of all reasonable arguments. You have either to transform yourself to live behind the barrier or to leave. |
Here is a patch which adds a suggestion to use appropriate separators with indent. It also use they in Lib/json/tool.py. I suggest this patch only for old Python, up to 3.3. For 3.4 this is not needed if my previous suggestion will be accepted. |
This was the reason that made me consider backporting this on the other branches. After all I expect this feature to be used from the terminal, while printing JSON in a human-friendly format and in other situations were trailing spaces would have little or no importance. IOW the annoyance of having trailing spaces if it doesn't get fixed evens out the annoyance of having a possibly unwanted change of behavior if it does get fixed. Serhiy patches look good to me (modulo a couple of minor typos), so I will probably apply them as soon as I get a chance. |
Yes, I think the risk of breaking doctests (and breaking them in a very mysterious way...trailing spaces) is high enough that we shouldn't backport the fix, unfortunately. |
I agree with RDM. |
Please left 2.7, 3.2 and 3.3 for documentation changes (the second patch). |
New changeset 78bad589f205 by Ezio Melotti in branch '2.7': New changeset 2a5b183ac3cd by Ezio Melotti in branch '3.2': New changeset 9d6706b6b482 by Ezio Melotti in branch '3.3': New changeset 8b30a764b58d by Ezio Melotti in branch 'default': New changeset e63ac05ccfa8 by Ezio Melotti in branch 'default': |
New changeset e7919cf9b5e5 by Ezio Melotti in branch 'default': |
I committed the patches leaving out the json.tool changes. |
New changeset 1e3b01c52aee by Ezio Melotti in branch 'default': |
Ezio, I noticed in the check-in mail that you added the note before the description of the separators argument (specifically, the descrition of what it does when it’s a tuple). Maybe it would flow best to move the note after that part? Also, I don’t think this is important enought that it warrants a note directive. Regular text would have looked good to me. Kudos for the patch otherwise; I love it when we can at least give workarounds and recipes in stable versions’ docs. |
The note is just after the description of the indent argument, because it's relevant only when a indent value is specified.
I considered this, but I think it's ok. I built the doc and the note is not too bad. The other two alternatives were: 1) use a normal paragraph, but that would have broken the flow, since each paragraph describes a different arg; 2) write it in the same paragraph of "indent", but that would have made it less visible. FTR I also closed the related issues bpo-16476 and bpo-16549. |
All right. :) |
New changeset bb43e8e05a7c by R David Murray in branch 'default': |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: