-
-
Notifications
You must be signed in to change notification settings - Fork 31.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
difflib: mention other "problematic" characters in documentation #87855
Comments
In the documentation you can currently read for the "?"-output: "These lines can be confusing if the sequences contain tab characters." From first hand experience :-), I can assure it is also very confusing for other types of whitespace characters, such as spaces and line breaks. I'd like to add the other characters to the documentation. |
The quote is in the following section. |
Lines beginning with "?" are entirely synthetic: they were not present in either input. So that's what that part means. I'm not clear on what else could be materially clearer without greatly bloating the text. For example, >>> d = difflib.Differ()
>>> for L in d.compare(["abcefghijkl\n"], ["a cxefghijkl\n"]):
print(L, end="")
- abcefghijkl
? ^
+ a cxefghijkl
? ^ + The "?" lines guide the eye to the places that differ: "b" was replaced by a blank, and "x" was inserted. The marks on the "?" lines are intended to point out exactly where changes (substitutions, insertions, deletions) occurred. If the second input had a tab instead of a blank, the "+" wouldn't _appear_ to be under the "x" at all. It would instead "look like" a long string of blanks was between "a" and "c" in the first input, and the "+" would appear to be under one of them somewhere near the middle of the empty space. Tough luck. Use tab characters (or any other kind of "goofy" whitespace) in input to visual tools, and you deserve whatever you get :-) |
After 3+ years of Github I did not remember that B&W diffs use lines with change position markers and in particular that at they (often? always?) start with ?s. IDLE also uses color to mark positions (for syntax errors). The following would have been clearer to me and likely to people who have never seen such lines. "Location marker lines beginning with ‘?’ use symbols to guide the eye to intraline differences." Tim, you seem to still think that tabs are especially problematical. Jürgen, without evidence otherwise, I agree with this. Adding other chars to the sentence would dilute the current focus on tabs. Hence my request for examples to justify doing so. Sorry I was not as clear as I could and should have been. |
First I need to apologize for not providing more info already when I created the issue. Initially, I did not even plan to create an issue, and thought the PR with the context of the current documentation would be sufficient information. Thanks for taking your time anyway! Also, thanks to Tim for explaining the meaning of the question mark in detail. When I read the documentation, I also had to pause a moment to understand the sentence. But I agree with Tim, it is hard to explain it better without getting much more verbose. My initial reason to read (and then to update) the documentation was an output of pytest, which left me puzzled. E AssertionError: assert 'ROOT: No tox...ith_no_t0/p\n' == 'ROOT: No tox..._with_no_t0/p' Here is the screenshot and some discussion: Using a similar snippet as Tim, here is a minimal example: for L in d.compare(["abcdefghijkl"], ["abcdefghijkl\n"]):
print(L)
? + Usually, the output is pretty obvious most of the time, so I never actually noticed the question mark - except when whitespace characters are involved. I was then told that pytest uses difflib, and I was kindly pointed to the Python documentation. As only the tab character was listed, I thought it would be a good idea to add the other whitespace characters as well. After Tim's explanation, I see, that tabs could be especially confusing, while all whitespace characters are on a normal level of confusing :-), especially at the end of the diff. I certainly won't forget what I learned, but maybe my proposal helps one fellow Python user or another. |
I have an alternate replacement: "These lines can be confusing if the sequences contain tab characters or other characters that result in the indicator symbols in these lines being mislocated." Or leave the current sentence as is. Explanation with the details omitted from the above: Tab is an example of a character that is either displayed as a variable space or a fixed double space ('\t') or larger. If we were to make a change, we should mention, as above, that many non-ascii chars are as especially confusing as tabs. In your example above, the caret at least points to the right space. It correctly indicates some difference beyond the visible end - a non-visible whitespace difference. |
Terry, your suggested replacement statement looks like an improvement to me. Perhaps the longer explanation could be placed in a footnote. Note that I'm old ;-) I grew up on plain old ASCII, decades & decades ago, and tabs are in fact the only "characters" I've had a problem with in doctests. But then, e.g., I never in my life used goofy things like ASCII "form feed" characters, or NUL bytes, or ... in text either. I don't use Unicode either, except to the extent that Python forces me to when I'm sticking printable ASCII characters inside string quotes ;-) |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: