assertion diffs for multiline-string can become unreadable soup #6757

wimglenn · 2020-02-18T06:10:26Z

The diff on multiline strings when a string from capysys is differing from expected just by indentation level and/or some leading/trailing whitespace is not great, I think we can do better.

Here's a reproducer, I just dumped some output of fortune | cowsay into a raw string for example purposes. Intentionally wrong "expected" to demonstrate the issue: it adds 4 space indentation, i.e. hardcoded in a multiline string within a function body but forgetting to use textwrap.dedent on it.

actual = r"""
 ________________________________________
/ You have Egyptian flu: you're going to \
\ be a mummy.                            /
 ----------------------------------------
        \   ^__^
         \  (oo)\_______
            (__)\       )\/\
                ||----w |
                ||     ||
"""


def test_cowsay_output(capsys):
    print(actual)
    expected = r"""
     ________________________________________
    / You have Egyptian flu: you're going to \
    \ be a mummy.                            /
     ----------------------------------------
            \   ^__^
             \  (oo)\_______
                (__)\       )\/\
                    ||----w |
                    ||     ||
    """
    captured = capsys.readouterr()
    assert captured.out == expected

When running with increased verbosity -vv, it's often complicated to figure out what the issue is without stepping into a debugger. The initial assertion is on one line which may get wrapped in the terminal over multiple lines:

E       assert ('\n'\n ' ________________________________________\n'\n "/ You have Egyptian flu: you're going to \\\n"\n '\\ be a mummy.                            /\n'\n ' ----------------------------------------\n'\n '        \\   ^__^\n'\n '         \\  (oo)\\_______\n'\n '            (__)\\       )\\/\\\n'\n '                ||----w |\n'\n '                ||     ||\n'\n '\n') == ('\n'\n '     ________________________________________\n'\n "    / You have Egyptian flu: you're going to \\\n"\n '    \\ be a mummy.                            /\n'\n '     ----------------------------------------\n'\n '            \\   ^__^\n'\n '             \\  (oo)\\_______\n'\n '                (__)\\       )\\/\\\n'\n '                    ||----w |\n'\n '                    ||     ||\n'\n '    ')

However the output appears like the actual was a tuple ('\n'\n ' ____ ...

If you look closer you see it is not even valid Python syntax, just a confusing repr corruption, and the multi-line diff that follows isn't too readable either because of the way the changes are interspersed line by line.

How about an option for the user to suppress pytest's attempts to make a rich diff, and to just to print one, and then print the other, between some horizontal "margins" such as this?

==================================== ACTUAL ====================================

 ________________________________________
/ You have Egyptian flu: you're going to \
\ be a mummy.                            /
 ----------------------------------------
        \   ^__^
         \  (oo)\_______
            (__)\       )\/\
                ||----w |
                ||     ||

=================================== EXPECTED ===================================

     ________________________________________
    / You have Egyptian flu: you're going to \
    \ be a mummy.                            /
     ----------------------------------------
            \   ^__^
             \  (oo)\_______
                (__)\       )\/\
                    ||----w |
                    ||     ||
    
================================================================================

The text was updated successfully, but these errors were encountered:

nicoddemus · 2020-02-20T00:12:06Z

How about an option for the user to suppress pytest's attempts to make a rich diff, and to just to print one, and then print the other, between some horizontal "margins" such as this?

I like this presentation, but I would like to avoid another option if possible.

How about if we have a threshold for differences based on the total number of characters (say 20%+), in which case we fallback to not displaying a diff but show texts horizontally as you suggest?

wimglenn · 2020-02-20T01:11:56Z

That sounds ok. Or perhaps there can just be presentation included, in addition, after the rich diff for high enough verbosity levels.

Something that just occurred to me, does pytest have any opinion, introspection, or convention about which is the "actual" and which is the "expected"? I suppose it might have to be just be presented "left" and then "right".

blueyed · 2020-02-20T01:32:06Z

One good heuristic here would also be to try with indentation removed (textwrap.dedent), and report it as "matches with different indent".

Ratio is also good, of course.

FWIW I have this in my conftest to print more with verbosity 3 - very helpful to adjust existing tests:

@pytest.hookimpl(hookwrapper=True)
def pytest_assertrepr_compare(config: Config, op: str, left: Any, right: Any) -> Optional[List[str]]:
    verbose = config.option.verbose

    outcome = yield
    result = outcome.get_result()  # type: Optional[List[str]]
    if not result:
        return
    lines = result[0]

    …

    if verbose > 2:
        import pprint
        full = pprint.saferepr(right).splitlines()
        output = [
            "Expected:",
        ] + full
        lines.extend(output)
        full = pprint.saferepr(left).splitlines()
        output = [
            "Actual:",
        ] + full
        lines.extend(output)
        print(left)

    outcome.force_result(result)

does pytest have any opinion, introspection, or convention about which is the "actual" and which is the "expected"? I suppose it might have to be just be presented "left" and then "right".

See #3333 / d59adc6

piotrhm · 2020-04-15T13:48:17Z

I played a bit with different outputs for large strings. Here are my thoughts:

Making something similar to:

Increase usability only for string with reasonable size. If we want to keep this convention we should batch output - it would be more readable.

Side by side approach. Base version:

Version with "?" part:

Definitely more complicated to implement but represents the best readability.

I am interested in solving this issue. Which version is more suitable?

wimglenn · 2020-04-15T19:39:21Z

My preference would be for a vertical presentation, not side-by-side. It has a few wins in basic practicality

ease of copy-pasting the content
no complications for long lines (side-by-side necessitates double the terminal width avail, or horizontal scrollbar)
simpler to implement

I also don't want to see any extra diff characters inline like "E" on the start of the line or stuff like "?", "-", "++++" within the presentation.

wimglenn · 2024-06-01T18:42:19Z

It is a shame that the PR stalled.

I have this workaround in conftest.py which may be useful for others finding the issue:

def pytest_assertrepr_compare(config, op, left, right):
    # https://docs.pytest.org/en/latest/reference/reference.html#pytest.hookspec.pytest_assertrepr_compare
    if isinstance(left, str) and isinstance(right, str) and op == "==":
        left_lines = left.splitlines(keepends=True)
        right_lines = right.splitlines(keepends=True)
        if len(left_lines) > 1 or len(right_lines) > 1:
            width, _ = shutil.get_terminal_size(fallback=(80, 24))
            width = max(width, 40) - 10
            lines = [
                "When comparing multiline strings:",
                f" LEFT ({len(left)}) ".center(width, "="),
                *left_lines,
                f" RIGHT ({len(right)}) ".center(width, "="),
                *right_lines,
            ]
            return lines

Pierre-Sassoulas · 2024-06-01T20:17:10Z

I searched for possible diff algorithms to replace the default difflib algo for the final result. Any of git diff 4 base algorithms (myers, minimal, patience or histogram) will highlight each line of the string (effectively displaying all the lines of the old string above all the line of the new string). Using git --word-diff will show nothing in the given example (white spaces are ignored). This or as blueeyed said above, textwrap.dedent being equal, could be another indicator that yet another diff algo is required. In the case of indentation issue, using something like git diff --word-diff-regex="[ ]+|[^ ]+" you can get the following diff :

There's specialized diff plugins for git (https://github.com/so-fancy/diff-so-fancy), but I did not check them in detail. Adding a fancy diff for each case might not be sufficiently better than simply displaying both strings alongside each other (what a git diff would do) when the ratio of diff to display become high.

blueyed added topic: rewrite related to the assertion rewrite mechanism type: enhancement new feature or API change, should be merged into features branch labels Feb 18, 2020

piotrhm mentioned this issue Apr 19, 2020

Issue 6757 - Improved diff output for similar strings. #7099

Closed

altendky mentioned this issue Apr 28, 2020

Weird output for long string equality assert diffs #7127

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

assertion diffs for multiline-string can become unreadable soup #6757

assertion diffs for multiline-string can become unreadable soup #6757

wimglenn commented Feb 18, 2020 •

edited

Loading

nicoddemus commented Feb 20, 2020

wimglenn commented Feb 20, 2020

blueyed commented Feb 20, 2020

piotrhm commented Apr 15, 2020 •

edited

Loading

wimglenn commented Apr 15, 2020 •

edited

Loading

wimglenn commented Jun 1, 2024

Pierre-Sassoulas commented Jun 1, 2024

assertion diffs for multiline-string can become unreadable soup #6757

assertion diffs for multiline-string can become unreadable soup #6757

Comments

wimglenn commented Feb 18, 2020 • edited Loading

nicoddemus commented Feb 20, 2020

wimglenn commented Feb 20, 2020

blueyed commented Feb 20, 2020

piotrhm commented Apr 15, 2020 • edited Loading

wimglenn commented Apr 15, 2020 • edited Loading

wimglenn commented Jun 1, 2024

Pierre-Sassoulas commented Jun 1, 2024

wimglenn commented Feb 18, 2020 •

edited

Loading

piotrhm commented Apr 15, 2020 •

edited

Loading

wimglenn commented Apr 15, 2020 •

edited

Loading