Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add --compatible diff flag to output a diff more compatible with other tools #647

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

infokiller
Copy link

For context: dandavison/delta#1256

@infokiller
Copy link
Author

@vidartf did you get a chance to look at this? thanks!

@vidartf
Copy link
Collaborator

vidartf commented Nov 6, 2023

I don't really understand what this change is doing. If its trying to output the diff as a proper unified diff, we would need to map the json diffs back to line/character numbers in the original file. There is no trivial way to make this happen. So I assume this change is trying to make some compromise/middle-ground, but not sure exactly what. Please add more details to the PR / command's help key.

@infokiller
Copy link
Author

I don't really understand what this change is doing. If its trying to output the diff as a proper unified diff, we would need to map the json diffs back to line/character numbers in the original file. There is no trivial way to make this happen. So I assume this change is trying to make some compromise/middle-ground, but not sure exactly what. Please add more details to the PR / command's help key.

@vidartf sorry for the lack of details, I will try to explain the problem and if you are OK with the solution, I'll also update the CLI help. The problem is described in dandavison/delta#1256 which is an issue I opened because https://github.com/dandavison/delta (syntax highlighting for diffs) didn't work with nbdiff. You can look at the issue for the full details, where the original creator of unified diff also commented there). The TLDR is:

  • The output of nbdiff is not compliant with unified diff
  • Specifically, the problem with delta is the hunk header was missing (line starting with @@)
  • I think it's not strictly mandatory to have the correct line/char numbers to be considered "compliant", and in the case of a notebook I guess it may not be easy or even possible
  • The unified diff author suggested a different format which avoids ## lines in order for them to be recognized by unified diff parsers, see 🐛 does not work with nbdiff (from https://github.com/jupyter/nbdime) dandavison/delta#1256 (comment)

@vidartf
Copy link
Collaborator

vidartf commented Nov 10, 2023

Thanks for clarifying! Having an output that is more compatible with unified diff does indeed sound useful 👍 Retaining the current default, and putting the new behavior behind a flag sounds good. Since your main motivation here is to have it be parsed by other tools, I think it can be hard to change this output format after initial release. With that in mind, I would suggest the following:

  • change the name of the flag to represent the target, i.e. something with "unified diff" in it. Maybe a --diff-format=unified flag in case there are others to add in the future?
  • Add some unit tests. We have a decent suite of notebooks producing a large variety of different diffs, so ensuring they can all be parsed as unified diff seems like a good first step, and then we will also want to test the actual contents of a good few of these. We should probably add baseline testing support for all of our diff tests now that I think of it... (I.e. record the current diff outputs of diffs in the repo to pick up changes from future commits). We can add the baselining, but would appreciate help in testing for unified diff readability and sanity.

@infokiller
Copy link
Author

Thanks @vidartf for the quick and helpful response. You suggestions sound good. As for tests, I assume they should go into https://github.com/jupyter/nbdime/tree/master/nbdime/tests is that right?
Are there any specific tests that are good references for what you'd like to see?
Also, should these use as input the higher level ops, or raw notebook?

@vidartf
Copy link
Collaborator

vidartf commented Nov 10, 2023

Yes, I would probably make a new file, and then you could start with something like this:

def test_notebook_diff(any_nb_pair):
    "Test unified diff output on any pair of notebooks in the test suite."
    a, b = any_nb_pair
    diff = diff_notebooks(a, b)

    output = []
    class Printer:
            def write(self, text):
                output.append(text)

    argv = []  # your arguments for diff CLI here
    arguments = _build_arg_parser().parse_args(argv)
    config = prettyprint_config_from_args(args, out=Printer())
    pretty_print_notebook_diff(a.name, b.name, a, diff, config)

    assert "".join(output) == expected_output

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants