Diffing and merging notebooks #8

martinal · 2015-12-14T13:07:46Z

No description provided.

ellisonbg · 2015-12-16T01:50:37Z

notebook-diff/notebook-diff.md

+* Command line tools for interactive resolution of merge conflicts
+* Make the merge tool git compatible
+* Make a web gui for displaying notebook diffs
+* Make a web gui for interactive resolution of merge conflicts


I want to make sure that the web gui is fully integrated with Jupyter. Otherwise it will be very difficult for this to gain adoption, especially on systems (remote JupyterHub) where users only have access to the system through Jupyter in the browser.

For the most part, I think this should be standalone, but it should at least play well with the new UI. I would expect most cases to be triggered by command-line calls, with a web view that only lives for a short period of time, dedicated to one diff view. If we properly separate how to request a diff of two notebooks from how they are displayed, adding a UI to request a diff from the web shouldn't be difficult, but I also think it shouldn't be the priority.

We should really have similar APIs for requesting diffs and merges from python, javascript and commandline. If the UI can be built as extensions to other JupyterLab work, that sounds great to me.

@minrk I don't quite understand how to use this pull request for discussion. Should I add text to the document?

minrk · 2016-02-11T14:06:24Z

@ellisonbg @fperez should we merge this enhancement proposal as "accepted"? We are working on the nbdime repo, and can iron out the details of implementation later, but I think the "we are going to do something about this" bar is cleared, yes?

rgbkrk · 2016-02-11T21:31:21Z

I sure think we should merge this as is and keep iterating.

ellisonbg · 2016-02-11T23:31:29Z

+1

On Thu, Feb 11, 2016 at 1:31 PM, Kyle Kelley notifications@github.com
wrote:

I sure think we should merge this as is and keep iterating.

—
Reply to this email directly or view it on GitHub
#8 (comment)
.

Brian E. Granger
Associate Professor of Physics and Data Science
Cal Poly State University, San Luis Obispo
@ellisonbg on Twitter and GitHub
bgranger@calpoly.edu and ellisonbg@gmail.com

fperez · 2016-02-11T23:36:29Z

notebook-diff/notebook-diff.md

+reuse as much as possible from jupyter notebook. An initial
+version of conflict resolution can be to output a notebook with
+conflicts marked within cells, to be manually edited as a regular
+jupyter notebook.


Even though below it says that merge of arbitrary output isn't in scope, it should clarify what it will produce in the scenario where the input doesn't change but the output does. We don't have tools to edit output, nor to present multiple outputs and delete them selectively. So this important edge case will need to be handled.

fperez · 2016-02-11T23:45:57Z

Yup @minrk , "we are doing something about this" :) But one key thing that seems missing to me (and I don't find it in the nbdime repo either) is a high-level description of the approach taken.

To me, the notebook diff problem seems separable in two phases: identification of changes to the 'container' by finding cell insertions, deletions, transpositions, moves, etc, and then within-cell diffing, that then becomes content/mimetype-specific. This is a more complex version of the problem of finding the location of changes in a linear pure-text file, and then applying the local diff itself.

That's the approach that seems natural to me, but if that's not the case, then what approach is being used should be described.

Second, I think at least mention of what happens to metadata should be made. We have a lot of metadata in the notebook, and there can be differences therein.

So, while I apologize for the delay, I'm -1 on merging this document until these high-level ideas are a bit better described. While I realize that the actual implementation in the nbdime repo is where the 'real work' happens, I think it's also important that the high-level description is reasonably complete.

It shouldn't be too difficult to add a bit of language to that effect, and the document in the long run will be much more valuable. We want our JEPs to serve a similar role to Python's PEPs, in that they provide reasonable stand-alone descriptions of the problem and the proposed solution, that one can read to understand the whole thing at a high-level without digging into the implementation itself.

willingc · 2016-02-12T01:19:38Z

FYI for those that would like more info on Python's PEP process. It's detailed in PEP-0001 https://www.python.org/dev/peps/pep-0001/#what-belongs-in-a-successful-pep

There is also a shorter description of the PEP process in the Python Dev Guide https://docs.python.org/devguide/langchanges.html

Anecdotally, the Achille's heel of the PEP process is that it is a slow slog with long periods of inactivity before acceptance of a PEP. Given Jupyter's rapid pace of development, I wonder if the PEP model, which is more a gatekeeper process than rapid innovation, is the best analogy. I love the focus on openness and transparency for JEPs; I would find it sad to see development flow ebb, due to PEP process inefficiences.

Perhaps "concept" status (which would equate to 5 or so committers/community members thinking it is worth exploring) would be a reasonable step before "draft" status; the benefit would be a place for a document to be worked on in the open and versioned while giving people an opportunity to iterate on the "concept" doc toward "draft" status. 🐢 🐰 🏃

minrk · 2016-02-12T13:02:14Z

@willingc that's a good point, and I think one thing that helps the JEP process be perhaps less cumbersome to the project as a whole is that relatively few changes warrant a JEP (e.g. a totally new project, like this one). The vast majority of changes, even relatively large ones, can go through the regular PR process.

@fperez thanks for the feedback! It's AOK to add things to do here, there just weren't any todo items for a while, so I thought I'd ping for a merge. Perhaps I should have phrased it as "Should we merge, or are there more things people would like added?" We'll get right on addressing your points.

martinal · 2016-02-12T16:26:33Z

@fperez Indeed, we have an algorithm for matching up cells like you describe. We'll write up a high level description for the jpep and document the currently implemented variant better. Probably last half of next week unless @minrk beats me to it.

fperez · 2016-02-12T20:45:58Z

Indeed, the Python PEP process can be too slow sometimes, but I also think that for key ideas it can be useful to stop and think carefully through the key implications. We don't need to set the bar that high, and can still allow for iteration/evolution, but I think it's also good practice to have the base document in a reasonably complete state of ideas before merge.

Thanks @minrk and @martinal for responding, I think with a few such changes, the overall description will be much improved and will help others understand the direction of these ideas. Nothing prevents the actual implementation from evolving further and even changing this initial description, of course.

* Provided a high-level description of the diff algorithm approach. * Updated with the major changes to the diff format that nbdime already implements (mostly copy-paste from current nbdime docs). * Many smaller updates.

Update notebook-diff.md

martinal · 2016-02-23T14:00:47Z

I've updated the JEP in some significant ways, recommend reading through the new version instead of looking at the diffs. Hope this covers the most important things. I believe the questions not covered here are best handled as specific issues towards the nbdime repository.

fperez · 2016-02-23T19:18:18Z

Thanks @martinal! Much better indeed.

The only thing I see missing is at least some acknowledgment of the metadata issue and some indication of what your strategy there would be. It doesn't need to be too detailed, and I'm sure it will evolve. But metadata is pervasive to the notebook, and will be the source of interesting questions and technical challenges, so at least these questions should be acknowledged here for the proposal to really cover the key points of the problem.

Once that is done, I'm happy merging. Thanks again!

bollwyvl · 2016-02-23T20:49:31Z

For the Big Ol Unstructured JSON Data Tree,
https://github.com/benjamine/jsondiffpatch is pretty compelling. Demo here:
http://benjamine.github.io/jsondiffpatch/demo/index.html

They, too, offer yet another transformation language:
https://github.com/benjamine/jsondiffpatch/blob/master/docs/deltas.md

Reminds me of a certain XKCD...

On Tue, Feb 23, 2016 at 2:18 PM Fernando Perez notifications@github.com
wrote:

Thanks @martinal https://github.com/martinal! Much better indeed.

The only thing I see missing is at least some acknowledgment of the
metadata issue and some indication of what your strategy there would be. It
doesn't need to be too detailed, and I'm sure it will evolve. But metadata
is pervasive to the notebook, and will be the source of interesting
questions and technical challenges, so at least these questions should be
acknowledged here for the proposal to really cover the key points of the
problem.

Once that is done, I'm happy merging. Thanks again!

—
Reply to this email directly or view it on GitHub
#8 (comment)
.

martinal · 2016-02-24T11:52:08Z

@fperez I'm not quite sure what "the metadata issue" is. Basically the plan is to treat it as a json blob in general, for which diff and merge basically works fine using already functional generic tools. @minrk has mentioned there are cases where we can know more about what the metadata means, in which case we can add options to do validation, use left, use right, etc. For visualization of arbitrary metadata, the demo @bollwyvl links to looks pretty nice.

@bollwyvl I think I saw this project before and rejected it because the way they approach comparing items in array diffs is too limiting. Also note that their diff format is even more terse than my original format ;) We should make a note of checking if they do something smart though.

fperez · 2016-02-24T15:48:16Z

@martinal, it's possible e.g. to have metadata changes in cells that are otherwise identical. But our UI doesn't by default display this explicitly, and for most users this is "hidden state". So the system will need to surface these changes and handle them in parallel to input/output, which is what users probably think most of the time about. It's not so much a technical problem (I agree with you, they are just another JSON blob), more of a UI/UX/workflow one.

But you don't need much more, just a quick pointer of what your plan is there for now, so we flag the issue. The implementation details can emerge in development.

martinal · 2016-02-25T08:01:49Z

@fperez so I see this as a two part problem.

One is computing the diff (the merge follows), where the idea so far has been to make source code the primary identifier for cell identity, output a secondary identifier, and metadata ignored at the moment. There is a framework in place for implementing a sequence of predicates that include more or less of the cell contents to determine if they're to be treated as 'the same cell'. This can be extended to include e.g. edit distance of metadata blobs at the stricter equality levels, while ignoring metadata at the lower strictness. At the least strict level any change to output and metadata is ignored as long as source is approximately equal. These predicates will need testing and experimenting to make the algorithm behave sensibly.

The other is the UX part. In the diff visualization there can be for example another box above, between or below cell source and output, showing metadata conflicts by showing the json tree, one line for each leaf key:value, aligned with corresponding keys between left/right view. For somewhat structured data I think a simple approach like this could be the best. Either the user can edit the json directly, and we validate it on the fly, or a UI is built to handle editing in a safer way.

add note about displaying metadata

minrk · 2016-03-08T10:05:30Z

@fperez I added a note about diffing metadata. Anything left to do on this?

fperez · 2016-03-08T13:57:20Z

Great, thanks! Merging now.

High-level description of a toolchain for diffing and merging of notebooks. Currently prototype implementation exists as the nbdime project.

fperez · 2016-03-08T13:58:37Z

Done, thanks everyone for patiently taking feedback into consideration.

minrk · 2016-03-09T20:19:28Z

Thanks!

Add JEP notebook-diff.

71b2467

minrk changed the title ~~Add JEP notebook-diff.~~ Diffing and merging notebooks Dec 14, 2015

ellisonbg reviewed Dec 16, 2015
View reviewed changes

This was referenced Dec 17, 2015

Implement conversion to JSON Patch format jupyter/nbdime#15

Open

D4.6: Tools for collaborating on notebooks via version-control OpenDreamKit/OpenDreamKit#95

Closed

fperez reviewed Feb 11, 2016
View reviewed changes

Martin Sandve Alnæs added 2 commits February 23, 2016 14:55

Update notebook-diff.md

afc7615

* Provided a high-level description of the diff algorithm approach. * Updated with the major changes to the diff format that nbdime already implements (mostly copy-paste from current nbdime docs). * Many smaller updates.

Merge pull request #1 from martinal/notebook-diff-v2

48dea94

Update notebook-diff.md

minrk and others added 2 commits March 8, 2016 09:47

add note about displaying metadata

c9dcdf6

Merge pull request #2 from minrk/nbdiff

7ea2c63

add note about displaying metadata

fperez added a commit that referenced this pull request Mar 8, 2016

Merge pull request #8 from martinal/master

340e9d8

High-level description of a toolchain for diffing and merging of notebooks. Currently prototype implementation exists as the nbdime project.

fperez merged commit 340e9d8 into jupyter:master Mar 8, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Diffing and merging notebooks #8

Diffing and merging notebooks #8

martinal commented Dec 14, 2015

ellisonbg Dec 16, 2015

minrk Dec 16, 2015

martinal Dec 17, 2015

martinal Dec 23, 2015

minrk commented Feb 11, 2016

rgbkrk commented Feb 11, 2016

ellisonbg commented Feb 11, 2016

fperez Feb 11, 2016

martinal Feb 12, 2016

fperez commented Feb 11, 2016

willingc commented Feb 12, 2016

minrk commented Feb 12, 2016

martinal commented Feb 12, 2016

fperez commented Feb 12, 2016

martinal commented Feb 23, 2016

fperez commented Feb 23, 2016

bollwyvl commented Feb 23, 2016

martinal commented Feb 24, 2016

fperez commented Feb 24, 2016

martinal commented Feb 25, 2016

minrk commented Mar 8, 2016

fperez commented Mar 8, 2016

fperez commented Mar 8, 2016

minrk commented Mar 9, 2016

Diffing and merging notebooks #8

Diffing and merging notebooks #8

Conversation

martinal commented Dec 14, 2015

ellisonbg Dec 16, 2015

Choose a reason for hiding this comment

minrk Dec 16, 2015

Choose a reason for hiding this comment

martinal Dec 17, 2015

Choose a reason for hiding this comment

martinal Dec 23, 2015

Choose a reason for hiding this comment

minrk commented Feb 11, 2016

rgbkrk commented Feb 11, 2016

ellisonbg commented Feb 11, 2016

fperez Feb 11, 2016

Choose a reason for hiding this comment

martinal Feb 12, 2016

Choose a reason for hiding this comment

fperez commented Feb 11, 2016

willingc commented Feb 12, 2016

minrk commented Feb 12, 2016

martinal commented Feb 12, 2016

fperez commented Feb 12, 2016

martinal commented Feb 23, 2016

fperez commented Feb 23, 2016

bollwyvl commented Feb 23, 2016

martinal commented Feb 24, 2016

fperez commented Feb 24, 2016

martinal commented Feb 25, 2016

minrk commented Mar 8, 2016

fperez commented Mar 8, 2016

fperez commented Mar 8, 2016

minrk commented Mar 9, 2016