Skip to content

Commit

Permalink
Update diff format documentation.
Browse files Browse the repository at this point in the history
  • Loading branch information
Martin Sandve Alnæs committed Jan 29, 2016
1 parent 48db461 commit 68ffd54
Show file tree
Hide file tree
Showing 4 changed files with 76 additions and 62 deletions.
8 changes: 4 additions & 4 deletions docs/source/cli.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,10 +2,10 @@
Commandline interface
=====================

Nbdime adds three commands to jupyter. See
Nbdime provides three CLI commands. See

jupyter nbdiff --help
jupyter nbpatch --help
jupyter nbmerge --help
nbdiff --help
nbpatch --help
nbmerge --help

for usage details.
120 changes: 63 additions & 57 deletions docs/source/diffing.rst
Original file line number Diff line number Diff line change
@@ -1,78 +1,84 @@
============================================================
A description of the recursive diff representation in nbdime
============================================================
================================================
Description of the diff representation in nbdime
================================================

Note: The diff format herein is considered experimental until
development stabilizes. If you have objections or opinions on
the format, please raise them ASAP while the project is in its
early stages.

In the traditional diff of two sequences X and Y, the diff is a
sequence of transformations turning X into Y.
*Note: The diff format herein is still considered experimental until
development stabilizes. If you have objections or opinions on the
format, please raise them ASAP while the project is in its early
stages.*

In nbdime, the objects to diff are json-compatible nested structures
of dicts (limited to string keys) and/or lists of values with
heterogeneous types (strings, ints, floats), warranting a more
flexible diff format. The diff of two objects is still a collection of
transformations, converting one object into the other.

In nbdime there is a single function `nbdiff.patch` that expects an
initial object and a diff object. The expected generic diff format is
defined below. Adhering to this diff format is a number of variations
of diff algorithms for different types of data. The collection of diff
algorithms is under heavy construction and therefore not documented
here yet.
of dicts (with string keys) and lists of values with heterogeneous
types (strings, ints, floats). The difference between these objects
will itself be represented as a json-compatible object in a format
described below.

Note that, for all practical purposes, the diff between two objects is
never unique. However any diff must always be correct in the sense
that "patch(X, diff(X, Y)) == Y" holds. This identity is extensively
used in the test suite. The goal of this project is to develop diff
and merge algorithms tailored for Jupyter notebooks that are not only
correct but also have high enough quality to be useful.

The diff is itself a json-compatible object. Instead of being a
sequence of transformations, it is a tree of transformations, a
hierarchial structure where a transformation may itself contain a
sequence of transformations of a substructure. Each level in the diff
hierarchy applies either to a diff of two dicts, or to a diff of
two sequences (lists or strings). The diff format for dict and
sequence cases are slightly different.
Diff format basics
------------------

For examples of concrete diffs, see e.g. the test suite for patch.
A diff object represents the difference B-A between two objects A and
B as a list of operations (ops) to apply to A to obtain B. Each
operation is represented as a dict with at least two items:

{ "op": <opname>, "key": <key> }

Diff format for dicts (current)
-------------------------------
The objects A and B are either mappings (dicts) or sequences (lists or
strings), and a different set of ops are legal for mappings and
sequences. Depending on the op, the operation dict usually contains an
additional argument, documented below.

A diff of two dicts is a list of diff entries:

key = string
entry = [action, key] | [action, key, argument]
diff = [entry0, entry1, ...]
Diff format for mappings
------------------------

A dict diff entry is a list of action and argument (except for deletion):
For mappings, the key is always a string. Valid ops are:

* ["remove", key]: delete existing value at key
* ["add", key, value]: insert new value at key
* ["replace", key, value]: replace value at key with newvalue
* ["patch", key, diff]: patch value at key with diff

Insert is an error if the key already exists.
Delete, replace and patch are errors if the key does not already exist.
* { "op": "remove", "key": <string> }
- delete existing value at key
* { "op": "add", "key": <string>, "value": <value> }
- insert new value at key not previously existing
* { "op": "replace", "key": <string>, "value": <value> }
- replace existing value at key with new value
* { "op": "patch", "key": <string>, "diff": <diffobject> }
- patch existing value at key with another diffobject


Diff format for sequences (list and string)
-------------------------------------------

A diff of two sequences is an ordered list of diff entries:
For sequences the key is always an integer index. This index is
relative to object A of length N. Valid ops are:

* { "op": "removerange", "key": <string>, "length": <n>}
- delete the values A[key:key+length]
* { "op": "addrange", "key": <string>, "values": <values> }
- insert new items from values list before A[key], at end if key=len(A)
* { "op": "patch", "key": <string>, "diff": <diffobject> }
- patch existing value at key with another diffobject


Relation to JSONPatch
---------------------

The above described diff representation has similarities with the
JSONPatch standard but is different in a few ways. JSONPatch contains
operations "move", "copy", "test" not used by nbdime, and nbdime
contains operations "addrange", "removerange", and "patch" not in
JSONPatch. Instead of providing a recursive "patch" op, JSONPatch uses
a deep JSON pointer based "path" item in each operation instead of the
"key" item nbdime uses. This way JSONPatch can represent the diff
object as a single list instead of the 'tree' of lists that nbdime
uses. To convert a nbdime diff object to the JSONPatch format, use the
function

from nbdime.diff_format import to_json_patch
jp = to_json_patch(diff_obj)

index = integer
entry = [action, index] | [action, index, argument]
diff = [entry0, entry1, ...]
Note that this function is currently a draft and not covered by tests.

A sequence diff entry is a list of action, index and argument (except for deletion):

* ["addrange", index, n]: delete n entries starting at index
* ["removerange", index, newvalues]: insert sequence newvalues before index
* ["patch", index, diff]: patch value at index with diff
Examples
--------

For examples of concrete diffs, see e.g. the test suite in test_patch.py.
8 changes: 8 additions & 0 deletions docs/source/merging.rst
Original file line number Diff line number Diff line change
@@ -1,6 +1,14 @@
Representing merge results and conflicts
========================================

Nbdime implements a three-way merge of Jupyter notebooks and a large
subset of generic json objects. The result of a merge operation with a
shared origin object base and modified objects local and remote, is a
fully or partially merged object plus diff objects between the
partially merged objects and the local and remote objects. These two
diff objects represent the merge conflicts that could not be
automatically resolved.

TODO: Define output formats for the merge operation.


Expand Down
2 changes: 1 addition & 1 deletion nbdime/diff_format.py
Original file line number Diff line number Diff line change
Expand Up @@ -251,7 +251,7 @@ def source_as_string(source):
return source


def to_json_patch_format(d, path="/"):
def to_json_patch(d, path="/"):
"""Convert nbdime diff object into the RFC6902 JSON Patch format.
This is untested and will need some details worked out.
Expand Down

0 comments on commit 68ffd54

Please sign in to comment.