Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP

Loading…

[prototype] cell id and diff tool #2342

Closed
wants to merge 30 commits into from

5 participants

@Carreau
Owner

This introduce cell_id and "family tree" for cell to allows smart diffing.

With a little example script and some test notebook in doc to diff notebooks.

You should see that the diff tool is capable of findings cell ancesters, making the diff between the concatenated input of all ancesters, and the current input. Same for split.

There is no detection of deleted cell for now.

Hoping to convince you that adding a little more than just cell_id might allow to do lot more stuff.

Thought ?

ocatave magic
modified octave magic
generated diff

@travisbot

This pull request passes (merged 58d78fc into 5bb0ae4).

@ellisonbg
Owner

The idea of notebook diffs seems quite powerful, but I wonder if there are larger issues here about how we save notebooks. @fperez and @minrk do you think there is an IPEP lurking here? I am not quite sure what it would be though. But I love the idea of being able to do nb diffs, just not sure how to integrate it with the rest of our plans.

@Carreau
Owner

What do you mean with issues here about how we save notebooks?

The main point of this PR is to discuss how we handle UUID, is it worth keeping track of ancestors UUIDs or not.
I think it is for having good custom merging and diffing.

@minrk
Owner

I think it is probably not worth it to track ancestors.

@Carreau
Owner

Then how do you diff when you have cell merge, or uuid changes ?
there are some operation like copying/pasting/merging cell, that can't conserve uuid.

@ellisonbg
Owner
@tkf
tkf commented

Is this diff tool going to be a building block for "Google Doc like" notebook sync across clients?

@minrk
Owner
@Carreau
Owner

@ellisonbg

How do you diff a simple cell move?

just extend the diff syntax to

@@ diff for cell 4 @@
--- 1_old_cell
+++ 1_new_cell
@@ -1,4 +1,3, move from position 4 to 18 @@

for example, or store in the metadata of the newly created cell to show a special UI

How do you handle multiple merges?
it does work with current implementation.

assumming A,B,C to AB,C to C,AB to CAB

your family tree woull be smth like

upper_cell:{
    C
    }
lower_cell:
    {
    AB : { upper_cell: A,
           lower_cell: B
    }

so wether you have (C,AB), or (C,A,B) you can reconstitute the merge.

It seems to me that there are cases where the ancestor id wont work.

Maybe, but introducing cell ID, allows us to do something better than simple
json diffing in many cases.

@tkf,

Is this diff tool going to be a building block for "Google Doc like" notebook
sync across clients?

I don't thinkn so, with live collaborating, document will be in sync, so you
don't need to look for differences. But UUID will help with syncing.

@minrk,

Sometimes the diff won't deduce everything, and might show the change as
larger than a human would.
This happens all the time with regular diffs, and is not generally a
problem.

Yes, and we have the occasion to reduce this effect, we can remove it totally but it will be better than anything.

Lots of interpreters don't show colored tracebacks, and it's not generally a problem.

I just don't think the benefit is worth the complexity.

The added complexity is small, and stored in metadata.

We have the occasion of storing additionnal metadata on the document and to
build a format that might use that to offer better versionning. Like some
Software allow you to see who edited what, let not miss the occasion.

users A and B are both given the same notebook with a single cell
user A does split-cell to create a two-cell notebook
user B creates a new cell, and does copy/paste to move text from one cell
to the next.

Their two notebooks are now identical, and created from the same source.
If we depend on ancestor ID, wouldn't A's diff be significantly different
from B's? This doesn't seem desirable.

I know some people that don't use/know git mv, should I consider that git-mv is worthless ?

I don't say that we should alway rely the extra-information to do diff and/or merge,
but we can have it, in metadata, the point of metadata is to add additionnal
informations, and we have the occasion of adding something that is awfully
difficult to guess afterward, and would make a really good plus for the fileformat.

git allows to manually edit chunk when commiting because diff tool does not always get things right.
let's do better and give versioning tool what they need to do their job better !

@tkf
tkf commented

@Carreau Ah, sorry, this is a sub-cell level diff tool, not worksheet level (this tool is not for detecting reordering of cells, etc.).

@Carreau
Owner

@Carreau Ah, sorry, this is a sub-cell level diff tool, not worksheet level (this tool is not for detecting reordering of cells, etc.).

Yes This is is to detect cell movement and merge.

You don't need to detect cell move and merge while live editting because you are actually dooing it
Live collaboration does not need this tool.

@ellisonbg
Owner

The reason this is related to how notebooks are saved is that if we want to add a UI to the notebook for viewing diffs, we would need access to different versions of the notebooks. This would require some sort of version control for the notebooks. We could build that ourselves or use git, but it definitely gets back to how the notebooks are saved.

But if we just want to have this diff capability as a separate tool, we don't have to think about that.

@ellisonbg
Owner

It also seems like there are other overlaps between:

  • Notebook saving
  • Live collaboration
  • Diffs and version control

Let me try to explain. Live notebook collaboration requires that the client sends cell level diffs to the server, which are then broadcast to other clients. These diffs are things like:

  • Move cell 2 down
  • Merge cells 4 and 5
  • Update the contents of cell 4 with the text "..."

Even without multiple people editing documents, this is exactly the information required to do things like track versions through time, rewind to older versions, etc. Essentially we are inventing structured version control for notebooks.

But there is then the question of where do we store that information. If we don't save it to a db/file, it will be lost. If we want to save it to file, we would have to invent a new file format.

But this all takes me back to my desire to see a new IPEP created where we can discuss all of these things. There are huge issues here that span a number of different areas of the code base.

@Carreau
Owner

Even without multiple people editing documents, this is exactly the information required to do things like track versions through time, rewind to older versions, etc. Essentially we are inventing structured version control for notebooks.

Ok, you think about storing "patches" and use them to do diffs and other.
I didn't thought of this one.

This is (I think) what darcs is trying to do. It is IMHO, a much to complex approach.
Patch could be used to lower the data that have to transit on the wire, but storing and using them give you theoretically more freedom, but would be a nightmare to program.

I have a better understanding of what you want to discuss, and will try to comme up with a new PEP.

@Carreau
Owner

Hi,
This PR is without activities for 2 month now,
I'm going to close it and open an issue to reference it.
This does not mean than we refuse the code that is here, but that we try to keep the number of opened Pull request as low as possible to have better bird view of active code.

Fell free to reopen this PR whenever you want or contact us if you have any questions.

Thanks for contributing.

@Carreau Carreau closed this
@Carreau Carreau deleted the Carreau:cell_id branch
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Commits on Jul 27, 2012
  1. @Carreau

    store uuid at heighest level

    Carreau authored
  2. @Carreau

    store cell id in metadata

    Carreau authored
  3. @Carreau
  4. @Carreau

    construct familly tree

    Carreau authored
  5. @Carreau

    generate uuid for notebook 00

    BUSSONNIER Matthias authored Carreau committed
  6. @Carreau

    merge and resplit notebook 00

    Carreau authored
  7. @Carreau

    add exampel with merge

    Carreau authored
  8. @Carreau

    add familly script

    Carreau authored
  9. @Carreau

    use list to have ordered parents

    Carreau authored
    fix typo
  10. @Carreau

    return to 00.ipynb master

    Carreau authored
  11. @Carreau

    uuid hard

    Carreau authored
  12. @Carreau

    generate uuid

    Carreau authored
  13. @Carreau

    test

    Carreau authored
  14. @Carreau

    add merge and resplit

    Carreau authored
  15. @Carreau

    add test merge

    Carreau authored
  16. @Carreau

    mergesplit

    Carreau authored
Commits on Jul 29, 2012
  1. @Carreau

    working

    Carreau authored
  2. @Carreau

    objectify

    Carreau authored
Commits on Aug 25, 2012
  1. @Carreau

    make merge executable

    Carreau authored
Commits on Aug 26, 2012
  1. @Carreau

    working differ

    Carreau authored
  2. @Carreau

    take argument on commandline

    Carreau authored
  3. @Carreau

    get input for text cell

    Carreau authored
  4. @Carreau

    add example merged

    Carreau authored
  5. @Carreau
  6. @Carreau

    keep unchanged call as-is

    Carreau authored
  7. @Carreau

    add octave diff example

    Carreau authored
  8. @Carreau

    octave modified example

    Carreau authored
  9. @Carreau

    try fix split ancester

    Carreau authored
  10. @Carreau
  11. @Carreau
Something went wrong with that request. Please try again.