diff-ability of notebooks #3065

Closed
y-p opened this Issue Mar 23, 2013 · 5 comments

Comments

Projects
None yet
3 participants
Contributor

y-p commented Mar 23, 2013

I've had a few sessions of reading through notebooks containing lots of text
and sending PRs for fixes. There are a couple of snags i hit:

  • running cells in a notebook creates a lot of "diff noise" due
    to changes in the "prompt_number" fields.
    This necessitates individually filtering them out with "git add -p", which
    gets tedious quickly.
  • Large blocks of text are serialized as a single row in the notebook.
    I realize this is a constraint imposed by JSON, but it means that
    deleting a spurious comman results in a diff for what could be 50
    lines of text, as diff is line-based.

I can't suggest a solution that doesn't involve changing the nb format,
so I guess this is just fyi.

Owner

minrk commented Mar 23, 2013

running cells in a notebook creates a lot of "diff noise" due to changes in the "prompt_number" fields. This necessitates individually filtering them out with "git add -p", which gets tedious quickly.

Input prompts are output information, and they definitely belong in the cell data. For those who want to strip these for git reasons, it should be easy to write a git pre-commit hook that either just strips the prompts, or does a full reset && run all && save on every changed notebook. But if you re-ran your notebook in a different order from before, the changed prompts are not noise - they are real information.

Large blocks of text are serialized as a single row in the notebook. I realize this is a constraint imposed by JSON, but it means that deleting a spurious command results in a diff for what could be 50 lines of text, as diff is line-based.

Where are you seeing this? It isn't true of input or markdown.

Contributor

y-p commented Mar 23, 2013

yes, the commit hook solution is what I thought, but that's pretty techy for drive-by contributors.

I see long lines in all markdown cells of a v3 notebooks, in the 'source' field. Is there something I should be doing
differently except hitting the "save" button?

Owner

Carreau commented Mar 23, 2013

Codemirror might be soft wrapping.
You can hardwrap wherever you want manually, those would appear as different line in JSON.
I don't know if there is an auto hard-wrap for Codemirror.

Owner

minrk commented Mar 23, 2013

I see long lines in all markdown cells of a v3 notebooks, in the 'source' field. Is there something I should be doing
differently except hitting the "save" button?

Ah, I see - long lines that you type will be preserved (just like a regular text file).
But note that in markdown, newlines are not semantic unless followed by two spaces (or two newlines).
This means that you can wrap your markdown source just as you probably would in a regular markdown file,
and these lines will be reflected in the JSON. In that way, there is nothing special about the notebook - it lets you make whatever decisions you want regarding text wrapping, and it is reflected in the JSON. If you want to write long lines, we will save long lines. If you break up your lines, they are broken in JSON. It's all up to you.

Contributor

y-p commented Mar 24, 2013

good, in that case that's exactly as it should be. thanks for the tips.

@y-p y-p closed this Mar 24, 2013

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment