Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

notebook nbdime diff: Hide unchanged cells #386

Closed
stas00 opened this issue Jun 11, 2018 · 28 comments
Closed

notebook nbdime diff: Hide unchanged cells #386

stas00 opened this issue Jun 11, 2018 · 28 comments

Comments

@stas00
Copy link

stas00 commented Jun 11, 2018

I'm using nbdime-1.0.1.

http://localhost:8888/nbdime/git-difftool?base=fastai%2Fcourses%2Fdl1%2Flesson1.ipynb

Hitting: Hide unchanged cells

still shows all the code cells, changed and unchanged. The only difference is the the right column where the cell is unchanged is empty. This seems to be a bug, as it's not hiding the unchanged cells.

nbdime-1

Perhaps the issue is that In[\d\d] is different in the original and the modified notebook. Yet if I don't hit 'Hide Unchanged cells' it doesn't display them as changed - and show to the full screen width.

Yet, it does report/hide some unchanged cells properly:

nbdime-3

So, something is incomplete.

both snapshots are from the same diff.

Thank you!

@vidartf
Copy link
Collaborator

vidartf commented Jun 12, 2018

I'm not sure if I follow the issue description completely. For the first screenshot, do you have two comparison images, one with "hide unchanged" checked and one unchecked? You are right that if the execution count (In[N]) changes it will not be hidden.

@stas00
Copy link
Author

stas00 commented Jun 12, 2018

Both screenshots are with identical setting of "hide unchanged" checked - there are just two parts of the same notebook that demonstrate different things.

You probably would agree that any changes in In[N] should be ignored (always or be configurable) - one should always be able to go back to some individual cell/cells and rerun them without affecting the diff of the source code.

@stas00
Copy link
Author

stas00 commented Jun 12, 2018

Doing some more testing it's In[N] that is the cause of this issue.

Is there an existing ignore settings that will ignore this attribute?

Thank you.

@stas00
Copy link
Author

stas00 commented Jun 12, 2018

Also is it possible to make "Hide unchanged cells" configurable to be the default On. I want this tool to be as close to diff(1) as possible, not as a may be later diff and a lot of noise.

Ideally there should be just a single setting that says:

Extension: {
    source-code-diff-only: true
}

and when http://localhost:8888/nbdime/git-difftool?base=file.ipynb is called it shows from the get going only the source code cells that changed, and nothing else. All those colorful splashes of Metadata changed, Output changed, etc. are cool, but most of the time they are just a noise which has no practical usefulness (probably for most users).

Thank you!

@vidartf
Copy link
Collaborator

vidartf commented Jun 13, 2018

There's a bunch of separate items here now, so I'll address them point by point. Hopefully I won't miss any!

You probably would agree that any changes in In[N] should be ignored (always or be configurable) .

Is there an existing ignore settings that will ignore this attribute?

It is configurable: Either by the details option/flag, or by ignoring /cells/*/execution_count. details also does /cells/*/outputs/*/execution_count. See also the next point.

only the source code cells that changed, and nothing else.

This is the flag --source / -s. It is configurable via e.g.:

"Extension": {
  "source": true
}

Or you could use details: false, outputs: false etc here (either set some to true, or some to false, but not mix the two). I realized that this was a little hard to discover, and had some other issues, so I'm submitting a PR to improve that (#390).

Also is it possible to make "Hide unchanged cells" configurable to be the default On.

That does sound reasonable, and maybe also make it configurable.

Both screenshots are with identical setting of "hide unchanged" checked - there are just two parts of the same notebook that demonstrate different things.

Sure, I was just wondering if you had a screenshot of the first region, one with the checkbox checked, and one with it unchecked. I was trying to understand why the unchanged markdown cell was not being hidden.

@stas00
Copy link
Author

stas00 commented Jun 13, 2018

As I specified I already had "/cells/*/outputs": true, (which should cover /cells/*/outputs/*/execution_count) and even if I added the explicit "/cells/*/execution_count": true, (below) the counts are still not being ignored.

{
  "Extension": {
    "Ignore": {
      "/cells/*/outputs": true,
      "/cells/*/attachments": true,
      "/cells/*/execution_count": true,
      "/cells/*/metadata": true
    }
  }

}

Your suggestion of:

"Extension": {
  "source": true
}

does absolutely nothing - as if there is no config at all - i.e. I get all the diffs - for all types of cells.

This doesn't do anything either:

{
  "Extension": {
    "details": false, "outputs": false
  }
}

Would you also please add to the documentation to always run: nbdime --config after changing nbdime_config.json - as more than once I broke the format - but the front-end wasn't able to tell me what was wrong.

Thank you for working on this, Vidar.

@vidartf
Copy link
Collaborator

vidartf commented Jun 14, 2018

does absolutely nothing - as if there is no config at all - i.e. I get all the diffs - for all types of cells.

Did you try with the linked PR? There are some fixes in there.

the front-end wasn't able to tell me what was wrong.

Did you get an exception in the server log though? And when you say you broke it, do you mean that the JSON was invalid, or just an incorrect configuration?

@vidartf
Copy link
Collaborator

vidartf commented Jun 14, 2018

(note, that PR has been merged, so feel free to try with master)

@vidartf
Copy link
Collaborator

vidartf commented Jun 14, 2018

Part of the problem is fixed in #394.

@stas00
Copy link
Author

stas00 commented Jun 15, 2018

ok, tried with the new changes using venv build and the master branch build:

cd ~/build
python3.6 -m venv myenv
source myenv/bin/activate
nbdime --version
1.0.2.dev
jupyter notebook
  1. I had to run:

jupyter serverextension enable nbdime --user --py

to enable nbdiff extension in the notebook - which is new. Until now this wasn't needed. Until now running:

jupyter nbextension enable nbdime --user --py

installed both extensions (nbextension and serverextension).

  1. I don't see any changes wrt ignores. I still can't get code only diff.

Specifically:

None of these does anything at all (i.e. I get all diff categories on). I tried each of these separately. I validated that they were picked by nbdime --config), and restarted jupyter notebook after each change.


{
  "Extension": {
    "details": false, "outputs": false
  }
}

{
  "Extension": {
    "source": true
  }
}

{
  "Extension": {
    "source": true,
    "details": false, 
    "outputs": false,
    "metadata": false
  }
}
  1. The In[N] still doesn't get ignored with:
{
  "Extension": {
    "Ignore": {
      "/cells/*/outputs": true,
      "/cells/*/attachments": true,
      "/cells/*/execution_count": true,
      "/cells/*/metadata": true
    }
  }

}

So I'm not sure what did change.

Thank you for looking into it, Vidar.

If you need any information from me to help you reproduce the issues please let me know how I can assist.

p.s. The user config file is still very error prone. I noticed I can write anything in there and nbconfig --config silently ignores invalid entries, e.g. if I write:

{
  "ExtensionE": {
    "sourcee": true
  }
}

nbconfig --config silently ignores these settings and reports:

Extension:
  Ignore: {}
  attachments: null
  color_words: false
  details: null
  metadata: null
  outputs: null
  source: null

which is OK. But it might be good to add to the docs a note that it's the user's responsibility to run nbconfig --config and to make sure that the reported changes reflect what the user intended.

@vidartf
Copy link
Collaborator

vidartf commented Jun 15, 2018

Replying to your points by number reference:

  1. This is expected when developing. A develop install does not automatically register the extensions. It should probably be a part of the dev install instructions (haven't been updated since adding the extensions).
  2. This will be fixed for the Extension config by Ensure server extension uses ignorable flags #394.
  3. As execution_count is not a list or a map, the syntax needs to be
{
  "Extension": {
    "Ignore": {
      "/cells/*/outputs": true,
      "/cells/*/attachments": true,
      "/cells/*/metadata": true,
      "/cells/*": ["execution_count"]
    }
  }
}

@vidartf
Copy link
Collaborator

vidartf commented Jun 15, 2018

For 3, I guess that syntax can be documented better 😅

@stas00
Copy link
Author

stas00 commented Jun 15, 2018

Great, we are making progress.

{
  "Extension": {
    "Ignore": {
      "/cells/*/outputs": true,
      "/cells/*/attachments": true,
      "/cells/*/metadata": true,
      "/cells/*": ["execution_count"]
    }
  }
}

almost does it. It now perfectly ignores cells with no code-changes.

However, if the code cell changed, it does show all the other stuff, which it should be ignoring, as you can see from the image:

nbdime-1

The outputs and metadata shouldn't be displayed, neither explicitly as in the last cell (output), nor in a suggestive way via the dropdown to show the diff or the blue colored indicator of change (metadata changed, output unchanged in earlier cells).

So the ignore rules at the moment aren't absolute. They are true only if code cell hasn't changed, otherwise they are being ignored. I imagine there are two parts to nbdiff logic - (1) choosing which cells to diff and which not, and then (2) how to display the diff. So currently the ignore rules apply to the choosing part (1), but not to the displaying part (2).

What I'm after is having just code changes, to be as close as possible to diff(1). And everything else ignored.

Thank you!

@stas00
Copy link
Author

stas00 commented Jun 15, 2018

Now I was able to validate your change #394 and this setting

{
  "Extension": {
    "source": true,
    "details": false, 
    "outputs": false,
    "metadata": false
  }
}

works identically to the setting in my comment above. So this is nice as it "speaks" a more implicit language and allows for future changes in the internals of jupyter notebook and still giving the user what they wanted. I'd rather use the latter syntax.

I see that in other parts of the config you have: ignore_transients - won't that be a good mnemonic to source-only-diff, without needing to define anything else? unless it means something else the way it's used now. I have no way of telling.

Back to #394. Note that the partial configuration doesn't do the same as above:

{
  "Extension": {
    "source": true
  }
}

I get:

Extension:
  Ignore: {}
  attachments: null
  color_words: false
  details: null
  metadata: null
  outputs: null
  source: true

and apparently those 'null's default to true, rather than false. And source being true doesn't override them (or perhaps it wasn't meant to override other settings). Perhaps it'd be a good idea to fix those nulls and choose either true or false as the default, so that the user running nbdime --config can tell what is the configuration supposed to do. At the moment with those nulls it's ambiguous and meaningless to the end user.

@vidartf
Copy link
Collaborator

vidartf commented Jun 18, 2018

apparently those 'null's default to true, rather than false.

The intention is for those nulls to default to the opposite of whatever else is defined, e.g. that if one or more fields are set to true, it defaults to false; and if one or more fields are set to false, it defaults to true. As far as I can tell, this is also the behavior I'm seeing from master. Do you have any example notebooks whose diff behave unexpectedly?

@stas00
Copy link
Author

stas00 commented Jun 18, 2018

The intention is for those nulls to default to the opposite of whatever else is defined, e.g. that if one or more fields are set to true, it defaults to false; and if one or more fields are set to false, it defaults to true.

This is far from obvious to the end-user. Since you have the logic that sets them to true/false during the execution, why not display the explicit settings to the user?

@vidartf
Copy link
Collaborator

vidartf commented Jun 19, 2018

This is far from obvious to the end-user. Since you have the logic that sets them to true/false during the execution, why not display the explicit settings to the user?

Simply printing false/true for the unset variables could be misleading, as it would seem to indicate that the values are actually set somewhere. Printing something more verbose would probably make sense for these non-obvious settings though (e.g. <unset, resolves to true>).

@vidartf
Copy link
Collaborator

vidartf commented Jun 19, 2018

Ref #398.

@vidartf
Copy link
Collaborator

vidartf commented Jun 19, 2018

@stas00 There were several issues reported in this thread. Are there any remaining as you can see?

@stas00
Copy link
Author

stas00 commented Jun 19, 2018

Yes, this part is still unresolved: #386 (comment)

@stas00
Copy link
Author

stas00 commented Jun 19, 2018

And thank you for #398 - the config printout is now explicit and clear for Extension. I see nulls in other sections but I haven't delved into those other parts.

@vidartf
Copy link
Collaborator

vidartf commented Jun 20, 2018

Yes, this part is still unresolved: #386 (comment)

The outputs and metadata shouldn't be displayed, neither explicitly as in the last cell (output), nor in a suggestive way via the dropdown to show the diff or the blue colored indicator of change (metadata changed, output unchanged in earlier cells).

  • The notebook level metadata is displayed in that screenshot there since its path is /metadata, which was not included in the config, so that one is as expected. Adding "/metadata": true should fix it.
  • When a cell is added / removed, all the data that was added is included. This is mainly based on the logic that if the source is ignored, it would look as if an empty cell was added (but maybe with outputs), which would look confusing. I'm possibly open for special casing some scenarios here though, but that would require changing how the diff config system is set up.
  • The "output unchanged" was meant as a way to conveniently allow the user to see the unchanged outputs of a changed cell. Currently, the front-end is agnostic about how the diff was produced, so it is not possible to differentiate the cases two cases: 1) the user is not ignoring anything, the source changed, the outputs didn't; vs 2) the user is ignoring changes to the output. While this is possible to change, it would again be more substantial than just a tweak.

@stas00
Copy link
Author

stas00 commented Jun 20, 2018

The notebook level metadata is displayed in that screenshot there since its path is /metadata, which was not included in the config, so that one is as expected. Adding "/metadata": true should fix it.

I'm not sure what to make of it. Why would you suggest to set it to true? So far true was for what the user does want to see, so metadata should be false.

{
  "Extension": {
    "source": true,
    "details": false, 
    "outputs": false,
    "metadata": false
  }
}

Unless "/metadata" != "metadata" and then this is very confusing. The above setting is being used at the moment. If you did mean false and wrote true by mistake, then I already had it as false.

wrt the last 2 bullets of your comments 2) "cell added/removed." 3) "output unchanged" - These features are currently not very coding-friendly/optimal.

I can see 2 major ways of someone using nbdime with notebooks:

  1. to diff all or some of the features of the notebook (e.g. someone preparing a presentation where outputs are important!)
  2. a coding platform, where only code is important and the outputs are ever changing and thus are just noise. The outputs can be useful in deterministic coding, but for example for machine learning the output numbers are always always different.

At the moment falling into the 2nd category, all I want is just a simple flag --emulate-diff(1)-functionality and that would mean - please show me just the source code diff, period, no matter what happened to the content of the notebook. And please don't show me any noise, not even a suggestion of something else changing.

Of course, I understand, as you suggest that perhaps the design of the features was built around a different intention. So this then belongs to the wish-list.

I guess the only "tweak" that can be done easily is to replace full-outputs of non-code outputs with click-to-show UI (ala metadata), thus reducing the amount of noise.

Thank you, Vidar.

@vidartf
Copy link
Collaborator

vidartf commented Jun 21, 2018

Regarding the metadata point, I was talking about the initial problem. Setting up:

"SectionName": {
  "metadata": false
}

is equivalent to:

"SectionName": {
  "Ignore": {
    "/metadata": true,
    "/cells/*/metadata": true,
    "/cells/*/outputs/*/metadata": true
  }
}

@vidartf
Copy link
Collaborator

vidartf commented Jun 21, 2018

Regarding the second point, do you actually want to have the outputs stored in the repository? If not, you could use a git filter to strip the outputs. Then nbdime would never see the outputs.

Otherwise, it would be a more extensive change. That doesn't mean it won't be done, just not in the short term. For now I'm pushing to get out a patch release with the small tweaks that have already been done. If you want to contribute a fix for this to speed it along, I'll be happy to help with that, but again, it would likely mean some refactoring of the code.

@stas00
Copy link
Author

stas00 commented Jun 21, 2018

I still don't follow your logic wrt metadata here: #386 (comment)

if that's the configuration as you displayed it:

"Extension": {
  "metadata": false
}

it should not show metadata, yet it does. And somehow you say that it's how it should be, yet, your equivalent configuration suggests the opposite - that it's set to be ignored.

@stas00
Copy link
Author

stas00 commented Jun 21, 2018

Regarding the second point, do you actually want to have the outputs stored in the repository?

  1. This is not my repository, so I don't have control over it
  2. These are notebooks for a MOOC, so the output is used during teaching, so the owner won't change it.

If not, you could use a git filter to strip the outputs. Then nbdime would never see the outputs.

That would work.

OK, so the remaining issue will be on nbdime's todo-list, correct?

@vidartf
Copy link
Collaborator

vidartf commented Sep 4, 2018

@stas00 Sorry, whenever I get back to this issue, I get lost in the different threads of conversation. If there are any issues remaining here, would you mind opening separate issues for them? I'm going to close this for now.

@vidartf vidartf closed this as completed Sep 4, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants