Should kernel info be included in notebook signature computation? #296

afshin · 2022-08-18T17:32:24Z

@divyansshhh opened an issue in JupyterLab that is more appropriate for jupyter/nbformat, so this issue is meant to replace the original in JupyterLab.

Problem

At present, when computing the signature of a notebook, the compute_signature function (ref) takes into consideration the kernel info. I believe this shouldn't be done because it has no effect on the safety of the output.

Proposed Solution

If there is no reason to include kernel info while computing the notebook signature then it should be stripped out.

Additional context

In our use-case we have a custom contents manager that can be used to access notebooks from a remote host in a read-only mode. Now, if the original notebook was authored with say kernel1 then opening that in a host which doesn't have a kernel named kernel1 makes the notebook untrusted.

The text was updated successfully, but these errors were encountered:

Carreau · 2022-08-30T10:01:19Z

I believe this shouldn't be done because it has no effect on the safety of the output

I'm not sure this is true , because I'm not sure in general one should consider only the output as being unsafe for the trust model in general.

While the trust in jupyter lab and jupyter server is used in this way it does not mean nbformat is used in a different way by someone else.

Trust can be for example used to know whether to run a notebook in a cron as root, maybe.
And then if the kernel can be changed, why not have it point to an attacker controlled executable. Or have a code that means different things in 2 different kernels.

krassowski · 2022-08-30T19:58:01Z

Loosely related previous discussion: compute_signature should skip all transient properties not only signature #234

I think that there are multiple things that the current trust mechanism attempts to be:

integrity certificate ("this notebook or its outputs were not modified manually in another application")
- improvements to this area (including a timestamp, previous checkpoint hash, etc) would be useful for complying with the letter of the FDA guidance on audit trial for drug trial analyses
protection from unwanted output (e.g. JavaScript) execution on startup ("it is safe to open this notebook if you trust the sender") or Markdown content, analogous to "trust" concept for executable macros in Office
protect from kernel-side runtime security pitfalls (e.g. swapped kernelspec, embedded/invisible code) ("it is safe to run this notebook if you trust the sender")

Should we split those into multiple signatures?

I would also argue that the ecosystem is showing signs of being annoyed with the current trust implementation. Any security mechanism which is sufficiently annoying will be circumvented by users (as the common example of complexity requirements on passwords). Here are some quick examples of this happening in the wild with the notebook trust:

Cannot "trust" a notebook with blank cells jupyterlab/jupyterlab#9765 (comment) - user suggesting to remove trust altogether
NBs becoming untrusted in save hook fastai/nbdev#892 (comment) - nbdev developer considering disabling trust for notebooks since it does not work reliably

I think that we could introduce a new granular trust system covering the three use cases as described above in backward-compatible way, so the roll-out would be gradual and would not too be costly.

Edit clarifying note: for (1) I think that kernel info should be included because different kernels could lead to different results, so a change here would indicate (depending on use-case) some form of tampering or write/read problem (2) should not include kernel info as it is irrelevant and annoying (3) should include kernel info due to reasons outlined above.

krassowski · 2022-08-30T20:02:19Z

Also linking to a previous discussion on trust in Real Time Collaboration scenario, where per-output/widget trust issue was discussed: jupyterlab/jupyterlab#11494.

Carreau · 2022-08-31T12:19:47Z

For me most of these are either bugs or conflation of "save" with "export".
Currently the ipynb is becoming both a persistent store for application state and an exchange format.

IMHO the jupyter server should store in whatever binary format that is incompatible between version somewhere, and over option to "export", or potentially auto-export version of the files in .md, .ipynb or whatever you like.

This is how most applications work today when you have complex structure.
Especially with RTC that needs complex informations there is no reason to try to shove things into the ipynb.

afshin added the question label Aug 18, 2022

afshin mentioned this issue Aug 18, 2022

Should kernel info be included in notebook signature computation? jupyterlab/jupyterlab#12955

Closed

JasonWeill mentioned this issue Aug 18, 2022

Weekly Triage meetings: Jul-Dec 2022 jupyterlab/frontends-team-compass#151

Closed

fcollonval mentioned this issue Sep 2, 2022

Weekly Team Meetings: Jul-Dec 2022 jupyterlab/frontends-team-compass#152

Closed

krassowski mentioned this issue Apr 9, 2023

Discussion: Rethinking notebook cell types jupyter/enhancement-proposals#95

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Should kernel info be included in notebook signature computation? #296

Should kernel info be included in notebook signature computation? #296

afshin commented Aug 18, 2022

Problem

Proposed Solution

Additional context

Carreau commented Aug 30, 2022

krassowski commented Aug 30, 2022 •

edited

Loading

krassowski commented Aug 30, 2022

Carreau commented Aug 31, 2022

Should kernel info be included in notebook signature computation? #296

Should kernel info be included in notebook signature computation? #296

Comments

afshin commented Aug 18, 2022

Problem

Proposed Solution

Additional context

Carreau commented Aug 30, 2022

krassowski commented Aug 30, 2022 • edited Loading

krassowski commented Aug 30, 2022

Carreau commented Aug 31, 2022

krassowski commented Aug 30, 2022 •

edited

Loading