Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sign notebooks #4824

Merged
merged 17 commits into from Jan 30, 2014
Merged

sign notebooks #4824

merged 17 commits into from Jan 30, 2014

Conversation

minrk
Copy link
Member

@minrk minrk commented Jan 17, 2014

This adds notebook signing for protecting dynamic output from running at load.

  • signature is stored in notebook.metadata.signature
  • signature is checked on open / written on save server side, in the NotebookManager
  • auth flag is set on cells when sent to Javascript via REST API
  • untrusted outputs are not displayed, instead a warning message is shown and display proceeds to safe output types

Needs tests before this is is ready to merge.

scheme, sig = stored_signature.split(':', 1)
try:
my_signature = notebook_signature(nb, secret, scheme)
except AttributeError:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Glad the scheme is caught here as a failure (fake lib).

@rgbkrk
Copy link
Member

rgbkrk commented Jan 18, 2014

The only "issue" I'm seeing so far is that this leaves the choice of scheme available to an attacker (choosing the algorithm to sign with).

scheme is the hashing scheme, which must be an attribute of the hashlib module,
as listed in hashlib.algorithms.
"""
hmac = HMAC(secret, digestmod=getattr(hashlib, scheme))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fairly happy with your choice to use getattr here, as it restricts them to only having the "pure" hashlib algorithms available.

Depending on the version of openssl that's installed, hashlib.new can pull other algorithms out:

In [27]: hashlib.new('ripemd160')
Out[27]: <ripemd160 HASH object @ 0x1025402b0>
In [28]: getattr(hashlib, 'ripemd160')
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-28-c02bfa21eec7> in <module>()
----> 1 getattr(hashlib, 'ripemd160')

AttributeError: 'module' object has no attribute 'ripemd160'

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, maybe I should not be using getattr, and using hashlib.new. Not sure, there.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm more or less saying that getattr may be an unintentional good thing, as we're subject to the built in versions of the primary hashlib algorithms, rather than arbitrary hashing algorithms bundled with openssl. OpenSSL includes at least one hashing algorithm (md4) with known derivable collision attacks. Demo:

In [76]: k1 = binascii.unhexlify("839c7a4d7a92cb5678a5d5b9eea5a7573c8a74deb366c3dc20a083b69f5d2a3bb3719dc69891e9f95e809fd7e8b23ba6318edd45e51fe39708bf9427e9c3e8b9")

In [77]: k2 = binascii.unhexlify("839c7a4d7a92cbd678a5d529eea5a7573c8a74deb366c3dc20a083b69f5d2a3bb3719dc69891e9f95e809fd7e8b23ba6318edc45e51fe39708bf9427e9c3e8b9")

In [78]: k1 == k2
Out[78]: False

In [79]: hash1 = hashlib.new('md4'); hash1.update(k1)

In [80]: hash2 = hashlib.new('md4'); hash2.update(k2)

In [81]: hash1.digest() == hash2.digest()
Out[81]: True

Is this a reasonable means of attack? No. You would have to construct a very well structured notebook, chunking away at partial bytes. This is mitigated by not allowing an attacker to "pick" a weaker algorithm.

@minrk
Copy link
Member Author

minrk commented Jan 18, 2014

The only "issue" I'm seeing so far is that this leaves the choice of scheme available to an attacker (choosing the algorithm to sign with).

That's a good point. I can let the user specify the signature scheme, and just consider it untrusted if the scheme doesn't match.

@takluyver
Copy link
Member

Should we have a whitelist/blacklist of signing schemes?

description="""Sign one or more IPython notebooks with your key,
to trust their dynamic (HTML, Javascript) output."""

examples="""ipython notebook trust mynotebook.ipynb"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should it "have" to be notebook ? nbvconvert work with notebook and is not ipython notebook nbconvert <mynotebook>

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Completely agree with @Carreau. ipython trust mynotebook.ipynb maps the action very simplistically.

@Carreau
Copy link
Member

Carreau commented Jan 18, 2014

Should we try to warn if notebook is read from disk and the "trusted" keys are already present ?

@@ -236,6 +237,10 @@ def save_notebook_model(self, model, name='', path=''):
# Save the notebook file
os_path = self.get_os_path(new_name, new_path)
nb = current.to_notebook_json(model['content'])

if sign.check_trusted_cells(nb):
sign.trust_notebook(nb, self.notary.secret, self.notary.scheme)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not a self.notary.sign(notebook) api ? assuming notary as a unique secret prevent to do ring of trust anf puplic/private key things. I don't say that we would ship such a things, but it make it really hard to implement such a subclass.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Partly because I wanted a purely functional API. I only added the Notary after everything was already done, and I realized I didn't have a sensible place to put config. I could rename it SignatureConfig, to better represent its current purpose, or give the class an actual API.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that I think about it, since pretty much every function requires secret and key args, and I already have an object to store those two values, that's kind of the definition of a class.

@Carreau
Copy link
Member

Carreau commented Jan 18, 2014

I'm really happy on how this is done. We might want to do that as nbconvert filters at some point, but I think it is fine for now. I saying that because I thing that stripping/adding signature could be done on git clean and smudge filter.

@Carreau
Copy link
Member

Carreau commented Jan 22, 2014

Test relaunched on 3.3. (min added commits, mainly test, so we can re-review)

@ghost ghost assigned ellisonbg Jan 23, 2014
@ellisonbg
Copy link
Member

Looks like hashlib doesn't have an algorithms attribute in Python 3 so the tests are having trouble. What else needs to be done on this?

@Carreau
Copy link
Member

Carreau commented Jan 27, 2014

I would still like

 +            trusted = self.notary.check_signature(nb)
 +            if not trusted:
 +                self.log.warn("Notebook %s/%s is not trusted", model['path'], model['name'])
 +            self.notary.mark_cells(nb, trusted)

and

 +        if self.notary.check_cells(nb):
 +            self.notary.sign(nb)
 +        else:
 +            self.log.warn("Saving untrusted notebook %s/%s", new_path, new_name)

To be methods of the notary object. I would like to implement my own notary that have different logics in thoses place.

And having this logic in FileNotebookManager would make me have to inherit and overwrite FileNotebookManager with a lot of code.

@ellisonbg
Copy link
Member

I have looked through the code - very solid at this point. I like the design. I have also tested it locally and it all works as expected. Other than the test issue, I think this is ready for merge.

@takluyver
Copy link
Member

On Python 3, hashlib has hashlib.algorithms_available and hashlib.algorithms_guaranteed.

@minrk
Copy link
Member Author

minrk commented Jan 27, 2014

@Carreau I don't think those methods belong in Notary, because it doesn't know anything about saving or paths. But I did make them simple methods on the base NotebookManager class.

@Carreau
Copy link
Member

Carreau commented Jan 28, 2014

@Carreau I don't think those methods belong in Notary, because it doesn't know anything about saving or paths. But I did make them simple methods on the base NotebookManager class.

Well, the path and names are log statement, they are not really used by the methods, but seem fine for me.

JS test failed on Py3 I relaunched.

@minrk
Copy link
Member Author

minrk commented Jan 28, 2014

JS tests passed, but I missed another instance of hashlib.algorithms. All tests are passing now on Python 3 (waiting for Travis to agree).

@takluyver
Copy link
Member

Python 2.7 build failed, restarted...

@takluyver
Copy link
Member

All clear on Travis.

@Carreau
Copy link
Member

Carreau commented Jan 28, 2014

@ivanov you were the most defavorable to going this route. Will you press the merge button ?

@ivanov
Copy link
Member

ivanov commented Jan 28, 2014

@Carreau that's not quite accurate - I was just in favor of a simpler mechanism to allow either showing all output, or disabling all potential problematic ones. I was strongly opposed to not treating our users like consenting adults who can make that decision on their own, with the default being the safer disabling problematic output. The signing idea wasn't even verbalized at that time.

With that said - I think we'd be doing a great disservice to current users of the notebook if in 2.0 we did not ship or (oops, missed ipython trust) document for them a simple way to sign their pre-2.0 notebooks.

Its a significant change that we should document for users.

@takluyver
Copy link
Member

This PR does add a subcommand ipython trust which will sign notebooks that you pass it. I agree that it should be documented, though.

protects against notebook author choosing bad hash scheme.
and move App definition to nbformat.sign (maybe it should get its own file).
needed for testing
rather than returning null
@minrk
Copy link
Member Author

minrk commented Jan 29, 2014

Rebased and documented.

@ivanov
Copy link
Member

ivanov commented Jan 29, 2014

Travis seems to be unhappy :\

javascript and HTML output will not be displayed on load,
and must be regenerated by re-executing the cells.

Any notebook that you have executed yourself will be considered trusted,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'have executed in its entirety'? or just have a complete sentence about how partially re-executed foreign notebooks stay untrusted?

so you can wait for at least n outputs
A few waits, little changes to get it running with recent changes
now that get_output_cell raises if there is no such output
so that subclasses have less to duplicate
@minrk
Copy link
Member Author

minrk commented Jan 29, 2014

fixed bug in casper utils.js introduced by bad rebase.

@minrk
Copy link
Member Author

minrk commented Jan 30, 2014

Travis is still failing, but the failing tests are not related to this PR. We have a bunch of random failures in recently added js tests, so Travis failures are mostly useless right now.

@minrk
Copy link
Member Author

minrk commented Jan 30, 2014

While Travis may disagree, relevant tests are indeed passing.

@takluyver
Copy link
Member

Docs look good to me - @ellisonbg , @Carreau , you were reviewing the actual code, so I'll let one of you do the merge.

@ellisonbg
Copy link
Member

Looks good, merging.

ellisonbg added a commit that referenced this pull request Jan 30, 2014
@ellisonbg ellisonbg merged commit c008278 into ipython:master Jan 30, 2014
@takluyver
Copy link
Member

And we should ping the list explaining this change and the existence of the ipython trust command.

@damianavila
Copy link
Member

And we should ping the list explaining this change and the existence of the ipython trust command.

Yes, I agree...

@ellisonbg
Copy link
Member

Let's wait until we start cleaning markdown cells on load before we
publicize that "notebooks are now safe on load" I will work on that code
today.

On Fri, Jan 31, 2014 at 1:01 AM, Damián Avila notifications@github.comwrote:

And we should ping the list explaining this change and the existence of
the ipython trust command.

Yes, I agree...

Reply to this email directly or view it on GitHubhttps://github.com//pull/4824#issuecomment-33770427
.

Brian E. Granger
Cal Poly State University, San Luis Obispo
bgranger@calpoly.edu and ellisonbg@gmail.com

@michaelaye
Copy link
Contributor

The docs I see using ipython help trust do not explain what key actually is and if it is required to generate an IPython-only key or if I could use my ssh-key or my GPG email key or ... (I'm not really good in understanding all these differences, but trying to use them whenever I can).
Is there any more documentation somewhere? I'm unclear if above mentioned documentation by @minrk refers to the content of ipython help trust or something else?

@takluyver
Copy link
Member

By default, it generates a random key for you, but if you prefer to give it a key, copy/symlink the file to ~/.ipython/profile_default/security/notebook_secret. Or set the config value c.NotebookNotary.secret_file to point to your key file. It will use the bytes in the file as is - there's no interpretation of the key in the file like removing ascii armour.

mattvonrocketstein pushed a commit to mattvonrocketstein/ipython that referenced this pull request Nov 3, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants