New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sign notebooks #4824
sign notebooks #4824
Conversation
scheme, sig = stored_signature.split(':', 1) | ||
try: | ||
my_signature = notebook_signature(nb, secret, scheme) | ||
except AttributeError: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Glad the scheme is caught here as a failure (fake lib).
The only "issue" I'm seeing so far is that this leaves the choice of scheme available to an attacker (choosing the algorithm to sign with). |
scheme is the hashing scheme, which must be an attribute of the hashlib module, | ||
as listed in hashlib.algorithms. | ||
""" | ||
hmac = HMAC(secret, digestmod=getattr(hashlib, scheme)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fairly happy with your choice to use getattr
here, as it restricts them to only having the "pure" hashlib algorithms available.
Depending on the version of openssl that's installed, hashlib.new
can pull other algorithms out:
In [27]: hashlib.new('ripemd160')
Out[27]: <ripemd160 HASH object @ 0x1025402b0>
In [28]: getattr(hashlib, 'ripemd160')
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-28-c02bfa21eec7> in <module>()
----> 1 getattr(hashlib, 'ripemd160')
AttributeError: 'module' object has no attribute 'ripemd160'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, maybe I should not be using getattr, and using hashlib.new. Not sure, there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm more or less saying that getattr
may be an unintentional good thing, as we're subject to the built in versions of the primary hashlib algorithms, rather than arbitrary hashing algorithms bundled with openssl. OpenSSL includes at least one hashing algorithm (md4) with known derivable collision attacks. Demo:
In [76]: k1 = binascii.unhexlify("839c7a4d7a92cb5678a5d5b9eea5a7573c8a74deb366c3dc20a083b69f5d2a3bb3719dc69891e9f95e809fd7e8b23ba6318edd45e51fe39708bf9427e9c3e8b9")
In [77]: k2 = binascii.unhexlify("839c7a4d7a92cbd678a5d529eea5a7573c8a74deb366c3dc20a083b69f5d2a3bb3719dc69891e9f95e809fd7e8b23ba6318edc45e51fe39708bf9427e9c3e8b9")
In [78]: k1 == k2
Out[78]: False
In [79]: hash1 = hashlib.new('md4'); hash1.update(k1)
In [80]: hash2 = hashlib.new('md4'); hash2.update(k2)
In [81]: hash1.digest() == hash2.digest()
Out[81]: True
Is this a reasonable means of attack? No. You would have to construct a very well structured notebook, chunking away at partial bytes. This is mitigated by not allowing an attacker to "pick" a weaker algorithm.
That's a good point. I can let the user specify the signature scheme, and just consider it untrusted if the scheme doesn't match. |
Should we have a whitelist/blacklist of signing schemes? |
description="""Sign one or more IPython notebooks with your key, | ||
to trust their dynamic (HTML, Javascript) output.""" | ||
|
||
examples="""ipython notebook trust mynotebook.ipynb""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should it "have" to be notebook ? nbvconvert work with notebook and is not ipython notebook nbconvert <mynotebook>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Completely agree with @Carreau. ipython trust mynotebook.ipynb
maps the action very simplistically.
Should we try to warn if notebook is read from disk and the "trusted" keys are already present ? |
@@ -236,6 +237,10 @@ def save_notebook_model(self, model, name='', path=''): | |||
# Save the notebook file | |||
os_path = self.get_os_path(new_name, new_path) | |||
nb = current.to_notebook_json(model['content']) | |||
|
|||
if sign.check_trusted_cells(nb): | |||
sign.trust_notebook(nb, self.notary.secret, self.notary.scheme) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not a self.notary.sign(notebook)
api ? assuming notary as a unique secret prevent to do ring of trust anf puplic/private key things. I don't say that we would ship such a things, but it make it really hard to implement such a subclass.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Partly because I wanted a purely functional API. I only added the Notary after everything was already done, and I realized I didn't have a sensible place to put config. I could rename it SignatureConfig, to better represent its current purpose, or give the class an actual API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that I think about it, since pretty much every function requires secret and key args, and I already have an object to store those two values, that's kind of the definition of a class.
I'm really happy on how this is done. We might want to do that as nbconvert filters at some point, but I think it is fine for now. I saying that because I thing that stripping/adding signature could be done on git clean and smudge filter. |
Test relaunched on 3.3. (min added commits, mainly test, so we can re-review) |
Looks like |
I would still like
and
To be methods of the notary object. I would like to implement my own notary that have different logics in thoses place. And having this logic in |
I have looked through the code - very solid at this point. I like the design. I have also tested it locally and it all works as expected. Other than the test issue, I think this is ready for merge. |
On Python 3, hashlib has |
@Carreau I don't think those methods belong in Notary, because it doesn't know anything about saving or paths. But I did make them simple methods on the base NotebookManager class. |
Well, the path and names are log statement, they are not really used by the methods, but seem fine for me. JS test failed on Py3 I relaunched. |
JS tests passed, but I missed another instance of hashlib.algorithms. All tests are passing now on Python 3 (waiting for Travis to agree). |
Python 2.7 build failed, restarted... |
All clear on Travis. |
@ivanov you were the most defavorable to going this route. Will you press the merge button ? |
@Carreau that's not quite accurate - I was just in favor of a simpler mechanism to allow either showing all output, or disabling all potential problematic ones. I was strongly opposed to not treating our users like consenting adults who can make that decision on their own, with the default being the safer disabling problematic output. The signing idea wasn't even verbalized at that time. With that said - I think we'd be doing a great disservice to current users of the notebook if in 2.0 we did not Its a significant change that we should document for users. |
This PR does add a subcommand |
protects against notebook author choosing bad hash scheme.
and move App definition to nbformat.sign (maybe it should get its own file).
needed for testing
rather than returning null
Rebased and documented. |
Travis seems to be unhappy :\ |
javascript and HTML output will not be displayed on load, | ||
and must be regenerated by re-executing the cells. | ||
|
||
Any notebook that you have executed yourself will be considered trusted, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
'have executed in its entirety'? or just have a complete sentence about how partially re-executed foreign notebooks stay untrusted?
so you can wait for at least n outputs
A few waits, little changes to get it running with recent changes
now that get_output_cell raises if there is no such output
so that subclasses have less to duplicate
fixed bug in casper utils.js introduced by bad rebase. |
Travis is still failing, but the failing tests are not related to this PR. We have a bunch of random failures in recently added js tests, so Travis failures are mostly useless right now. |
While Travis may disagree, relevant tests are indeed passing. |
Docs look good to me - @ellisonbg , @Carreau , you were reviewing the actual code, so I'll let one of you do the merge. |
Looks good, merging. |
And we should ping the list explaining this change and the existence of the |
Yes, I agree... |
Let's wait until we start cleaning markdown cells on load before we On Fri, Jan 31, 2014 at 1:01 AM, Damián Avila notifications@github.comwrote:
Brian E. Granger |
The docs I see using |
By default, it generates a random key for you, but if you prefer to give it a key, copy/symlink the file to |
sign notebooks
This adds notebook signing for protecting dynamic output from running at load.
Needs tests before this is is ready to merge.