Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

default password hash of sha-1 is insecure #2426

Closed
jnhnum1 opened this issue Apr 19, 2017 · 10 comments · Fixed by #3793
Closed

default password hash of sha-1 is insecure #2426

jnhnum1 opened this issue Apr 19, 2017 · 10 comments · Fixed by #3793

Comments

@jnhnum1
Copy link

jnhnum1 commented Apr 19, 2017

see https://security.googleblog.com/2017/02/announcing-first-sha1-collision.html

would be better to use sha256, or even better something like scrypt or bcrypt (which are actually meant for password hashing)

@takluyver
Copy link
Member

It's definitely not a good example for password storage, but a collision attack is not a plausible vector for stealing a password - an attacker would have to trick you into picking a specific long and unlikely password, which they would have to know, in order to generate a different password that results in the same hash. And if someone can persuade you to use a password they know, the collision is irrelevant. It's a good idea to move to a stronger hash anyway, but the recent news doesn't make that particularly urgent.

A collision attack (making two inputs which produce the same hash) is generally much easier to do than a preimage attack (making an input which produces a given hash). The former is what has happened for SHA-1, and I don't believe it makes much difference for this use case.

If authenticating users to access the notebook is important, we recommend using JupyterHub.

@jnhnum1
Copy link
Author

jnhnum1 commented Apr 19, 2017

That's a good point; thanks.

@nealmcb
Copy link

nealmcb commented Sep 7, 2019

The current hashing method is indeed very insecure, but not really because it uses SHA-1.

The problem is that it doesn't iterate the hash, making the hash function so blazing fast that brute-force attacks, using newly-available fast hardware, are way too tempting, even with a salt.
The fact that a collision was found just underscores the speed with which hashes can be done now, and the need for massive numbers of iterations.

Only doing a single iteration is madness. As even the hashlib documentation stated long ago:

As of 2013, at least 100,000 iterations of SHA-256 are suggested.

Since then, hardware acceleration has expanded speeds tremendously, and a homemade rig can try 26 billion trial passwords a second. Even passwords which were once considered pretty strong can be cheaply cracked these days via online services.

Offhand, pull request #3793 to use bcrypt would help with security, but requires a dependency on an external library.

It would seem better offhand to use the hashlib.pbkdf2_hmac function with a big iteration count. It is standard as of Python 3.4, and described in Hashing Passwords in Python – Useful code.

Soon I hope Python will support the much better Argon2 standard for password hashing, which also allows specifying space and parallelism complexity.

Until then, users should either be warned away from hashing passwords feature, or counseled to use really great passwords. And counseled to throw away and never reuse passwords they may have deployed in the past this way.

For more background, see Developers, its 2019, hash password accordingly

@nealmcb
Copy link

nealmcb commented Jun 13, 2020

(...Continuing a conversation from the PR at #3793...)

With either scrypt or argon2, I guess the next question is how much memory overhead we're comfortable with. The default memory_cost is 105 MB (102400 KiB, from the handy python -m argon2 command). And the parameter tuning tips at the latest IETF internet-draft on the topic

currently emphasize maximizing memory (3. Figure out the maximum amount m of memory that each call can afford.)

But the initial size of Jupyter seems more like 50 MB, based on my laptop running Ubuntu 18.04, if I'm interpreting the output of smem correctly. After startup that is likely to grow, but do we want logging in to always take twice the initial size?

What is the maximum (momentary) extra memory hit you're comfortable with? I guess login most often happens at startup, but if the system is really short on memory, you'd hate to cause problems for users trying to log in to a notebook to gracefully shut it down.

So perhaps we go with a smaller value like memory_cost=10240, and a correspondingly higher number of iterations, like time_cost=18 (which takes the same time per password verification as the default parameters on my machine: ~100 ms).

We could also consider different values for the maximum number of threads to use, or pick a different maximum real time.

@nealmcb
Copy link

nealmcb commented Jun 13, 2020

Is any synergy with JupyterHub desired? Offhand it seems that JupyterHub can leverage authentication from either a variety of external sources as noted at Authenticators, or from a native user authenticator. The latter stores password hashes locally, generated via bcrypt with default parameters, as see at nativeauthenticator.py

That seems to call for 2^12 bcrypt rounds, which isn't too bad according to Recommended # of rounds for bcrypt - Information Security Stack Exchange.

But I'll note that bcrypt isn't as memory-hard (4 KiB) as argon2 or scrypt (configurable up to MB or GB), and there is now FGPA support for bcrypt in Jack the Ripper (popular cracking software), which allows low-power cracking with significant speedups, if you can find suitable hardware.

@nealmcb
Copy link

nealmcb commented Jun 26, 2020

I've reviewed the situation some more, and think the consensus continues to shift to Argon2id. So I think we should use argon2-cffi · PyPI with these default parameters:

memory_cost = 10240
time_cost = 18
parallelism = 8
ph = PasswordHasher(memory_cost=memory_cost, time_cost=time_cost, parallelism=parallelism)
ph.hash(password)

as discussed above. That involves using about 10 MiB of memory. Those parameters should be adjusted in future releases. See the Django Argon2 work factor section and class Argon2PasswordHasher(BasePasswordHasher) for their approach, though I think making the parameters adjustable via jupyter_notebook_config.py would be better.

@remram44
Copy link
Contributor

I don't think we have to go with state-of-the-art crypto here. Using something that can easily be installed and won't cause issues for most users probably should be preferred. Like @takluyver, people needing stronger authentication should be using JupyterHub already.

I'd just like to remind everyone that the current code is terribly insecure and can be improved at no cost, we should try to improve that soon.

@nealmcb
Copy link

nealmcb commented Jun 26, 2020

I agree with @remram44 that the priority is addressing the very dangerous use of the current very fast SHA-1 approach.

If another dependency is ok, I think Argon2 is still the best bet. If not, even PBKDF2 would be a huge improvement, and it is the default (with a SHA256 hash and iterations = 260000) for that reason in Django.

I don't think bcrypt is preferable to Argon2.

@nealmcb
Copy link

nealmcb commented Jun 29, 2020

I'm curious to know how this all fits in with JupyterLab. Does it inherit whatever solution we come up with here? Or does it have yet a different way of using passwords with notebook servers?

@blink1073
Copy link
Member

As of today, JupyterLab uses this package as its server. It is migrating to use the new jupyter_server project, so we'd have to port the change there as well. cf jupyter-server/jupyter_server#110

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Mar 25, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants