Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert bcrypt to use OpenBSD code #68

Merged
merged 16 commits into from Jun 27, 2016
Merged

Convert bcrypt to use OpenBSD code #68

merged 16 commits into from Jun 27, 2016

Conversation

reaperhulk
Copy link
Member

@reaperhulk reaperhulk commented Jun 21, 2016

This allows us to add bcrypt_pbkdf to support OpenSSH keys much more easily.

Some review things:

  • Verify that the files added are the OpenBSD files (with a few changes like removing static on a definition, adding a header, and removing unused functions.)
  • Verify that the changes to sha2.c to use be64toh, htobe64, and le32toh are correct.
  • Carefully check portable_endian.h to verify it does what we expect on our supported platforms.

retval = _bcrypt.lib.crypt_gensalt_rn(
b"$" + prefix + b"$", rounds, salt, len(salt), output, len(output),
return (
b"$" + prefix + b"$" + ("%2.2u" % rounds).encode("ascii") + b"$" +

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason not to use b"%2.2u" % rounds (or b"$%s$%2.2u$" % (prefix, rounds))?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can't use % with bytes in Py3 :)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And that’s what I get for “temporarily” linking python to python2. Sorry!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haha, no problem. I'm pleased that you're reviewing this!

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewing history, I did actually check it properly: % works with bytes as of 3.5. (Seeing as 3.3 is the compatibility target, though, that doesn’t really matter.)

@reaperhulk reaperhulk changed the title [WIP] Convert bcrypt to use OpenBSD code Convert bcrypt to use OpenBSD code Jun 22, 2016
@reaperhulk reaperhulk changed the title Convert bcrypt to use OpenBSD code [WIP] Convert bcrypt to use OpenBSD code Jun 22, 2016
@reaperhulk
Copy link
Member Author

Back to WIP because this doesn't work on Windows yet.

@reaperhulk reaperhulk changed the title [WIP] Convert bcrypt to use OpenBSD code Convert bcrypt to use OpenBSD code Jun 22, 2016
@reaperhulk
Copy link
Member Author

This is now tested on Mac (El Capitan), Linux, and Windows under py2 and py3.

We should consider improving the test infra for bcrypt a blocker for merging this as that validation should be provided in the status checks, not just my assertion that I tested it and it works.

salt = os.urandom(16)
output = _bcrypt.ffi.new("unsigned char[]", 30)
_bcrypt.lib.encode_base64(output, salt, len(salt))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason to write/use our own base64 encoding instead of using what the standard library already supplies?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I see we need our own base64 encoding in the C code to return the hashed password, but I think that this particular call can just use the standard library version. I guess there's an argument to be made for using this here so that we use the same encoding function everywhere, but it feels like to me it'd be better to just not expose the encode_base64 function via cffi and just treat it as an internal implementation detail of the library, and to use the standard library code in Python.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one doesn't add trailing = but I don't know if there are any other differences. We may be able to switch (I originally had it using stdlib but converted it during the port for no good reason)

@dstufft
Copy link
Member

dstufft commented Jun 22, 2016

Couple of general questions:

  • Does this support all the bcrypt hash types? ($2a$, $2y$, and $2b$)? I assume it supports at least $2b$ since that came from OpenBSD but what about the other two? If not we should at least document that as a backwards incompatible change.
  • What does updating this code to pull in new code from OpenBSD look like? Is this something that even makes sense to think about?

@dstufft
Copy link
Member

dstufft commented Jun 22, 2016

Another thought: Since we're using different backing library, would it make sense to pull out the test vectors from openwall's code and add that into this library?

@reaperhulk
Copy link
Member Author

$2a$ and $2b$ are both supported. $2y$ is not. I've added some tests pulled from openwall for $2a$ and updated the README to note the supported prefixes.

@dstufft
Copy link
Member

dstufft commented Jun 22, 2016

@reaperhulk so, at one point I think we were defaulting to $2y$ prefixes so that I think could be a bit of a problem. Looking at the change log for the Openwall code I see this:

Version 1.3 adds support for the $2b$ prefix introduced in OpenBSD 5.5+, which behaves exactly the same as crypt_blowfish's $2y$. This way, full compatibility with OpenBSD's bcrypt is achieved at this new prefix.

Which suggests to me that we could regain support for the $2y$ prefix using something like:

import re

_normalize_re = re.compile(b"^\$2y\$")

def _normalize_prefix(hashed):
    return _normalize_re.sub(b"$2b$", hashed)

@dstufft
Copy link
Member

dstufft commented Jun 22, 2016

I do think if we do that though, we should fail on passing a 2y prefix to gensalt, this should only be to support working with existing hashes, not to support continuing to generate 2y hashes.

@reaperhulk
Copy link
Member Author

That seems fair. I'll update this PR to add support for understanding $2y$ prefixes and add tests.

@reaperhulk
Copy link
Member Author

I realize I didn't address the "what does updating this look like" question... Updating this is going to be challenging because we've taken a snapshot of the OpenBSD code, removed various functions we don't need, changed a few function definitions from static, added some include headers to make it work across various platforms, and changed the way it does endian swapping. Updating will require doing a diff of the new version vs ours and then looking at the diff to understand what has changed that we may want to pull in and what is affecting areas we've intentionally removed/changed.

In practice updating is unlikely to be required except in the event of a security vulnerability.

One potential way we could split this up would be to make the C code a separate project called libbcrypt and then consume that for the Python bindings.

@dstufft
Copy link
Member

dstufft commented Jun 23, 2016

@reaperhulk I'm not particularly opposed to the idea that updating is harder as long as we explicitly make that choice and we don't accidentally make it and end up regretting it. Looking at the code base for the openwall code, it appears that it's had practically no changes since it's original implementation besides the prefix handing and that one security fix, so I think it's a pretty safe thing to not really worry too hard about unless we can think of a low impact way to make it easier.

@lvh
Copy link
Member

lvh commented Jun 23, 2016

Are we using the OpenBSD C code verbatim, sans modification? Because man there are some missing braces for some if statements that I don't like.

@reaperhulk
Copy link
Member Author

There have been changes (removal of functions, removal of static on one or two declarations, and the endian change), but I've tried hard to minimize those to make it vaguely possibly to compare them. I'd rather not diverge further, but I do agree that the lack of consistent bracing is pretty unfortunate.


for (i = 0; i < 4; i++) {
for (k = 0; k < 256; k += 2) {
d[0]^= Blowfish_stream2word(data, databytes, &j);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consistency nitpick: space after right bracket?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixing this would make the diff from OpenBSD even harder to read 😢 (most of this is a direct copy from their source tree)

@lvh
Copy link
Member

lvh commented Jun 27, 2016

I have reviewed this PR.

@lvh lvh merged commit 394882d into pyca:master Jun 27, 2016
@reaperhulk
Copy link
Member Author

🎉 💃 🏆

@reaperhulk reaperhulk deleted the bcrypt3 branch June 27, 2016 16:55
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 16, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants