Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpo-16487: allow certificates to be specified from memory #2449

Open
wants to merge 7 commits into
base: master
from

Conversation

Projects
None yet
6 participants
@jgehrcke
Copy link

commented Jun 27, 2017

Refs:

Compared to the state last discussed via http://bugs.python.org/review/16487, I have

  • rebased on latest master and resolved merge conflicts (nothing complicated)
  • incorporated the changes that I had acked in this review board message

https://bugs.python.org/issue16487

"""
with open(filepath, 'rb') as f:
buf = BytesIO(f.read())
buf.read = buf.getvalue

This comment has been minimized.

Copy link
@pitrou

pitrou Jun 29, 2017

Member

Why do you need that? It should work without.

>>> b = io.BytesIO(b"abc")
>>> b.read()
b'abc'

This comment has been minimized.

Copy link
@jgehrcke

jgehrcke Jun 30, 2017

Author

I think I wanted to have the object be re-read()able, because IIRC that simplified writing the test code quite a bit (it probably allowed me to prepare the various file objects just once, instead of for every load_cert_chain() method invocation):

>>> import io
>>> b = io.BytesIO(b"abc")
>>> b.read()
b'abc'
>>> b.read()
b''

vs.

>>> b = io.BytesIO(b"abc")
>>> b.read = b.getvalue
>>> b.read()
b'abc'
>>> b.read()
b'abc'
}


PySSL_BEGIN_ALLOW_THREADS_S(pw_info.thread_state);

This comment has been minimized.

Copy link
@pitrou

pitrou Jun 29, 2017

Member

Can a password be asked for the certificates (as opposed to the private key)? This sounds a bit weird.

This comment has been minimized.

Copy link
@jgehrcke

jgehrcke Jun 30, 2017

Author

The "certificate file" is actually allowed to contain the private key: https://docs.python.org/3/library/ssl.html#combined-key-and-certificate. That's one of the reasons why this function's input space is so gigantic.

In an extreme (but sane) case this file contains a password-protected private key, an end-entity certificate, and N intermediate CA certificates.

Edit: That argument might not be satisfying, I had all the relevant code comments in place in my dev branch a year ago -- will try to clarify that in a future comment here. In any case this behavior is tested here: https://github.com/python/cpython/pull/2449/files#diff-e144a9cacd10921d4dae0aeac0300a6fR1045

This comment has been minimized.

Copy link
@jgehrcke

jgehrcke Sep 30, 2017

Author

Hey @pitrou. I got around to looking into this again. My argument above was not the correct argument to answer your question. I went into the code and docs again and understood the reasoning again. There are two points of relevance here:

1) cert files might indeed be encrypted

From https://www.openssl.org/docs/man1.0.2/ssl/SSL_CTX_use_certificate_chain_file.html:

The private keys loaded from file can be encrypted. In order to successfully load encrypted keys, a function returning the passphrase must have been supplied, see SSL_CTX_set_default_passwd_cb. (Certificate files might be encrypted as well from the technical point of view, it however does not make sense as the data in the certificate is considered public anyway.)

(emphasis mine)

2) Consistency

That is the code block in question from my patch:

    PySSL_BEGIN_ALLOW_THREADS_S(pw_info.thread_state);
    r = _openssl_use_certificate_chain_from_bio(self->ctx, certbio->bio);
    PySSL_END_ALLOW_THREADS_S(pw_info.thread_state);

whereas _openssl_use_certificate_chain_from_bio() is by definition meant to closely resemble OpenSSL's SSL_CTX_use_certificate_chain_file(). Both can handle the password callback. As of reason (1), I suppose. From the source of SSL_CTX_use_certificate_chain_file() in OpenSSL 1.1.0:

    x = PEM_read_bio_X509_AUX(in, NULL, passwd_callback,
                              passwd_callback_userdata);

If _openssl_use_certificate_chain_from_bio() or SSL_CTX_use_certificate_chain_file() triggers the password callback I believe we better have the wrapper in place.

That's probably what one of the earlier authors thought, too, and so the same ALLOW_THREADS wrapper around SSL_CTX_use_certificate_chain_file() has always been present in CPython so far:

    PySSL_BEGIN_ALLOW_THREADS_S(pw_info.thread_state);
    r = SSL_CTX_use_certificate_chain_file(self->ctx,
        PyBytes_AS_STRING(certfile_bytes));
    PySSL_END_ALLOW_THREADS_S(pw_info.thread_state);

Notably, I do not think that we cover the case of loading an encrypted certificate (where there is no key also present in the cert file) within test_ssl.

This comment has been minimized.

Copy link
@jgehrcke

jgehrcke Oct 1, 2017

Author

I have added a commit with a hopefully clarifying code comment.

Lib/ssl.py Outdated
keyfile_path = keyfile

# Pre CPython 3.7 behavior: let OpenSSL consume the files via
# SSL_CTX_use_certificate_chain_file().

This comment has been minimized.

Copy link
@pitrou

pitrou Jun 29, 2017

Member

Why not read the files in Python and fall through below? This would allow deleting a bunch of C code.

This comment has been minimized.

Copy link
@jgehrcke

jgehrcke Jun 30, 2017

Author

It's intriguing. You are asking this now and I feel like "mhm, I think there was a pretty strong reason to not do it, because I tried hard, but what was the reason again?".

I am glad that I wrote a rather complete summary one year ago. It revived my memories... this is an excerpt from it:


[...]

An important note on backwards compatibility:

>> Antoine, are you suggesting that we remove the current c-level
>> capability to use file system files (using open()) and just go with
>> raw bytes data at the C api level, and then do the 'filename or
>> filelike object' in Python land?
>
> Yes, I think that's reasonable. Hopefully it won't break any existing uses.

I absolutely agree that Python code should be responsible for doing the
distinction between file paths and file objects. However, I do not agree with

> remove the current c-level capability to use file system files

After following the code I am rather sure that we should leave the current C
implementation (`_ssl__SSLContext_load_cert_chain_impl()` in Modules/_ssl.c)
as untouched as possible, and not try to integrate the new logic into that
very same function.

[...]

The central argument for having two separate C code paths is that old code
must proceed using OpenSSL's SSL_CTX_use_certificate_chain_file().

[...]

A bit of background on this can be found in the summary I linked to. I tried hard to understand the OpenSSL docs and the corresponding C code in the CPython repository, and tried hard to come up with what you were suggesting but I failed achieving the level of confidence that's required here. I was unable to achieve the desired level of confidence as of inherently bad OpenSSL API docs, and (this is what I vaguely remember) a few obvious mismatches between code and docs. In my summary I also wrote

When inspecting the OpenSSL code for SSL_CTX_use_certificate_chain_file(), some
of the above statements are not entirely correct.

(referring to statements from API docs)

So I could not rely on doc. Then I probably thought that the only way to gain confidence was via a proper test suite. In this particular case here, however, we probably agree that a test suite that actually matters does not really exist. It's comprised of all the applications in the world that use this code path. As of certificate diversity and pure combinatorics the input space into this function is super large.

From here it was rather easy to conclude for me that the old code path is better left intact, and that having a switch/shunt at a higher level is much safer in terms of backwards compatibility.

Merge remote-tracking branch 'origin/master' into address-issue-16487
Manually resolved conflicts in

	Lib/test/test_ssl.py
	Modules/clinic/_ssl.c.h
@jgehrcke

This comment has been minimized.

Copy link
Author

commented Oct 1, 2017

@pitrou @orsenthil @tiran I would very much appreciate to get another round of feedback. I am available for pushing this further, so that maybe we can still land this in 3.7. With all the information present across the issue tracker and this GH conv I want to stress again what's probably easy to miss: this is meant to be a conceptually backwards-compatible change, because the old code path remains intact.

Updated:

  • Responded more in-depth to review.
  • Merged current master into this branch; manually resolved conflicts (that was required especially after #3058 and #1955 landed recently).
  • Added code comments based on review feedback.
  • Built successfully on Fedora 26's gcc version 7.2.1 (new dereferencing pointer to incomplete type errors newly appeared compared to my work from a year ago).
  • Confirmed that ./python -m test -v test_ssl passes (one error message emitted by OpenSSL seems to have changed from OpenSSL 1.0.2 to 1.1.0, and I changed one assertRaisesRegex(SSLError, ...) back to simply assertRaises(SSLError))
  • Added NEWS entry.

jgehrcke added some commits Oct 1, 2017

@pitrou

This comment has been minimized.

Copy link
Member

commented Oct 2, 2017

I'm still surprised that this functionality needs so much support code, but I'll let @tiran review it.

@jgehrcke

This comment has been minimized.

Copy link
Author

commented Oct 2, 2017

I'm still surprised that this functionality needs so much support code, but I'll let @tiran review it.

Gotcha. At the core that's because there is unfortunately no such thing like SSL_CTX_use_certificate_chain_bio() provided by the OpenSSL API: a bio pendant corresponding to SSL_CTX_use_certificate_chain_file() would be super convenient.

That's what this patch adds and what's called _openssl_use_certificate_chain_from_bio().

Thanks for giving this another quick thought @pitrou, much appreciated.

Edit: I've tried to write this before, but I want to retry with more clarity: _openssl_use_certificate_chain_from_bio() is effectively static int use_certificate_chain_file(SSL_CTX *ctx, SSL *ssl, const char *file) straight from the OpenSSL source, just without BIO creation.

OpenSSL's use_certificate_chain_file(SSL_CTX *ctx, SSL *ssl, const char *file) expects a file path as input and then immediately performs BIO creation from a file (via BIO_read_filename(in, file)), and then processes this BIO with the "OpenSSL cert chain loading logic" (a term I will use again below).

In this patch, we create the BIO using the nice existing Python API for Memory BIO management (which was added in Python 3.5). That results in PySSLMemoryBIO objects that hold the OpenSSL-native bio structs internally.

The input to the _openssl_use_certificate_chain_from_bio() added by this patch is precisely the native OpenSSL Bio struct. The function then proceeds with the "OpenSSL cert chain loading logic" copied from OpenSSL' use_certificate_chain_file(SSL_CTX *ctx, SSL *ssl, const char *file).

This is why _openssl_use_certificate_chain_from_bio() is OpenSSL's use_certificate_chain_file(SSL_CTX *ctx, SSL *ssl, const char *file) modulo BIO creation.

That

  1. leaves the old use_certificate_chain_file()-based code path intact (for conceptually ensuring backwards compat).
  2. should conceptually ensure that the new code path has the same effects on a given SSLContext when using the same input data, just from memory instead of from file.

@brettcannon brettcannon requested a review from tiran Mar 27, 2019

@tiran
Copy link
Member

left a comment

I'm still -1 on this change. @reaperhulk and I are going to have another stab in PEP 543, which will provide a better way to load certs and keys. As Antoine pointed out, it's a lot of additional code. I don't want to maintain additional code and multiple APIs to load key and cert material.

@bedevere-bot

This comment has been minimized.

Copy link

commented Mar 28, 2019

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.