Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

End-to-end SSL #2055

Merged
merged 71 commits into from
Oct 17, 2018
Merged

End-to-end SSL #2055

merged 71 commits into from
Oct 17, 2018

Conversation

tgmachina
Copy link
Contributor

@tgmachina tgmachina commented Jul 23, 2018

These changes are in service of turning on SSL internally for JupyterHub (issue #370).

To generate certs, I created a separate module Certipy that wraps pyOpenSsl. Certipy has also been uploaded to PyPI.

To test these changes, I added 3 separate test files that turn on internalSSL via monkey patching. Tests intermittently fail when run together, but pass individually. @minrk had initially suggested making all requests to the hub public_url, but I wasn't able to work out a good scheme for that. I set up the 3 separate test files to get these changes through review ASAP, but I'd be happy to iterate on changing these tests.

These changes must be used in conjunction with changes made to CHP in order to work (likely automated github testing will fail because of this). I wasn't entirely sure how to specify a certain CHP version as a requirement.

Happy to iterate on whatever changes you see necessary. Thanks for your help and patience!

Add Localhost to trusted alt names

Update to match refactored certipy names

Add the FQDN to cert alt names for hub

Ensure notebooks do not trust each other

Drop certs in user's home directory

Refactor cert creation and movement

Make alt names configurable

Make attaching alt names more generic

Setup ssl_context for the singleuser hub check
Import socket when needed

Move pwd import since more than one thing uses it.
Setup general ssl request, not just to api

Basic tests comprised of non-ssl test copies

Create the context only when request is http

Refactor ssl key, cert, ca names

Configure the AsyncHTTPClient at app start

Change tests to import existing ones with ssl on

Override __new__ in MockHub to turn on SSL
Copy link
Member

@minrk minrk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great!

I need to figure out a bit how the cert-staging step is going to work for container-based deployments, but I suspect separating .create_certs and .move_certs is already most of what's needed.

We'll need to make the API for .move_certs very clear, since most Spawners will need to reimplement it. As I understand it:

  • key_pair is the dict of keyfile, certfile, cafile paths on the Hub filesystem
  • return value is (keyfile, certfile, cafile) paths on the notebook server filesystem

e.g. on Docker this could mean creating a docker volume and staging the files into it. On Kubernetes, it could mean creating a Secret.

It would be good to document exactly what files are created, shared, etc. and when, so we know exactly what files are needed by which processes and when, and which are recreated. One important use case is the proxy being in a separate pod/container from the Hub, so we need to make sure that we can create the certs the proxy needs prior to hub start.

if self.internal_ssl:
key, cert, ca = self.move_certs(self.create_certs())

args.append('--keyfile="%s"' % key)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's do this in env instead of args. It's easier to pass things in environment variables with certain deployments than CLI args, and makes things more flexible for alternate entrypoints.

except KeyError:
self.log.debug("User {} not found on system, not moving certs".format(self.user.name))

return [key, cert, ca]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe return the dict instead of tuple, so it is clearer what's what. That will also make it easier to preserve backward-compatibility in the future.



def upgrade():
op.add_column('servers', sa.Column('certfile', sa.Unicode(4096)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We only need to persist values in the database if these are not predictable per-server. If these are the same (or even deterministic) for all servers, as I think they are, we can leave them out of the database and have them live only in memory.

@@ -1641,6 +1709,52 @@ def write_pid_file(self):
cfg = self.config.copy()
cfg.JupyterHub.merge(cfg.JupyterHubApp)
self.update_config(cfg)
if self.internal_ssl:
from certipy import Certipy
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's put this in a .init_internal_ssl method for slightly easier organization

home = user.pw_dir

# Create dir for user's certs wherever we're starting
out_dir = "{home}/.jupyter/certs".format(home=home)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's call this .jupyterhub/jupyterhub-certs just to be clear

@tgmachina
Copy link
Contributor Author

For the case where the proxy lives separately from the hub, would it be worthwhile to enable a command like jupyterhub --generate-config except to set up the initial certificate authorities/certs that internal_ssl uses, for example, something like: jupyterhub --generate-certs? That would allow users to get and move the certs they need manually.

As for your assessment of .move_certs, that is correct. I have to think about the container case a bit more too, because its also important to get the appropriate domain/ip info into the cert in order for ssl verification to work. That capability exists with the use of alt_names at certificate creation, but I am unsure if the info is available or known before container start. I'll think about that some more and work on clarifying this API, good call.

I'm working through these changes now. I think the only thing I'm curious about are the docs. I was thinking of expanding the docstrings for .create_certs and .move_certs as well as including more notes within the proxy.md, spawner.md and websecurity.md docs. Seem reasonable?

@tgmachina tgmachina changed the title End to end ssl End-to-end SSL Jul 27, 2018
@minrk
Copy link
Member

minrk commented Aug 1, 2018

something like: jupyterhub --generate-certs?

That's worth a try. I'm not 100% sure what will work best.

because its also important to get the appropriate domain/ip info into the cert in order for ssl verification to work

oooh, that makes things more difficult because it might be hard/impossible to know this before the server starts in some cases. I'm not quite sure how to deal with that.

I was thinking of expanding the docstrings for .create_certs and .move_certs as well as including more notes within the proxy.md, spawner.md and websecurity.md docs. Seem reasonable?

Yes, absolutely!

@tgmachina
Copy link
Contributor Author

Ok, just as an update, here's what I'm adding/changing/pondering:

  • Nearly done with updating the docs, just pending the changes below.
  • The Proxy class will have reference to its own certs either:
    • Created and signed on Proxy init using the Hub CA (basically how its done now).
    • Created by some third-party mechanism (Let's Encrypt, Certipy, self-signed, etc) for which the signing CA file is specified in the proxy config so that trust can be propagated appropriately.
  • Docker/Kube cases w.r.t internal_ssl:
    • Swarm/Kube managed deployments: as I understand them, the networking can be set such that containers are connected with tls. If containers/pods running the single-user notebooks can't reach each other, this seems like it would have about the same effect as using internal_ssl.
    • Docker does have a mechanism for specifying an ip address prior to launch. This would entail drawing a random, unused ip from Docker's reserved subnet for container ips. I can investigate what enabling that would entail.

To better accommodate external certificate management
as well as building of trust, Certipy was refactored.
This included general improvements to file and
record handling. In the process, some of Certipy's
APIs changed slightly, but should be more stable now
going forward.
This is used to be able to access JupyterHub's CA
information and (manually) move it to components
that need them (like externally managed proxies).
This reverts commit bcebf0e.

Setting change-origin introduces CORS problems
can have consequences if args are re-used
avoids leaving lingering proxy if app fails to start
must be module-scoped, not session-scoped, or it will get reused inconsistently
avoids possible conflict e.g. if a user had the name 'hub-internal'
- trust subdomain_host by default
- JupyterHub.trusted_alt_names is inherited by Spawners by default. Do we need Spawner.ssl_alt_names to be separately configurable?
to cover any protocol mismatches
@minrk
Copy link
Member

minrk commented Oct 16, 2018

Okay, after resolving the subdomain issues with trusted_alt_names, I believe this is totally working. I just need to update the tests with real-database backends, because test_api can't actually be run multiple times against the same database.

@minrk minrk merged commit 2d94b29 into jupyterhub:master Oct 17, 2018
@minrk
Copy link
Member

minrk commented Oct 17, 2018

Woo, it works!

@consideRatio
Copy link
Member

Wooooooooooooo wow this PR! Massive work done here! I've been keeping an eye on it, well done @tgmachina and @minrk !!! 🎉

@tgmachina
Copy link
Contributor Author

Wow! Thanks so much for all your help @minrk!

@tgamblin
Copy link

This is awesome! Thanks @tgmachina and @minrk!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants