-
Notifications
You must be signed in to change notification settings - Fork 791
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use DNS names instead of IPv4 addresses to be IPv6 friendly #1643
Conversation
This change also gets rid of some deprecated JupyterHub configuration settings which cause warnings during startup
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for helping to push the project forward!
Switching to DNS sounds reasonable but see what everyone else thinks first. I suspect the use of environment variables is due to the age of this project.
Would you mind writing a bit about how you expect this to behave on an IPV4-only cluster?
Okay, CI stills fails because Tornados httpserver does special things here: https://github.com/tornadoweb/tornado/blob/master/tornado/netutil.py#L144 |
Okay, CI is now green :) kind has support for IPv6. Is there some interest to extend the CI pipeline with a test case on an IPv6 Kubernetes cluster? |
More tests are always good 🙂 Perhaps you could combine it with the Helm 3 test to avoid adding another matrix job? zero-to-jupyterhub-k8s/.travis.yml Line 64 in 6f7505f
@consideRatio thoughts on testing this, and also on whether to replace the env-vars with K8s DNS? |
I choose to use env variables to link rather than DNS because I had not installed kube-dns in the hand-build wikimedia cluster I started this codebase on :D So @manics is right about the age... |
My take on testing
|
I'm planning to update the Helm 3 docs for Helm 3.2.0 (includes a |
@yuvipanda I switched the service discovery to DNS now. @consideRatio Thanks for the review. I addressed your comments in my last commit @manics IPv6 in Travis proofs to be difficult as Travis does not support IPv6 in the VM based runners: https://docs.travis-ci.com/user/reference/overview/#virtualisation-environment-vs-operating-system |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added some code suggestions to stop assuming ports used by various k8s services as hardcoded.
@@ -74,11 +73,18 @@ def camelCaseify(s): | |||
cfg_key = camelCaseify(trait) | |||
set_config_if_not_none(c.JupyterHub, trait, 'hub.' + cfg_key) | |||
|
|||
c.JupyterHub.ip = os.environ['PROXY_PUBLIC_SERVICE_HOST'] | |||
c.JupyterHub.port = int(os.environ['PROXY_PUBLIC_SERVICE_PORT']) | |||
c.JupyterHub.bind_url = f'http://proxy-public:{os.environ['PROXY_PUBLIC_SERVICE_PORT']}' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@minrk can you help me understand this with the JupyterHub docs?
Is it correct, that since we don't let Z2JH's JupyterHub process create the CHP
proxy as a separate process, the configuration of bind_url is pointless in z2jh?
-
bind_url
## The public facing URL of the whole JupyterHub application. # # This is the address on which the proxy will bind. Sets protocol, ip, base_url # Default: 'http://:8000' # c.JupyterHub.bind_url = 'http://:8000'
-
hub_connect_url
## The URL for connecting to the Hub. Spawners, services, and the proxy will use # this URL to talk to the Hub. # # Only needs to be specified if the default hub URL is not connectable (e.g. # using a unix+http:// bind url). # # .. seealso:: # JupyterHub.hub_connect_ip # JupyterHub.hub_bind_url # # .. versionadded:: 0.9 # Default: '' # c.JupyterHub.hub_connect_url = ''
-
hub_bind_url
## The URL on which the Hub will listen. This is a private URL for internal # communication. Typically set in combination with hub_connect_url. If a unix # socket, hub_connect_url **must** also be set. # # For example: # # "http://127.0.0.1:8081" # "unix+http://%2Fsrv%2Fjupyterhub%2Fjupyterhub.sock" # # .. versionadded:: 0.9 # Default: '' # c.JupyterHub.hub_bind_url = ''
if ipv6_enabled_cluster | ||
c.JupyterHub.hub_bind_url = f'http://[::]:{hub_container_port}' | ||
else: | ||
c.JupyterHub.hub_bind_url = f'http://0.0.0.0:{hub_container_port}' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@minrk @stv0g I was considering this limitation about only being able to listen either on IPv4 or IPv6? I could not find any clear confirmation on this.
Perhaps though JupyterHub ends up limiting us even though Tornado's httpserver does. What JupyterHub ends up doing to actively listen is this call here, but before this call, some processing of the configuration is made.
I wonder if it would work to set this to fhttp://:{hub_container_port}
instead, as my understanding is that tornado would support both IPv4 or IPv6 if a blank string or None value was used for the IP.
Hmmm... But looking more in depth, i see that there seem to be three ways to start Tornado, and we seem to follow the "single process" pattern. See: https://www.tornadoweb.org/en/stable/httpserver.html#http-server.
I find that the listen
function we use is documented here, in the TCPServer class which HTTPServer is derived from.
The interesting part is defined in the bind
function documentation.
Address may be an empty string or None to listen on all available interfaces. Family may be set to either socket.AF_INET or socket.AF_INET6 to restrict to IPv4 or IPv6 addresses, otherwise both will be used if available.
It sounds to me, that as long as we pass an empty string when calling listen
, we should be able to listen to both IPv4 and IPv6 - could this be correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to use http://:8081
and that worked well for IPv4, but based on the Tornado documentation I think it will also work for IPv6 as well.
A discussion for this commit can be found in jupyterhub#1643 (comment), but in short it may allow for the Tornado webserver to listen both on IPv4 and IPv6 traffic instead of only one or the other.
PR Summary by @consideRatio
*_SERVICE_HOST
environment variables were replaced with the DNS names of the services.hub_connect_ip
andhub_connect_port
were combined intohub_connect_url
(how to reach the hub from other pods)hub_ip
were replaced withhub_bind_url
(what ports the JupyterHub process listen to within the container)ip
and andport
were not combined tobind_url
but instead removed, as this is configuration influencing a CHP proxy started up by JupyterHub, which we don't do at all.Original PR text
This PR lets JupyterHub use an IPv6 compatible configuration and allows us to run z2jh on an IPv6 only Kubernetes cluster.
I am not really convinced by the use of the old "Docker-style" link environment variables.
From my understanding best practice for service discovery in Kubernetes is by using DNS.
Is there any reason why DNS is not being used by z2jh?
If there are no objections, I will update my PR to rely on DNS..
This would also have avoided the problems we now encounter with IPv6 compatability.
This PR also eliminates warnings about deprecated JupyterHub configuration settings during startup. Most of the
*_port
and*_ip
settings have been replaced by an*_url
variant.This also closes #1624