Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create only one service for JupyterHub and make type a parameter #99

Closed
jlewi opened this issue Jan 6, 2018 · 8 comments
Closed

Create only one service for JupyterHub and make type a parameter #99

jlewi opened this issue Jan 6, 2018 · 8 comments

Comments

@jlewi
Copy link
Contributor

jlewi commented Jan 6, 2018

We are currently creating two services for JupyterHub; one of type ClusterIp and one of type LoadBalancer.

   jupyterHubService: {
	  "apiVersion": "v1", 
	  "kind": "Service", 
	  "metadata": {
	    "labels": {
	      "app": "tf-hub"
	    }, 
	    "name": "tf-hub-0",
	    namespace: namespace,
	  }, 
	  "spec": {
	    "clusterIP": "None", 
	    "ports": [
	      {
	        "name": "hub", 
	        "port": 8000
	      }
	    ], 
	    "selector": {
	      "app": "tf-hub"
	    }
	  }
   },

   jupyterHubLoadBalancer: {
	  "apiVersion": "v1", 
	  "kind": "Service", 
	  "metadata": {
	    "labels": {
	      "app": "tf-hub"
	    }, 
	    "name": "tf-hub-lb",
	    "namespace": namespace,
	  }, 
	  "spec": {
	    "ports": [
	      {
	        "name": "http", 
	        "port": 80, 
	        "targetPort": 8000
	      }
	    ], 
	    "selector": {
	      "app": "tf-hub"
	    }, 
	    "type": "LoadBalancer"
	  }
	},

We should just create a single service and make the type a configurable parameter.

@foxish @yuvipanda Am I missing something? Is there any reason to have two services?

@jlewi
Copy link
Contributor Author

jlewi commented Jan 7, 2018

It looks like the first service is a headless that is needed by the individual Jupyter servers to connect back to the hub. It looks like this needs to be a headless service.

Whereas the second service is used to create an entrypoint from outside the cluster for users to connect to.

@yuvipanda
Copy link
Contributor

While I haven't looked at the code yet (been swamped!), I suspect you need the headless service because the pods need to know where to find the hub, and by default this just uses the hub's hostname. Since you're using a statefulset, that would be tf-hub-0, and you need the extra headless service.

This is configurable - see the 4 hub_connect_* options used in https://github.com/jupyterhub/zero-to-jupyterhub-k8s/blob/master/images/hub/jupyterhub_config.py#L128 to get an idea of how to configure it. Setting those in your config should allow you to get rid of the headless service.

@jlewi
Copy link
Contributor Author

jlewi commented Jan 7, 2018

Thanks. So my conclusion is we do in fact need two services

  • Service is intended for internal consumption by the spawned pods with the hub
  • 2nd service is intended for users to connect to the hub.

@jlewi jlewi closed this as completed Jan 7, 2018
@yuvipanda
Copy link
Contributor

yuvipanda commented Jan 7, 2018 via email

@jlewi
Copy link
Contributor Author

jlewi commented Jan 8, 2018

I tried it and ran into problems. Specifically I tried having a single tf-hub-0 service of type ClusterIP.

I ran into an error like #78 with the Jupyter pods not being able to connect to the hub. My guess is that there is something about headless services and how these pods connect to the hub.

I didn't bother to investigate because it seems much cleaner to have 2 services.

Users will want to configure how they connect to JupyterHub; for example they might want to use an external loadbalancer; they might want to use ingress; they might want to use kubectl proxy. I can't think of a reason why external access to JupyterHub should affect how the pods connect to the hub. That just seems like its creating more ways for things to go wrong.

@yuvipanda
Copy link
Contributor

yuvipanda commented Jan 8, 2018 via email

@jlewi
Copy link
Contributor Author

jlewi commented Jan 8, 2018

Thanks for the input. I think we'll stick with two.

With two services the pod always use a headless service to connect to the hub. So we don't have to worry about configuration of the ingress to JupyterHub interfering with pod to hub communication.

@foxish
Copy link
Contributor

foxish commented Jan 8, 2018

I ran into an error like #78 with the Jupyter pods not being able to connect to the hub. My guess is that there is something about headless services and how these pods connect to the hub.

We could actually unify them. The headless service differs in the kind of DNS name it imparts to the pod, which is why it wasn't interoperable when you tried it I think. But I agree that they are serving different purposes - one is essential for jupyterhub to connect back and creates a DNS entry, the other is for exposing Jupyterhub outside of the cluster - so, I'm also in favor of keeping both of them around at least for now.

elenzio9 pushed a commit to arrikto/kubeflow that referenced this issue Oct 31, 2022
…eflow#99)

* Grant access to Johnu who wants to update devstats.

* Add johnu's gmail address
* Add container.admin permissions
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants