Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bump r2d version on prod #47

Merged
merged 6 commits into from
May 1, 2019
Merged

bump r2d version on prod #47

merged 6 commits into from
May 1, 2019

Conversation

jhamman
Copy link
Member

@jhamman jhamman commented Apr 16, 2019

No description provided.

@jhamman
Copy link
Member Author

jhamman commented Apr 23, 2019

Though the deployment CI passed, there seems to be a problem with staging:

 ~  kubectl get pods --namespace staging                                                                                                                                                                              ✔  10624  13:10:09
NAME                                                   READY   STATUS              RESTARTS   AGE
binder-669bf6fd6d-md2vr                                1/1     Running             0          7d
hub-7f46cdccdc-zr6n6                                   1/1     Running             0          7d
proxy-6b86d4679b-6l8hw                                 1/1     Running             0          7d
staging-dind-jw589                                     1/1     Running             0          1h
staging-dind-jxx6b                                     0/1     CrashLoopBackOff    2013       7d
staging-dind-ldxhb                                     1/1     Running             0          1h
staging-dind-pdch9                                     1/1     Running             0          1h
staging-dind-qp42x                                     1/1     Running             0          1h
staging-dind-qpzj9                                     1/1     Running             0          1h
staging-dind-wbjnh                                     1/1     Running             0          1h
staging-grafana-68cc4f9db7-7lqb9                       1/1     Running             0          7d
staging-image-cleaner-259sv                            0/1     ContainerCreating   0          7d
staging-image-cleaner-96mwp                            1/1     Running             0          1h
staging-image-cleaner-cjpml                            1/1     Running             0          1h
staging-image-cleaner-g7ktn                            1/1     Running             0          1h
staging-image-cleaner-mrt54                            1/1     Running             0          1h
staging-image-cleaner-tnnp6                            1/1     Running             0          1h
staging-image-cleaner-x57pc                            1/1     Running             0          1h
staging-kube-lego-8448dcd848-xltxg                     1/1     Running             0          7d
staging-nginx-ingress-controller-8469cbdc78-bc77x      1/1     Running             0          7d
staging-nginx-ingress-controller-8469cbdc78-vl6n2      1/1     Running             0          7d
staging-nginx-ingress-default-backend-69d96f4b-7jp6w   1/1     Running             0          7d
 ~  kubectl logs --namespace staging staging-dind-jxx6b                                                                                                                                                               ✔  10625  13:10:18
time="2019-04-23T20:09:29.121108491Z" level=warning msg="could not change group /var/run/dind/docker.sock to docker: group docker not found"
Failed to load listeners: can't create unix socket /var/run/dind/docker.sock: is a directory

This seems to be entirely related to our recent addition of dind to the chart. I suspect @yuvipanda will know what this means but I'm at a loss.

@yuvipanda
Copy link
Member

Are your prod and staging deployments in the same cluster? That means the DIND daemonsets for both deployments are racing for access to /var/run/dind on all your nodes, causing failures like this.

Solution would be to make the host dind path something like /var/run/dind/{{Release.Name}}/docker.sock or something like that.

@jhamman
Copy link
Member Author

jhamman commented Apr 30, 2019

@yuvipanda - yes, both prod and staging are in the same GKE cluster (different namespaces). Is this not how its done with mybinder.org?

Can you, by chance, point to an example where the host dind path is set? Googling isn't getting me anywhere.

@jhamman
Copy link
Member Author

jhamman commented Apr 30, 2019

I'm also wondering if this namespace isolation of dind shouldn't happen in the binderhub chart? I'm thinking it would be best set here: https://github.com/jupyterhub/binderhub/blob/master/helm-chart/binderhub/templates/dind/daemonset.yaml

@yuvipanda
Copy link
Member

https://github.com/jupyterhub/binderhub/blob/master/helm-chart/binderhub/templates/dind/daemonset.yaml#L34 Should let you do it I think. On mybinder we have separate clusters for staging and prod

@jhamman
Copy link
Member Author

jhamman commented May 1, 2019

Thanks @yuvipanda - seems to be working on staging, giving it a try on prod.

@jhamman jhamman merged commit 4689924 into prod May 1, 2019
@@ -16,6 +16,9 @@ binderhub:
hosts:
- binder.pangeo.io

dind:
hostSocketDir: /var/run/dind/prod/docker.sock
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be /var/run/dind/prod, without the docker.sock, since it is only specifying the socket directory and not the socket itself.

@yuvipanda
Copy link
Member

I think you also need to set dind.hostLibDir to /var/lib/dind/staging or /var/lib/dind/prod. Sorry I missed this last time.

@yuvipanda
Copy link
Member

I think you might need to ssh to your nodes and rm -rf the current stuff under /var/run/dind before merging the changes, since you'll already have a directory created at /var/run/dind/staging/docker.sock, so can't make a socket there

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants