Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Invalid certificate for binderhub #21

Closed
ltetrel opened this issue Nov 11, 2019 · 28 comments
Closed

Invalid certificate for binderhub #21

ltetrel opened this issue Nov 11, 2019 · 28 comments
Assignees
Labels
bug Something isn't working help wanted Extra attention is needed Priority: HIGH

Comments

@ltetrel
Copy link
Member

ltetrel commented Nov 11, 2019

Sometimes, when destroying/creating an instance few times, we have a certificate error on the binderhub instance. As an effect we cannot use https..

@ltetrel ltetrel added bug Something isn't working help wanted Extra attention is needed Priority: HIGH labels Nov 11, 2019
@ltetrel
Copy link
Member Author

ltetrel commented Nov 11, 2019

Related to this issue :
jupyterhub/binderhub#284

@ltetrel
Copy link
Member Author

ltetrel commented Nov 14, 2019

This seems to be related to this :
https://community.letsencrypt.org/t/end-of-life-plan-for-acmev1/88430
from pod kube-lego-kube-lego-75964c7d5b-5257k

@ltetrel
Copy link
Member Author

ltetrel commented Nov 14, 2019

This issue prevent binderhub from creating new users :
[E 191114 00:49:44 launcher:76] Error accessing Hub API (using https://binder-wksh2.conp.cloud/jupyter/hub/api/users/zhangyu2ustc-gcn_tutorial_test-cmfssc2o): HTTP 599: SSL certificate problem: unable to get local issuer certificate

@ltetrel
Copy link
Member Author

ltetrel commented Nov 14, 2019

Important documentation on lets encrypt limit rate, which could explain our certificate issue :
https://letsencrypt.org/docs/rate-limits/

@ltetrel
Copy link
Member Author

ltetrel commented Nov 14, 2019

What I used so far to replace kube-lego by cert-manager :
https://docs.cert-manager.io/en/latest/tutorials/acme/quick-start/index.html

@ltetrel
Copy link
Member Author

ltetrel commented Nov 14, 2019

external IP from NGINX Ingress is pending for hours..

@ltetrel
Copy link
Member Author

ltetrel commented Dec 9, 2019

@anibalsolon gave me some details, thanks to him ! He will try following this tuto :
https://medium.com/@Amet13/wildcard-k8s-4998173b16c8

@ltetrel
Copy link
Member Author

ltetrel commented Jan 7, 2020

Here was the reply I got from Darne Boss (compute canada). Basically he is saying that the Load balancer cannot work on carbutus instances:

Loic Tetrel,

While we have instances of Kubernetes running on our clouds there are a few Kubernetes integrations with the OpenStack cloud driver which are not currently supported by our clouds and you have run into one of them. Our clouds to no support the LoadBalancer type for services because we don't have a load balancer API and implementation available. If you take a look at the default values used for the nginx-ingress controller, the Nginx reverse proxy uses a service that requests type: LoadBalancer. See line 259 in https://github.com/helm/charts/blob/master/stable/nginx-ingress/values.yaml.

What you can do is download this values.yaml file, change LoadBalancer to NodePort and instead of running an ip loadbalancer, 2 high numbered ports will be forwarded to the nginx proxy. To get cert-manager to work you will have to setup an external load balancer yourself on another vm instance which can be done using HAproxy or another Nginx reverse proxy. That vm will listen on ports 80/443 and connect to the k8s nodes on the high ports that are used in the NodePort service. There is a -f option to helm to load in the values from this values.yaml file which will override the default values.

Hopefully this is enough information to get you going.

@anibalsolon
Copy link
Member

Interesting.

So from my understanding, it is not possible from within the Kubernetes? Even with nginx load balancer.

@agahkarakuzu
Copy link
Collaborator

@anibalsolon that's kinda the solution, deploying an external load balancer and placing it in front of the k8 BinderHub cluster. We just cannot describe this in config files, it won't work out of the box because the OpenStack API available to CC is missing the functionality.

@agahkarakuzu
Copy link
Collaborator

agahkarakuzu commented Jan 7, 2020

If I remember correctly, we also need to migrate from kube-lego to cert-manager.

Yeah, @ltetrel already gave this a shot #21 (comment).

@ltetrel
Copy link
Member Author

ltetrel commented Feb 21, 2020

usefull ressource that helped me for debugging k8s networking : https://www.digitalocean.com/community/tutorials/how-to-inspect-kubernetes-networking

@ltetrel
Copy link
Member Author

ltetrel commented Feb 25, 2020

important reference if we don't have load balancer available:
https://kubernetes.github.io/ingress-nginx/deploy/baremetal/

@agahkarakuzu
Copy link
Collaborator

I gave metalb a brief try, after I saw it on one of the Gitter threads, could not get it running in the first try but would be a useful tool if CC is going to be the only option available. There was one more alternative to that, I’ll check my notes.

@ltetrel
Copy link
Member Author

ltetrel commented Feb 25, 2020

On my side I read/try intensively load balancing on k8s last week. I had lot of exchanges with Darren from compute canada, trying to debug the current configuration..

@agahkarakuzu
Copy link
Collaborator

On Arbutus or on the new OpenStack? Are you trying to debug the Helm Chart or the previous installation?

P.S. I installed Metallb using helm (https://github.com/helm/charts/tree/master/stable/metallb) instead of using manifest, trying to avoid direct k8s interaction as much as possible. If you are using Helm, you can just add it as a dependency and it'll bring that up.

@ltetrel
Copy link
Member Author

ltetrel commented Feb 25, 2020

On Arbutus,

@ltetrel
Copy link
Member Author

ltetrel commented Feb 27, 2020

some additionnal clues on why cert-manager not working:
cert-manager/cert-manager#2319 (comment)

@ltetrel
Copy link
Member Author

ltetrel commented Mar 5, 2020

Issue when trying to achieve the http01 challenge from cert-manager :

"error"="acme: authorization error for binder-dev.conp.cloud: 400 urn:ietf:params:acme:error:connection: During secondary validation: Fetching http://binder-dev.conp.cloud/.well-known/acme-challenge/IESIcs2PBiHLuNsHd7gFLRkxa-5o8vpIecL-MDMDHyg: Connection refused

I think this is due to some cloudflare protections because I did a huge number of http request these weeks (every time I create a binderhub infrastructure).

@agahkarakuzu
Copy link
Collaborator

agahkarakuzu commented Mar 5, 2020

Arbutus? I was wondering if you gave it a try with the Helm chart (https://github.com/agahkarakuzu/neurolibre-helm) with your new settings?

😆I would not imagine that attempts would arouse DDoS attack suspicion on cloudfare's end. Crazy.

@ltetrel
Copy link
Member Author

ltetrel commented Mar 5, 2020

You have other suggestions why this error ?

@agahkarakuzu
Copy link
Collaborator

I don’t think that I can have any with my questions unanswered.

I did not run into any of these issues with the Helm chart I used.

@ltetrel
Copy link
Member Author

ltetrel commented Mar 10, 2020

So this issue is finally resolved.
There were issues with let's encrypt limit rate (so I am using staging environment), cloudflare bandwith limits (I just gave some days to let cloudflare rest!), cert-manager web requests (timing and k8s networking issues), and not using a load balancer (using ClusterIP ingress type instead of NodePort or LoadBalancer).
And of course some configuration files that were not correct, you can check the working scripts mostly here.

I updated the instructions in consequence.

@ltetrel ltetrel closed this as completed Mar 10, 2020
@pbellec
Copy link
Member

pbellec commented Mar 10, 2020

It would be useful to summarize these issues in a blog post, to share with the mybinder community.
The post could be hosted on this github repository's wiki.

@ltetrel
Copy link
Member Author

ltetrel commented Mar 10, 2020

@ltetrel
Copy link
Member Author

ltetrel commented Jun 18, 2020

Another issue came up with the cert-manage webhook:
cert-manager/cert-manager#2918 (comment)

@ltetrel
Copy link
Member Author

ltetrel commented Jun 18, 2020

Issue with resolver:
cert-manager/cert-manager#3021

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help wanted Extra attention is needed Priority: HIGH
Projects
None yet
Development

No branches or pull requests

4 participants