New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Traefik in the autohttps pod should reprovision on redeploy if needed #1602
Comments
Thanks for finding and debugging this. I am not sure exactly what the way forward is but I'd start investigating a solution like zero-to-jupyterhub-k8s/jupyterhub/templates/hub/deployment.yaml Lines 26 to 28 in a5127ae
We'd need to find the right bit of config to include in the sha256. Probably something from the Ingress objects? The problem is a pod only gets recreated if something about its configuration changes, which in the case of pointing the domain name to the right IP doesn't happen. Maybe cert-manager would have retried getting the certificate but my guess would be that the wait time for that is quite long compared to wanting to deploy things. |
Thank you for a clearly written and formatted issue @mikebranski! ❤️ Note that We want to have Traefik try the ACME challenge interaction again with Let's Encrypt when the domain name points to the proxy-public loadbalancer IP. I don't think there is a sensible mechanism to do so.
SummaryI don't see a fix other than documenting that one may need to do a |
@consideRatio this issue has been bugging me for a while... It seems that even if the domain name points to the I use Am I missing something? I would be happy to help with a PR documenting that or give a shot at a fix, let me know :) |
I opened #2150 @pvanliefland ! |
Let's say I'm setting up a new install at
jhub.example.com
on AWS EKS. I need to point that host to theproxy-public
pod sitting in front of the hub, which I can't do until it's been deployed and I can runkubectl get svc proxy-public
. The problem is the cert-manager'straefik
container is going to fail to provision a certificate from letsencrypt becausejhub.example.com
isn't pointing to anything yet and it can't complete the ACME challenge.That makes sense. Now, if I point
jhub.example.com
to the proxy's public URL and re-deploy, I would expect cert-manager to try provisioning again, but it doesn't. Theautohttps-*
pod does not get recreated, either.The hub is still unreachable – both through the proxy's public URL and the subdomain – and there's only one new entry about upgrading in the logs.
Finally, if I then delete the
autohttps-*
pod, it gets recreated and attempts another provision, which succeeds and everything loads as I'd expect.This was my first foray into JupyterHub and Kubernetes, so I could be missing something very elementary, but I've been working with the 0.9.x chart for a few months and this has been plaguing me the entire time. Am I missing something or doing something incorrectly, or could this behavior be changed or called out to be more clear?
Here is our
values.yml
for posterity. Our real one has a lot more to it, but the issue was reproducible with just this subset.The text was updated successfully, but these errors were encountered: