-
Notifications
You must be signed in to change notification settings - Fork 665
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
envoy container in CrashLoopBackOff : error initializing configuration #3264
Comments
The Do you have any error/status information for that and does your config match contour/examples/contour/03-envoy.yaml Line 98 in 78c434f
|
Here is the full describe of the pod :
The init container matches the suggested yaml configuration as you can see above, unless you spot any error in the configuration ? The kubectl logs -c envoy-initconfig for the pod are empty, maybe there is a way to access them in a different way, or increase debug level ? Thanks |
Hi @vinzo99, sorry you have this problem, it's definitely not good! The key error here appears to be '/config/resources/sds/xds-tls-certificate.json' not existing. That file is part of the system we use to secure the communication between Contour and Envoy. That system requires the Contour namespace (ie I'd start by checking if the secrets are present, and if the |
Hi @youngnick, I just checked what you suggested : 1°) the job has successfully run :
2°) here are the job details :
3°)
but Thanks ! |
@youngnick meanwhile I also tried and manually perform what Still no All secrets are now here :
Envoy pod status is still |
The
|
@stevesloka sure :
I can't exec
|
Could you try killing the Envoy pod and letting it restart? At one point, some folks did see an issue where the Envoy pod would try to start before the secrets were ready in the shared secret (but shouldn't happen because it's done in an initContainer). |
I already tried that, same result. And you're right, since the job is performed in the initContainer, the envoy container should end up getting started, which it does not, it keeps trying to restart indefinitely. See here, after 3+ days :
|
Hi, any hints on this issue ? |
Hi @vinzo99, you can see that @skriss has put this one in "Needs Investigation" in our project board. That means that one of us will need to try and reproduce the issue to see if we can figure out what's causing it. The way this whole setup works is:
So, I have a couple of questions for you:
|
Hi @youngnick, regarding your 2 questions : 1°) the pods can indeed create 2°) I killed the Envoy pod using this command :
I'm no sure what we can do to monitor the bootstrap process, apart from creating a dummy pod that recreates all actions performed by Thanks ! |
@vinzo99 I wonder if we're chasing the wrong thing, one item that might be an issue in your environment is the default I would expect some sort of error relating but just trying to think of what else it might be. If not the other path we could try is removing the initContainer & certGen and pushing the bits manually. |
FYI : when I first tried to deploy Contour with the default yaml files I faced the following issue when launching
I quickly solved this issue, by adding the following rules in
which leads us to the current state with Not sure this is what you meant though. Thanks ! |
@vinzo99 yup this might be it, but let's confirm. =) In the examples, the Envoy daemonset which deploys the Envoy pods has two Can you remove those and see if your pod spins up properly? I may not work because we need to swap the service values around, but that will tell us what the problem is and then where to go. Thanks! |
@stevesloka I removed both
Thanks ! |
Are you using Pod Security Policies? Can you share any information about your cluster? Is seems like @youngnick suggested something with the initContainer isn't working to create this default config. Let me see if I can pick out the bits into a configmap, have you apply that and see if you can get it working. |
@stevesloka the cluster has 2 main Pod Security Policies,
the following rules have been added in
In any case, Hope this helps ... Thanks ! |
I just spun up a minikube cluster with PSP enabled and I had to change a few things to get this to work:
I didn't need to modify the Which helm chart did you use? I can try and recreate that, I never use the helm chart, just use the examples, but want to double check that setup (maybe it's different than the contour repo). I can put together the files as well to avoid the initContainer, but wanted to double-check the helm chart bits. |
@stevesloka a few inputs : I tried and add a Regarding the other suggestions : unfortunately I need the Envoy pods to act as listeners for the ingress controller, and therefore listen on In order to achieve a configuration without a K8S LoadBalancer, with Host Networking (as explained here : https://projectcontour.io/docs/v1.11.0/deploy-options/#host-networking) I used the example charts provided here : https://github.com/projectcontour/contour/blob/release-1.11/examples/render/contour.yaml I had to make slight modifications to the charts, such as :
Those modifications have been successfully tested on the same cluster, on Envoy/Contour versions released before the introduction of the new certgen process a few months back. Thanks ! |
Thanks @vinzo99. I still think it's likely that the files are not getting created properly by the bootstrap. I thought of a way to check this, which is not ideal, but should work to let you check what the Envoy container is seeing. In the Envoy daemonset, replace the args:
- -c
- "sleep 86400"
command:
- /bin/bash This will just run a sleeping bash job instead of trying to run Envoy. Then you should be able to The things we need to know to find more about this are:
If the args:
- bootstrap
- /config/envoy.json
- --xds-address=contour
- --xds-port=8001
- --xds-resource-version=v3
- --resources-dir=/config/resources
- --envoy-cafile=/certs/ca.crt
- --envoy-cert-file=/certs/tls.crt
- --envoy-key-file=/certs/tls.key
command:
- contour The key one is the |
Hi @youngnick thanks for your suggestions ! Hi just replaced this part by the suggested one in the Envoy DaemonSet, in order to start a standard shell with 24hrs sleep instead of the envoy command, and get access to the container :
The envoy pod starts. For some reason I am not able to log into the container, the kubectl command returns right away, but I still can run single commands, that basically show that
I am able to touch a file in
but not in
I believe this directory is created by I also checked the bootstrap part, which seems correct :
Thanks ! |
Thanks for that @vinzo99, I think you may have missed the I'll check the permissions for the created directory, this sounds promising, that it's something about the directory creation that's the problem. Edit: Yes, I can see that this is a "envoy is not running as the root user" problem. The initContainer runs as root, but Envoy is running as the user I'd rather not make the |
Hi @youngnick ! I followed your suggestion and added a security context in the initContainer in order to launch it with the same user:group
Now the Envoy pod starts, but the containers on the edge node fail to bind on
I remember @stevesloka suggested to remove Here is the
Any hints on this ? FYI since the Envoy pod is now started I am able to log into the Envoy container, which should make troubleshooting a lot easier. Thanks ! |
Hey @vinzo99, after taking out the So your steps are:
|
Hi @stevesloka ! Like I said, I have no other choice but to use port I still get Permission denied to bind
Just to clear any doubts, I installed an older working version (Envoy 1.14.1 + Contour 1.4.0) on the same cluster, there is no binding issue as you can see here :
I compared configurations between old and current charts, apart from a few unrelated changes ( I believe something else is preventing Envoy to bind on Thanks ! |
Not being able to bind the 80 and 443 ports is either going to be related to the security context, or something weird going on with the hostPort thing. If you can post your Envoy Daemonset YAML, we can take a look, but without that, I'm not sure how much more we will be able to help. |
@youngnick sure ! The configuration is based on the template https://github.com/projectcontour/contour/blob/release-1.11/examples/render/contour.yaml, + the following :
I also suspect the issue is related to the security context workaround. Thanks ! |
Ah, I think that you may need to actually tell the security context that the pod needs to bind to low ports. This can be done either by having the pod run as privileged, or by adding the CAP_NET_BIND_SERVICE capability: securityContext:
runAsNonRoot: true
runAsUser: 4444
runAsGroup: 4444
capabilities:
add:
- NET_BIND_SERVICE When you are not running as root, you can't bind <1024 without that capability. Note that the PSP you've setup will permit this binding, because it allows all capabilities, but it won't add them for you. That's what the securityContext does. Edit: A great gotcha with capabilities is that you have to drop the |
Hi @youngnick Thanks for your suggestion, I tried it but unfortunately I still get the same error. I guess the capability needs to be added in the I also tried and set a global securityContext for the whole DaemonSet (which should apply to all containers), same. Since this issue appears to come from the securityContext workaround, maybe we can try an different approach : do you have any hint on a fix that would make Thanks ! |
We can change the The change should definitely explain why we do that and refer back to this issue though. |
Fixes: projectcontour#3264 Signed-off-by: Amey Bhide <amey15@gmail.com>
@youngnick I have a draft PR out: #3390 |
Yep, also running into this issue because I'm using the https://hub.docker.com/r/bitnami/envoy image instead of the envoyproxy/envoy one, and so my envoy doesn't run as root |
Fixes: projectcontour#3264 Signed-off-by: Amey Bhide <amey15@gmail.com>
Fixes: #3264 Signed-off-by: Amey Bhide <amey15@gmail.com>
Thanks ! the fix being available in |
Yes that is correct |
Hi
we are deploying Contour (v1.11.0), with Envoy (v1.16.2) as a DaemonSet, using the following yaml templates :
https://github.com/projectcontour/contour/blob/release-1.11/examples/render/contour.yaml
We only applied minor changes to fit our configuration (such as pointing to our local images repository, adding privileges for RBAC etc).
When firing up the helm installation, the Envoy pod fails with CrashLoopBackOff, with the following error in the envoy container :
this error wasn't occurring with older versions.
Just for information, the contour-certgen job has been successfully run, and the Contour pods are up&running.
Can you please advise ?
Thanks
The text was updated successfully, but these errors were encountered: