Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unexpected error storing fake SSL Cert: could not create PEM certificate file /etc/ingress-controller/ssl/default-fake-certificate.pem: open /etc/ingress-controller/ssl/default-fake-certificate.pem: permission denied #4061

Closed
puzzledz opened this issue May 3, 2019 · 17 comments
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.

Comments

@puzzledz
Copy link

puzzledz commented May 3, 2019

Is this a request for help? (If yes, you should use our troubleshooting guide and community support channels, see https://kubernetes.io/docs/tasks/debug-application-cluster/troubleshooting/.):

What keywords did you search in NGINX Ingress controller issues before filing this one? (If you have found any duplicates, you should instead reply there.):


Is this a BUG REPORT or FEATURE REQUEST? (choose one):
FEATURE REQUEST

NGINX Ingress controller version:
0.24.1

Kubernetes version (use kubectl version):
v1.14.1

Environment:

  • Cloud provider or hardware configuration: vmware evironment
  • OS (e.g. from /etc/os-release): ubuntu 18.04.1
  • Kernel (e.g. uname -a): 4.15.0-47-generic
  • Install tools:
  • Others:

What happened:
I deploy ingress-nginx,then it is wrong
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES nginx-ingress-controller-5694ccb578-csn72 0/1 CrashLoopBackOff 8 20m 10.244.1.141 node2 <none> <none>

this is a part of log
W0503 09:23:37.549224 1 flags.go:214] SSL certificate chain completion is disabled (--enable-ssl-chain-completion=false) nginx version: nginx/1.15.6 W0503 09:23:37.559659 1 client_config.go:549] Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work. I0503 09:23:37.564858 1 main.go:205] Creating API client for https://10.96.0.1:443 I0503 09:23:37.618028 1 main.go:249] Running in Kubernetes cluster version v1.14 (v1.14.1) - git (clean) commit b7394102d6ef778017f2ca4046abbaa23b88c290 - platform linux/amd64 F0503 09:23:37.925586 1 main.go:121] unexpected error storing fake SSL Cert: could not create PEM certificate file /etc/ingress-controller/ssl/default-fake-certificate.pem: open /etc/ingress-controller/ssl/default-fake-certificate.pem: permission denied
What you expected to happen:

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know:
node1 is master, node2 is generic,I apply yaml file,then ingress-nginx pod was deploy in node2,it is relative to host permission?

@puzzledz
Copy link
Author

i have checked it

    spec:
      serviceAccountName: nginx-ingress-serviceaccount
      containers:
        - name: nginx-ingress-controller
          image: quay.io/kubernetes-ingress-controller/nginx-ingress-controller:0.24.1
          args:
            - /nginx-ingress-controller
            - --configmap=$(POD_NAMESPACE)/nginx-configuration
            - --tcp-services-configmap=$(POD_NAMESPACE)/tcp-services
            - --udp-services-configmap=$(POD_NAMESPACE)/udp-services
            - --publish-service=$(POD_NAMESPACE)/ingress-nginx
            - --annotations-prefix=nginx.ingress.kubernetes.io
          securityContext:
            allowPrivilegeEscalation: true
            capabilities:
              drop:
                - ALL
              add:
                - NET_BIND_SERVICE
            # www-data -> 33
            runAsUser: 33
          env:
            - name: POD_NAME

@puzzledz
Copy link
Author

root@node3:/home/z/kubeadm/ingress-nginx# kubectl get pods -n ingress-nginx                                                     NAME                                        READY   STATUS             RESTARTS   AGE
nginx-ingress-controller-5694ccb578-mzb4v   0/1     CrashLoopBackOff   6          6m57s
root@node3:/home/z/kubeadm/ingress-nginx# kubectl logs nginx-ingress-controller-5694ccb578-mzb4v -n ingress-nginx
-------------------------------------------------------------------------------
NGINX Ingress controller
  Release:    0.24.1
  Build:      ce418168f
  Repository: https://github.com/kubernetes/ingress-nginx
-------------------------------------------------------------------------------

W0515 14:31:45.066437       1 flags.go:214] SSL certificate chain completion is disabled (--enable-ssl-chain-completion=false)
nginx version: nginx/1.15.6
W0515 14:31:45.095746       1 client_config.go:549] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
I0515 14:31:45.096599       1 main.go:205] Creating API client for https://10.96.0.1:443
I0515 14:31:45.195979       1 main.go:249] Running in Kubernetes cluster version v1.14 (v1.14.1) - git (clean) commit b7394102d6ef778017f2ca4046abbaa23b88c290 - platform linux/amd64
F0515 14:31:46.356556       1 main.go:121] unexpected error storing fake SSL Cert: could not create PEM certificate file /etc/ingress-controller/ssl/default-fake-certificate.pem: open /etc/ingress-controller/ssl/default-fake-certificate.pem: permission denied

@NicolasT
Copy link

For what it's worth, I ran into the very same issue using containerd from EPEL on CentOS 7.6 (containerd-1.2.1-1.el7). Before that, I ran into an issue with nginx being denied to bind to 0.0.0.0:80 which I could resolve by running the process as UID 0.

All of this hinted at issues with ACLs or xattrs on the binary, the cert directory,... so I ran a Google query and bumped into containerd/containerd#2942

Indeed, removing the images from the system, then upgrading to containerd-1.2.4-1.fc30 (it's a static Go binary after all...) made the controller container start just fine after (re)pulling the image.

So, if your environment is using containerd (could be as part of Docker? We don't run Docker, plain CRI to containerd) affected by that bug, you may want to upgrade and try again.

NicolasT added a commit to scality/metalk8s that referenced this issue May 28, 2019
The `nginx-ingress-controller` container relies on filesystem extended
attributes to allow permissions to the `nginx` binary and temporary
storage ACLs. There's a bug in `containerd` 1.2.1 which causes such
xattrs not to be properly unpacked from container image layers. To
work-around this, we need a newer version, which is not packaged for
CentOS/EPEL. However, since `containerd` is a pure Golang static binary,
we can use a package from Fedora.

See: kubernetes/ingress-nginx#4061 (comment)
See: containerd/containerd#2942
@puzzledz
Copy link
Author

yes,I ran ingress-nginx on docker,but the containerd version is 1.2.5,what is CRI?Suggest to upgrade containerd to containerd-1.2.4-1.fc30?

root@node3:/home/z# containerd -v
containerd github.com/containerd/containerd 1.2.5 bb71b10fd8f58240ca47fbb579b9d1028eea7c84

@NicolasT
Copy link

NicolasT commented Jun 1, 2019

I am not suggesting anything at all, just providing some more information for the devs to dig into this issue, if applicable.

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 30, 2019
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Sep 29, 2019
@fejta-bot
Copy link

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

@k8s-ci-robot
Copy link
Contributor

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@mcambal
Copy link

mcambal commented Jan 16, 2020

In my case I used helm upgrade command without specifying the chart version, which caused that I used chart for nginx-ingress version 0.27.x with image version 0.26.2.

There is a breaking change in the default of runAsUser attribute due to migration to Alpine linux.

@dignajar
Copy link

dignajar commented Mar 9, 2020

Solution: Change runAsUser: 33 to runAsUser: 101.

@alter
Copy link

alter commented Mar 17, 2020

and this solution doesn't work with

helm ls |grep ingress
test-ingress   	1       	Tue Mar 17 14:08:06 2020	DEPLOYED	nginx-ingress-1.33.5      	0.30.0     	test

Check user:

k -n test get deployment.apps/test-ingress-nginx-ingress-controller -o yaml |grep runAsUser
runAsUser: 101

@muenchdo
Copy link

@alter can you double check that your ingress controller pods are running a version >= 0.27.0? I had the same issue but figured out that I had overwritten the image tag in my Helm values.

@Cultuzz-India
Copy link

Cultuzz-India commented May 16, 2020

I have the same problem with 0.32.0. The default runAsUser is already set to 101.

Interestingly, it works well in one node but not in another in the cluster. I'm running it in daemonset mode.

@kylegalea
Copy link

I have the same problem with 0.32.0. The default runAsUser is already set to 101.

Interestingly, it works well in one node but not in another in the cluster. I'm running it in daemonset mode.

Have the same issue,
Tried to changed imagepullpolicy to always to get new image but issue persisted.

I then proceeded to change the ingress controller version to 0.30.0, manually deleted the 0.32.0 image on the worker node and changed back the version to 0.32.0 Seems its working however lets see how long it will take.

sreis added a commit to opstrace/opstrace that referenced this issue Dec 2, 2020
kubernetes/ingress-nginx#4061 (comment)

Signed-off-by: Simão Reis <sreis@opstrace.com>
jgehrcke pushed a commit to opstrace/opstrace that referenced this issue Dec 2, 2020
kubernetes/ingress-nginx#4061 (comment)

Signed-off-by: Simão Reis <sreis@opstrace.com>
@mattroark
Copy link

Helm Chart Version: 2.13.0
Nginx Ingress Controller Version: v0.35.0
Kubernetes Version: v1.15.12
Docker Version: v18.09.9

Controller was deployed about 300d ago w/o any interruption, and, then, suddenly, the deployment/pod started failing with the initial error described.

It is able to partially start with runAsUser set to 0 (root); however, it eventually fails trying to chown a tmp file.

I0726 20:14:08.897721       7 main.go:105] SSL fake certificate created /etc/ingress-controller/ssl/default-fake-certificate.pem
I0726 20:14:08.905766       7 ssl.go:528] loading tls certificate from certificate path /usr/local/certificates/cert and key path /usr/local/certificates/key
I0726 20:14:08.946139       7 nginx.go:263] Starting NGINX Ingress controller
...
Error: exit status 1
nginx: the configuration file /tmp/nginx-cfg213950217 syntax is ok
2021/07/26 20:14:16 [emerg] 55#55: chown("/tmp/client-body", 101) failed (1: Operation not permitted)
nginx: [emerg] chown("/tmp/client-body", 101) failed (1: Operation not permitted)
nginx: configuration file /tmp/nginx-cfg213950217 test failed
/etc/nginx # stat /tmp/client-body
  File: /tmp/client-body
  Size: 4096      	Blocks: 8          IO Block: 4096   directory
Device: 300020h/3145760d	Inode: 12389077    Links: 2
Access: (0700/drwx------)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2021-07-26 20:18:27.000000000
Modify: 2021-07-26 20:17:42.000000000
Change: 2021-07-26 20:17:42.000000000

If I add the CHOWN capability to the securityContext, exec into the pod, and then perform a chown -R 101:101 /etc/ingress-controller, things start flowing temporarily, but, then, it fails loading again shortly thereafter:

I0727 15:39:53.689868       7 status.go:275] updating Ingress ... status from [{10.1.0.61 }] to []
I0727 15:39:55.891559       7 nginx.go:388] Stopping admission controller
I0727 15:39:55.891637       7 nginx.go:396] Stopping NGINX process
E0727 15:39:55.891675       7 nginx.go:329] http: Server closed
2021/07/27 15:39:55 [emerg] 73#73: cannot load certificate "/etc/ingress-controller/ssl/default-fake-certificate.pem": BIO_new_file() failed (SSL: error:0200100D:system library:fopen:Permission denied:fopen('/etc/ingress-controller/ssl/default-fake-certificate.pem','r') error:2006D002:BIO routines:BIO_new_file:system lib)

To further workaround this, I added SETGID and SETUID capabilities to the securityContext as well.

        securityContext:
          allowPrivilegeEscalation: true
          capabilities:
            add:
            - NET_BIND_SERVICE
            - CHOWN
            - SETGID
            - SETUID
            drop:
            - ALL
          runAsUser: 0

The deployment is finally "fixed". Any other combination results in the initial failure still. What caused this deployment to go haywire?

jayaddison added a commit to openculinary/infrastructure that referenced this issue Oct 17, 2021
@vitvavra
Copy link

vitvavra commented May 5, 2022

I am using
k8s.gcr.io/ingress-nginx/controller:v1.2.0
Kubernetes V1.23.6 / v1.24.0 (tested on the latter but should be working on both since 1.22)

Only thing that solved the issue for me was setting the securityContext of the .spec.template.spec.containers[].securityContext to default and adding to the Deployment's .spec.template.spec.securityContext this:

sysctls:
  - name: net.ipv4.ip_unprivileged_port_start
    value: "1"

That way it is not needed to use the runAsUser: 0 for the *:80 and *:443 ports to work and also you can work with the pod as intended (non-root)

Edit: it also allows the creation of the /etc/ingress-controller/ssl/default-fake-certificate.pem

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Projects
None yet
Development

No branches or pull requests