New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kubernetes system pods are not showing up but containers are in health state #62566

Closed
iponnam opened this Issue Apr 13, 2018 · 12 comments

Comments

Projects
None yet
3 participants
@iponnam

iponnam commented Apr 13, 2018

Is this a BUG REPORT or FEATURE REQUEST?:

Uncomment only one, leave it on its own line:

/kind support

/kind feature

What happened:
Enabled --client-cert-auth=true in etcd yaml to communicate over TLS and restarted the kubelet for the changes to effect. kube-apiserver, controller, scheduler and etd containers along with pause container came up. however when we do kubectl -n kube-system get pods or anything else for that matter results are coming up empty and worker nodes that already joined no longer show up.

kubectl get pods -n kube-system
No resources found.

When reverted etcd yaml without tls settings, everything back online.

kubelet keeps on throwing errors for all 4 components:

pods "kube-controller-manager-azwushubqaadmmaster01" is forbidden: no providers available to validate pod request

What you expected to happen:
The nodes registered before enabling TLS on ETCD should show.

How to reproduce it (as minimally and precisely as possible):
Kubeadm init on master and nodes.
create certs for etcd server and client. Followed document
Apply the changes are mentioned in the yaml files attached.
Attached etcd.yaml & kube-apiserver.yaml
tls.zip

Anything else we need to know?:

Environment:

  • Kubernetes version (use kubectl version): 1.9.5

  • Cloud provider or hardware configuration:

  • OS (e.g. from /etc/os-release): Ubuntu

  • Kernel (e.g. uname -a): 4.13.0-1012-azure #15-Ubuntu SMP Thu Mar 8 10:47:27 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

  • Install tools:

  • Others:
    @kubernetes/sig-auth-bug

@iponnam

This comment has been minimized.

iponnam commented Apr 13, 2018

@kubernetes/sig-auth-bugs

@k8s-ci-robot

This comment has been minimized.

Contributor

k8s-ci-robot commented Apr 13, 2018

@iponnam: Reiterating the mentions to trigger a notification:
@kubernetes/sig-auth-bugs

In response to this:

@kubernetes/sig-auth-bugs

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@liggitt

This comment has been minimized.

Member

liggitt commented Apr 14, 2018

it sounds like the etcd client credentials given to the apiserver are not correct

can you capture the apiserver logs (from the very beginning of startup with etcd client credentials) and the output of these URLs:

https://<apiserver>:6443/healthz
https://<apiserver>:6443/healthz/etcd
@iponnam

This comment has been minimized.

iponnam commented Apr 14, 2018

Good Morning @liggitt

Thank you for quick reply.
Attached logs from api-server & etcd logs.
out put of healthz & healthz/etcd are also included in the zip.
k8s-logs.zip

@iponnam

This comment has been minimized.

iponnam commented Apr 14, 2018

When passed the certs to curl, healthz passed

curl -v -k --cacert ca.crt --key etcd-client.key --cert etcd-client.crt https://127.0.0.1:6443/healthz  *   Trying 127.0.0.1...
* Connected to 127.0.0.1 (127.0.0.1) port 6443 (#0)
* found 1 certificates in ca.crt
* found 594 certificates in /etc/ssl/certs
* ALPN, offering http/1.1
* SSL connection using TLS1.2 / ECDHE_RSA_AES_128_GCM_SHA256
*        server certificate verification SKIPPED
*        server certificate status verification SKIPPED
*        common name: kube-apiserver (does not match '127.0.0.1')
*        server certificate expiration date OK
*        server certificate activation date OK
*        certificate public key: RSA
*        certificate version: #3
*        subject: CN=kube-apiserver
*        start date: Fri, 30 Mar 2018 19:33:05 GMT
*        expire date: Sat, 30 Mar 2019 19:33:06 GMT
*        issuer: CN=kubernetes
*        compression: NULL
* ALPN, server accepted to use http/1.1
> GET /healthz HTTP/1.1
> Host: 127.0.0.1:6443
> User-Agent: curl/7.47.0
> Accept: */*
>
< HTTP/1.1 200 OK
< Date: Sat, 14 Apr 2018 19:38:21 GMT
< Content-Length: 2
< Content-Type: text/plain; charset=utf-8
<
* Connection #0 to host 127.0.0.1 left intact

healthz/etcd - output

 curl -v -k --cacert ca.crt --key etcd-client.key --cert etcd-client.crt https://127.0.0.1:6443/healthz/etcd
*   Trying 127.0.0.1...
* Connected to 127.0.0.1 (127.0.0.1) port 6443 (#0)
* found 1 certificates in ca.crt
* found 594 certificates in /etc/ssl/certs
* ALPN, offering http/1.1
* SSL connection using TLS1.2 / ECDHE_RSA_AES_128_GCM_SHA256
*        server certificate verification SKIPPED
*        server certificate status verification SKIPPED
*        common name: kube-apiserver (does not match '127.0.0.1')
*        server certificate expiration date OK
*        server certificate activation date OK
*        certificate public key: RSA
*        certificate version: #3
*        subject: CN=kube-apiserver
*        start date: Fri, 30 Mar 2018 19:33:05 GMT
*        expire date: Sat, 30 Mar 2019 19:33:06 GMT
*        issuer: CN=kubernetes
*        compression: NULL
* ALPN, server accepted to use http/1.1
> GET /healthzz//etcd HTTP/1.1
> Host: 127.0.0.1:6443
> User-Agent: curl/7.47.0
> Accept: */*
>
< HTTP/1.1 403 Forbidden
< Content-Type: application/json
< X-Content-Type-Options: nosniff
< Date: Sat, 14 Apr 2018 19:38:43 GMT
< Content-Length: 247
<
{
  "kind": "Status",
  "apiVersion": "v1",
  "metadata": {

  },
  "status": "Failure",
  "message": "forbidden: User \"system:anonymous\" cannot get path \"/healthzz//etcd\"",
  "reason": "Forbidden",
  "details": {

  },
  "code": 403
* Connection #0 to host 127.0.0.1 left intact

@liggitt

This comment has been minimized.

Member

liggitt commented Apr 14, 2018

When reverted etcd yaml without tls settings, everything back online.

Do you have the API server yaml for the reverted mode that is working? Was the etcd TLS settings the only difference between the two setups? Based on the logs, it looks like you enabled the PodSecurityPolicy admission plugin without creating a privileged policy and granting access to the kubelet (which it needs to report on the static apiserver pod)

As an example, this is the policy GCE sets up to give the kubelet access:

@iponnam

This comment has been minimized.

iponnam commented Apr 14, 2018

Yes, etcd TLS is the only setting different.
etcd-yaml-without-tls.txt

kube-apiserver-yaml-without-tls.txt

@iponnam

This comment has been minimized.

iponnam commented Apr 14, 2018

We have followed the same document to enable PSP.

@iponnam

This comment has been minimized.

iponnam commented Apr 16, 2018

@liggitt
we are seeing this in kube-apiserver logs

I0416 16:45:49.963699       1 wrap.go:42] GET /api/v1/nodes/azwushubqaadmmaster01?resourceVersion=0: (680.211µs) 404 [[kubelet/v1.9.5 (linux/amd64) kubernetes/f01a2bf] 10.248.66.21:60654]
I0416 16:45:50.031507       1 wrap.go:42] GET /api/v1/nodes/azwushubqaadmagent01?resourceVersion=0: (626.311µs) 404 [[kubelet/v1.9.5 (linux/amd64) kubernetes/f01a2bf] 10.248.66.22:59256]
@iponnam

This comment has been minimized.

iponnam commented Apr 16, 2018

PSP Applied:

apiVersion: extensions/v1beta1
kind: PodSecurityPolicy
metadata:
  annotations:
    kubernetes.io/description: privileged allows full unrestricted access to pod features,
      as if the PodSecurityPolicy controller was not enabled.
    seccomp.security.alpha.kubernetes.io/allowedProfileNames: '*'
  creationTimestamp: 2018-04-02T22:55:42Z
  labels:
    addonmanager.kubernetes.io/mode: Reconcile
    kubernetes.io/cluster-service: "true"
  name: k8s.priv
  resourceVersion: "358329"
  selfLink: /apis/extensions/v1beta1/podsecuritypolicies/k8s.priv
  uid: f8b75ff2-36c8-11e8-ae83-000d3a35faea
spec:
  allowPrivilegeEscalation: true
  allowedCapabilities:
  - '*'
  fsGroup:
    rule: RunAsAny
  hostIPC: true
  hostNetwork: true
  hostPID: true
  hostPorts:
  - max: 65535
    min: 0
  privileged: true
  runAsUser:
    rule: RunAsAny
  seLinux:
    rule: RunAsAny
  supplementalGroups:
    rule: RunAsAny
  volumes:
  - '*'

RoleBinding :

apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  annotations:
    kubernetes.io/description: Allow nodes to create privileged pods. Should be used
      in combination with the NodeRestriction admission plugin to limit nodes to mirror
      pods bound to themselves.
  creationTimestamp: 2018-04-02T22:56:31Z
  labels:
    addonmanager.kubernetes.io/mode: Reconcile
    kubernetes.io/cluster-service: "true"
  name: k8spriv:privileged:nodes
  namespace: kube-system
  resourceVersion: "358398"
  selfLink: /apis/rbac.authorization.k8s.io/v1/namespaces/kube-system/rolebindings/k8spriv%3Aprivileged%3Anodes
  uid: 15c14497-36c9-11e8-ae83-000d3a35faea
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: k8s:psp:privileged
subjects:
- apiGroup: rbac.authorization.k8s.io
  kind: Group
  name: system:nodes
- apiGroup: rbac.authorization.k8s.io
  kind: User
  name: kubelet

Role:

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  creationTimestamp: 2018-04-02T22:55:08Z
  labels:
    addonmanager.kubernetes.io/mode: Reconcile
    kubernetes.io/cluster-service: "true"
  name: k8s:psp:privileged
  resourceVersion: "358282"
  selfLink: /apis/rbac.authorization.k8s.io/v1/clusterroles/k8s%3Apsp%3Aprivileged
  uid: e44d019a-36c8-11e8-ae83-000d3a35faea
rules:
- apiGroups:
  - extensions
  resourceNames:
  - k8s.priv
  resources:
  - podsecuritypolicies
  verbs:
  - use
@iponnam

This comment has been minimized.

iponnam commented Apr 17, 2018

Good Evening Jordan [ @liggitt ],

Thank you for all the assistance that you have provided not only on this issue but also few other issues that I have come across and solutions you posted brought me where I am today.

Thank you very much.

Pavan Surya Prakash Ponnam

@iponnam

This comment has been minimized.

iponnam commented Apr 17, 2018

Closing this issue.

@iponnam iponnam closed this Apr 17, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment