Kubelet with anonymous-auth documentation is laking #3891

josselin-c · 2017-11-19T15:29:42Z

What kops version are you running?
Version 1.7.1
What Kubernetes version are you running?
kubectl version
Client Version: version.Info{Major:"1", Minor:"7", GitVersion:"v1.7.4", GitCommit:"793658f2d7ca7f064d2bdf606519f9fe1229c381", GitTreeState:"clean", BuildDate:"2017-08-17T08:48:23Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"darwin/amd64"}
What cloud provider are you using?
AWS
What commands did you run? What is the simplest way to reproduce this issue?

kops create cluster --dns-zone=lab-aws.jossctz-test.com --zones=us-east-1a,us-east-1b --name=k8s-kops.lab-aws.jossctz-test.com
kops edit cluster k8s-kops.lab-aws.jossctz-test.com
# Then I add to the spec section:
kubelet:
  anonymousAuth: false
# And finally
kops update cluster k8s-kops.lab-aws.jossctz-test.com --yes

What happened after the commands executed?
The cluster isn't working. Kubectl fails connecting to the API Server: kubectl get pods returns an IO Timeout.
What did you expect to happen?
My cluster is running and safe from https://github.com/kayrus/kubelet-exploit kind of attacks.
Please provide your cluster manifest.

apiVersion: kops/v1alpha2
kind: Cluster
metadata:
  creationTimestamp: 2017-11-19T15:15:36Z
  name: k8s-kops.lab-aws.jossctz-test.com
spec:
  api:
    dns: {}
  authorization:
    alwaysAllow: {}
  channel: stable
  cloudProvider: aws
  configBase: s3://jossctz/k8s-kops.lab-aws.jossctz-test.com
  dnsZone: lab-aws.jossctz-test.com
  etcdClusters:
  - etcdMembers:
    - instanceGroup: master-us-east-1a
      name: a
    name: main
  - etcdMembers:
    - instanceGroup: master-us-east-1a
      name: a
    name: events
  kubelet:
    anonymousAuth: false
  kubernetesApiAccess:
  - 0.0.0.0/0
  kubernetesVersion: 1.7.10
  masterInternalName: api.internal.k8s-kops.lab-aws.jossctz-test.com
  masterPublicName: api.k8s-kops.lab-aws.jossctz-test.com
  networkCIDR: 172.20.0.0/16
  networking:
    kubenet: {}
  nonMasqueradeCIDR: 100.64.0.0/10
  sshAccess:
  - 0.0.0.0/0
  subnets:
  - cidr: 172.20.32.0/19
    name: us-east-1a
    type: Public
    zone: us-east-1a
  - cidr: 172.20.64.0/19
    name: us-east-1b
    type: Public
    zone: us-east-1b
  topology:
    dns:
      type: Public
    masters: public
    nodes: public

---

apiVersion: kops/v1alpha2
kind: InstanceGroup
metadata:
  creationTimestamp: 2017-11-19T15:15:36Z
  labels:
    kops.k8s.io/cluster: k8s-kops.lab-aws.jossctz-test.com
  name: master-us-east-1a
spec:
  image: kope.io/k8s-1.7-debian-jessie-amd64-hvm-ebs-2017-07-28
  machineType: m3.medium
  maxSize: 1
  minSize: 1
  role: Master
  subnets:
  - us-east-1a

---

apiVersion: kops/v1alpha2
kind: InstanceGroup
metadata:
  creationTimestamp: 2017-11-19T15:15:36Z
  labels:
    kops.k8s.io/cluster: k8s-kops.lab-aws.jossctz-test.com
  name: nodes
spec:
  image: kope.io/k8s-1.7-debian-jessie-amd64-hvm-ebs-2017-07-28
  machineType: t2.medium
  maxSize: 2
  minSize: 2
  role: Node
  subnets:
  - us-east-1a
  - us-east-1b

I want to setup a cluster with kops where pods can't talk directly to the kubelet. I think I have to set anonymousAuth: false and it should work but it doesn't.
I looked in other issues and the documentation, tried a few things but nothing worked.
Maybe the procedure should be easier to find/more explicit.

The text was updated successfully, but these errors were encountered:

josselin-c · 2017-11-20T22:28:35Z

ping @gambol99 @justinsb

chrislovecnm · 2017-11-20T22:48:07Z

/cc ‪@bradgeesaman ‬

What is the best practice security setup?

bgeesaman · 2017-11-20T22:54:55Z

@josselin-c What are the full command line options that your kubelet is running with? --anonymous-auth=false is only part of the solution. You also need to ensure --client-ca-file=/path/ca.pem and --authorization-mode=Webhook. Are you missing other TLS related items?

Here https://github.com/bgeesaman/kubernetes-the-hard-way/blob/0aaf79ec93356f3afee534d67e17acca273c5d25/docs/09-bootstrapping-kubernetes-workers.md is a working kubelet command set for a prior release of Kubernetes the Hard Way as a reference.

bgeesaman · 2017-11-20T22:59:43Z

--anonymous-auth=false only tells kubelet to not accept anonymous API calls. So, basically, no calls will work in your cluster ATM.

Kubelet needs to use the --client-ca-file to obtain the Subject from the client token/cert to send via Webhook call against the API Server's SubjectAccessReview API to get an allow/block answer.

gambol99 · 2017-11-21T12:19:34Z

What is the best practice security setup?

So you should definitely have anonymousAuth=false; you can get up to a lot of mischief otherwise, assuming you are not blocking the local kubelet api port from your containers

In kops, if you set --anonymous-auth=false it automatically add the --client-ca-file here and the options switched on at the kube-apiserver. Adding the authorization-mode=Webhook is a good shout though .. I don't think the componentconfig.go currently exposes this ...

josselin-c · 2017-11-21T13:06:39Z

Thanks for the pointers, I better understand what options I have to set now:

Kubelet: 
  anonymous-auth=false
  authorization-mode=Webhook
  client-ca-file=/var/lib/kubernetes/ca.pem
APIServer: 
  nothing?

It doesn't seem possible to set authorization-mode=Webhook on the kubelet configuration though. I don't see a matching attribute in the KubeletConfigSpec object (in componentconfig.go)

gambol99 · 2017-11-21T13:51:18Z

correct ... authorization-mode=Webhook isn't available for configuration at the moment. Adding that one requires a little more thought as it would changes to RBAC ... @justinsb @chrislovecnm ??

bgeesaman · 2017-11-21T14:49:02Z

@josselin-c Can you manually edit that setting for your kubelet on one worker node (SSH in, edit, restart kubelet) and see if that node still functions correctly (pods schedule, you can kubectl exec and kubectl logs that pod, etc)? If not, can you post the RBAC logs that show any deny entries related to kubelet on that node?

Over the next few weeks, I'll be looking into these specifics myself, but this was the process I took when submitting the PR I linked above to Kubernetes the Hard Way to validate the configuration.

josselin-c · 2017-11-21T15:32:35Z

Okay, here is what I did:
Edit /etc/sysconfig/kubelet so it looks like that (added --anonymous-auth=false --authorization-mode=Webhook --client-ca-file=/var/lib/kubernetes/ca.pem):

root@ip-172-20-36-15:/home/admin# cat /etc/sysconfig/kubelet
DAEMON_ARGS="--allow-privileged=true --cgroup-root=/ --cloud-provider=aws --cluster-dns=100.64.0.10 --cluster-domain=cluster.local --enable-debugging-handlers=true --eviction-hard=memory.available<100Mi,nodefs.available<10%,nodefs.inodesFree<5%,imagefs.available<10%,imagefs.inodesFree<5% --hostname-override=ip-172-20-36-15.ec2.internal --kubeconfig=/var/lib/kubelet/kubeconfig --network-plugin-mtu=9001 --network-plugin=kubenet --node-labels=kubernetes.io/role=node,node-role.kubernetes.io/node= --non-masquerade-cidr=100.64.0.0/10 --pod-infra-container-image=gcr.io/google_containers/pause-amd64:3.0 --pod-manifest-path=/etc/kubernetes/manifests --register-schedulable=true --require-kubeconfig=true --v=2 --cni-bin-dir=/opt/cni/bin/ --cni-conf-dir=/etc/cni/net.d/ --network-plugin-dir=/opt/cni/bin/ --anonymous-auth=false --authorization-mode=Webhook --client-ca-file=/var/lib/kubernetes/ca.pem"

Create the /var/lib/kubernetes/ca.pem file with the certificate-authority-data field of the /var/lib/kubelet/kubeconfig file:

Made the /etc/sysconfig/kubelet imutable so it isn't replaced next boot: chattr +i /etc/sysconfig/kubelet

Then: reboot

After that, the node is marked as Running, I can schedule pods on it but I cannot exec into it:

$ kubectl exec -ti debian-2251103498-p5sqp bash
error: unable to upgrade connection: Unauthorized

Maybe I set the wrong CA? /var/lib/kubernetes/ca.pem didn't exists before I created it.

bgeesaman · 2017-11-21T18:25:53Z

Ok, I spun up a 1.7.1 cluster with everything as defaults and hand-edited things until they worked. This is not a final solution per se, but rather a map on how to get to a potentially working destination.

NOTE: Do not deploy these changes to a cluster you care about. I enable RBAC here, and I guarantee that I'm breaking other services via missed RBAC policies.

On the workers, go into /var/lib/kubelet.

Extract out of kubeconfig the ca.pem, kubelet.cert, and kubelet.key files into /var/lib/kubelet (copy each section, base64 decode to file)
chmod 640 kubelet.*
Add --authorization-mode=Webhook --anonymous-auth=false --client-ca-file=/var/lib/kubelet/ca.pem --tls-cert-file=/var/lib/kubelet/kubelet.cert --tls-private-key-file=/var/lib/kubelet/kubelet.key to /etc/sysconfig/kubelet
Restart the kubelet: systemctl daemon-reload && systemctl restart kubelet

On the master, edit /etc/kubernetes/manifests/kube-apiserver.manifest:

Change authorization-mode from AlwaysAllow to RBAC
--authorization-mode=RBAC
Add the following options
--kubelet-client-certificate=/srv/kubernetes/kubelet.cert --kubelet-client-key=/srv/kubernetes/kubelet.key --audit-log-path=- (The audit-log-path option is optional, but very useful for debugging RBAC. They will show up in /var/log/kube-apiserver.log. NOT good for disk space in production, FYI.)
Copy kubelet.key and kubelet.cert from a worker into /var/lib/kubernetes on this master and chmod 640 kubelet.*
Restart the kubelet: systemctl daemon-reload && systemctl restart kubelet

Run $ kubectl edit clusterrolebinding system:node
and edit it to look like this:

apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
  annotations:
    rbac.authorization.kubernetes.io/autoupdate: "true"
  labels:
    kubernetes.io/bootstrapping: rbac-defaults
  name: system:node
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: system:node
subjects:
- apiGroup: rbac.authorization.k8s.io
  kind: Group
  name: system:nodes

Run $ kubectl edit clusterroles system:node
and add:

- apiGroups:
  - ""
  resources:
  - nodes/proxy
  verbs:
  - create
  - get

You should now be able to perform exec and log actions again.

So, I know that isn't a "add this option to kops in a yaml and go" solution, but it does outline some of the work needed to make this function as intended.

bgeesaman · 2017-11-21T18:34:42Z

Issue #1231 is what is causing the additional RBAC items to be necessary (instead of following the naming convention that gets the permissions automatically applied)

chrislovecnm · 2017-11-30T17:49:57Z

/area security

chrislovecnm · 2017-11-30T17:51:03Z

@josselin-c this issue should be coverred by getting #1231 working in kops. Agreed? If so can we close this as a duplicate?

josselin-c · 2017-12-01T10:42:15Z

Probably.
For info, here is what I'm using on my cluster with flannel (no network policies) to "fix" the issue:
EDIT: This solution doesn't work, see bgeesaman comments bellow

kind: DaemonSet
apiVersion: extensions/v1beta1
metadata:
  namespace: kube-system
  name: blackhole-kubelet
  labels:
    app: blackhole-kubelet
spec:
  template:
    metadata:
      labels:
        app: blackhole-kubelet
    spec:
      hostPID: true
      containers:
        - name: blackhole-kubelet
          image: gcr.io/google-containers/startup-script:v1
          securityContext:
            privileged: true
          env:
          - name: STARTUP_SCRIPT
            value: |
              #! /bin/bash
              while true; do
                iptables-save | grep INPUT | grep -q "KUBELET-BLACKHOLE"
                if [ $? -ne 0 ]; then
                   echo "Missing kubelet Blackhole rule, adding it"
                   iptables -I INPUT -s 100.64.0.0/10 -m comment --comment "KUBELET-BLACKHOLE: block kubelet access from pods" -m tcp  -p tcp --dport 10250 -j REJECT --reject-with icmp-port-unreachable
                fi
                sleep 60
              done

bgeesaman · 2017-12-01T12:46:16Z

@josselin-c Clever! Are you using this successfully in a cluster running calico/weave/other?

josselin-c · 2017-12-01T13:07:38Z

On clusters with calico/weave I'd look into NetworkPolicies, I had to use a DaemonSet because flannel doesn't support them.

bgeesaman · 2017-12-01T13:33:46Z

Ah, that makes sense. IME, Calico monitors “foreign” iptables rules and removes them automatically. In cases where networkpolicy doesn’t yet support egress filtering, rules like these can be a useful stopgap if they can be made to “stick”.

bgeesaman · 2017-12-01T17:00:51Z

I take that back. Kops 1.7.1 (k8s 1.7.10) with calico does not modify the INPUT chain in this case, so this policy stays in place and works to block pods hitting the local node's kubelet port. However, it does NOT prevent pods on one worker node from crossing over and hitting other worker nodes and more importantly, the master node's kubelet port. This is because by that time, it's NAT-ed out via the eth0 IP.

This is the shortened output of a simple shell script that I run inside a pod to see what it can see/do:

# ./audit.sh 
...snip...
 7 - Access Kubelet on local host (https://172.20.50.151:10250/runningpods/): False
 8 - Access Kubelet on another worker host (https://172.20.53.168:10250/runningpods/): True
 9 - Access Kubelet on master host (https://172.20.57.132:10250/runningpods/): True
...snip...

Notice how # 7 is blocked but # 8 and # 9 still succeed.

This is what the traffic from the audit pod (100.97.190.131/32) on worker (172.20.50.151) going to the master (172.20.57.132) looks like on its way out:

# ssh 172.20.50.151
# tcpdump -ni eth0 port 10250
16:40:38.679005 IP 172.20.50.151.40304 > 172.20.57.132.10250: Flags [S], seq 2002535152, win 29200, options [mss 1460,sackOK,TS val 2247126 ecr 0,nop,wscale 9], length 0
16:40:38.679326 IP 172.20.57.132.10250 > 172.20.50.151.40304: Flags [S.], seq 3126813096, ack 2002535153, win 26847, options [mss 8961,sackOK,TS val 2247315 ecr 2247126,nop,wscale 9], length 0
16:40:38.679365 IP 172.20.50.151.40304 > 172.20.57.132.10250: Flags [.], ack 1, win 58, options [nop,nop,TS val 2247126 ecr 2247315], length 0

A crude, but certainly workable stop-gap is to edit the outbound rules on the worker security-group rule from:

| All traffic | All | All | 0.0.0.0/0 |

to be something like:

| Custom TCP Rule | TCP | 10251 - 65535 | 0.0.0.0/0 |  
| Custom TCP Rule | TCP | 0 - 10249     | 0.0.0.0/0 |  
| All UDP         | UDP | 0 - 65535     | 0.0.0.0/0 |  
| All ICMP - IPv4 | All | N/A           | 0.0.0.0/0 |

Now, the run looks like:

...snip...
 7 - Access Kubelet on local host (https://172.20.50.151:10250/runningpods/): False
 8 - Access Kubelet on another worker host (https://172.20.53.168:10250/runningpods/): False
 9 - Access Kubelet on master host (https://172.20.57.132:10250/runningpods/): False
...snip...

Of course, these workarounds aren't needed with 1.8.x+ and CNI plugins (like calico) that support egress networkpolicy and egress policies on namespaces that block this pod-to-node mgmt traffic. Even once the kubelet is configured to perform authn/authz via web hook, you still don't want those ports exposed. Defense in depth, etc.

josselin-c · 2017-12-04T14:52:59Z

Thanks for the review, indeed it wasn't enough to block traffic from pods CIDR.
Here is another try at fixing the issue while staying with flannel-only networking:
https://gist.github.com/josselin-c/3002e9bac8be27305b579ba6650ad8da

bgeesaman · 2017-12-04T14:59:58Z

Again, very clever!

fejta-bot · 2018-05-22T05:38:17Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

fejta-bot · 2018-06-25T21:13:50Z

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten
/remove-lifecycle stale

fejta-bot · 2018-07-25T22:00:56Z

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

k8s-ci-robot added the area/security label Nov 30, 2017

justinsb added this to the 1.8.1 milestone Dec 1, 2017

bgeesaman mentioned this issue Dec 4, 2017

Suggested Security Improvements dysnix/installer#1

Closed

justinsb modified the milestones: 1.8.1, 1.9 Feb 21, 2018

nazarewk mentioned this issue Apr 5, 2018

[ideas] Security notes compiled from various sources (links included) freach/kubernetes-security-best-practice#1

Open

56 tasks

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 22, 2018

krogon mentioned this issue May 24, 2018

Support Kubelet --authorization-mode=Webhook #5176

Closed

justinsb removed this from the 1.9.0 milestone May 26, 2018

justinsb added this to the 1.10 milestone May 26, 2018

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jun 25, 2018

k8s-ci-robot closed this as completed Jul 25, 2018

PaulJuliusMartinez mentioned this issue Dec 31, 2018

Unable to access pod logs when using Webhook authorizationMode for Kubelet in cluster with RBAC #6280

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Kubelet with anonymous-auth documentation is laking #3891

Kubelet with anonymous-auth documentation is laking #3891

josselin-c commented Nov 19, 2017

josselin-c commented Nov 20, 2017

chrislovecnm commented Nov 20, 2017

bgeesaman commented Nov 20, 2017 •

edited

Loading

bgeesaman commented Nov 20, 2017 •

edited

Loading

gambol99 commented Nov 21, 2017

josselin-c commented Nov 21, 2017

gambol99 commented Nov 21, 2017

bgeesaman commented Nov 21, 2017

josselin-c commented Nov 21, 2017

bgeesaman commented Nov 21, 2017

bgeesaman commented Nov 21, 2017

chrislovecnm commented Nov 30, 2017

chrislovecnm commented Nov 30, 2017

josselin-c commented Dec 1, 2017 •

edited

Loading

bgeesaman commented Dec 1, 2017

josselin-c commented Dec 1, 2017

bgeesaman commented Dec 1, 2017

bgeesaman commented Dec 1, 2017 •

edited

Loading

josselin-c commented Dec 4, 2017

bgeesaman commented Dec 4, 2017

fejta-bot commented May 22, 2018

fejta-bot commented Jun 25, 2018

fejta-bot commented Jul 25, 2018

Kubelet with anonymous-auth documentation is laking #3891

Kubelet with anonymous-auth documentation is laking #3891

Comments

josselin-c commented Nov 19, 2017

josselin-c commented Nov 20, 2017

chrislovecnm commented Nov 20, 2017

bgeesaman commented Nov 20, 2017 • edited Loading

bgeesaman commented Nov 20, 2017 • edited Loading

gambol99 commented Nov 21, 2017

josselin-c commented Nov 21, 2017

gambol99 commented Nov 21, 2017

bgeesaman commented Nov 21, 2017

josselin-c commented Nov 21, 2017

bgeesaman commented Nov 21, 2017

bgeesaman commented Nov 21, 2017

chrislovecnm commented Nov 30, 2017

chrislovecnm commented Nov 30, 2017

josselin-c commented Dec 1, 2017 • edited Loading

bgeesaman commented Dec 1, 2017

josselin-c commented Dec 1, 2017

bgeesaman commented Dec 1, 2017

bgeesaman commented Dec 1, 2017 • edited Loading

josselin-c commented Dec 4, 2017

bgeesaman commented Dec 4, 2017

fejta-bot commented May 22, 2018

fejta-bot commented Jun 25, 2018

fejta-bot commented Jul 25, 2018

bgeesaman commented Nov 20, 2017 •

edited

Loading

bgeesaman commented Nov 20, 2017 •

edited

Loading

josselin-c commented Dec 1, 2017 •

edited

Loading

bgeesaman commented Dec 1, 2017 •

edited

Loading