Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kops 1.16 unable to setup docker on recent Amazon Linux 2 #8827

Closed
elisiano opened this issue Apr 1, 2020 · 8 comments
Closed

Kops 1.16 unable to setup docker on recent Amazon Linux 2 #8827

elisiano opened this issue Apr 1, 2020 · 8 comments

Comments

@elisiano
Copy link
Contributor

elisiano commented Apr 1, 2020

1. What kops version are you running? The command kops version, will display
this information.

$ kops version
Version 1.16.0

2. What Kubernetes version are you running? kubectl version will print the
version if a cluster is running or provide the Kubernetes version specified as
a kops flag.

running 1.15.7, trying to update to 1.16.8

3. What cloud provider are you using?
AWS

4. What commands did you run? What is the simplest way to reproduce this issue?
Updated the cluster spec, setup kops env vars then

$ kops replace -f file.yaml
$ kops update cluster # then again with --yes
$ kops rolling-update cluster # then again with --yes

5. What happened after the commands executed?
When the first master gets killed/respawned it's unable to join the cluster because kops is unable to install the docker-ce package.
Here are the relevant logs from nodeup:

Apr  1 16:27:00 ip-172-21-75-89 nodeup: I0401 16:27:00.100833    2739 executor.go:143] No progress made, sleeping before retrying 1 failed task(s)
Apr  1 16:27:10 ip-172-21-75-89 nodeup: I0401 16:27:10.100963    2739 executor.go:103] Tasks: 84 done / 94 total; 1 can run
Apr  1 16:27:10 ip-172-21-75-89 nodeup: I0401 16:27:10.101012    2739 executor.go:176] Executing task "Package/docker-ce": Package: docker-ce
Apr  1 16:27:10 ip-172-21-75-89 nodeup: I0401 16:27:10.101067    2739 package.go:206] Listing installed packages: /usr/bin/rpm -q docker-ce --queryformat %{NAME} %{VERSION}
Apr  1 16:27:10 ip-172-21-75-89 nodeup: I0401 16:27:10.127189    2739 package.go:267] Installing package "docker-ce" (dependencies: [Package: docker-ce-cli Package: containerd.io Package: container-selinux])
Apr  1 16:27:10 ip-172-21-75-89 nodeup: I0401 16:27:10.156133    2739 files.go:100] Hash matched for "/var/cache/nodeup/packages/docker-ce": sha1:0b656dcdbddfc231f871ae78e3f5ac76716b5914
Apr  1 16:27:10 ip-172-21-75-89 nodeup: I0401 16:27:10.177012    2739 files.go:100] Hash matched for "/var/cache/nodeup/packages/docker-ce-cli": sha1:0c51b1339a95bd732ca305f07b7bcc95f132b9c8
Apr  1 16:27:10 ip-172-21-75-89 nodeup: I0401 16:27:10.207794    2739 files.go:100] Hash matched for "/var/cache/nodeup/packages/containerd.io": sha1:f6447e84479df3a58ce04a3da87ccc384663493b
Apr  1 16:27:10 ip-172-21-75-89 nodeup: I0401 16:27:10.207932    2739 files.go:100] Hash matched for "/var/cache/nodeup/packages/container-selinux": sha1:7de4211fa0dfd240d8827b93763e1eb5f0d56411
Apr  1 16:27:10 ip-172-21-75-89 nodeup: I0401 16:27:10.207948    2739 package.go:304] running command [/usr/bin/rpm -i /var/cache/nodeup/packages/docker-ce /var/cache/nodeup/packages/docker-ce-cli /var/cache/nodeup/packages/containerd.io /var/cache/nodeup/packages/container-selinux]
Apr  1 16:27:10 ip-172-21-75-89 nodeup: W0401 16:27:10.258129    2739 executor.go:128] error running task "Package/docker-ce" (2m9s remaining to succeed): error installing package "docker-ce": exit status 4: warning: /var/cache/nodeup/packages/docker-ce: Header V4 RSA/SHA512 Signature, key ID 621e9f35: NOKEY
Apr  1 16:27:10 ip-172-21-75-89 nodeup: warning: /var/cache/nodeup/packages/container-selinux: Header V3 RSA/SHA256 Signature, key ID f4a80eb5: NOKEY
Apr  1 16:27:10 ip-172-21-75-89 nodeup: error: Failed dependencies:
Apr  1 16:27:10 ip-172-21-75-89 nodeup: selinux-policy >= 3.13.1-216.el7 is needed by container-selinux-2:2.107-1.el7_6.noarch
Apr  1 16:27:10 ip-172-21-75-89 nodeup: selinux-policy-base >= 3.13.1-216.el7 is needed by container-selinux-2:2.107-1.el7_6.noarch
Apr  1 16:27:10 ip-172-21-75-89 nodeup: selinux-policy-targeted >= 3.13.1-216.el7 is needed by container-selinux-2:2.107-1.el7_6.noarch
Apr  1 16:27:10 ip-172-21-75-89 nodeup: I0401 16:27:10.258155    2739 executor.go:143] No progress made, sleeping before retrying 1 failed task(s)

6. What did you expect to happen?
For kops to be able to install docker.

7. Please provide your cluster manifest. Execute
kops get --name my.example.com -o yaml to display your cluster manifest.
You may want to remove your cluster name and other sensitive information.

$ kops get -o yaml | sed -e 's/'${KOPS_CLUSTER_NAME}'/<REDACTED>/g' -e 's/ami-.*/ami-<REDACTED>/g' -e 's/\(oidc.*\):.*/\1: <REDACTED>/'
apiVersion: kops.k8s.io/v1alpha2
kind: Cluster
metadata:
  creationTimestamp: null
  generation: 4
  name: <REDACTED>
spec:
  additionalPolicies:
    master: |
      [
        {
          "Effect": "Allow",
          "Action": ["sts:AssumeRole"],
          "Resource": ["*"]
        }
      ]
    node: |
      [
        {
          "Effect": "Allow",
          "Action": ["sts:AssumeRole"],
          "Resource": ["*"]
        }
      ]
  api:
    loadBalancer:
      type: Public
  authorization:
    rbac: {}
  channel: stable
  cloudProvider: aws
  configBase: s3://<REDACTED>-kops-state-store/<REDACTED>
  encryptionConfig: false
  etcdClusters:
  - etcdMembers:
    - encryptedVolume: true
      instanceGroup: master-eu-west-1a
      name: a
    - encryptedVolume: true
      instanceGroup: master-eu-west-1b
      name: b
    - encryptedVolume: true
      instanceGroup: master-eu-west-1c
      name: c
    name: main
  - etcdMembers:
    - encryptedVolume: true
      instanceGroup: master-eu-west-1a
      name: a
    - encryptedVolume: true
      instanceGroup: master-eu-west-1b
      name: b
    - encryptedVolume: true
      instanceGroup: master-eu-west-1c
      name: c
    name: events
  iam:
    allowContainerRegistry: true
    legacy: false
  kubeAPIServer:
    admissionControl:
    - PodSecurityPolicy
    - NamespaceLifecycle
    - LimitRanger
    - ServiceAccount
    - DefaultStorageClass
    - DefaultTolerationSeconds
    - MutatingAdmissionWebhook
    - ValidatingAdmissionWebhook
    - ResourceQuota
    - PersistentVolumeLabel
    - NodeRestriction
    - Priority
    oidcClientID: <REDACTED>
    oidcIssuerURL: https: <REDACTED>
    oidcUsernameClaim: <REDACTED>
  kubelet:
    anonymousAuth: false
    authenticationTokenWebhook: true
    authorizationMode: Webhook
    featureGates:
      TTLAfterFinished: "true"
  kubernetesApiAccess:
  - 0.0.0.0/0
  kubernetesVersion: 1.16.8
  masterInternalName: api.internal.<REDACTED>
  masterPublicName: api.<REDACTED>
  networkCIDR: 172.21.0.0/16
  networking:
    calico: {}
  nonMasqueradeCIDR: 100.64.0.0/10
  sshAccess:
  - 0.0.0.0/0
  subnets:
  - cidr: 172.21.32.0/19
    name: eu-west-1a
    type: Private
    zone: eu-west-1a
  - cidr: 172.21.64.0/19
    name: eu-west-1b
    type: Private
    zone: eu-west-1b
  - cidr: 172.21.96.0/19
    name: eu-west-1c
    type: Private
    zone: eu-west-1c
  - cidr: 172.21.0.0/22
    name: utility-eu-west-1a
    type: Utility
    zone: eu-west-1a
  - cidr: 172.21.4.0/22
    name: utility-eu-west-1b
    type: Utility
    zone: eu-west-1b
  - cidr: 172.21.8.0/22
    name: utility-eu-west-1c
    type: Utility
    zone: eu-west-1c
  topology:
    bastion:
      bastionPublicName: bastion.<REDACTED>
    dns:
      type: Public
    masters: private
    nodes: private

---

apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
  creationTimestamp: "2020-04-01T15:57:28Z"
  generation: 3
  labels:
    kops.k8s.io/cluster: <REDACTED>
  name: bastions
spec:
  image: ami-<REDACTED>
  machineType: t2.micro
  maxSize: 1
  minSize: 1
  nodeLabels:
    kops.k8s.io/instancegroup: bastions
  role: Bastion
  subnets:
  - utility-eu-west-1a
  - utility-eu-west-1b
  - utility-eu-west-1c

---

apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
  creationTimestamp: "2020-04-01T15:57:29Z"
  generation: 3
  labels:
    kops.k8s.io/cluster: <REDACTED>
  name: master-eu-west-1a
spec:
  image: ami-<REDACTED>
  machineType: m5.large
  maxSize: 1
  minSize: 1
  nodeLabels:
    kops.k8s.io/instancegroup: master-eu-west-1a
  role: Master
  subnets:
  - eu-west-1a

---

apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
  creationTimestamp: "2020-04-01T15:57:29Z"
  generation: 3
  labels:
    kops.k8s.io/cluster: <REDACTED>
  name: master-eu-west-1b
spec:
  image: ami-<REDACTED>
  machineType: m5.large
  maxSize: 1
  minSize: 1
  nodeLabels:
    kops.k8s.io/instancegroup: master-eu-west-1b
  role: Master
  subnets:
  - eu-west-1b

---

apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
  creationTimestamp: "2020-04-01T15:57:30Z"
  generation: 3
  labels:
    kops.k8s.io/cluster: <REDACTED>
  name: master-eu-west-1c
spec:
  image: ami-<REDACTED>
  machineType: m5.large
  maxSize: 1
  minSize: 1
  nodeLabels:
    kops.k8s.io/instancegroup: master-eu-west-1c
  role: Master
  subnets:
  - eu-west-1c

---

apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
  creationTimestamp: "2020-04-01T15:57:31Z"
  generation: 3
  labels:
    kops.k8s.io/cluster: <REDACTED>
  name: nodes
spec:
  image: ami-<REDACTED>
  machineType: m5.xlarge
  maxSize: 6
  minSize: 3
  nodeLabels:
    kops.k8s.io/instancegroup: nodes
  role: Node
  subnets:
  - eu-west-1a
  - eu-west-1b
  - eu-west-1c

8. Please run the commands with most verbose logging by adding the -v 10 flag.
Paste the logs into this report, or in a gist and provide the gist link here.

See logs from point 5
9. Anything else do we need to know?
We use custom AMIs generated with packer. We're running this setup for more than 2 years and so far we never had issues with Amazon Linux being treated like CentOS.

I tried installing RPMs from centos 7 but it got messy very quickly and the effort doesn't seem worthwhile.
I've noticed that #8525 might eventually fix this.
What are my options in this scenario?
I might instruct kops not to install docker and build my AMIs with one of the docker versions provided by amazon linux extras, but there is no exact match with the versions listed in nodeup/pkg/model/docker.go (actually it looks that 18.09.9 is a match, but the latest supported is 19.03.4, whereas the os would have 19.03.6):

# yum search docker --show-duplicates|grep ^docker
docker-17.06.2ce-1.102.amzn2.x86_64 : Automates deployment of containerized
docker-17.12.1ce-2.amzn2.x86_64 : Automates deployment of containerized
docker-18.03.1ce-2.amzn2.x86_64 : Automates deployment of containerized
docker-18.03.1ce-3.amzn2.x86_64 : Automates deployment of containerized
docker-18.03.1ce-5.amzn2.x86_64 : Automates deployment of containerized
docker-18.06.1ce-2.amzn2.x86_64 : Automates deployment of containerized
docker-18.06.1ce-4.amzn2.x86_64 : Automates deployment of containerized
docker-18.06.1ce-5.amzn2.x86_64 : Automates deployment of containerized
docker-18.06.1ce-6.amzn2.x86_64 : Automates deployment of containerized
docker-18.06.1ce-7.amzn2.x86_64 : Automates deployment of containerized
docker-18.06.1ce-8.amzn2.x86_64 : Automates deployment of containerized
docker-18.06.1ce-10.amzn2.x86_64 : Automates deployment of containerized
docker-18.09.9ce-2.amzn2.x86_64 : Automates deployment of containerized
docker-19.03.6ce-1.amzn2.x86_64 : Automates deployment of containerized

is #8803 (comment) still the best course of action?

@Mikulas
Copy link
Contributor

Mikulas commented Apr 3, 2020

We are also experiencing this issue with Amazon Linux 2, kops docker-ce fails to install on those stock AMIs:

  • amazon.com/amzn2-ami-hvm-2.0.20190313-x86_64-gp2
  • amazon.com/amzn2-ami-hvm-2.0.20200304.0-x86_64-gp2

@Mikulas
Copy link
Contributor

Mikulas commented Apr 3, 2020

I can confirm the bypass mentioned in #8803 (comment) does work with amzn2-ami-hvm-2.0.20200304.0-x86_64-gp2 even with stable kops 1.16.

@hakman
Copy link
Member

hakman commented Apr 4, 2020

I can also confirm that, until Kops 1.18 is released, #8803 (comment) will be the only way to install Kubernetes on Amazon Linux 2. This is possible only because Kops 1.18 is aware of and manages containerd and not just Docker.

Please test Amazon Linux 2 support in Kops 1.18 alphas and betas when released and report any issues so that they can be addressed.

@elisiano
Copy link
Contributor Author

elisiano commented Apr 6, 2020

Unfortunately I’m not in a position where I can test alpha releases (only stable). I’m closing this ticket, given there is a known workaround.

@elisiano elisiano closed this as completed Apr 6, 2020
@hakman
Copy link
Member

hakman commented Apr 6, 2020

For future reference, this is the place where latest Kops is tested daily with Amazon Linux 2:
https://testgrid.k8s.io/sig-cluster-lifecycle-kops#kops-aws-distro-amazonlinux-2

@throrin19
Copy link

Any news about this ? I have the same problem using directly the latest Amazon Linux 2 AMI

@olemarkus
Copy link
Member

As mentioned above, only the current kops alpha is expected to work. As we haven't released 1.17 yet, it will unfortunately take a while for 1.18 to go stable.

@hakman
Copy link
Member

hakman commented May 26, 2020

For now, the only known workaround is #8803 (comment).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants