Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nodeup can't find container-selinux-2.68-1.el7.noarch.rpm when trying to bootstrap a new node to a cluster #7608

Closed
igarcia-sugarcrm opened this issue Sep 17, 2019 · 24 comments
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.

Comments

@igarcia-sugarcrm
Copy link
Contributor

igarcia-sugarcrm commented Sep 17, 2019

1. What kops version are you running? The command kops version, will display
this information.

Version 1.13.0

2. What Kubernetes version are you running? kubectl version will print the
version if a cluster is running or provide the Kubernetes version specified as
a kops flag.

Version 1.13.0
3. What cloud provider are you using?
AWS
4. What commands did you run? What is the simplest way to reproduce this issue?
Adding a node to a cluster results in nodeup to look for Downloading "http://mirror.centos.org/centos/7/extras/x86_64/Packages/container-selinux-2.68-1.el7.noarch.rpm" which it does not exist anymore due to centos 7.7 release.
5. What happened after the commands executed?
kops tries to boostrap the node but nodeup fails due to pointing to a nonexistent package

6. What did you expect to happen?
New node bootstrapped and joined to the cluster

8. Please run the commands with most verbose logging by adding the -v 10 flag.
Paste the logs into this report, or in a gist and provide the gist link here.

Sep 17 19:59:07  nodeup: I0917 19:59:07.667801    3560 executor.go:103] Tasks: 40 done / 48 total; 1 can run
Sep 17 19:59:07  nodeup: I0917 19:59:07.667844    3560 executor.go:178] Executing task "Package/docker-ce": Package: docker-ce
Sep 17 19:59:07  nodeup: I0917 19:59:07.667883    3560 package.go:206] Listing installed packages: /usr/bin/rpm -q docker-ce --queryformat %{NAME} %{VERSION}
Sep 17 19:59:07 nodeup: I0917 19:59:07.693153    3560 package.go:267] Installing package "docker-ce" (dependencies: [Package: container-selinux])
Sep 17 19:59:07  nodeup: I0917 19:59:07.747296    3560 files.go:100] Hash matched for "/var/cache/nodeup/packages/docker-ce": sha1:5369602f88406d4fb9159dc1d3fd44e76fb4cab8
Sep 17 19:59:07 nodeup: I0917 19:59:07.747368    3560 files.go:103] Hash did not match for "/var/cache/nodeup/packages/container-selinux": actual=sha1:93fdc15d22645b17bb1b2cc652f5bf51924d00a7 vs expected=sha1:d9f87f7f4f2e8e611f556d873a17b8c0c580fec0
Sep 17 19:59:07  nodeup: I0917 19:59:07.747458    3560 http.go:77] Downloading "http://mirror.centos.org/centos/7/extras/x86_64/Packages/container-selinux-2.68-1.el7.noarch.rpm"
Sep 17 19:59:07  nodeup: I0917 19:59:07.891339    3560 files.go:103] Hash did not match for "/var/cache/nodeup/packages/container-selinux": actual=sha1:93fdc15d22645b17bb1b2cc652f5bf51924d00a7 vs expected=sha1:d9f87f7f4f2e8e611f556d873a17b8c0c580fec0
Sep 17 19:59:07  nodeup: W0917 19:59:07.891385    3560 executor.go:130] error running task "Package/docker-ce" (2m20s remaining to succeed): downloaded from "http://mirror.centos.org/centos/7/extras/x86_64/Packages/container-selinux-2.68-1.el7.noarch.rpm"
 but hash did not match expected "sha1:d9f87f7f4f2e8e611f556d873a17b8c0c580fec0"
@igarcia-sugarcrm igarcia-sugarcrm changed the title Nodeup can't find container-selinux-2.68-1.el7.noarch.rpm when trying to bootstrap and add a new node to a cluster Nodeup can't find container-selinux-2.68-1.el7.noarch.rpm when trying to bootstrap a new node to a cluster Sep 17, 2019
@elisiano
Copy link
Contributor

I'm seeing this as well

@eytan-avisror
Copy link

eytan-avisror commented Sep 17, 2019

We are seeing this issue as well.

Looks like this package was removed from centos repo, returning a 404:

wget http://mirror.centos.org/centos/7/extras/x86_64/Packages/container-selinux-2.68-1.el7.noarch.rpm
--2019-09-17 15:10:16--  http://mirror.centos.org/centos/7/extras/x86_64/Packages/container-selinux-2.68-1.el7.noarch.rpm
Resolving mirror.centos.org (mirror.centos.org)... 23.254.0.226
Connecting to mirror.centos.org (mirror.centos.org)|23.254.0.226|:80... connected.
HTTP request sent, awaiting response... 404 Not Found
2019-09-17 15:10:17 ERROR 404: Not Found.

This causes a major issue when considering autoscaling (cluster-autoscaler) which takes down nodes and new ones never join the cluster.

Ideally, for resiliency Kops should not be resolving artifacts required for nodeup/bootstrapping during node runtime from public repos - not sure if this is the way to go but possibly consider placing such critical rpms/binaries in the state store during init and fetching from there during runtime?
Also, if package is already installed (some may choose to bake in their AMI), it should skip trying to fetch this (not sure if this is the current behavior already).

rifelpet added a commit to rifelpet/kops that referenced this issue Sep 17, 2019
I noticed that the recent container-selinux issue on centos was reporting a hash mismatch rather than a 404.

See the error message here: kubernetes#7608 and the "actual" sha1 response is that of the 404 page:

```
curl http://mirror.centos.org/centos/7/extras/x86_64/Packages/container-selinux-2.68-1.el7.noarch.rpm
curl http://mirror.centos.org/centos/7/extras/x86_64/Packages/container-selinux-2.68-1.el7.noarch.rpm | shasum -a 1
```
@ianlmk
Copy link

ianlmk commented Sep 18, 2019

Experiencing this in a production cluster as well. Is there any way to fast track this?

Added a PR
#7612

@rdjy
Copy link

rdjy commented Sep 18, 2019

A manual workaround is downloading the following file from a working node
/var/cache/nodeup/packages/container-selinux
and upload it to the new node.

Some Centos mirrors sites might still have the old RPM file. see: https://mirror-status.centos.org/

@gjtempleton
Copy link
Member

This has just bitten us as well, #7609 should resolve it however.

mikesplain pushed a commit to mikesplain/kops that referenced this issue Sep 18, 2019
I noticed that the recent container-selinux issue on centos was reporting a hash mismatch rather than a 404.

See the error message here: kubernetes#7608 and the "actual" sha1 response is that of the 404 page:

```
curl http://mirror.centos.org/centos/7/extras/x86_64/Packages/container-selinux-2.68-1.el7.noarch.rpm
curl http://mirror.centos.org/centos/7/extras/x86_64/Packages/container-selinux-2.68-1.el7.noarch.rpm | shasum -a 1
```
mikesplain pushed a commit to mikesplain/kops that referenced this issue Sep 18, 2019
I noticed that the recent container-selinux issue on centos was reporting a hash mismatch rather than a 404.

See the error message here: kubernetes#7608 and the "actual" sha1 response is that of the 404 page:

```
curl http://mirror.centos.org/centos/7/extras/x86_64/Packages/container-selinux-2.68-1.el7.noarch.rpm
curl http://mirror.centos.org/centos/7/extras/x86_64/Packages/container-selinux-2.68-1.el7.noarch.rpm | shasum -a 1
```
mikesplain pushed a commit to mikesplain/kops that referenced this issue Sep 18, 2019
I noticed that the recent container-selinux issue on centos was reporting a hash mismatch rather than a 404.

See the error message here: kubernetes#7608 and the "actual" sha1 response is that of the 404 page:

```
curl http://mirror.centos.org/centos/7/extras/x86_64/Packages/container-selinux-2.68-1.el7.noarch.rpm
curl http://mirror.centos.org/centos/7/extras/x86_64/Packages/container-selinux-2.68-1.el7.noarch.rpm | shasum -a 1
```
@a8j8i8t8
Copy link

@rdjy Thanks for the answer, it did the trick for us.

@dojadop
Copy link

dojadop commented Sep 18, 2019

Now that #7609 is merged how would I be able to leverage this change? Do I have to wait for a new kops release or how is nodeup released?

@mikesplain
Copy link
Contributor

We're working on getting a 1.13/1.14 cut with these fixes asap.

You'll either need to build and deploy your own version of kops (including protokube and kubeup), a workaround as suggested above (you can probably utilize a hook to automate it https://github.com/kubernetes/kops/blob/master/docs/cluster_spec.md#hooks) or wait for a release which we're actively working on getting out asap!

@alexinthesky
Copy link

Hi,

I had no luck using a hook to curl the correct file, as hooks seem to run AFTER nodeup. All I can think of is to build a custom AMI instead of vanillia amazon linux 2.

@hrzbrg
Copy link

hrzbrg commented Sep 19, 2019

Indeed, hooks won't work. We figured that out the exact same time as @alexinthesky 😂

Then we switched for the Debian AMI to avoid further damage by dying spot instances.
kope.io/k8s-1.14-debian-stretch-amd64-hvm-ebs-2019-08-16

@CarpathianUA
Copy link

+1 Seeing the same

@elisiano
Copy link
Contributor

it's a bit involved but we found a workaround until a new release is cut (especially for people having this issue in production).
Bottom line is:

  • create a public s3 bucket and place there a tar with what you need (we did this with all /var/cache/nodeup, it's around 200mb)
  • copy the current launch configuration of the AutoScalingGroup into a new one (make sure you select the right IP policy based on your topology) and add 1 line in the beginning:
     curl https://yourBucket/var_cache_nodeup.tgz | tar -C / -xzf -
    
    (adjust the tar path extraction depending how you created your tar)
  • update the AutoScalingGroup to use the newly created LaunchConfiguration.

This way the cache is there before nodeup is ran.

@rdjy
Copy link

rdjy commented Sep 20, 2019

Below is an improved workaround, inspired by previous comments and pull requests. Kops supports arbitrary userdata. The snippet needs to be added to each instance group spec.

spec:
  additionalUserData:
  - content: |
      bootcmd:
        - mkdir -p /var/cache/nodeup/packages
        - curl --proxy http://my.proxy:3128 -o /var/cache/nodeup/packages/container-selinux http://mirror.centos.org/centos/7.6.1810/extras/x86_64/Packages/container-selinux-2.68-1.el7.noarch.rpm
    name: workaround-container-selinux
    type: text/cloud-config

@dignajar
Copy link

Hi,
I just face the same issue recreating one of the masters node.

I connected to the node via ssh and download the package from another URL.

curl http://mirror.centos.org/centos/7.6.1810/extras/x86_64/Packages/container-selinux-2.68-1.el7.noarch.rpm -o /var/cache/nodeup/packages/container-selinux

@bgopalakrishnan1986
Copy link

Was able to workaround the issue by running the below commands on both Master and Nodes

curl http://mirror.centos.org/centos/7.6.1810/extras/x86_64/Packages/container-selinux-2.68-1.el7.noarch.rpm -o /var/cache/nodeup/packages/container-selinux
yum install -y selinux-policy selinux-policy-base selinux-policy-targeted

@vobrien-axway
Copy link

This workaround no longer works. As of today http://mirror.centos.org/centos/7.6.1810/ has been deprecated. This also breaks the fix that went in kops 1.13.1: #7609

As a workaround you can use http://vault.centos.org/7.6.1810/extras/x86_64/Packages/container-selinux-2.68-1.el7.noarch.rpm

But really contianer-selinux needsto be updated to http://mirror.centos.org/centos/7/extras/x86_64/Packages/container-selinux-2.107-3.el7.noarch.rpm along with associated dependencies

@justinsb
Copy link
Member

OK so looks like we'll be doing 1.13.2 this morning. I'd also really prefer to get away from the OS packaging (towards "tar.gz" installation) as it seems to be introducing more problems than it solves.

For 2.68.1 -> 2.107.3: We try not to make potentially breaking changes once we have released the 1.x.0 of kops. But we do so for security fixes etc. So we can look at getting it into 1.14.0 (which hasn't quite released yet). But is it a security fix (in which case we would get it into 1.13.0)?

@justinsb
Copy link
Member

Here's the changelog, looks like there's not a strict security fix vs feature distinction, so we should probably shouldn't introduce the new version in kops 1.13:

* Fri Aug 02 2019 Jindrich Novy <jnovy@redhat.com> - 2:2.107-3
- use 2.107 in RHEL7u7
- add build.sh script

* Thu Jul 11 2019 Lokesh Mandvekar <lsm5@redhat.com> - 2:2.107-2
- Resolves: #1626215

* Mon Jun 24 2019 Lokesh Mandvekar <lsm5@redhat.com> - 2:2.107-1
- bump to v2.107

* Tue Apr 23 2019 Lokesh Mandvekar <lsm5@redhat.com> - 2:2.99-1
- built commit b13d03b

* Tue Apr 02 2019 Frantisek Kluknavsky <fkluknav@redhat.com> - 2:2.95-2
- rebase

* Thu Feb 28 2019 Frantisek Kluknavsky <fkluknav@redhat.com> - 2:2.84-2
- rebase

* Tue Jan 08 2019 Frantisek Kluknavsky <fkluknav@redhat.com> - 2.77-1
- backported fixes from upstream

* Mon Nov 12 2018 Dan Walsh <dwalsh@fedoraproject.org> - 2.76-1
- Allow containers to use fuse file systems by default
- Allow containers to sendto dgram socket of container runtimes
- Needed to run container runtimes in notify socket unit files.

* Fri Oct 19 2018 Dan Walsh <dwalsh@fedoraproject.org> - 2.74-1
- Allow containers to setexec themselves

* Tue Sep 18 2018 Frantisek Kluknavsky <fkluknav@redhat.com> - 2:2.73-3
- tweak macro for fedora - applies to rhel8 as well

* Mon Sep 17 2018 Frantisek Kluknavsky <fkluknav@redhat.com> - 2:2.73-2
- moved changelog entries:
- Define spc_t as a container_domain, so that container_runtime will transition
to spc_t even when setup with nosuid.
- Allow container_runtimes to setattr on callers fifo_files
- Fix restorecon to not error on missing directory

* Thu Sep 06 2018 Dan Walsh <dwalsh@fedoraproject.org> - 2.69-3
- Make sure we pull in the latest selinux-policy

* Wed Jul 25 2018 Dan Walsh <dwalsh@fedoraproject.org> - 2.69-2
- Add map support to container-selinux for RHEL 7.5
- Dontudit attempts to write to kernel_sysctl_t

@nigeldunn
Copy link

nigeldunn commented Sep 26, 2019

Can the packages be externalised into a yaml/json file that nodeup reads in instead of being compiled into the binary? That would enable people to source the rpm and store it locally (s3, cloud storage, etc).

I've opted to save the rpm in S3 and then add it into kops with this in the instance groups:

spec:
  additionalUserData:
  - content: |
      bootcmd:
        - mkdir -p /var/cache/nodeup/packages
        - aws s3 cp s3://<my-s3-bucket>/container-selinux /var/cache/nodeup/packages/container-selinux
    name: workaround-container-selinux
    type: text/cloud-config

Then you just need to sort out the bucket policy and iam privileges for kops to read from the bucket. This is in an AWS environment obviously, I'm sure there are similar approaches for the other cloud platforms.

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 25, 2019
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jan 24, 2020
@fejta-bot
Copy link

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

@k8s-ci-robot
Copy link
Contributor

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@jbdoto
Copy link

jbdoto commented Mar 1, 2022

Hi Everyone,

Our team encountered this issue yesterday on a Kops 1.14.8 cluster we have, related to this vault.centos.org issue.

We had previously successfully used this fix on an older cluster, but we had an issue using the bootcmd approach detailed in that comment. We ended up using the following approach in an additionalUserData stanza:

  additionalUserData:
    - name: initialize-cache.sh
      type: text/x-shellscript
      content: |
        #!/bin/sh
        ( mkdir -p /var/cache/nodeup/packages && curl -o /var/cache/nodeup/packages/container-selinux https://mirror.rackspace.com/centos-vault/centos/7.6.1810/extras/x86_64/Packages/container-selinux-2.68-1.el7.noarch.rpm )

For full disclosure, our actual fix pointed to our company's internal yum repo, so if you have the ability to do that, it's probably a better solution than relying on a public mirror.

Hope this helps save everyone else some pain!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Projects
None yet
Development

No branches or pull requests