Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Re-add handling of rename from official image "k8s.gcr.io/coredns" to "k8s.gcr.io/coredns/coredns" to support deprecated installations #114978

Closed
wants to merge 1 commit into from

Conversation

AndiDog
Copy link

@AndiDog AndiDog commented Jan 11, 2023

What type of PR is this?

/kind bug
/kind regression

What this PR does / why we need it:

This was accidentally replaced in 50bea1d (PR #109938). By re-adding it, kubeadm join will no longer fail with failed to pull image k8s.gcr.io/coredns:v1.8.6 for clusters that still have k8s.gcr.io explicitly configured as image repository, such as CAPI-managed clusters.

Example failure that I'm facing:

  • Cluster got created via CAPI for an older Kubernetes version. The default value k8s.gcr.io gets stored in the cluster (kubectl get configmap -n kube-system kubeadm-config -o yaml | grep imageRepository: # prints k8s.gcr.io value)
  • Cluster upgrade to v1.23.15, or any other version that switched to registry.k8s.io, is performed by CAPI. This means starting a new cloud server as node, and running kubeadm join on it.
  • kubeadm (buggy version v1.23.15) reads the in-cluster configuration, sees ClusterConfiguration.imageRepository=k8s.gcr.io and then fails with failed to pull image k8s.gcr.io/coredns:v1.8.6 because the renaming logic for official images is gone

Which issue(s) this PR fixes:

Did not create an issue

Context:

Special notes for your reviewer:

Maybe @dims who authored the related migration wants to have a look 😉

Does this PR introduce a user-facing change?

Fix `kubeadm join` image pulls of the coredns image for installations that still use the old registry k8s.gcr.io

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


… "k8s.gcr.io/coredns/coredns" to support deprecated installations

This was accidentally replaced in
kubernetes@50bea1d.
By re-adding it, `kubeadm join` will no longer fail with `failed to pull image
k8s.gcr.io/coredns:v1.8.6` for clusters that still have `k8s.gcr.io` explicitly
configured as image repository, such as CAPI-managed clusters.
@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. kind/bug Categorizes issue or PR as related to a bug. kind/regression Categorizes issue or PR as related to a regression from a prior release. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Jan 11, 2023
@k8s-ci-robot
Copy link
Contributor

Welcome @AndiDog!

It looks like this is your first PR to kubernetes/kubernetes 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes/kubernetes has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

@k8s-ci-robot k8s-ci-robot added needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jan 11, 2023
@k8s-ci-robot
Copy link
Contributor

@AndiDog: This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot
Copy link
Contributor

Hi @AndiDog. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the needs-priority Indicates a PR lacks a `priority/foo` label and requires one. label Jan 11, 2023
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: AndiDog
Once this PR has been reviewed and has the lgtm label, please assign neolit123 for approval by writing /assign @neolit123 in a comment. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added area/kubeadm sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Jan 11, 2023
Copy link
Member

@neolit123 neolit123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cluster got created via CAPI

capi has specific handling of the migration that they were working on recently. please contact the capi maintainers about your use case.

@@ -48,8 +48,9 @@ func GetDNSImage(cfg *kubeadmapi.ClusterConfiguration) string {
if cfg.DNS.ImageRepository != "" {
dnsImageRepository = cfg.DNS.ImageRepository
}
// Handle the renaming of the official image from "registry.k8s.io/coredns" to "registry.k8s.io/coredns/coredns
if dnsImageRepository == kubeadmapiv1beta3.DefaultImageRepository {
// Handle the renaming of the official image from "registry.k8s.io/coredns" to "registry.k8s.io/coredns/coredns"
Copy link
Member

@neolit123 neolit123 Jan 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is actually not valid. the official image remains with a /coredns subpath. if the user has a custom repo the subpath is always absent.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand what's not valid here.

$ docker pull k8s.gcr.io/coredns:v1.8.6
Trying to pull k8s.gcr.io/coredns:v1.8.6...
Error: initializing source docker://k8s.gcr.io/coredns:v1.8.6: reading manifest v1.8.6 in k8s.gcr.io/coredns: manifest unknown: Failed to fetch "v1.8.6" from request "/v2/coredns/manifests/v1.8.6".

$ docker pull k8s.gcr.io/coredns/coredns:v1.8.6
Trying to pull k8s.gcr.io/coredns/coredns:v1.8.6...
Getting image source signatures
Copying blob sha256:88efb86cbcab356a7db0ba9ea09694d394f2e088f3e5119aa331025c1e6cb7fa
[...]

$ docker pull registry.k8s.io/coredns:v1.8.6
Trying to pull registry.k8s.io/coredns:v1.8.6...
Error: initializing source docker://registry.k8s.io/coredns:v1.8.6: reading manifest v1.8.6 in registry.k8s.io/coredns: manifest unknown: Failed to fetch "v1.8.6"

$ docker pull registry.k8s.io/coredns/coredns:v1.8.6
Trying to pull registry.k8s.io/coredns/coredns:v1.8.6...
Getting image source signatures
Copying blob sha256:88efb86cbcab356a7db0ba9ea09694d394f2e088f3e5119aa331025c1e6cb7fa
[...]

So k8s.gcr.io and registry.k8s.io seem to behave the same, and therefore require the same workaround. Before 50bea1d, this exact code location did the rewrite for k8s.gcr.io. Or did I mistake something?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this code:
50bea1d#diff-54f02c1435b3faae5f85c0507e8ccb2d3986f080287d9de55473b6587e126fa9R51
is about appending /coredns to registry.k8s.io so that the location ends up as registry.k8s.io/coredns/coredns:foo and is the valid one.
for newer versions of kubeadm we should not track the old registry k8s.gcr.io.
at that point kubeadm considers k8s.gcr.io a "custom" registry and the coredns image location is k8s.gcr.io/coredns:foo
this is by design and was done for backwards compatibility once coredns maintainers introduced the /coredns subpath.

@neolit123
Copy link
Member

neolit123 commented Jan 11, 2023

for clusters that still have k8s.gcr.io explicitly configured as image repository, such as CAPI-managed clusters.

this should not be done as it will trip kubeadm upgrade. you should remove the explicit setting and let kubeadm swap to registry.k8s.io. this applies to kubeadm standalone, not sure about capi.

@AndiDog
Copy link
Author

AndiDog commented Jan 11, 2023

Cluster got created via CAPI

capi has specific handling of the migration that they were working on recently. please contact the capi maintainers about your use case.

Yes, I had digged through all migration code related to registry.k8s.io. The code in CAPI only auto-migrates the CoreDNS image if using the official repos (k8s.gcr.io / registry.k8s.io). But since my use case has a custom CoreDNS installation, I don't actually want kubeadm to even pull CoreDNS (see kubernetes/kubeadm#2603 for skipping image pulls). While that bug is still active, we could fix image pull attempts as suggested in this PR. So my use case isn't CAPI-specific, as it only delegates to kubeadm for joining a cluster and won't rewrite my custom CoreDNS image (e.g. private.repo.com/coredns).

for clusters that still have k8s.gcr.io explicitly configured as image repository, such as CAPI-managed clusters.

this should not be done as it will trip kubeadm upgrade. you should remove the explicit setting and let kubeadm swap to registry.k8s.io. this applies to kubeadm standalone, not sure about capi.

That would mean manual intervention on every managed cluster that has the explicit setting. If Kubernetes allows smooth, automatic upgrades of existing Kubernetes clusters with old versions, users will be more likely to upgrade. Then with a later major version, Kubernetes could automatically replace any image pulls for k8s.gcr.io with registry.k8s.io instead of allowing the old registry or having to support some migration code.

AndiDog added a commit to giantswarm/cluster-aws that referenced this pull request Jan 11, 2023
….15 tries to pull the official image incorrectly

While kubeadm is buggy (kubernetes/kubernetes#114978) and tries to download the CoreDNS image despite us skipping that addon (kubernetes/kubeadm#2603), we try to use the new official repository already. This fixes `kubeadm join` for new Kubernetes versions and therefore avoids stuck node upgrades.
AndiDog added a commit to giantswarm/cluster-aws that referenced this pull request Jan 11, 2023
…s to pull the official image incorrectly

While kubeadm is buggy (kubernetes/kubernetes#114978) and tries to download the CoreDNS image despite us skipping that addon (kubernetes/kubeadm#2603), we try to use the new official repository already. This fixes `kubeadm join` for new Kubernetes versions and therefore avoids stuck node upgrades. CAPI propagates this value before attempting the node upgrade.
@neolit123
Copy link
Member

neolit123 commented Jan 11, 2023

see kubernetes/kubeadm#2603 for skipping image pulls

that is actually a complicated problem and all solutions had trade-offs, IIRC.

That would mean manual intervention on every managed cluster that has the explicit setting. If Kubernetes allows smooth, automatic upgrades of existing Kubernetes clusters with old versions, users will be more likely to upgrade. Then with a later major version, Kubernetes could automatically replace any image pulls for k8s.gcr.io with registry.k8s.io instead of allowing the old registry or having to support some migration code.

kubeadm (no capi) users had the following problem:

  • they heard about the k8s.gcr.io -> registry.k8s.io migration and pinned k8s,gcr.io in their ClusterConfiguration.imageRepository, to delay the migration
  • that is not going to work, since kubeadm upgrade/join/... (from a binary that has registry.k8s.io as the default) will see k8s.gcr.io and think that this is a "custom" / unknown registry
  • coredns image pulls will fail since k8s.gcr.io/coredns does not exist
  • the fix is for them to remove the k8s.gcr.io pin and let kubeadm migrate from old default -> new default registry

we want users to stop using k8s.gcr.io ASAP which means that the transition had to be immediate and not track the old registry at all. k8s.gcr.io is currently still generating traffic that costs a lot of $$.

@AndiDog
Copy link
Author

AndiDog commented Jan 11, 2023

Fully agree with all the explanations! I didn't want to immediately suggest a more drastic change as first-time contributor. Then would it make sense to enforce the migration by replacing k8s.gcr.io/coredns with registry.k8s.io/coredns/coredns? The image pull of k8s.gcr.io/coredns:TAG anyway fails right now, so we wouldn't make it worse. And the change could be scoped to the CoreDNS image for now in order to avoid any impact on cluster upgrades/joins.

@neolit123
Copy link
Member

this branch now tracks .27. i am generally not in favour of keeping track of the old registry in kubeadm code for .27 or backporting more changes to .26.

of course, the other reviewers might have a different opinion.

AndiDog added a commit to giantswarm/cluster-aws that referenced this pull request Jan 11, 2023
…s to pull the official image incorrectly

While kubeadm is buggy (kubernetes/kubernetes#114978) and tries to download the CoreDNS image despite us skipping that addon (kubernetes/kubeadm#2603), we try to use the new official repository already. This fixes `kubeadm join` for new Kubernetes versions and therefore avoids stuck node upgrades. CAPI propagates this value before attempting the node upgrade.
AndiDog added a commit to giantswarm/cluster-aws that referenced this pull request Jan 11, 2023
…s to pull the official image incorrectly (#197)

While kubeadm is buggy (kubernetes/kubernetes#114978) and tries to download the CoreDNS image despite us skipping that addon (kubernetes/kubeadm#2603), we try to use the new official repository already. This fixes `kubeadm join` for new Kubernetes versions and therefore avoids stuck node upgrades. CAPI propagates this value before attempting the node upgrade.
@pacoxu
Copy link
Member

pacoxu commented Mar 13, 2023

kubernetes/kubeadm#2714 (comment)

  • If I understand correctly, you may try to use the old image registry by using a custom configuration like below.
apiVersion: kubeadm.k8s.io/v1beta3
bootstrapTokens:
- groups:
  - system:bootstrappers:kubeadm:default-node-token
  token: abcdef.0123456789abcdef
  ttl: 24h0m0s
  usages:
  - signing
  - authentication
kind: InitConfiguration
localAPIEndpoint:
  advertiseAddress: 1.2.3.4
  bindPort: 6443
nodeRegistration:
  criSocket: unix:///var/run/containerd/containerd.sock
  imagePullPolicy: IfNotPresent
  name: node
  taints: null
---
apiServer:
  timeoutForControlPlane: 4m0s
apiVersion: kubeadm.k8s.io/v1beta3
certificatesDir: /etc/kubernetes/pki
clusterName: kubernetes
controllerManager: {}
dns:
  imageRepository: k8s.gcr.io/coredns
etcd:
  local:
    dataDir: /var/lib/etcd
imageRepository: k8s.gcr.io
kind: ClusterConfiguration
kubernetesVersion: 1.25.0
networking:
  dnsDomain: cluster.local
  serviceSubnet: 10.96.0.0/12
scheduler: {}

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 17, 2023
@k8s-ci-robot
Copy link
Contributor

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

  • Mark this PR as fresh with /remove-lifecycle stale
  • Close this PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 15, 2023
@AndiDog AndiDog closed this Jun 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/kubeadm cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. kind/regression Categorizes issue or PR as related to a regression from a prior release. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants