Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error: failed to list objects for the "infrastructure.cluster.x-k8s.io/v1alpha4, Kind=AWSClusterControllerIdentity" during upgrade to CAPI 1.0 #2955

Closed
wmgroot opened this issue Nov 15, 2021 · 6 comments
Labels
kind/bug Categorizes issue or PR as related to a bug. priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. triage/accepted Indicates an issue or PR is ready to be actively worked on.

Comments

@wmgroot
Copy link

wmgroot commented Nov 15, 2021

/kind bug

What steps did you take and what happened:
I am trying to upgrade a cluster from CAPI 0.3 to CAPI 1.0.

I've successfully updated our test cluster from 0.3 to 0.4 by following these steps

  1. Download clusterctl 0.4.4
  2. Run clusterctl upgrade plan and clusterctl upgrade apply
  3. Ensure all CRDs are now using alpha4 instead of alpha3
  4. Verify by rolling all nodes in the cluster with an AWSMachineTemplate update (succeeded)
clusterctl-0.4.4 upgrade apply --contract v1alpha4
Checking cert-manager version...
Deleting cert-manager Version="v1.1.0"
Installing cert-manager Version="v1.5.3"
Waiting for cert-manager to be available...
Performing upgrade...
Scaling down Provider="cluster-api" Version="v0.3.22" Namespace="capi-system"
Scaling down Provider="bootstrap-kubeadm" Version="v0.3.22" Namespace="capi-kubeadm-bootstrap-system"
Scaling down Provider="control-plane-kubeadm" Version="v0.3.22" Namespace="capi-kubeadm-control-plane-system"
Scaling down Provider="infrastructure-aws" Version="v0.6.7" Namespace="capa-system"
Deleting Provider="cluster-api" Version="v0.3.22" Namespace="capi-system"
Installing Provider="cluster-api" Version="v0.4.4" TargetNamespace="capi-system"
Deleting Provider="bootstrap-kubeadm" Version="v0.3.22" Namespace="capi-kubeadm-bootstrap-system"
Installing Provider="bootstrap-kubeadm" Version="v0.4.4" TargetNamespace="capi-kubeadm-bootstrap-system"
Deleting Provider="control-plane-kubeadm" Version="v0.3.22" Namespace="capi-kubeadm-control-plane-system"
Installing Provider="control-plane-kubeadm" Version="v0.4.4" TargetNamespace="capi-kubeadm-control-plane-system"
Deleting Provider="infrastructure-aws" Version="v0.6.7" Namespace="capa-system"
Installing Provider="infrastructure-aws" Version="v0.7.1" TargetNamespace="capa-system"

From there I performed the same steps using clusterctl 1.0.1, expecting to see beta1 replace alpha4.
However, I hit the following error while upgrading the aws infrastructure provider.

clusterctl-1.0.1 upgrade apply --contract v1beta1
Checking cert-manager version...
Cert-manager is already up to date
Performing upgrade...
Scaling down Provider="cluster-api" Version="v0.4.4" Namespace="capi-system"
Scaling down Provider="bootstrap-kubeadm" Version="v0.4.4" Namespace="capi-kubeadm-bootstrap-system"
Scaling down Provider="control-plane-kubeadm" Version="v0.4.4" Namespace="capi-kubeadm-control-plane-system"
Scaling down Provider="infrastructure-aws" Version="v0.7.1" Namespace="capa-system"
Deleting Provider="cluster-api" Version="v0.4.4" Namespace="capi-system"
Error: failed to list objects for the "infrastructure.cluster.x-k8s.io/v1alpha4, Kind=AWSClusterControllerIdentity" GroupVersionKind: conversion webhook for infrastructure.cluster.x-k8s.io/v1alpha3, Kind=AWSClusterControllerIdentity failed: Post "https://capa-webhook-service.capa-system.svc:443/convert?timeout=30s": dial tcp 44.145.89.35:443: connect: connection refused

What did you expect to happen:
I expected an upgrade from 0.4.4 to 1.0.1 to apply cleanly.

Anything else you would like to add:
I did not do anything to clean up the old alpha3 CRDs in this cluster before attempting the upgrade to 1.0.

Environment:

  • Cluster-api-provider-aws version: 0.7.1
  • Kubernetes version: (use kubectl version): 1.21.5
  • OS (e.g. from /etc/os-release): ubuntu capi AMI
@k8s-ci-robot k8s-ci-robot added kind/bug Categorizes issue or PR as related to a bug. needs-priority labels Nov 15, 2021
@k8s-ci-robot
Copy link
Contributor

@wmgroot: This issue is currently awaiting triage.

If CAPA/CAPI contributors determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Nov 15, 2021
@sedefsavas sedefsavas added the priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. label Nov 15, 2021
@sedefsavas sedefsavas added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Nov 15, 2021
@sedefsavas sedefsavas added this to the v1.1.0 milestone Nov 15, 2021
@sedefsavas
Copy link
Contributor

@randomvariable any ideas why this error shows up:
Error: failed to list objects for the "infrastructure.cluster.x-k8s.io/v1alpha4, Kind=AWSClusterControllerIdentity" GroupVersionKind: conversion webhook for infrastructure.cluster.x-k8s.io/v1alpha3, Kind=AWSClusterControllerIdentity failed: Post "https://capa-webhook-service.capa-system.svc:443/convert?timeout=30s": dial tcp 44.145.89.35:443: connect: connection refused

RBAC allows listing AWSClusterControllerIdentity in those versions.

@sedefsavas
Copy link
Contributor

There is a fix that will be in the next cluster-api release for this in clusterctl side: kubernetes-sigs/cluster-api#5681

@wmgroot
Copy link
Author

wmgroot commented Nov 16, 2021

When was the capa-webhook-service introduced?

I can see that it does not exist on our clusters running capi 0.3, provider-aws 0.6.7.

kubectl get svc -n capa-system
NAME                                      TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)    AGE
capa-controller-manager-metrics-service   ClusterIP   44.144.190.132   <none>        8443/TCP   54d

However, it does exist after attempting to run the upgrade from capi 0.4 to 1.0

kubectl get svc -n capa-system
NAME                                      TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)    AGE
capa-controller-manager-metrics-service   ClusterIP   44.145.17.252   <none>        8443/TCP   23h
capa-webhook-service                      ClusterIP   44.145.89.35    <none>        443/TCP    23h

I believe the error is a direct result of the Pods for this Service not being available, the timeout occurs because no Pod exists to serve the request made to capa-webhook-service. It is not clear to me if these pods should be created as part of the upgrade process for 0.4 -> 1.0, or if they should have existed after upgrading from 0.3 -> 0.4.

kubectl get pod -n capa-system
No resources found in capa-system namespace.

@wmgroot
Copy link
Author

wmgroot commented Nov 16, 2021

Just saw your note, thanks for the update.

@sedefsavas sedefsavas modified the milestones: v1.1.0, v1.2.0 Nov 19, 2021
@sedefsavas
Copy link
Contributor

Closing as this issue should be fixed with kubernetes-sigs/cluster-api#5684
We have v1alpha3 --> v1beta1 e2e test passing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

No branches or pull requests

3 participants