Skip to content

Conversation

@sbueringer
Copy link
Member

What this PR does / why we need it:
See linked issue

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Partially implements #4966

@k8s-ci-robot k8s-ci-robot added this to the v0.3 milestone Jul 23, 2021
@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jul 23, 2021
@k8s-ci-robot k8s-ci-robot requested review from benmoss and justinsb July 23, 2021 20:05
@sbueringer sbueringer force-pushed the pr-kcp-block-mgmt-cluster-updates branch from 3d081f6 to 14ec45d Compare July 26, 2021 15:04
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jul 26, 2021
@sbueringer sbueringer force-pushed the pr-kcp-block-mgmt-cluster-updates branch from 14ec45d to 0ea76df Compare July 26, 2021 15:09
@sbueringer
Copy link
Member Author

@vincepri @fabriziopandini PTAL :)

@sbueringer
Copy link
Member Author

/test pull-cluster-api-test-release-0-3
(flake afaik)

@sbueringer
Copy link
Member Author

/retest
(I assume it's flaky as I tested the PR locally)

@sbueringer
Copy link
Member Author

Okay nope, moving it up one layer broke the tests. I'll have to investigate.

@sbueringer sbueringer changed the title ✨ v0.3.x: KCP: block mgmt cluster updates to v1.22.0 [WIP] ✨ v0.3.x: KCP: block mgmt cluster updates to v1.22.0 Jul 26, 2021
@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jul 26, 2021
@sbueringer
Copy link
Member Author

sbueringer commented Jul 26, 2021

My local env on the old branch with kubebuilder does not really work (even with make test, I assume the problem is that Mac Security is blocking a lot of stuff). So I'll probably need a few tries to get this fixed...

@sbueringer
Copy link
Member Author

/test pull-cluster-api-e2e-full-release-0-3

@sbueringer sbueringer changed the title [WIP] ✨ v0.3.x: KCP: block mgmt cluster updates to v1.22.0 ✨ v0.3.x: KCP: block mgmt cluster updates to v1.22.0 Jul 26, 2021
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jul 26, 2021
@sbueringer
Copy link
Member Author

sbueringer commented Jul 26, 2021

@fabriziopandini Should be ready for review now. I moved the func in the WorkloadCluster interface as it's just a lot easier to test that way. I'm not sure what I would have to refactor to get the regular reconcile unit tests working again when I add it in one of the reconcile funcs.

As I'm not that familiar with the release-0.3 e2e tests. Is the test coverage good enough or do you think I should test something manually?

/test pull-cluster-api-e2e-full-release-0-3

Copy link
Member

@fabriziopandini fabriziopandini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

E2E could provide a signal that the change does not introduces regression, but it would be great if you could do a manual test with 1.20, then upgrade to 1.21 (everything should work), then upgrade to 1.22 (it should block)

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 28, 2021
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 28, 2021
@sbueringer
Copy link
Member Author

/retest

looks like flakes to me

@vincepri Thx for the review :). Findings should be fixed, ptal.

@sbueringer
Copy link
Member Author

updated the list of uncached types (although I'm not entirely sure it includes all necessary types). I brute-force the flakes successful after reviews :) (unit tests were successful locally)

@vincepri
Copy link
Member

/retest

return nil, &RemoteClusterConnectionError{Name: clusterKey.String(), Err: err}
}

newClient := util.DelegatingClientFuncWithUncached(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sbueringer Sorry for the confusion, could we actually try to just use the plain client.New without the delegating client and retrieve with PartialObjectMetadata directly? It should work considering that it has support for it https://github.com/kubernetes-sigs/controller-runtime/blob/release-0.5/pkg/client/client.go#L111

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, yeah will try. At least I'm learning a lot :)

Copy link
Member Author

@sbueringer sbueringer Jul 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works, tested it locally with all relevant code branches of IsKubernetesVersionSupported and it uses the metaclient as expected.

@sbueringer
Copy link
Member Author

/test pull-cluster-api-e2e-full-release-0-3

@sbueringer
Copy link
Member Author

sbueringer commented Jul 29, 2021

/test pull-cluster-api-test-release-0.3

@fabriziopandini Tested it locally and it worked. There were a few interesting pitfalls (cgroup-driver and local dev image). Thus, for reference:

Prepare KCP image

export REGISTRY=gcr.io/k8s-staging-cluster-api
export TAG=dev
export ARCH=amd64
export PULL_POLICY=IfNotPresent
make docker-build

Setup Management cluster

kind create cluster --config /tmp/kind-cluster-with-extramounts.yaml
clusterctl-v0.3.22 init --infrastructure docker
# use KCP dev image
kind load docker-image "gcr.io/k8s-staging-cluster-api/kubeadm-control-plane-controller-amd64:dev"
kind get kubeconfig --name=$(kind get clusters | grep kind) | k8s-ctx-import; kctx kind-$(kind get clusters | grep kind)
kubectl patch deployment -n capi-kubeadm-control-plane-system capi-kubeadm-control-plane-controller-manager  --type json   -p='[{"op": "replace", "path": "/spec/template/spec/containers/1/image", value: "gcr.io/k8s-staging-cluster-api/kubeadm-control-plane-controller-amd64:dev"}]'

Install v1.20.7 Workload Cluster

clusterctl-v0.3.22 config cluster capi-quickstart --kubernetes-version v1.20.7 --control-plane-machine-count=1 --worker-machine-count=1 --flavor=development > capi-quickstart.yaml
# Edit capi-quickstart.yaml to add `cgroup-driver: cgroupfs` kubeletExtraArgs
kubectl -n default apply -f ./capi-quickstart.yaml
kubectl -n default get secret capi-quickstart-kubeconfig  -o json | jq '.data.value' -r | base64 -d > /tmp/kubeconfig
sed -i -e "s/certificate-authority-data:.*/insecure-skip-tls-verify: true/g" /tmp/kubeconfig
sed -i -e "s/server:.*/server: https:\/\/$(docker port capi-quickstart-lb 6443/tcp | sed "s/0.0.0.0/127.0.0.1/")/g"  /tmp/kubeconfig
kubectl --kubeconfig=/tmp/kubeconfig apply -f https://docs.projectcalico.org/v3.15/manifests/calico.yaml

Pivot workload cluster to self-hosted management cluster

clusterctl-v0.3.22 init --infrastructure docker --kubeconfig=/tmp/kubeconfig
# use KCP dev image
kind --name capi-quickstart load docker-image "gcr.io/k8s-staging-cluster-api/kubeadm-control-plane-controller-amd64:dev"
kubectl --kubeconfig=/tmp/kubeconfig patch deployment -n capi-kubeadm-control-plane-system capi-kubeadm-control-plane-controller-manager  --type json   -p='[{"op": "replace", "path": "/spec/template/spec/containers/1/image", value: "gcr.io/k8s-staging-cluster-api/kubeadm-control-plane-controller-amd64:dev"}]'
clusterctl-v0.3.22 move --to-kubeconfig=/tmp/kubeconfig

Update to v1.21.2

kubectl --kubeconfig=/tmp/kubeconfig patch kcp capi-quickstart-control-plane --type json -p='[{"op": "replace", "path": "/spec/version", value: "v1.21.2"}]'
# load KCP dev image on the new node
kind --name capi-quickstart load docker-image "gcr.io/k8s-staging-cluster-api/kubeadm-control-plane-controller-amd64:dev"
kubectl --kubeconfig=/tmp/kubeconfig patch md capi-quickstart-md-0 --type json -p='[{"op": "replace", "path": "/spec/template/spec/version", value: "v1.21.2"}]'
# load KCP dev image on the new node
kind --name capi-quickstart load docker-image "gcr.io/k8s-staging-cluster-api/kubeadm-control-plane-controller-amd64:dev"

Update to v1.22.0

kubectl --kubeconfig=/tmp/kubeconfig patch kcp capi-quickstart-control-plane --type json -p='[{"op": "replace", "path": "/spec/version", value: "v1.22.0"}]'

Update to 1.21 worked and the last step resulted in the following log in KCP:

I0729 13:25:34.603463       1 upgrade.go:59]  "msg"="Kubernetes version \"v1.22.0\" is not supported for management clusters" "cluster-name"="capi-quickstart" "name"="capi-quickstart-control-plane" "namespace"="default"

@k8s-ci-robot
Copy link
Contributor

@sbueringer: The specified target(s) for /test were not found.
The following commands are available to trigger required jobs:

  • /test pull-cluster-api-build-release-0-3
  • /test pull-cluster-api-make-release-0-3
  • /test pull-cluster-api-verify-release-0-3
  • /test pull-cluster-api-test-release-0-3
  • /test pull-cluster-api-e2e-release-0-3

The following commands are available to trigger optional jobs:

  • /test pull-cluster-api-apidiff-release-0-3
  • /test pull-cluster-api-e2e-full-release-0-3

Use /test all to run the following jobs that were automatically triggered:

  • pull-cluster-api-build-release-0-3
  • pull-cluster-api-make-release-0-3
  • pull-cluster-api-apidiff-release-0-3
  • pull-cluster-api-verify-release-0-3
  • pull-cluster-api-test-release-0-3
  • pull-cluster-api-e2e-release-0-3

In response to this:

/test pull-cluster-api-test-release-0.3

@fabriziopandini Tested it locally and it worked. There were a few interesting pitfalls (cgroup-driver and local dev image). Thus, for reference:

Prepare KCP image

export REGISTRY=gcr.io/k8s-staging-cluster-api
export TAG=dev
export ARCH=amd64
export PULL_POLICY=IfNotPresent
make docker-build

Setup Management cluster

kind create cluster --config /tmp/kind-cluster-with-extramounts.yaml
clusterctl-v0.3.22 init --infrastructure docker
# use KCP dev image
kind load docker-image "gcr.io/k8s-staging-cluster-api/kubeadm-control-plane-controller-amd64:dev"
kind get kubeconfig --name=$(kind get clusters | grep kind) | k8s-ctx-import; kctx kind-$(kind get clusters | grep kind)
kubectl patch deployment -n capi-kubeadm-control-plane-system capi-kubeadm-control-plane-controller-manager  --type json   -p='[{"op": "replace", "path": "/spec/template/spec/containers/1/image", value: "gcr.io/k8s-staging-cluster-api/kubeadm-control-plane-controller-amd64:dev"}]'

Install v1.20.7 Workload Cluster

clusterctl-v0.3.22 config cluster capi-quickstart --kubernetes-version v1.20.7 --control-plane-machine-count=1 --worker-machine-count=1 --flavor=development > capi-quickstart.yaml
# Edit capi-quickstart.yaml to add `cgroup-driver: cgroupfs` kubeletExtraArgs
kubectl -n default apply -f ./capi-quickstart.yaml
kubectl -n default get secret capi-quickstart-kubeconfig  -o json | jq '.data.value' -r | base64 -d > /tmp/kubeconfig
sed -i -e "s/certificate-authority-data:.*/insecure-skip-tls-verify: true/g" /tmp/kubeconfig
sed -i -e "s/server:.*/server: https:\/\/$(docker port capi-quickstart-lb 6443/tcp | sed "s/0.0.0.0/127.0.0.1/")/g"  /tmp/kubeconfig
kubectl --kubeconfig=/tmp/kubeconfig apply -f https://docs.projectcalico.org/v3.15/manifests/calico.yaml

Pivot workload cluster to self-hosted management cluster

clusterctl-v0.3.22 init --infrastructure docker --kubeconfig=/tmp/kubeconfig
# use KCP dev image
kind --name capi-quickstart load docker-image "gcr.io/k8s-staging-cluster-api/kubeadm-control-plane-controller-amd64:dev"
kubectl --kubeconfig=/tmp/kubeconfig patch deployment -n capi-kubeadm-control-plane-system capi-kubeadm-control-plane-controller-manager  --type json   -p='[{"op": "replace", "path": "/spec/template/spec/containers/1/image", value: "gcr.io/k8s-staging-cluster-api/kubeadm-control-plane-controller-amd64:dev"}]'
clusterctl-v0.3.22 move --to-kubeconfig=/tmp/kubeconfig

Update to v1.21.2

kubectl --kubeconfig=/tmp/kubeconfig patch kcp capi-quickstart-control-plane --type json -p='[{"op": "replace", "path": "/spec/version", value: "v1.21.2"}]'
# load KCP dev image on the new node
kind --name capi-quickstart load docker-image "gcr.io/k8s-staging-cluster-api/kubeadm-control-plane-controller-amd64:dev"
kubectl --kubeconfig=/tmp/kubeconfig patch md capi-quickstart-md-0 --type json -p='[{"op": "replace", "path": "/spec/template/spec/version", value: "v1.21.2"}]'
# load KCP dev image on the new node
kind --name capi-quickstart load docker-image "gcr.io/k8s-staging-cluster-api/kubeadm-control-plane-controller-amd64:dev"

Update to v1.22.0

kubectl --kubeconfig=/tmp/kubeconfig patch kcp capi-quickstart-control-plane --type json -p='[{"op": "replace", "path": "/spec/version", value: "v1.22.0"}]'

The last step resulted in the following log in KCP:

I0729 13:25:34.603463       1 upgrade.go:59]  "msg"="Kubernetes version \"v1.22.0\" is not supported for management clusters" "cluster-name"="capi-quickstart" "name"="capi-quickstart-control-plane" "namespace"="default"

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@sbueringer
Copy link
Member Author

/test pull-cluster-api-test-release-0-3

@sbueringer
Copy link
Member Author

/test @k8s-ci-robot
pull-cluster-api-test-release-0-3

@k8s-ci-robot
Copy link
Contributor

@sbueringer: The specified target(s) for /test were not found.
The following commands are available to trigger required jobs:

  • /test pull-cluster-api-build-release-0-3
  • /test pull-cluster-api-make-release-0-3
  • /test pull-cluster-api-verify-release-0-3
  • /test pull-cluster-api-test-release-0-3
  • /test pull-cluster-api-e2e-release-0-3

The following commands are available to trigger optional jobs:

  • /test pull-cluster-api-apidiff-release-0-3
  • /test pull-cluster-api-e2e-full-release-0-3

Use /test all to run the following jobs that were automatically triggered:

  • pull-cluster-api-build-release-0-3
  • pull-cluster-api-make-release-0-3
  • pull-cluster-api-apidiff-release-0-3
  • pull-cluster-api-verify-release-0-3
  • pull-cluster-api-test-release-0-3
  • pull-cluster-api-e2e-release-0-3

In response to this:

/test @k8s-ci-robot
pull-cluster-api-test-release-0-3

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@sbueringer
Copy link
Member Author

/test pull-cluster-api-test-release-0-3

@vincepri
Copy link
Member

@sbueringer Let's squash :)

Signed-off-by: Stefan Büringer buringerst@vmware.com
@sbueringer sbueringer force-pushed the pr-kcp-block-mgmt-cluster-updates branch from 95f8b2c to 8456ae8 Compare July 29, 2021 14:51
@sbueringer
Copy link
Member Author

@sbueringer Let's squash :)

Done :)

Copy link
Member

@vincepri vincepri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve
/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 29, 2021
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: vincepri

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 29, 2021
@k8s-ci-robot k8s-ci-robot merged commit 1c68fe9 into kubernetes-sigs:release-0.3 Jul 29, 2021
@sbueringer sbueringer deleted the pr-kcp-block-mgmt-cluster-updates branch July 29, 2021 17:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants