Add optional v1alpha4 and self-hosted management cluster tests #2833

randomvariable · 2021-10-06T20:09:07Z

Signed-off-by: Naadir Jeewa jeewan@vmware.com

What type of PR is this?

/kind failing-test
/area testing

What this PR does / why we need it:

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Part of #2788
Part of #2793

Special notes for your reviewer:

Checklist:

squashed commits
includes documentation
adds unit tests
adds or updates e2e tests

Release note:

NONE

randomvariable · 2021-10-06T20:09:54Z

/test ?

k8s-ci-robot · 2021-10-06T20:09:55Z

@randomvariable: The following commands are available to trigger required jobs:

/test pull-cluster-api-provider-aws-build
/test pull-cluster-api-provider-aws-test
/test pull-cluster-api-provider-aws-verify

The following commands are available to trigger optional jobs:

/test pull-cluster-api-provider-aws-e2e
/test pull-cluster-api-provider-aws-e2e-blocking
/test pull-cluster-api-provider-aws-e2e-conformance
/test pull-cluster-api-provider-aws-e2e-conformance-with-ci-artifacts
/test pull-cluster-api-provider-aws-e2e-eks

Use /test all to run the following jobs that were automatically triggered:

pull-cluster-api-provider-aws-build
pull-cluster-api-provider-aws-test
pull-cluster-api-provider-aws-verify

In response to this:

/test ?

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

randomvariable · 2021-10-06T20:10:03Z

/test pull-cluster-api-provider-aws-e2e

randomvariable · 2021-10-06T20:10:34Z

/milestone v1.0.0
/triage accepted
/priority critical-urgent

randomvariable · 2021-10-06T21:19:28Z

/test pull-cluster-api-provider-aws-e2e

randomvariable · 2021-10-07T12:06:36Z

/test pull-cluster-api-provider-aws-e2e

randomvariable · 2021-10-07T12:07:42Z

/test pull-cluster-api-provider-aws-e2e

randomvariable · 2021-10-07T12:08:02Z

/test pull-cluster-api-provider-aws-e2e

sedefsavas · 2021-10-08T06:52:25Z

I am able to replicate locally. After the upgrade, all deployment machines come up but MachineDeployment is not being reconciled hence never detects scaling up is successful.

randomvariable · 2021-10-08T06:58:05Z

I am able to replicate locally. After the upgrade, all deployment machines come up but MachineDeployment is not being reconciled hence never detects scaling up is successful.

Is that a separate issue to the one that keeps showing up for all the ones using a remote management cluster?

"reason": "ProgressDeadlineExceeded",
[1] "message": "ReplicaSet "capa-controller-manager-7bd56cd58" has timed out progressing."

sedefsavas · 2021-10-08T07:07:11Z

They are the same, that deadline is exceeded because CAPI controllers don't reconcile anything.
KCP show only one error:
KubeadmControlPlane" "cluster"="clusterctl-upgrade-vexwe7" "name"="clusterctl-upgrade-vexwe7-control-plane" "namespace"="clusterctl-upgrade" "reconciler group"="controlplane.cluster.x-k8s.io" "reconciler kind"="KubeadmControlPlane" E1008 06:28:50.096627 1 reflector.go:138] pkg/mod/k8s.io/client-go@v0.22.2/tools/cache/reflector.go:167: Failed to watch *v1.PartialObjectMetadata: unknown

And CAPI logs do not show any error but also it doesn't reconcile when I make changes to MachineDeployment.

I will check with Stefan/Fabrizio when they are online.

[UPDATE]: Failures I saw was probably due to using an unsupported version of K8s.

randomvariable · 2021-10-08T07:16:55Z

If you trace the lines back, then it's waiting for the deployment of CAPA controllers to succeed with the appv1.Deployment of the controller coming up. Following the trace to

[1]     /home/prow/go/pkg/mod/sigs.k8s.io/cluster-api/test@v1.0.0/framework/deployment_helpers.go:72
[1] 
[1]     Full Stack Trace
[1]     sigs.k8s.io/cluster-api/test/framework.WaitForDeploymentsAvailable(0x3096840, 0xc000054140, 0x7fe5f0056d40, 0xc0006e40e0, 0xc0003b1400, 0xc001215d80, 0x2, 0x2)
[1]     	/home/prow/go/pkg/mod/sigs.k8s.io/cluster-api/test@v1.0.0/framework/deployment_helpers.go:72 +0x2cf
[1]     sigs.k8s.io/cluster-api/test/framework/clusterctl.UpgradeManagementClusterAndWait(0x3096840, 0xc000054140, 0x30b47a0, 0xc0011220c0, 0xc000e43800, 0x31, 0x2c27eda, 0x7, 0xc000658ec0, 0x32, ...)
[1]     	/home/prow/go/pkg/mod/sigs.k8s.io/cluster-api/test@v1.0.0/framework/clusterctl/clusterctl_helpers.go:155 +0x8ab
[1]     sigs.k8s.io/cluster-api/test/e2e.ClusterctlUpgradeSpec.func2()

gets https://github.com/kubernetes-sigs/cluster-api/blob/main/test/framework/clusterctl/clusterctl_helpers.go#L153-L166
and https://github.com/kubernetes-sigs/cluster-api/blob/main/test/framework/deployment_helpers.go#L52-L73

randomvariable · 2021-10-08T07:17:15Z

Just ran in us-west-2, and it's working fine locally. :(

randomvariable · 2021-10-08T07:24:37Z

One thing to try might be to see if the e2e image upload is actually working. Will need to to look at the logs mid run, but there will be a log line along the lines of:

[1] STEP: Uploading the e2e image to s3:///capi-images.us-west-2.921092657406.oci-images/sha256:545ee55243f44f063048ea53fc99e13fc487bcd88c9b794fcf0d3e72d2b36fde
[1] STEP: Making the image public
[1] STEP: Searching for AMI: name=capa-ami-ubuntu-18.04-1.21.2*

at that point , run

aws s3 cp  s3://capi-images.us-west-2.921092657406.oci-images/sha256:545ee55243f44f063048ea53fc99e13fc487bcd88c9b794fcf0d3e72d2b36fde local.tar

Note that you'll need to change s3:/// to s3:// due to a typo in my log line.

sedefsavas · 2021-10-08T07:33:53Z

Thanks, will check out that.

/test pull-cluster-api-provider-aws-e2e

randomvariable · 2021-10-08T07:52:12Z

Yeah, the bucket is empty during the test run. I think something is deleting the S3 image. Maybe it's a lambda or some security thing deleting any object made publicly readable.

Will probably need to create an S3 bucket policy that allows members of the AWS account to read, but not set publicly readable. Might have to check if this is what is happening with k8s infra / sig testing folk.

sedefsavas · 2021-10-08T07:53:24Z

It is now in mid run and looks like the image is missing:
fatal error: An error occurred (404) when calling the HeadObject operation: Key "sha256:2395e936830e46b0138729b29c253f97404d382edecf4e098d268f0c7694b144" does not exist

randomvariable · 2021-10-08T07:55:15Z

I think I'm confident to say that the upgrades are working based on local testing, and would suggest reverting the timeout changes and the skip, and then changing Describe to PDescribe for the v1alpha3 upgrade test (rather than comment out), merge in and do a release.

Thoughts @richardcase ?. PS EKS upgrade test currently isn't possible because of #2828 dependencies.

sedefsavas · 2021-10-08T08:25:46Z

/test pull-cluster-api-provider-aws-e2e
Last one time before making the release.

richardcase · 2021-10-08T12:39:30Z

Yay @sedefsavas the e2e passed.

sedefsavas · 2021-10-08T12:45:40Z

/lgtm
/approve

k8s-ci-robot · 2021-10-08T12:45:51Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: sedefsavas

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [sedefsavas]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

richardcase · 2021-10-08T12:46:13Z

I think I'm confident to say that the upgrades are working based on local testing, and would suggest reverting the timeout changes and the skip, and then changing Describe to PDescribe for the v1alpha3 upgrade test (rather than comment out), merge in and do a release.

Sounds good to me, especially as the e2e is passing. The timeout changes remain changed but all good:

/lgtm

sedefsavas · 2021-10-08T12:51:25Z

@richardcase These tests were already passing. The issue is with the upgrade test which I disabled in this PR.
I also confirm that it passes locally. It is not passing here because of the reason randomvariable found: the e2e image pushed to s3 is getting deleted in the prow.

sedefsavas · 2021-10-08T13:29:15Z

/hold cancel

k8s-ci-robot requested review from sedefsavas and shivi28 October 6, 2021 20:09

k8s-ci-robot added the size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. label Oct 6, 2021

k8s-ci-robot added this to the v1.0.0 milestone Oct 6, 2021

k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Oct 6, 2021

randomvariable mentioned this pull request Oct 7, 2021

When leader election is disabled controllers fail to start conversion webhooks when upgrading from v1alpha3 to v1beta1 #2834

Closed

randomvariable force-pushed the e2e-upgrade-test branch from 3cc609c to fc08e02 Compare October 7, 2021 12:07

randomvariable force-pushed the e2e-upgrade-test branch from fc08e02 to 795baa8 Compare October 7, 2021 12:58

k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Oct 7, 2021

randomvariable changed the title ~~Enable v1alpha3 upgrade e2e test and add optional v1alpha4 and self-hosted management cluster tests~~ Add optional v1alpha4 and self-hosted management cluster tests Oct 8, 2021

sedefsavas added 2 commits October 8, 2021 01:05

Minor log fix

f3aa975

Disable upgrade from v1alpha3 to v1beta1 test

cc00bcc

sedefsavas force-pushed the e2e-upgrade-test branch from 8131c39 to cc00bcc Compare October 8, 2021 08:09

k8s-ci-robot assigned sedefsavas Oct 8, 2021

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 8, 2021

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 8, 2021

k8s-ci-robot assigned richardcase Oct 8, 2021

k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 8, 2021

k8s-ci-robot merged commit 82f3c98 into kubernetes-sigs:main Oct 8, 2021

k8s-ci-robot modified the milestones: v1.0.0, v1.x Oct 8, 2021

sedefsavas mentioned this pull request Nov 15, 2021

Enable v1alpha3/v1alpha4 to v1beta1 upgrade test #2950

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add optional v1alpha4 and self-hosted management cluster tests #2833

Add optional v1alpha4 and self-hosted management cluster tests #2833

randomvariable commented Oct 6, 2021 •

edited

randomvariable commented Oct 6, 2021

k8s-ci-robot commented Oct 6, 2021

randomvariable commented Oct 6, 2021

randomvariable commented Oct 6, 2021

randomvariable commented Oct 6, 2021

randomvariable commented Oct 7, 2021

randomvariable commented Oct 7, 2021

randomvariable commented Oct 7, 2021

sedefsavas commented Oct 8, 2021

randomvariable commented Oct 8, 2021 •

edited

sedefsavas commented Oct 8, 2021 •

edited

randomvariable commented Oct 8, 2021 •

edited

randomvariable commented Oct 8, 2021

randomvariable commented Oct 8, 2021

sedefsavas commented Oct 8, 2021

randomvariable commented Oct 8, 2021

sedefsavas commented Oct 8, 2021 •

edited

randomvariable commented Oct 8, 2021

sedefsavas commented Oct 8, 2021

richardcase commented Oct 8, 2021

sedefsavas commented Oct 8, 2021

k8s-ci-robot commented Oct 8, 2021

richardcase commented Oct 8, 2021

sedefsavas commented Oct 8, 2021 •

edited

sedefsavas commented Oct 8, 2021

Add optional v1alpha4 and self-hosted management cluster tests #2833

Add optional v1alpha4 and self-hosted management cluster tests #2833

Conversation

randomvariable commented Oct 6, 2021 • edited

randomvariable commented Oct 6, 2021

k8s-ci-robot commented Oct 6, 2021

randomvariable commented Oct 6, 2021

randomvariable commented Oct 6, 2021

randomvariable commented Oct 6, 2021

randomvariable commented Oct 7, 2021

randomvariable commented Oct 7, 2021

randomvariable commented Oct 7, 2021

sedefsavas commented Oct 8, 2021

randomvariable commented Oct 8, 2021 • edited

sedefsavas commented Oct 8, 2021 • edited

randomvariable commented Oct 8, 2021 • edited

randomvariable commented Oct 8, 2021

randomvariable commented Oct 8, 2021

sedefsavas commented Oct 8, 2021

randomvariable commented Oct 8, 2021

sedefsavas commented Oct 8, 2021 • edited

randomvariable commented Oct 8, 2021

sedefsavas commented Oct 8, 2021

richardcase commented Oct 8, 2021

sedefsavas commented Oct 8, 2021

k8s-ci-robot commented Oct 8, 2021

richardcase commented Oct 8, 2021

sedefsavas commented Oct 8, 2021 • edited

sedefsavas commented Oct 8, 2021

randomvariable commented Oct 6, 2021 •

edited

randomvariable commented Oct 8, 2021 •

edited

sedefsavas commented Oct 8, 2021 •

edited

randomvariable commented Oct 8, 2021 •

edited

sedefsavas commented Oct 8, 2021 •

edited

sedefsavas commented Oct 8, 2021 •

edited