[WIP] Use E2E test framework #69

fabriziopandini · 2022-12-23T13:16:46Z

What this PR does / why we need it:
While there is already something under hack tests, it would be nice to have a setup that is similar to what we have in CAPI, so we can easily run and debug E2E tests from IDEs
Also, if we move those tests in CAPI, this will allow any provider to quickly setup an autoscaler test

Which issue this PR fixes:
fixes #67

Special notes for your reviewer:

Release notes:

Starting using the CAPI E2E test framework

k8s-ci-robot · 2022-12-23T13:16:54Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: fabriziopandini

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [fabriziopandini]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

test/e2e/autoscaler_test.go

Makefile

elmiko · 2022-12-23T15:14:13Z

this is fantastic @fabriziopandini !

i look forward to taking a deeper review in the new year

fabriziopandini · 2023-01-22T20:44:47Z

made some progress, still figuring out why the autoscaler test does not work

elmiko · 2023-01-23T14:04:32Z

thanks Fabrizio, i am planning to try out these tests this week

elmiko

i think the main issue is just getting the autoscaler configured for the capi clusters properly, the manifests and code looked mostly good to me. i couldn't quite get it working locally, hopefully we can sync up soon.

test/e2e/data/autoscaler/autoscaler.yaml

elmiko · 2023-01-27T20:49:45Z

test/e2e/autoscaler_test.go

+							Image: "busybox",
+							Resources: corev1.ResourceRequirements{
+								Requests: map[corev1.ResourceName]resource.Quantity{
+									corev1.ResourceMemory: resource.MustParse("2G"), // TODO: consider if to make this configurable


this is probably fine for the default tests, we just want to synchronize the requests for the workload with the capacity for the kubemark nodes. by default they are 4GB ram, 2 core cpu. this deployment should be able to fit a single replica per node.

fabriziopandini · 2023-02-05T15:38:32Z

@elmiko when you have some time PTAL
I have changed the code to deploy the autoscaler in the workload cluster, and then did some tricks to get it working with the docker/kind networking.

The autoscaler seems to start, but when I create the test deployment requiring additional capacity it panics...

root@autoscaler-gjn4jz-control-plane-wgdfg:/# k logs --namespace cluster-autoscaler-system   cluster-autoscaler-57b558b4db-lzvtn
I0205 15:28:47.759694       1 leaderelection.go:248] attempting to acquire leader lease kube-system/cluster-autoscaler...
I0205 15:28:47.763337       1 leaderelection.go:258] successfully acquired lease kube-system/cluster-autoscaler
W0205 15:28:47.775334       1 client_config.go:618] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
I0205 15:28:47.788731       1 clusterapi_controller.go:351] Using version "v1beta1" for API group "cluster.x-k8s.io"
I0205 15:28:47.789978       1 clusterapi_controller.go:429] Resource "machinedeployments" available
I0205 15:28:47.891517       1 node_instances_cache.go:156] Start refreshing cloud provider node instances cache
I0205 15:28:47.911260       1 node_instances_cache.go:168] Refresh cloud provider node instances cache finished, refresh took 19.718917ms
W0205 15:28:57.924666       1 clusterstate.go:428] AcceptableRanges have not been populated yet. Skip checking
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x28 pc=0x3468e60]

goroutine 97 [running]:
k8s.io/autoscaler/cluster-autoscaler/simulator/scheduling.(*HintingSimulator).findNode(0x400033fc68, 0x5195560?, {0x5195560, 0x4000c160b0}, 0x4000be4000, 0x57?)
	/gopath/src/k8s.io/autoscaler/cluster-autoscaler/simulator/scheduling/hinting_simulator.go:114 +0x140
k8s.io/autoscaler/cluster-autoscaler/simulator/scheduling.(*HintingSimulator).TrySchedulePods(0x380db20?, {0x5195560, 0x4000c160b0}, {0x4000b2dc60, 0x4, 0x34f33f0?}, 0x4000523478?, 0x0)
	/gopath/src/k8s.io/autoscaler/cluster-autoscaler/simulator/scheduling/hinting_simulator.go:70 +0x224
k8s.io/autoscaler/cluster-autoscaler/core/podlistprocessor.(*filterOutSchedulablePodListProcessor).filterOutSchedulableByPacking(0x4000c160c0, {0x4000b2dc60, 0x4, 0x4}, {0x5195560, 0x4000c160b0})
	/gopath/src/k8s.io/autoscaler/cluster-autoscaler/core/podlistprocessor/filter_out_schedulable.go:101 +0xb4
k8s.io/autoscaler/cluster-autoscaler/core/podlistprocessor.(*filterOutSchedulablePodListProcessor).Process(0x0?, 0x4000754c00, {0x4000b2dc60?, 0x4, 0x4})
	/gopath/src/k8s.io/autoscaler/cluster-autoscaler/core/podlistprocessor/filter_out_schedulable.go:66 +0x8c
k8s.io/autoscaler/cluster-autoscaler/core/podlistprocessor.(*defaultPodListProcessor).Process(0x4000477230, 0x4000dfd1a0?, {0x4000b2dc60?, 0x2?, 0x2?})
	/gopath/src/k8s.io/autoscaler/cluster-autoscaler/core/podlistprocessor/pod_list_processor.go:45 +0x6c
k8s.io/autoscaler/cluster-autoscaler/core.(*StaticAutoscaler).RunOnce(0x4000b31360, {0x4?, 0x0?, 0x7f268a0?})
	/gopath/src/k8s.io/autoscaler/cluster-autoscaler/core/static_autoscaler.go:477 +0x16a4
main.run(0x4000ab8600?, {0x518d098, 0x4000960c00})
	/gopath/src/k8s.io/autoscaler/cluster-autoscaler/main.go:442 +0x268
main.main.func2({0x0?, 0x0?})
	/gopath/src/k8s.io/autoscaler/cluster-autoscaler/main.go:529 +0x28
created by k8s.io/client-go/tools/leaderelection.(*LeaderElector).Run
	/gopath/src/k8s.io/autoscaler/cluster-autoscaler/vendor/k8s.io/client-go/tools/leaderelection/leaderelection.go:211 +0x114

I'm pretty sure I'm doing something wrong in one of two yaml I'm using to install the autoscaler but I can't catch it..
"autoscaler-to-workload-management.yaml" is the yaml creating the service account in the management cluster
"autoscaler-to-workload-workload.yaml", is the yaml deploying the autoscaler in the workload cluster

elmiko · 2023-02-10T20:11:47Z

hey @fabriziopandini , i haven't had a chance to follow up here, hoping to take a look next week.

elmiko · 2023-02-18T01:11:32Z

the configurations look ok to me, i'm not quite sure what is happening here.

this line

I0205 15:28:47.789978       1 clusterapi_controller.go:429] Resource "machinedeployments" available

makes me think that the connection to the management cluster is working fine. it found the crd and is using it.

i'm double checking the rbac we use in openshift, there might be some differences.

fabriziopandini · 2023-03-10T11:21:53Z

@elmiko I found the problem.
The panic was an actual error in the autoscaler, which has been fixed in a patch release.
I'm moving all the helpers in CAPI, so we can then pick up CAPI 1.5 and clean up this PR

elmiko · 2023-03-10T13:21:47Z

oh wow, good find. thanks for the update @fabriziopandini . tag me on the pr you make in capi, happy to help review.

k8s-triage-robot · 2023-06-08T13:35:54Z

The Kubernetes project currently lacks enough contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

Mark this PR as fresh with /remove-lifecycle stale
Close this PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-ci-robot · 2023-06-08T13:36:02Z

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

elmiko · 2023-06-08T14:30:42Z

i am still very curious to see this come to fruition, i have some free hack time coming up, i might try to see if i can continue with the effort. with your blessing @fabriziopandini 🙏

/remove-lifecycle stale

fabriziopandini · 2023-06-12T11:18:04Z

FYI the code we prototyped in this PR is now merged in CAPI test framework, so the first step will be to pick up the latest CAPI release, and then start using tests from CAPI (hopefully the code of this PR will be hugely simplified)

elmiko · 2023-06-12T12:26:04Z

ack, thanks @fabriziopandini , i have a pr up to change the cluster-api dep to version 1.4.3 (#86), is that sufficient or will we need the next release?

fabriziopandini · 2023-06-12T20:26:15Z

Those changes are on main, so we can wait for the first 1.5.0 beta around EOM, pick a commit or ask the release team to cut an alpha tag

elmiko · 2023-06-13T08:40:21Z

no need to cut an alpha, i don't have a ton of time to hack on kubemark currently, but am trying to cleanup what i can and learn a little more.

k8s-triage-robot · 2024-01-20T00:04:02Z

The Kubernetes project currently lacks enough contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

Mark this PR as fresh with /remove-lifecycle stale
Close this PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot · 2024-02-19T00:10:16Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

Mark this PR as fresh with /remove-lifecycle rotten
Close this PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot · 2024-03-20T00:59:04Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

Reopen this PR with /reopen
Mark this PR as fresh with /remove-lifecycle rotten
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

k8s-ci-robot · 2024-03-20T00:59:08Z

@k8s-triage-robot: Closed this PR.

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied

After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied

After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

Reopen this PR with /reopen

Mark this PR as fresh with /remove-lifecycle rotten

Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Dec 23, 2022

k8s-ci-robot requested review from cchengleo and detiber December 23, 2022 13:16

k8s-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Dec 23, 2022

fabriziopandini commented Dec 23, 2022

View reviewed changes

test/e2e/autoscaler_test.go Show resolved Hide resolved

fabriziopandini commented Dec 23, 2022

View reviewed changes

Makefile Outdated Show resolved Hide resolved

k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Dec 23, 2022

fabriziopandini force-pushed the use-capi-test-framework branch from cf4a67a to 2486dfb Compare January 8, 2023 10:59

k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 8, 2023

This was referenced Jan 9, 2023

Split Cluster Autoscaler codebase kubernetes/autoscaler#5394

Open

spike: e2e smoke test over a CAPI ephemeral cluster kubernetes/autoscaler#2942

Closed

fabriziopandini mentioned this pull request Jan 14, 2023

Refactor KubemarkCluster into ComputeCluster #66

Closed

use CAPI test framework for testing

4ce67c8

fabriziopandini force-pushed the use-capi-test-framework branch from 2486dfb to 4ce67c8 Compare January 22, 2023 20:44

This was referenced Jan 22, 2023

🌱 Improve tilt-prepare for testing providers outside of the CAPI repository kubernetes-sigs/cluster-api#7968

Merged

🌱 Improve quick-start test supporting scenarios with more than one infrastructure provider kubernetes-sigs/cluster-api#7969

Merged

elmiko reviewed Jan 27, 2023

View reviewed changes

install autoscaler into the workload clusters

90f1388

elmiko mentioned this pull request Mar 14, 2023

Add e2e test to cover basic autoscaler scenarios kubernetes-sigs/cluster-api#8282

Closed

k8s-ci-robot added lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Jun 8, 2023

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 8, 2023

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 20, 2024

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Feb 19, 2024

k8s-ci-robot closed this Mar 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Use E2E test framework #69

[WIP] Use E2E test framework #69

fabriziopandini commented Dec 23, 2022

k8s-ci-robot commented Dec 23, 2022

elmiko commented Dec 23, 2022

fabriziopandini commented Jan 22, 2023

elmiko commented Jan 23, 2023

elmiko left a comment

elmiko Jan 27, 2023

fabriziopandini commented Feb 5, 2023

elmiko commented Feb 10, 2023

elmiko commented Feb 18, 2023

fabriziopandini commented Mar 10, 2023 •

edited

elmiko commented Mar 10, 2023

k8s-triage-robot commented Jun 8, 2023

k8s-ci-robot commented Jun 8, 2023

elmiko commented Jun 8, 2023

fabriziopandini commented Jun 12, 2023

elmiko commented Jun 12, 2023

fabriziopandini commented Jun 12, 2023

elmiko commented Jun 13, 2023

k8s-triage-robot commented Jan 20, 2024

k8s-triage-robot commented Feb 19, 2024

k8s-triage-robot commented Mar 20, 2024

k8s-ci-robot commented Mar 20, 2024

[WIP] Use E2E test framework #69

[WIP] Use E2E test framework #69

Conversation

fabriziopandini commented Dec 23, 2022

k8s-ci-robot commented Dec 23, 2022

elmiko commented Dec 23, 2022

fabriziopandini commented Jan 22, 2023

elmiko commented Jan 23, 2023

elmiko left a comment

Choose a reason for hiding this comment

elmiko Jan 27, 2023

Choose a reason for hiding this comment

fabriziopandini commented Feb 5, 2023

elmiko commented Feb 10, 2023

elmiko commented Feb 18, 2023

fabriziopandini commented Mar 10, 2023 • edited

elmiko commented Mar 10, 2023

k8s-triage-robot commented Jun 8, 2023

k8s-ci-robot commented Jun 8, 2023

elmiko commented Jun 8, 2023

fabriziopandini commented Jun 12, 2023

elmiko commented Jun 12, 2023

fabriziopandini commented Jun 12, 2023

elmiko commented Jun 13, 2023

k8s-triage-robot commented Jan 20, 2024

k8s-triage-robot commented Feb 19, 2024

k8s-triage-robot commented Mar 20, 2024

k8s-ci-robot commented Mar 20, 2024

fabriziopandini commented Mar 10, 2023 •

edited