Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Use E2E test framework #69

Conversation

fabriziopandini
Copy link
Member

What this PR does / why we need it:
While there is already something under hack tests, it would be nice to have a setup that is similar to what we have in CAPI, so we can easily run and debug E2E tests from IDEs
Also, if we move those tests in CAPI, this will allow any provider to quickly setup an autoscaler test

Which issue this PR fixes:
fixes #67

Special notes for your reviewer:

Release notes:

Starting using the CAPI E2E test framework

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Dec 23, 2022
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: fabriziopandini

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Dec 23, 2022
Makefile Outdated Show resolved Hide resolved
@elmiko
Copy link
Contributor

elmiko commented Dec 23, 2022

this is fantastic @fabriziopandini !

i look forward to taking a deeper review in the new year

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Dec 23, 2022
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 8, 2023
@fabriziopandini
Copy link
Member Author

made some progress, still figuring out why the autoscaler test does not work

@elmiko
Copy link
Contributor

elmiko commented Jan 23, 2023

thanks Fabrizio, i am planning to try out these tests this week

Copy link
Contributor

@elmiko elmiko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think the main issue is just getting the autoscaler configured for the capi clusters properly, the manifests and code looked mostly good to me. i couldn't quite get it working locally, hopefully we can sync up soon.

test/e2e/data/autoscaler/autoscaler.yaml Outdated Show resolved Hide resolved
Image: "busybox",
Resources: corev1.ResourceRequirements{
Requests: map[corev1.ResourceName]resource.Quantity{
corev1.ResourceMemory: resource.MustParse("2G"), // TODO: consider if to make this configurable
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is probably fine for the default tests, we just want to synchronize the requests for the workload with the capacity for the kubemark nodes. by default they are 4GB ram, 2 core cpu. this deployment should be able to fit a single replica per node.

@fabriziopandini
Copy link
Member Author

@elmiko when you have some time PTAL
I have changed the code to deploy the autoscaler in the workload cluster, and then did some tricks to get it working with the docker/kind networking.

The autoscaler seems to start, but when I create the test deployment requiring additional capacity it panics...

root@autoscaler-gjn4jz-control-plane-wgdfg:/# k logs --namespace cluster-autoscaler-system   cluster-autoscaler-57b558b4db-lzvtn
I0205 15:28:47.759694       1 leaderelection.go:248] attempting to acquire leader lease kube-system/cluster-autoscaler...
I0205 15:28:47.763337       1 leaderelection.go:258] successfully acquired lease kube-system/cluster-autoscaler
W0205 15:28:47.775334       1 client_config.go:618] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
I0205 15:28:47.788731       1 clusterapi_controller.go:351] Using version "v1beta1" for API group "cluster.x-k8s.io"
I0205 15:28:47.789978       1 clusterapi_controller.go:429] Resource "machinedeployments" available
I0205 15:28:47.891517       1 node_instances_cache.go:156] Start refreshing cloud provider node instances cache
I0205 15:28:47.911260       1 node_instances_cache.go:168] Refresh cloud provider node instances cache finished, refresh took 19.718917ms
W0205 15:28:57.924666       1 clusterstate.go:428] AcceptableRanges have not been populated yet. Skip checking
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x28 pc=0x3468e60]

goroutine 97 [running]:
k8s.io/autoscaler/cluster-autoscaler/simulator/scheduling.(*HintingSimulator).findNode(0x400033fc68, 0x5195560?, {0x5195560, 0x4000c160b0}, 0x4000be4000, 0x57?)
	/gopath/src/k8s.io/autoscaler/cluster-autoscaler/simulator/scheduling/hinting_simulator.go:114 +0x140
k8s.io/autoscaler/cluster-autoscaler/simulator/scheduling.(*HintingSimulator).TrySchedulePods(0x380db20?, {0x5195560, 0x4000c160b0}, {0x4000b2dc60, 0x4, 0x34f33f0?}, 0x4000523478?, 0x0)
	/gopath/src/k8s.io/autoscaler/cluster-autoscaler/simulator/scheduling/hinting_simulator.go:70 +0x224
k8s.io/autoscaler/cluster-autoscaler/core/podlistprocessor.(*filterOutSchedulablePodListProcessor).filterOutSchedulableByPacking(0x4000c160c0, {0x4000b2dc60, 0x4, 0x4}, {0x5195560, 0x4000c160b0})
	/gopath/src/k8s.io/autoscaler/cluster-autoscaler/core/podlistprocessor/filter_out_schedulable.go:101 +0xb4
k8s.io/autoscaler/cluster-autoscaler/core/podlistprocessor.(*filterOutSchedulablePodListProcessor).Process(0x0?, 0x4000754c00, {0x4000b2dc60?, 0x4, 0x4})
	/gopath/src/k8s.io/autoscaler/cluster-autoscaler/core/podlistprocessor/filter_out_schedulable.go:66 +0x8c
k8s.io/autoscaler/cluster-autoscaler/core/podlistprocessor.(*defaultPodListProcessor).Process(0x4000477230, 0x4000dfd1a0?, {0x4000b2dc60?, 0x2?, 0x2?})
	/gopath/src/k8s.io/autoscaler/cluster-autoscaler/core/podlistprocessor/pod_list_processor.go:45 +0x6c
k8s.io/autoscaler/cluster-autoscaler/core.(*StaticAutoscaler).RunOnce(0x4000b31360, {0x4?, 0x0?, 0x7f268a0?})
	/gopath/src/k8s.io/autoscaler/cluster-autoscaler/core/static_autoscaler.go:477 +0x16a4
main.run(0x4000ab8600?, {0x518d098, 0x4000960c00})
	/gopath/src/k8s.io/autoscaler/cluster-autoscaler/main.go:442 +0x268
main.main.func2({0x0?, 0x0?})
	/gopath/src/k8s.io/autoscaler/cluster-autoscaler/main.go:529 +0x28
created by k8s.io/client-go/tools/leaderelection.(*LeaderElector).Run
	/gopath/src/k8s.io/autoscaler/cluster-autoscaler/vendor/k8s.io/client-go/tools/leaderelection/leaderelection.go:211 +0x114

I'm pretty sure I'm doing something wrong in one of two yaml I'm using to install the autoscaler but I can't catch it..
"autoscaler-to-workload-management.yaml" is the yaml creating the service account in the management cluster
"autoscaler-to-workload-workload.yaml", is the yaml deploying the autoscaler in the workload cluster

@elmiko
Copy link
Contributor

elmiko commented Feb 10, 2023

hey @fabriziopandini , i haven't had a chance to follow up here, hoping to take a look next week.

@elmiko
Copy link
Contributor

elmiko commented Feb 18, 2023

the configurations look ok to me, i'm not quite sure what is happening here.

this line

I0205 15:28:47.789978       1 clusterapi_controller.go:429] Resource "machinedeployments" available

makes me think that the connection to the management cluster is working fine. it found the crd and is using it.

i'm double checking the rbac we use in openshift, there might be some differences.

@fabriziopandini
Copy link
Member Author

fabriziopandini commented Mar 10, 2023

@elmiko I found the problem.
The panic was an actual error in the autoscaler, which has been fixed in a patch release.
I'm moving all the helpers in CAPI, so we can then pick up CAPI 1.5 and clean up this PR

@elmiko
Copy link
Contributor

elmiko commented Mar 10, 2023

oh wow, good find. thanks for the update @fabriziopandini . tag me on the pr you make in capi, happy to help review.

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

  • Mark this PR as fresh with /remove-lifecycle stale
  • Close this PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Jun 8, 2023
@k8s-ci-robot
Copy link
Contributor

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@elmiko
Copy link
Contributor

elmiko commented Jun 8, 2023

i am still very curious to see this come to fruition, i have some free hack time coming up, i might try to see if i can continue with the effort. with your blessing @fabriziopandini 🙏

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 8, 2023
@fabriziopandini
Copy link
Member Author

FYI the code we prototyped in this PR is now merged in CAPI test framework, so the first step will be to pick up the latest CAPI release, and then start using tests from CAPI (hopefully the code of this PR will be hugely simplified)

@elmiko
Copy link
Contributor

elmiko commented Jun 12, 2023

ack, thanks @fabriziopandini , i have a pr up to change the cluster-api dep to version 1.4.3 (#86), is that sufficient or will we need the next release?

@fabriziopandini
Copy link
Member Author

Those changes are on main, so we can wait for the first 1.5.0 beta around EOM, pick a commit or ask the release team to cut an alpha tag

@elmiko
Copy link
Contributor

elmiko commented Jun 13, 2023

no need to cut an alpha, i don't have a ton of time to hack on kubemark currently, but am trying to cleanup what i can and learn a little more.

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

  • Mark this PR as fresh with /remove-lifecycle stale
  • Close this PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 20, 2024
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

  • Mark this PR as fresh with /remove-lifecycle rotten
  • Close this PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Feb 19, 2024
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

  • Reopen this PR with /reopen
  • Mark this PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

@k8s-ci-robot
Copy link
Contributor

@k8s-triage-robot: Closed this PR.

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

  • Reopen this PR with /reopen
  • Mark this PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Setup E2E tests
4 participants