New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
scheduler-perf: run as integration tests #118202
scheduler-perf: run as integration tests #118202
Conversation
This issue is currently awaiting triage. If a SIG or subproject determines this is a relevant issue, they will accept it by applying the The Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
9b65d9e
to
1ba89fd
Compare
But there's no failure message 😢 /retest |
Here's why the failure is not shown:
That's from Verbosity is a bit high. The V(5) log message from graph_builder.go are too much. I'm not sure yet where that gets increased. The default should be -v2. |
06b49cb
to
af79207
Compare
af79207
to
95c8e50
Compare
The verbosity problem was fixed by reconfiguring defaults for integration tests and the test failure should be fixed by disabling the sampling interval check. It's only relevant when we measure performance and is more likely to fail when many Go tests run in parallel. |
@Huang-Wei @ahg-g @alculquicondor can one of you please approve/lgtm? |
@kerthcet could you take a look as you recently reviewed similar PRs in pert tests. Thanks! |
62844bc
to
0358fdb
Compare
I've gone through the review feedback and tried to address everything. For now I continue to use gomega.Eventually. @alculquicondor: if you still prefer |
Each benchmark test case runs with a fresh etcd instance. Therefore it is not necessary to delete objects after a run. A future unit test might reuse etcd, therefore cleanup is optional.
0358fdb
to
7066d05
Compare
/retest |
Addressed all my concerns, so |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: kerthcet, mimani68, pohly The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/hold |
/retest |
// - k8s api server | ||
// - scheduler | ||
// It returns regular and dynamic clients, and destroyFunc which should be used to | ||
// remove resources after finished. | ||
// Notes on rate limiter: | ||
// - client rate limit is set to 5000. | ||
func mustSetupScheduler(ctx context.Context, b *testing.B, config *config.KubeSchedulerConfiguration, enabledFeatures map[featuregate.Feature]bool) (informers.SharedInformerFactory, clientset.Interface, dynamic.Interface) { | ||
func mustSetupScheduler(ctx context.Context, tb testing.TB, config *config.KubeSchedulerConfiguration, enabledFeatures map[featuregate.Feature]bool) (informers.SharedInformerFactory, clientset.Interface, dynamic.Interface) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It still says Scheduler here.
Also please update the comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
@@ -127,7 +132,10 @@ func StartFakePVController(ctx context.Context, clientSet clientset.Interface, i | |||
claimRef := obj.Spec.ClaimRef | |||
pvc, err := clientSet.CoreV1().PersistentVolumeClaims(claimRef.Namespace).Get(ctx, claimRef.Name, metav1.GetOptions{}) | |||
if err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should avoid any races:
if err != nil { | |
if err != nil && errors.Is(err, context.Canceled) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have to return when there was an error, otherwise the code below would use a nil pvc
. What needs to be fixed is the check whether that error should be logger. I had the logic backwards. It now is:
if err != nil {
// Note that the error can be anything, because components like
// apiserver are also shutting down at the same time, but this
// check is conservative and only ignores the "context canceled"
// error while shutting down.
if ctx.Err() == nil || !errors.Is(err, context.Canceled) {
klog.Errorf("error while getting %v/%v: %v", claimRef.Namespace, claimRef.Name, err)
}
return
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sg
test/integration/util/util.go
Outdated
@@ -136,7 +144,10 @@ func StartFakePVController(ctx context.Context, clientSet clientset.Interface, i | |||
metav1.SetMetaDataAnnotation(&pvc.ObjectMeta, pvutil.AnnBindCompleted, "yes") | |||
_, err := clientSet.CoreV1().PersistentVolumeClaims(claimRef.Namespace).Update(ctx, pvc, metav1.UpdateOptions{}) | |||
if err != nil { | |||
klog.Errorf("error while updating %v/%v: %v", claimRef.Namespace, claimRef.Name, err) | |||
if ctx.Err() != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here
func() { | ||
_, ctx := ktesting.NewTestContext(t) | ||
// 30 minutes is for *all* tests using this configuration. | ||
ctx, cancel := context.WithTimeout(ctx, 30*time.Minute) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not just context.WithCancel
then?
} | ||
} | ||
|
||
// We need to wait here because even with deletion time stamp set, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// We need to wait here because even with deletion time stamp set, | |
// We need to wait here because even with deletion timestamp set, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
pods, err := client.CoreV1().Pods(namespace).List(ctx, metav1.ListOptions{}) | ||
if err != nil { | ||
tb.Fatalf("failed to list pods in %q: %v", namespace, err) | ||
} | ||
for _, pod := range pods.Items { | ||
if err := client.CoreV1().Pods(namespace).Delete(ctx, pod.Name, deleteNow); err != nil { | ||
tb.Fatalf("failed to delete pod %q in namespace %q: %v", pod.Name, namespace, err) | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use a single DeleteCollection? IIUC, it also accepts DeleteOptions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea, changed.
Seems forgot to push the commit... |
Once the context is canceled, the controller can stop processing events. Without this change it prints errors when the apiserver is already down.
This becomes relevant when doing more fine-grained leak checking.
Merely deleting the namespace is not enough: - Workloads might rely on the garbage collector to get rid of obsolete objects, so we should run it to be on the safe side. - Pods must be force-deleted because kubelet is not running. - Finally, the namespace controller is needed to get rid of deleted namespaces.
This runs workloads that are labeled as "integration-test". The apiserver and scheduler are only started once per unique configuration, followed by each workload using that configuration. This makes execution faster. In contrast to benchmarking, we care less about starting with a clean slate for each test.
…imeout This is done for the sake of consistency. The failure message becomes less useful.
2ff7891
to
0d41d50
Compare
I didn't quite finish yesterday after leaving my initial replies. I think I addressed everything now and pushed: https://github.com/kubernetes/kubernetes/compare/2ff7891706089c1a3ae58b6cf6cd116b8e462dab..0d41d509d2d96ccc3473924cb4e1b8e1b3e4c170 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
LGTM label has been added. Git tree hash: b66b490fb759b8567b59abbd483c435bbdb2fc06
|
/hold cancel |
What type of PR is this?
/kind cleanup
What this PR does / why we need it:
This has two purposes:
Special notes for your reviewer:
Split into multiple stand-alone commits to simplify reviews, but probably none of those are worth merging by their own.
Does this PR introduce a user-facing change?