Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Utilize jobframework setups #1630

Merged

Conversation

tenzen-y
Copy link
Member

What type of PR is this?

/kind cleanup

What this PR does / why we need it:

I utilized setup functions for the jobframeworks so that the platform developers can avoid to copy those functions to separate in-house controllers.

Which issue(s) this PR fixes:

Part-of #1601

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Expose utilization functions to setup jobframework reconcilers and webhooks

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. labels Jan 22, 2024
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: tenzen-y

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 22, 2024
Copy link

netlify bot commented Jan 22, 2024

Deploy Preview for kubernetes-sigs-kueue canceled.

Name Link
🔨 Latest commit 4ab6b5f
🔍 Latest deploy log https://app.netlify.com/sites/kubernetes-sigs-kueue/deploys/65b16fabb36a49000820f33b

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jan 22, 2024
@tenzen-y tenzen-y changed the title Utilize jobframework setups WIP: Utilize jobframework setups Jan 22, 2024
@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 22, 2024
@tenzen-y tenzen-y force-pushed the utilize-jobframework-setup branch 2 times, most recently from 9425583 to b60d7ed Compare January 22, 2024 12:45
@tenzen-y tenzen-y changed the title WIP: Utilize jobframework setups Utilize jobframework setups Jan 22, 2024
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 22, 2024
@tenzen-y tenzen-y changed the title Utilize jobframework setups WIP: Utilize jobframework setups Jan 22, 2024
@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 22, 2024
@tenzen-y tenzen-y force-pushed the utilize-jobframework-setup branch 3 times, most recently from bfedab3 to 5d17773 Compare January 22, 2024 13:36
@tenzen-y tenzen-y changed the title WIP: Utilize jobframework setups Utilize jobframework setups Jan 22, 2024
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 22, 2024
return sets.New(cfg.Integrations.Frameworks...)
}

func IsWaitForPodsReadyEnable(cfg *configapi.Configuration) bool {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
func IsWaitForPodsReadyEnable(cfg *configapi.Configuration) bool {
func IsWaitForPodsReadyEnabled(cfg *configapi.Configuration) bool {

maybe this will read better

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM

gotWls := &kueue.WorkloadList{}
if tc.wantFieldMatcherError {
// Given that the `wantFieldMatcherError` is `true`, a list operation without fieldMatcher should succeed.
if gotListErr := k8sClient.List(ctx, gotWls, client.InNamespace(testNamespace)); gotListErr != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, this operation would succeed also when wantFieldMatcherError=false, right? If so, what is the purpose of this check and making it conditional?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, this operation would succeed also when wantFieldMatcherError=false, right?

Yes, that's right.

the purpose of this check

I wanted to verify that this operation would succeeded when we don't use the fieldMatcher.
We probably also should verify that the gotWls has the proper workloads here.

making it conditional?

I wanted to avoid unnecessary checks when the wantFieldMatcherError=false.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably also should verify that the gotWls has the proper workloads here.

I think so.

I wanted to avoid unnecessary checks when the wantFieldMatcherError=false.

I see, but maybe we can remove the conditional for simplicity. Or, alternatively, add a comment why it is checked.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, but maybe we can remove the conditional for simplicity.

I'm ok with removing the conditional since this is a unit test, and increasing the test execution time almost wouldn't happen.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated.

}
for name, tc := range cases {
t.Run(name, func(t *testing.T) {
ctx := context.Background()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sure the ctx is closed at the end of each test case

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean the following?

ctx, cancel := context.WithCancel(context.Background())
defer cancel()

IIUC, the context.Background() generates an empty context.
So I think that we need not such a check.
If my understanding isn't correct, let me know.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, it is empty. Still, we need to be careful passing it to other functions. When a function starts goroutines using the context they may leak because the context is not closed. Thus, it seems a safe practice to use the snippet as you pasted to make sure the context passed to any functions is closed after the test case.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It makes sense. Let's make sure that the context is closed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

},
modifyOpts: func(fwkName string, opts ...Option) ([]Option, error) { return opts, nil },
},
"modifyOptions returns errors": {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we add a test case when kubeflow is enabled, but there is no mapper for it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Proving no errors occur would be valuable In that case. Thanks!

log.Info("No matching API in the server for job framework, skipped setup of controller and webhook")

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

filter client.ListOption
wantError error
wantFieldMatcherError bool
wantList []string
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
wantList []string
wantWorkloads []string

nit

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


cases := map[string]struct {
opts []Option
wls []client.Object
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
wls []client.Object
workloads []client.Object

nit

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@k8s-ci-robot k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jan 22, 2024
@mimowo
Copy link
Contributor

mimowo commented Jan 23, 2024

/lgtm
The remaining comment from me #1630 isn't blocking.
/assign @alculquicondor

@@ -168,3 +169,14 @@ func Load(scheme *runtime.Scheme, configFile string) (ctrl.Options, configapi.Co
addTo(&options, &cfg)
return options, cfg, err
}

func EnabledFrameworks(cfg *configapi.Configuration) sets.Set[string] {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
func EnabledFrameworks(cfg *configapi.Configuration) sets.Set[string] {
func EnabledFrameworks(integrations *configapi.Integrations) sets.Set[string] {

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

return sets.New(cfg.Integrations.Frameworks...)
}

func IsWaitForPodsReadyEnabled(cfg *configapi.Configuration) bool {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
func IsWaitForPodsReadyEnabled(cfg *configapi.Configuration) bool {
func IsWaitForPodsReadyEnabled(cfg *configapi.WaitForPodsReady) bool {

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -65,3 +66,9 @@ func ManageCerts(mgr ctrl.Manager, cfg config.Configuration, setupFinished chan
RequireLeaderElection: false,
})
}

func WaitForCertsReady(setupLog logr.Logger, certsReady chan struct{}) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
func WaitForCertsReady(setupLog logr.Logger, certsReady chan struct{}) {
func WaitForCertsReady(log logr.Logger, certsReady chan struct{}) {

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@tenzen-y
Copy link
Member Author

@tenzen-y: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-kueue-verify-main 67ef390 link unknown /test pull-kueue-verify-main
pull-kueue-test-unit-main 67ef390 link unknown /test pull-kueue-test-unit-main
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

It seems that I need to rebase this PR.

@tenzen-y tenzen-y force-pushed the utilize-jobframework-setup branch 2 times, most recently from 1d86f51 to eefbe1e Compare January 23, 2024 17:34
@tenzen-y
Copy link
Member Author

/assign @alculquicondor

jobframework.WithManageJobsWithoutQueueName(manageJobsWithoutQueueName),
jobframework.WithWaitForPodsReady(waitForPodsReady(cfg)),
jobframework.WithManageJobsWithoutQueueName(cfg.ManageJobsWithoutQueueName),
jobframework.WithWaitForPodsReady(config.IsWaitForPodsReadyEnabled(cfg.WaitForPodsReady)),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about something like this?
I don't think we should need two separate functions to process the configuration.

Suggested change
jobframework.WithWaitForPodsReady(config.IsWaitForPodsReadyEnabled(cfg.WaitForPodsReady)),
jobframework.WithWaitForPodsReady(cfg.WaitForPodsReady),

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It makes sense.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also I think we can apply the same way to EnabledFrameworks.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


func SetupControllers(
mgr ctrl.Manager,
setupLog logr.Logger,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
setupLog logr.Logger,
log logr.Logger,

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

cmd/kueue/main.go Outdated Show resolved Hide resolved
Signed-off-by: tenzen-y <yuki.iwai.tz@gmail.com>
@tenzen-y
Copy link
Member Author

/hold cancel

@alculquicondor I rebased this PR. Please take another look.

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 24, 2024
Comment on lines 40 to 42
// The controllers won't work until the webhooks are operating, and the webhook won't work until the
// certs are all in place.
cert.WaitForCertsReady(log, certsReady)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you forget to remove this from main?

Or maybe it shouldn't be here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both here and the main one is intended although I should replace

kueue/cmd/kueue/main.go

Lines 222 to 226 in 78acd53

// The controllers won't work until the webhooks are operating, and the webhook won't work until the
// certs are all in place.
setupLog.Info("Waiting for certificate generation to complete")
<-certsReady
setupLog.Info("Certs ready")
with cert.WaitForCertsReady().

I think we need to perform cert.WaitForCertsReady() here since in the in-house custom job kueue-manager, only this function launches controllers and webhooks.

However, the kueue-manager launches controllers and webooks in this function and main's setupControllers().

So, I think both Is needed. @alculquicondor WDYT?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, but maybe they didn't setup certs at all? I think they can call the function if they need it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It makes sense. Let's remove it from here.
They can call WaitForCertsReady() since it is an exported one.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

errFailedMappingResource = errors.New("restMapper failed mapping resource")
)

func SetupControllers(mgr ctrl.Manager, log logr.Logger, certsReady chan struct{}, opts ...Option) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add comments for how/when this should be used

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It makes sense.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe this file should be named setup.go?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Signed-off-by: tenzen-y <yuki.iwai.tz@gmail.com>
Signed-off-by: tenzen-y <yuki.iwai.tz@gmail.com>
Signed-off-by: tenzen-y <yuki.iwai.tz@gmail.com>
Signed-off-by: tenzen-y <yuki.iwai.tz@gmail.com>
@tenzen-y
Copy link
Member Author

tenzen-y commented Jan 24, 2024

The above CI error occurred due to a connection error with the go proxy server.

../../../pkg/mod/github.com/onsi/ginkgo/v2@v2.15.0/ginkgo/labels/labels_command.go:15:2: golang.org/x/tools@v0.16.1: read "https:/proxy.golang.org/@v/v0.16.1.zip": stream error: stream ID 23; INTERNAL_ERROR; received from peer
make: *** [Makefile:322: ginkgo] Error 1

https://prow.k8s.io/view/gs/kubernetes-jenkins/pr-logs/pull/kubernetes-sigs_kueue/1630/pull-kueue-test-e2e-main-1-29/1750246773613400064

@tenzen-y
Copy link
Member Author

@alculquicondor I addressed all comments. PTAL, thanks.

Copy link
Contributor

@alculquicondor alculquicondor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 24, 2024
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: f0805ea43f86cb0c0af3ba4d381d4e41ebe05325

@k8s-ci-robot k8s-ci-robot merged commit 93de69a into kubernetes-sigs:main Jan 24, 2024
14 checks passed
@k8s-ci-robot k8s-ci-robot added this to the v0.6 milestone Jan 24, 2024
@tenzen-y tenzen-y deleted the utilize-jobframework-setup branch January 24, 2024 21:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants