Skip to content

Conversation

@rimolive
Copy link
Member

@rimolive rimolive commented Apr 8, 2024

Description of your changes:
Closes #10488

Merge all the kfp-tekton compiler code to pipelines.

Checklist:

@rimolive
Copy link
Member Author

rimolive commented Apr 8, 2024

/hold

Still working on the tests


// TODO: Verify/add support for file:///.
if ms[1] != "gs://" && ms[1] != "s3://" && ms[1] != "minio://" {
if ms[1] != "gs://" && ms[1] != "s3://" && ms[1] != "minio://" && ms[1] != "mem://" {
Copy link
Member

@Tomcli Tomcli Apr 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mem:// here is to support in memory storage for launcher tests under environments that doesn't have access to public gcs endpoints

"google.golang.org/grpc/metadata"

api "github.com/kubeflow/pipelines/backend/api/v1beta1/go_client"
apiv2 "github.com/kubeflow/pipelines/backend/api/v2beta1/go_client"
Copy link
Member

@Tomcli Tomcli Apr 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We move report client to use the new v2 api

numWorkerName = "numWorker"
clientQPSFlagName = "clientQPS"
clientBurstFlagName = "clientBurst"
executionTypeFlagName = "executionType"
Copy link
Member

@Tomcli Tomcli Apr 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This new flag is for changing executionType, default is Argo

@rimolive
Copy link
Member Author

rimolive commented Apr 9, 2024

/test kubeflow-pipeline-e2e-test

1 similar comment
@rimolive
Copy link
Member Author

rimolive commented Apr 9, 2024

/test kubeflow-pipeline-e2e-test

@rimolive
Copy link
Member Author

rimolive commented Apr 9, 2024

@Tomcli @chensun I need a help in here. The e2e tests that are failling are job tests, and I just ran locally but they're passing. Did I miss someting?

Copy link
Member

@Tomcli Tomcli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @rimolive for resolving the tests. I don't have access the the backend test, but it seems like only the api integration test is failing.

@chensun
Copy link
Member

chensun commented Apr 9, 2024

The failure are on jobs, aka scheduled workflow:

integration-test-dg828-1458653792: === RUN   TestJobApi/TestJobApis
integration-test-dg828-1458653792:     job_api_test.go:374: 
integration-test-dg828-1458653792:         	Error Trace:	/go/src/github.com/kubeflow/pipelines/backend/test/integration/job_api_test.go:374
integration-test-dg828-1458653792:         	Error:      	Expected nil, but got: &errors.errorString{s:"expected runs to be length 1, got: 0"}
integration-test-dg828-1458653792:         	Test:       	TestJobApi/TestJobApis
integration-test-dg828-1458653792:     job_api_test.go:395: 
integration-test-dg828-1458653792:         	Error Trace:	/go/src/github.com/kubeflow/pipelines/backend/test/integration/job_api_test.go:395
integration-test-dg828-1458653792:         	Error:      	Expected nil, but got: &errors.errorString{s:"expected runs to be length 1, got: 0"}
integration-test-dg828-1458653792:         	Test:       	TestJobApi/TestJobApis
integration-test-dg828-1458653792: === RUN   TestJobApi/TestJobApis_SwfNotFound
integration-test-dg828-1458653792: === RUN   TestJobApi/TestJobApis_noCatchupOption
integration-test-dg828-1458653792:     job_api_test.go:512: 
integration-test-dg828-1458653792:         	Error Trace:	/go/src/github.com/kubeflow/pipelines/backend/test/integration/job_api_test.go:512
integration-test-dg828-1458653792:         	Error:      	Expected nil, but got: &errors.errorString{s:"expected runsWhenCatchupTrue with periodic schedule to be 2, got: 0"}
integration-test-dg828-1458653792:         	Test:       	TestJobApi/TestJobApis_noCatchupOption
integration-test-dg828-1458653792:     job_api_test.go:540: 
integration-test-dg828-1458653792:         	Error Trace:	/go/src/github.com/kubeflow/pipelines/backend/test/integration/job_api_test.go:540
integration-test-dg828-1458653792:         	Error:      	Expected nil, but got: &errors.errorString{s:"expected runsWhenCatchupFalse with periodic schedule to be 1, got: 0"}
integration-test-dg828-1458653792:         	Test:       	TestJobApi/TestJobApis_noCatchupOption
integration-test-dg828-1458653792: --- FAIL: TestJobApi (176.21s)
integration-test-dg828-1458653792:     --- FAIL: TestJobApi/TestJobApis (93.56s)
integration-test-dg828-1458653792:     --- PASS: TestJobApi/TestJobApis_SwfNotFound (0.50s)
integration-test-dg828-1458653792:     --- FAIL: TestJobApi/TestJobApis_noCatchupOption (81.82s)

Not sure how you tested this locally, consider adding debug logs to the test functions?
https://github.com/kubeflow/pipelines/blob/ac399315e66d6ed2666dc9dbaecbce4938f87356/backend/test/integration/job_api_test.go#L156C27-L156C38

Signed-off-by: Ricardo M. Oliveira <rmartine@redhat.com>
@rimolive
Copy link
Member Author

/unhold

@Tomcli @chensun Finally I got the tests passing but unfortunately I needed to follow Tommy's recommentation to remove the scheduled workflow and rethink the kfp-tekton code for that.

Feel free to lgtm/approve, but take into consideration that this will be a partial feature. I believe it's not a big deal given that historically we could handle partial feature implementations to make progress.

Copy link
Member

@Tomcli Tomcli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @rimolive for fixing some of the tests and dependencies.
/lgtm

For the scheduledworkflow, we probably need to also update the api test to use the new swf v2 client. Let's do that in a follow up PR so it won't block the core KFP feature.

@chensun
Copy link
Member

chensun commented Apr 15, 2024

/unhold

@Tomcli @chensun Finally I got the tests passing but unfortunately I needed to follow Tommy's recommentation to remove the scheduled workflow and rethink the kfp-tekton code for that.

Feel free to lgtm/approve, but take into consideration that this will be a partial feature. I believe it's not a big deal given that historically we could handle partial feature implementations to make progress.

@rimolive Just to confirm, by partial feature you meant scheduledworkflow doesn't work on kfp-tekton yet, but there's no regression to the feature on kfp-argo, right?

@Tomcli
Copy link
Member

Tomcli commented Apr 15, 2024

/unhold
@Tomcli @chensun Finally I got the tests passing but unfortunately I needed to follow Tommy's recommentation to remove the scheduled workflow and rethink the kfp-tekton code for that.
Feel free to lgtm/approve, but take into consideration that this will be a partial feature. I believe it's not a big deal given that historically we could handle partial feature implementations to make progress.

@rimolive Just to confirm, by partial feature you meant scheduledworkflow doesn't work on kfp-tekton yet, but there's no regression to the feature on kfp-argo, right?

Yes, there's no regression on all the kfp-argo feature.

ENV NUM_WORKERS 2
ENV LOG_LEVEL info

CMD persistence_agent --logtostderr=true --namespace=${NAMESPACE} --ttlSecondsAfterWorkflowFinish=${TTL_SECONDS_AFTER_WORKFLOW_FINISH} --numWorker ${NUM_WORKERS} --logLevel=${LOG_LEVEL}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be an unintentional change? I see the default value remains the same, but we should keep this customization capability.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch! I will rollback that change.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, this line is completely wrong, it should have the EXECUTIONTYPE env var set by default and send as a flag. I'm fixing it.

// k8s.io/client-go/rest/config.go#RESTClientFor
flag.Float64Var(&clientQPS, clientQPSFlagName, 5, "The maximum QPS to the master from this client.")
flag.IntVar(&clientBurst, clientBurstFlagName, 10, "Maximum burst for throttle from this client.")
flag.StringVar(&executionType, executionTypeFlagName, "Workflow", "Custom Resource's name of the backend Orchestration Engine")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Workflow is too generic as the executionType, should we name the default "ArgoWorkflow" instead?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code uses the kind field of the pipeline to find its execution type. Changing to ArgoWorkflow would break the code.

We can think of a better way to do this, maybe using annotation. However, I think this is out of scope of this PR.


}

// TODO: wait for up stream to officially update to v2beta1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: The comment needs some update--there's no upstream once this is merged in to kfp repro.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But v2beta1 is already released, right? If so, we can just remove the comment.

return nil, util.NewInternalServerError(util.NewInvalidInputError("ScheduledWorkflow doesn't exist: %s", job.K8SName), "Failed to create a run due to invalid name")
}
executionSpec.SetOwnerReferences(swf)
// executionSpec.SetExecutionName(run.Description)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: remove?

if executionSpecStr, ok := wfr.Spec.(string); ok {
return NewPipelineRunFromScheduleWorkflowSpecBytesJSON([]byte(executionSpecStr))
}
// fall back to Argo WorkflowSpec, need to marshal back to json string then unmarshal to
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: update comment

assert.Empty(t, execSpec)
assert.Error(t, err)
assert.EqualError(t, err, "InternalServerError: type:PipelineRun: ExecutionType is not supported")
assert.EqualError(t, err, "Invalid input error: not PipelineRun struct")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: there should be a better error message

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// Get converts this object to a workflowapi.Workflow.
// func (pr *PipelineRun) Get() *pipelineapi.PipelineRun {
// return pr.PipelineRun
// }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: remove

// if newWF.Spec.ActiveDeadlineSeconds != nil && *newWF.Spec.ActiveDeadlineSeconds == 0 {
// // if it was terminated, unset the deadline
// newWF.Spec.ActiveDeadlineSeconds = nil
// }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: remove all the dead code.

func readTaskRunMetricsJSONOrEmpty(
runID string, nodeStatus pipelineapi.PipelineRunTaskRunStatus,
retrieveArtifact RetrieveArtifact) (string, error) {
// Tekton doesn't support any artifact spec, artifact records are done by our custom metadata writers:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto: remove dead code

@google-oss-prow google-oss-prow bot removed the lgtm label Apr 15, 2024
Signed-off-by: Ricardo M. Oliveira <rmartine@redhat.com>
@google-oss-prow
Copy link

google-oss-prow bot commented Apr 15, 2024

@rimolive: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
kubeflow-pipelines-samples-v2 39991f2 link false /test kubeflow-pipelines-samples-v2
Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

Copy link
Member

@chensun chensun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

Thanks!

@google-oss-prow google-oss-prow bot added the lgtm label Apr 15, 2024
@google-oss-prow
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: chensun, Tomcli

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@google-oss-prow google-oss-prow bot merged commit 60a443e into kubeflow:master Apr 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[feature] Add the new Tekton packages into the KFP backends

3 participants