-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move kubeflow jobs to https://github.com/GoogleCloudPlatform/oss-test-infra/tree/master/prow/prowjobs #14343
Comments
@fejta That seems fine with me. What else would need to change? i.e Do testgrids move? @scottilee is this something you could help with? |
@fejta How urgent is this on your end? |
Not urgent. Testgrid can stay the same. @chases2 we should probably set up GCP/oss-test-infra to work like istio -- where we can annotate prowjobs there and have them show up in this testgrid. |
@fejta correct we manually upload our files to GCS right now; but we could probably switch to use pod_utils. |
@fejta Can you share some more info (e.g., a link to a ticket or document with explanation if available) on why the move from prow.k8s.io to prow.gflocks.com? Also, would it just be creating a "kubeflow" folder in https://github.com/GoogleCloudPlatform/oss-test-infra/tree/master/prow/prowjobs and moving the files in https://github.com/kubernetes/test-infra/tree/master/config/jobs/kubeflow to there? Lastly, I'm not familiar with pod-utils. I'm assuming it's this https://github.com/kubernetes/test-infra/tree/master/prow/pod-utils? Any more info anywhere so I can read up on it? |
prow.k8s.io is for kubernetes (or at least CNCF project) Eventually we want to migrate all non-CNCF projects out of prow.k8s.io And yes, it is ideally
Let's not worry about this for now, see https://github.com/kubernetes/test-infra/blob/master/prow/jobs.md#pod-utilities for more detail. Test containers should no longer need to check out repos and/or upload results to GCS. Sidecar containers will do this for you. |
@fejta I apologize for the delay on this. I started a PR at GoogleCloudPlatform/oss-test-infra#93, which is probably wrong 🙄 but it's a start! Let me know what's missing...
|
@scottilee Kubeflow already has a Kubernetes cluster in project kubeflow-ci which we use for testing purposes. So I believe with the new approach the goal would be to have prow schedule the jobs for Kubeflow on that instance. I'm not sure what we need to do to make that happen. I suspect we need to install some CRs and other infra on our test cluster. Given that we are getting close to 0.7 we might want to be careful not make an infra changes that could inhibit us releasing on time. |
@jlewi can I either get access to the kubeflow-ci project or can you create the |
If you need access to the CI cluster please join this group. Lets proceed cautiously in terms of moving our prow infrastructure because we are getting ready to do a release and don't want to disrupt our test infra. |
@scottilee opened up kubeflow/testing#475 to track changes to our test infra. I will run mkbuild-cluster as soon as I can. |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
/remove-lifecycle stale |
PR open in #16898 |
#16898 moved the prow jobs onto the kubeflow testing cluster but we are still using the kubernetes instance of prow. So I think this issue should remain open. |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
I'm fixing build cluster setup for tests of other test clusters. Current error: https://oss-prow.knative.dev/log?job=kubeflow-testing-presubmit&id=1362976347659440128
|
Other Kubeflow tests do not use pod-utils, they directly write to the prow gcs bucket to report status back to prow. @chaodaiG what would be recommended location of these artifacts in gcp oss prow? |
Corresponding trigger code lives in https://github.com/kubeflow/testing/blob/aad63589e8c98fd021121e17a08c948f4658c889/images/run_workflows.sh |
I'd like some feedback, I'm starting to think that I should just refactor it to use pod-utils instead, it seems we can maintain fewer stuff this way. There's one feature I'm not sure how to use with pod-utils -- How can I add key-value properties dynamically during test like above? Showing the URI helps us navigate to dashboards where the actual tests were running (a tekton UI). |
and I found our code to manually create |
Corresponding gubernator gcs path for gcp oss prow seems to look like:
(Found on https://oss-prow.knative.dev/view/gs/oss-prow/logs/kubeflow-pipeline-postsubmit-integration-test/1362898955494494208 -> click "artifacts" button) Can we add our CI prow service account (kf-ci-v1-prow@kubeflow-ci.iam.gserviceaccount.com) with write access to this gs://os-prow/logs directory? |
Migrating to pod-utils is highly recommended. the meta data expected can be done by writing a file called https://prow.knative.dev/view/gs/knative-prow/logs/ci-knative-serving-continuous/1363096292372254720 |
A prow job doesn't and shouldn't need to know where the results are stored, anything stored under |
Looks great, can I confirm whether the custom key value pairs can show when the test is still running? |
I'm afraid that it's not, I have just inspected one of the same job as in the example I pointed out above, and after 1 hour it's still not displayed (The metadata.json file should have been created long before that). |
Hmm, we'd want to show an URL to a tekton instance running the steps, and ideally when the job is still running. |
The biggest concern of the old approach, is that the service account used for running test pod can access a shared GCS bucket that all prow jobs use, which technically is a security concern. But if it's preferable to you we can work together figuring out a plan for this one time exception. There are a couple of possible approaches in my mind:
|
We have uniform bucket ACLs enabled so 1 is not an option. 2 should work though and may be worthwhile either way since we'd like to move toward per-team GCS buckets. Please check out the docs for the pod utilities to see if migrating would be feasible. If you don't believe it is, we can rely on option 2 to ensure that the custom upload logic doesn't interfere with other tenant's job results. |
Explaining a little bit of history, there were quite some chunk of code written in Kubeflow CI that had corresponding responsibility of pod utilities --- and no one with familiarity to them are still on the team, so it'll be difficult to migrate because of the unmaintained test code. Anyway, I've decided to finish the migration first by not setting up presubmit tests for the remaining repos, because they are not under active development. My team can either migrate or build new tests later. |
The migration is done! |
/close |
@Bobgy: You can't close an active issue/PR unless you authored it or you are a collaborator. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Actually, there's one last step of adding back the testgrid in gcp oss prow. |
/assign @chaodaiG |
yes https://testgrid.k8s.io/googleoss-kubeflow-pipelines is online now. Great to see this eventually done! /close |
@chaodaiG: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@jlewi what do you think about moving the kubeflow jobs over to https://github.com/GoogleCloudPlatform/oss-test-infra/tree/master/prow/prowjobs and prow.gflocks.com?
The text was updated successfully, but these errors were encountered: