Argo workflow to run E2E tests #72

jlewi · 2017-12-28T23:28:31Z

Create an Argo workflow to run the E2E test for Kubeflow deployment
Create a ksonnet app for deploying Argo in our test infrastructure
Create a ksonnet component to trigger the E2E workflow.
Add tensorflow/k8s as a git submodule because we want to reuse some python scripts in that project to
write our tests.
bootstrap.sh is the entrypoint for our prow jobs
- It will be used to check out the repo at the commit corresponding to the prow job and then invoke
  a test script in the repo. This ensures that the bulk of our test logic is pulled from the repo at the
  commit being tested.
checkout.sh is a script for checking out the source to be used as the first step in our workflows
The Argo workflow uses an NFS share to store test data so that we can have multiple steps running
in parallel and accessing the same files.

…art. * Log output and test results are written to a local directory. * In a follow on PR we will add a binary to copy all the outputs from the local directory to Gubernator. This way most steps in the test don't need to deal with GCS.

* add Python lint rc file.

…ages.

the appropriate version of the repo and then invoking the test program. This will allow us to pull the bulk of the test program from the repo at the commit that is being tested.

…on scripts needed to run our tests.

jlewi · 2018-01-03T04:57:31Z

@foxish this is ready for review but #71 should be submitted first.

foxish · 2018-01-03T15:54:10Z

Maybe someone from the argo team could take a look as well - to spot potential issues with the use of the DAGs.

jlewi · 2018-01-03T19:22:21Z

@jessesuen any chance you could look at our Argo workflow and provide suggestions? Our workflow is generated using ksonnet from workflows.libsonnet, but I've provided the actual YAML spec below.

Here are some key points regarding our use of Argo

I ended up not using GitHub artifact support. Instead I added an explicit step which executes a shell script to check out the code.
- I did this because GitHub artifact support was insufficient.
- For example, we needed to fetch refs (to checkout code for PRs) and also initialize sub modules.
- So given we always needed to run some git commands manually it seemed simpler just to use a custom script.
I used a custom script rather than Artifact support to copy relevant files to GCS to support gubernator
- This seemed simpler then using Artifacts S3 support with minio to support GCS.

apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  name: e2e-test-presubmit-20180102-204405
  namespace: kubeflow-test-infra
spec:
  entrypoint: e2e
  templates:
  - name: e2e
    steps:
    - - name: checkout
        template: checkout
    - - name: test-deploy
        template: test-deploy
      - name: create-started
        template: create-started
      - name: create-pr-symlink
        template: create-pr-symlink
    - - name: create-finished
        template: create-finished
    - - name: copy-artifacts
        template: copy-artifacts
  - container:
      args:
      - /mnt/test-data-volume/e2e-test-presubmit-20180102-204405/src
      command:
      - /usr/local/bin/checkout.sh
      env:
      - name: REPO_OWNER
        value: google
      - name: REPO_NAME
        value: kubeflow
      - name: PULL_NUMBER
        value: "72"
      - name: PULL_PULL_SHA
        value: pr
      - name: BUILD_NUMBER
        value: "101"
      - name: JOB_NAME
        value: kubeflow-presubmit
      image: gcr.io/mlkube-testing/kubeflow-testing
      volumeMounts:
      - mountPath: /mnt/test-data-volume
        name: kubeflow-test-volume
    name: checkout
  - container:
      command:
      - python
      - -m
      - testing.test_deploy
      - --project=mlkube-testing
      - --cluster=kubeflow-testing
      - --zone=us-east1-d
      - --github_token=$(GIT_TOKEN)
      - --test_dir=/mnt/test-data-volume/e2e-test-presubmit-20180102-204405
      - --artifacts_dir=/mnt/test-data-volume/e2e-test-presubmit-20180102-204405/output/artifacts
      env:
      - name: PYTHONPATH
        value: /mnt/test-data-volume/e2e-test-presubmit-20180102-204405/src:/mnt/test-data-volume/e2e-test-presubmit-20180102-204405/src/tensorflow_k8s
      - name: GOOGLE_APPLICATION_CREDENTIALS
        value: /secret/gcp-credentials/key.json
      - name: GIT_TOKEN
        valueFrom:
          secretKeyRef:
            key: github_token
            name: github-token
      - name: REPO_OWNER
        value: google
      - name: REPO_NAME
        value: kubeflow
      - name: PULL_NUMBER
        value: "72"
      - name: PULL_PULL_SHA
        value: pr
      - name: BUILD_NUMBER
        value: "101"
      - name: JOB_NAME
        value: kubeflow-presubmit
      image: gcr.io/mlkube-testing/kubeflow-testing
      volumeMounts:
      - mountPath: /mnt/test-data-volume
        name: kubeflow-test-volume
      - mountPath: /secret/github-token
        name: github-token
      - mountPath: /secret/gcp-credentials
        name: gcp-credentials
    name: test-deploy
  - container:
      command:
      - python
      - -m
      - testing.prow_artifacts
      - --artifacts_dir=/mnt/test-data-volume/e2e-test-presubmit-20180102-204405/output
      - create_started
      env:
      - name: PYTHONPATH
        value: /mnt/test-data-volume/e2e-test-presubmit-20180102-204405/src:/mnt/test-data-volume/e2e-test-presubmit-20180102-204405/src/tensorflow_k8s
      - name: GOOGLE_APPLICATION_CREDENTIALS
        value: /secret/gcp-credentials/key.json
      - name: GIT_TOKEN
        valueFrom:
          secretKeyRef:
            key: github_token
            name: github-token
      - name: REPO_OWNER
        value: google
      - name: REPO_NAME
        value: kubeflow
      - name: PULL_NUMBER
        value: "72"
      - name: PULL_PULL_SHA
        value: pr
      - name: BUILD_NUMBER
        value: "101"
      - name: JOB_NAME
        value: kubeflow-presubmit
      image: gcr.io/mlkube-testing/kubeflow-testing
      volumeMounts:
      - mountPath: /mnt/test-data-volume
        name: kubeflow-test-volume
      - mountPath: /secret/github-token
        name: github-token
      - mountPath: /secret/gcp-credentials
        name: gcp-credentials
    name: create-started
  - container:
      command:
      - python
      - -m
      - testing.prow_artifacts
      - --artifacts_dir=/mnt/test-data-volume/e2e-test-presubmit-20180102-204405/output
      - create_pr_symlink
      - --bucket=mlkube-testing_temp
      env:
      - name: PYTHONPATH
        value: /mnt/test-data-volume/e2e-test-presubmit-20180102-204405/src:/mnt/test-data-volume/e2e-test-presubmit-20180102-204405/src/tensorflow_k8s
      - name: GOOGLE_APPLICATION_CREDENTIALS
        value: /secret/gcp-credentials/key.json
      - name: GIT_TOKEN
        valueFrom:
          secretKeyRef:
            key: github_token
            name: github-token
      - name: REPO_OWNER
        value: google
      - name: REPO_NAME
        value: kubeflow
      - name: PULL_NUMBER
        value: "72"
      - name: PULL_PULL_SHA
        value: pr
      - name: BUILD_NUMBER
        value: "101"
      - name: JOB_NAME
        value: kubeflow-presubmit
      image: gcr.io/mlkube-testing/kubeflow-testing
      volumeMounts:
      - mountPath: /mnt/test-data-volume
        name: kubeflow-test-volume
      - mountPath: /secret/github-token
        name: github-token
      - mountPath: /secret/gcp-credentials
        name: gcp-credentials
    name: create-pr-symlink
  - container:
      command:
      - python
      - -m
      - testing.prow_artifacts
      - --artifacts_dir=/mnt/test-data-volume/e2e-test-presubmit-20180102-204405/output
      - create_finished
      env:
      - name: PYTHONPATH
        value: /mnt/test-data-volume/e2e-test-presubmit-20180102-204405/src:/mnt/test-data-volume/e2e-test-presubmit-20180102-204405/src/tensorflow_k8s
      - name: GOOGLE_APPLICATION_CREDENTIALS
        value: /secret/gcp-credentials/key.json
      - name: GIT_TOKEN
        valueFrom:
          secretKeyRef:
            key: github_token
            name: github-token
      - name: REPO_OWNER
        value: google
      - name: REPO_NAME
        value: kubeflow
      - name: PULL_NUMBER
        value: "72"
      - name: PULL_PULL_SHA
        value: pr
      - name: BUILD_NUMBER
        value: "101"
      - name: JOB_NAME
        value: kubeflow-presubmit
      image: gcr.io/mlkube-testing/kubeflow-testing
      volumeMounts:
      - mountPath: /mnt/test-data-volume
        name: kubeflow-test-volume
      - mountPath: /secret/github-token
        name: github-token
      - mountPath: /secret/gcp-credentials
        name: gcp-credentials
    name: create-finished
  - container:
      command:
      - python
      - -m
      - testing.prow_artifacts
      - --artifacts_dir=/mnt/test-data-volume/e2e-test-presubmit-20180102-204405/output
      - copy_artifacts
      - --bucket=mlkube-testing_temp
      env:
      - name: PYTHONPATH
        value: /mnt/test-data-volume/e2e-test-presubmit-20180102-204405/src:/mnt/test-data-volume/e2e-test-presubmit-20180102-204405/src/tensorflow_k8s
      - name: GOOGLE_APPLICATION_CREDENTIALS
        value: /secret/gcp-credentials/key.json
      - name: GIT_TOKEN
        valueFrom:
          secretKeyRef:
            key: github_token
            name: github-token
      - name: REPO_OWNER
        value: google
      - name: REPO_NAME
        value: kubeflow
      - name: PULL_NUMBER
        value: "72"
      - name: PULL_PULL_SHA
        value: pr
      - name: BUILD_NUMBER
        value: "101"
      - name: JOB_NAME
        value: kubeflow-presubmit
      image: gcr.io/mlkube-testing/kubeflow-testing
      volumeMounts:
      - mountPath: /mnt/test-data-volume
        name: kubeflow-test-volume
      - mountPath: /secret/github-token
        name: github-token
      - mountPath: /secret/gcp-credentials
        name: gcp-credentials
    name: copy-artifacts
  volumes:
  - name: github-token
    secret:
      secretName: github-token
  - name: gcp-credentials
    secret:
      secretName: kubeflow-testing-credentials
  - name: kubeflow-test-volume
    persistentVolumeClaim:
      claimName: kubeflow-testing

jessesuen · 2018-01-03T20:48:58Z

Sure lemme take a look

jessesuen · 2018-01-03T21:41:04Z

Here are my suggestions:

It seems the only difference between the all the steps, is the python command which is executed. If that's the case, the workflow could be simplified to only have three templates. The checkout template, the e2e, and a generic kubeflow-testing template. The python command could be sent in as a input parameter to the kubeflow-testing template.
I noticed that the workflow name e2e-test-presubmit-20180102-204405 is used as part of the volume paths. The workflow name is already made available as the variable {{workflow.name}} so you can choose to use that instead of doing the substitution in ksonnet.
Along the lines of suggestion 2, PULL_NUMBER, PULL_PULL_SHA, BUILD_NUMBER appear to be inputs to the workflow so these could be made as workflow level parameters and globally referenced using {{workflow.parameters.XXXX}}.

Suggestions 2 and 3 need the 2.0.0-alpha3 version. Here is a simplified yaml after incorporating the above suggestions.

apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  name: e2e-test-presubmit-20180102-204405
  namespace: kubeflow-test-infra
spec:
  entrypoint: e2e
  arguments:
    parameters:
    - name: pull-number
      value: "72"
    - name: pull-pull-sha
      value: pr
    - name: build-number
      value: "101"
  templates:
  - name: e2e
    steps:
    - - name: checkout
        template: checkout
    - - name: test-deploy
        template: kubeflow-testing
        arguments:
          parameters:
          - name: cmd
            value: |
              python -m testing.test_deploy --project=mlkube-testing --cluster=kubeflow-testing --zone=us-east1-d 
              --github_token=$(GIT_TOKEN) --test_dir=/mnt/test-data-volume/{{workflow.name}} 
              --artifacts_dir=/mnt/test-data-volume/{{workflow.name}}/output/artifacts
      - name: create-started
        template: kubeflow-testing
        arguments:
          parameters:
          - name: cmd
            value: python -m testing.prow_artifacts --artifacts_dir=/mnt/test-data-volume/{{workflow.name}}/output create_started
      - name: create-pr-symlink
        template: kubeflow-testing
        arguments:
          parameters:
          - name: cmd
            value: python -m testing.prow_artifacts --artifacts_dir=/mnt/test-data-volume/{{workflow.name}}/output create_pr_symlink --bucket=mlkube-testing_temp
    - - name: create-finished
        template: kubeflow-testing
        arguments:
          parameters:
          - name: cmd
            value: python -m testing.prow_artifacts --artifacts_dir=/mnt/test-data-volume/{{workflow.name}}/output create_finished
    - - name: copy-artifacts
        template: kubeflow-testing
        arguments:
          parameters:
          - name: cmd
            value: python -m testing.prow_artifacts --artifacts_dir=/mnt/test-data-volume/{{workflow.name}}/output copy_artifacts --bucket=mlkube-testing_temp
  - container:
      args:
      - /mnt/test-data-volume/{{workflow.name}}/src
      command:
      - /usr/local/bin/checkout.sh
      env:
      - name: REPO_OWNER
        value: google
      - name: REPO_NAME
        value: kubeflow
      - name: PULL_NUMBER
        value: "{{workflow.parameters.pull-number}}"
      - name: PULL_PULL_SHA
        value: "{{workflow.parameters.pull-pull-sha}}"
      - name: BUILD_NUMBER
        value: "{{workflow.parameters.build-number}}"
      - name: JOB_NAME
        value: kubeflow-presubmit
      image: gcr.io/mlkube-testing/kubeflow-testing
      volumeMounts:
      - mountPath: /mnt/test-data-volume
        name: kubeflow-test-volume
    name: checkout
  - inputs:
      parameters:
      - name: cmd
    container:
      command: [sh, -c]
      args: ["{{inputs.parameters.cmd}}"]
      env:
      - name: PYTHONPATH
        value: /mnt/test-data-volume/{{workflow.name}}/src:/mnt/test-data-volume/{{workflow.name}}/src/tensorflow_k8s
      - name: GOOGLE_APPLICATION_CREDENTIALS
        value: /secret/gcp-credentials/key.json
      - name: GIT_TOKEN
        valueFrom:
          secretKeyRef:
            key: github_token
            name: github-token
      - name: REPO_OWNER
        value: google
      - name: REPO_NAME
        value: kubeflow
      - name: PULL_NUMBER
        value: "{{workflow.parameters.pull-number}}"
      - name: PULL_PULL_SHA
        value: "{{workflow.parameters.pull-pull-sha}}"
      - name: BUILD_NUMBER
        value: "{{workflow.parameters.build-number}}"
      - name: JOB_NAME
        value: kubeflow-presubmit
      image: gcr.io/mlkube-testing/kubeflow-testing
      volumeMounts:
      - mountPath: /mnt/test-data-volume
        name: kubeflow-test-volume
      - mountPath: /secret/github-token
        name: github-token
      - mountPath: /secret/gcp-credentials
        name: gcp-credentials
    name: kubeflow-testing

jessesuen · 2018-01-03T23:05:54Z

testing/README.md

+http://127.0.0.1:8001/api/v1/proxy/namespaces/kubeflow-test-infra/services/argo-ui:80/
+```
+
+TODO(jlewi): We can probably make the UI publicly available since I don't think it offers any ability to launch workflows.


argo-ui has a feature and API to exec into running containers (enabled using --enable-web-console install option). This is disabled by default in alpha3 but be aware you would not want this enabled if the UI were to be made public facing.

jlewi · 2018-01-03T23:55:02Z

Thanks Jesse

It seems the only difference between the all the steps, is the python command which is executed. If that's the case, the workflow could be simplified to only have three templates. The checkout template, the e2e, and a generic kubeflow-testing template. The python command could be sent in as a input parameter to the kubeflow-testing template.

Since we're using ksonnet to avoid unnecessary code duplication; I think there's less of an incentive to use Argo's templating feature. It looks like using Argo templates requires us to wrap the command in a shell because we can only do string substitution not list substitution.

So I think a prefer duplicating the templates.

I noticed that the workflow name e2e-test-presubmit-20180102-204405 is used as part of the volume paths. The workflow name is already made available as the variable {{workflow.name}} so you can choose to use that instead of doing the substitution in ksonnet.

I prefer to do all the substitution in ksonnet; I think this makes things simpler.

Along the lines of suggestion 2, PULL_NUMBER, PULL_PULL_SHA, BUILD_NUMBER appear to be inputs to the workflow so these could be made as workflow level parameters and globally referenced using {{workflow.parameters.XXXX}}.

That won't work because the set of environment variables is unknown. The input to the workflow is a comma separated list of key, value pairs which is turned into environment variables. We use ksonnet to split this and turn into an array of environment variables.

As an example, PULL_NUMBER won't always be set; its only set for presubmit jobs.

Resolved Conflicts: testing/README.md testing/test_deploy.py

jlewi · 2018-01-05T15:00:10Z

@foxish Approved?

k8s-ci-robot · 2018-01-06T01:59:04Z

@jlewi: The following test failed, say /retest to rerun them all:

Test name	Commit	Details	Rerun command
kubeflow-presubmit	`b19ae59`	link	`/test kubeflow-presubmit`

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

…ng errors in run-tests (kubeflow#72)

jlewi added 30 commits December 18, 2017 16:22

Start writing code.

179bfc2

Merge remote-tracking branch 'upstream/master' into testing

76584a8

* Fix lint

165b459

* add Python lint rc file.

Get rid of subparsers because we don't need them yet.

735d2ba

Update the README.

e9d3fca

Start setting up Argo.

34145a0

Start some ksonnet configs for Argo.

8580dfb

Commit before updating the deployment.

d41c639

Fix ksonnet configs for argo; the problem was we were using the v1 im…

7608419

…ages.

Minimal workflow that checks out the source.

1bd205c

Bind a volume.

58d677f

Commit before getting rid of use of Artifact inputs.

57fd682

Workflow correctly checks out the source.

c213fbc

Add __init__.py file so that testing is a module.

e1b748c

Configure access to the cluster.

54d0142

Fix permissions and try to output error.

12db32f

Need to run gcloud to create a kubeconfig file for ks init.

400b708

Fix typo.

82aaf3f

Latest.

d72d0df

Add a sleep to facilitate debugging.

c5b255e

Use service account.

cc60326

Remove sleep.

b2e43da

Add sleep for debug.

b7bf0f9

Use --as with ks

b073ef3

Argo workflow runs successfully!

b693d63

bootstrap.sh is a script for bootstrapping our prow jobs by checking out

6755ede

the appropriate version of the repo and then invoking the test program. This will allow us to pull the bulk of the test program from the repo at the commit that is being tested.

Add tensorflow_k8s as a git submodule so that we can pull in the Pyth…

71d6298

…on scripts needed to run our tests.

Update scripts to work with submodules.

9092a24

Fix local errors.

95399a9

jlewi added 9 commits January 2, 2018 18:33

Add -m

edddff5

Fix output location for test_deploy unit files.

02eab6b

Fix gsutil rsync command.

422b415

Update build number.

057a919

Fix unittest.

bed57ff

Merge remote-tracking branch 'upstream/master' into testing

80fc3e7

Add function to create symlink.

a67a76b

Update params.

f83a9a1

Merge remote-tracking branch 'upstream/master' into argo

b161cce

jlewi changed the title ~~[Wip] Argo workflow to run E2E tests~~ Argo workflow to run E2E tests Jan 3, 2018

Remove create_cluster.sh

ec09e41

jlewi mentioned this pull request Jan 3, 2018

A python script to test deploying Kubeflow #71

Merged

jlewi added 3 commits January 3, 2018 11:08

Add link to troubleshooting guide.

f4c38d4

Merge remote-tracking branch 'upstream/master' into testing

85047a0

Merge branch 'testing' into argo

808968e

jessesuen reviewed Jan 3, 2018

View reviewed changes

jlewi added 2 commits January 4, 2018 22:47

Merge remote-tracking branch 'upstream/master' into argo

7ca0d4f

Merge remote-tracking branch 'upstream/master' into argo

9fa08dc

Resolved Conflicts: testing/README.md testing/test_deploy.py

Merge remote-tracking branch 'upstream/master' into argo

b19ae59

k8s-ci-robot added the size/XXL label Jan 6, 2018

jlewi merged commit 17aa4ce into kubeflow:master Jan 6, 2018

yanniszark pushed a commit to arrikto/kubeflow that referenced this pull request Nov 1, 2019

jupyter_test.go and profiles_test.go are replacing overlays and causi…

053a5f5

…ng errors in run-tests (kubeflow#72)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Argo workflow to run E2E tests #72

Argo workflow to run E2E tests #72

jlewi commented Dec 28, 2017 •

edited

jlewi commented Jan 3, 2018

foxish commented Jan 3, 2018

jlewi commented Jan 3, 2018

jessesuen commented Jan 3, 2018

jessesuen commented Jan 3, 2018 •

edited

jessesuen Jan 3, 2018

jlewi commented Jan 3, 2018

jlewi commented Jan 5, 2018

k8s-ci-robot commented Jan 6, 2018

Argo workflow to run E2E tests #72

Argo workflow to run E2E tests #72

Conversation

jlewi commented Dec 28, 2017 • edited

jlewi commented Jan 3, 2018

foxish commented Jan 3, 2018

jlewi commented Jan 3, 2018

jessesuen commented Jan 3, 2018

jessesuen commented Jan 3, 2018 • edited

jessesuen Jan 3, 2018

Choose a reason for hiding this comment

jlewi commented Jan 3, 2018

jlewi commented Jan 5, 2018

k8s-ci-robot commented Jan 6, 2018

jlewi commented Dec 28, 2017 •

edited

jessesuen commented Jan 3, 2018 •

edited