Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Argo workflow to run E2E tests #72

Merged
merged 53 commits into from Jan 6, 2018
Merged

Argo workflow to run E2E tests #72

merged 53 commits into from Jan 6, 2018

Conversation

jlewi
Copy link
Contributor

@jlewi jlewi commented Dec 28, 2017

  • Create an Argo workflow to run the E2E test for Kubeflow deployment
  • Create a ksonnet app for deploying Argo in our test infrastructure
  • Create a ksonnet component to trigger the E2E workflow.
  • Add tensorflow/k8s as a git submodule because we want to reuse some python scripts in that project to
    write our tests.
  • bootstrap.sh is the entrypoint for our prow jobs
    • It will be used to check out the repo at the commit corresponding to the prow job and then invoke
      a test script in the repo. This ensures that the bulk of our test logic is pulled from the repo at the
      commit being tested.
  • checkout.sh is a script for checking out the source to be used as the first step in our workflows
  • The Argo workflow uses an NFS share to store test data so that we can have multiple steps running
    in parallel and accessing the same files.

…art.

* Log output and test results are written to a local directory.

* In a follow on PR we will add a binary to copy all the outputs from the
  local directory to Gubernator. This way most steps in the test don't
  need to deal with GCS.
* add Python lint rc file.
the appropriate version of the repo and then invoking the test program.

This will allow us to pull the bulk of the test program from the repo
at the commit that is being tested.
@jlewi jlewi changed the title [Wip] Argo workflow to run E2E tests Argo workflow to run E2E tests Jan 3, 2018
@jlewi
Copy link
Contributor Author

jlewi commented Jan 3, 2018

@foxish this is ready for review but #71 should be submitted first.

@foxish
Copy link
Contributor

foxish commented Jan 3, 2018

Maybe someone from the argo team could take a look as well - to spot potential issues with the use of the DAGs.

@jlewi
Copy link
Contributor Author

jlewi commented Jan 3, 2018

@jessesuen any chance you could look at our Argo workflow and provide suggestions? Our workflow is generated using ksonnet from workflows.libsonnet, but I've provided the actual YAML spec below.

Here are some key points regarding our use of Argo

  • I ended up not using GitHub artifact support. Instead I added an explicit step which executes a shell script to check out the code.

    • I did this because GitHub artifact support was insufficient.
    • For example, we needed to fetch refs (to checkout code for PRs) and also initialize sub modules.
    • So given we always needed to run some git commands manually it seemed simpler just to use a custom script.
  • I used a custom script rather than Artifact support to copy relevant files to GCS to support gubernator

    • This seemed simpler then using Artifacts S3 support with minio to support GCS.
apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  name: e2e-test-presubmit-20180102-204405
  namespace: kubeflow-test-infra
spec:
  entrypoint: e2e
  templates:
  - name: e2e
    steps:
    - - name: checkout
        template: checkout
    - - name: test-deploy
        template: test-deploy
      - name: create-started
        template: create-started
      - name: create-pr-symlink
        template: create-pr-symlink
    - - name: create-finished
        template: create-finished
    - - name: copy-artifacts
        template: copy-artifacts
  - container:
      args:
      - /mnt/test-data-volume/e2e-test-presubmit-20180102-204405/src
      command:
      - /usr/local/bin/checkout.sh
      env:
      - name: REPO_OWNER
        value: google
      - name: REPO_NAME
        value: kubeflow
      - name: PULL_NUMBER
        value: "72"
      - name: PULL_PULL_SHA
        value: pr
      - name: BUILD_NUMBER
        value: "101"
      - name: JOB_NAME
        value: kubeflow-presubmit
      image: gcr.io/mlkube-testing/kubeflow-testing
      volumeMounts:
      - mountPath: /mnt/test-data-volume
        name: kubeflow-test-volume
    name: checkout
  - container:
      command:
      - python
      - -m
      - testing.test_deploy
      - --project=mlkube-testing
      - --cluster=kubeflow-testing
      - --zone=us-east1-d
      - --github_token=$(GIT_TOKEN)
      - --test_dir=/mnt/test-data-volume/e2e-test-presubmit-20180102-204405
      - --artifacts_dir=/mnt/test-data-volume/e2e-test-presubmit-20180102-204405/output/artifacts
      env:
      - name: PYTHONPATH
        value: /mnt/test-data-volume/e2e-test-presubmit-20180102-204405/src:/mnt/test-data-volume/e2e-test-presubmit-20180102-204405/src/tensorflow_k8s
      - name: GOOGLE_APPLICATION_CREDENTIALS
        value: /secret/gcp-credentials/key.json
      - name: GIT_TOKEN
        valueFrom:
          secretKeyRef:
            key: github_token
            name: github-token
      - name: REPO_OWNER
        value: google
      - name: REPO_NAME
        value: kubeflow
      - name: PULL_NUMBER
        value: "72"
      - name: PULL_PULL_SHA
        value: pr
      - name: BUILD_NUMBER
        value: "101"
      - name: JOB_NAME
        value: kubeflow-presubmit
      image: gcr.io/mlkube-testing/kubeflow-testing
      volumeMounts:
      - mountPath: /mnt/test-data-volume
        name: kubeflow-test-volume
      - mountPath: /secret/github-token
        name: github-token
      - mountPath: /secret/gcp-credentials
        name: gcp-credentials
    name: test-deploy
  - container:
      command:
      - python
      - -m
      - testing.prow_artifacts
      - --artifacts_dir=/mnt/test-data-volume/e2e-test-presubmit-20180102-204405/output
      - create_started
      env:
      - name: PYTHONPATH
        value: /mnt/test-data-volume/e2e-test-presubmit-20180102-204405/src:/mnt/test-data-volume/e2e-test-presubmit-20180102-204405/src/tensorflow_k8s
      - name: GOOGLE_APPLICATION_CREDENTIALS
        value: /secret/gcp-credentials/key.json
      - name: GIT_TOKEN
        valueFrom:
          secretKeyRef:
            key: github_token
            name: github-token
      - name: REPO_OWNER
        value: google
      - name: REPO_NAME
        value: kubeflow
      - name: PULL_NUMBER
        value: "72"
      - name: PULL_PULL_SHA
        value: pr
      - name: BUILD_NUMBER
        value: "101"
      - name: JOB_NAME
        value: kubeflow-presubmit
      image: gcr.io/mlkube-testing/kubeflow-testing
      volumeMounts:
      - mountPath: /mnt/test-data-volume
        name: kubeflow-test-volume
      - mountPath: /secret/github-token
        name: github-token
      - mountPath: /secret/gcp-credentials
        name: gcp-credentials
    name: create-started
  - container:
      command:
      - python
      - -m
      - testing.prow_artifacts
      - --artifacts_dir=/mnt/test-data-volume/e2e-test-presubmit-20180102-204405/output
      - create_pr_symlink
      - --bucket=mlkube-testing_temp
      env:
      - name: PYTHONPATH
        value: /mnt/test-data-volume/e2e-test-presubmit-20180102-204405/src:/mnt/test-data-volume/e2e-test-presubmit-20180102-204405/src/tensorflow_k8s
      - name: GOOGLE_APPLICATION_CREDENTIALS
        value: /secret/gcp-credentials/key.json
      - name: GIT_TOKEN
        valueFrom:
          secretKeyRef:
            key: github_token
            name: github-token
      - name: REPO_OWNER
        value: google
      - name: REPO_NAME
        value: kubeflow
      - name: PULL_NUMBER
        value: "72"
      - name: PULL_PULL_SHA
        value: pr
      - name: BUILD_NUMBER
        value: "101"
      - name: JOB_NAME
        value: kubeflow-presubmit
      image: gcr.io/mlkube-testing/kubeflow-testing
      volumeMounts:
      - mountPath: /mnt/test-data-volume
        name: kubeflow-test-volume
      - mountPath: /secret/github-token
        name: github-token
      - mountPath: /secret/gcp-credentials
        name: gcp-credentials
    name: create-pr-symlink
  - container:
      command:
      - python
      - -m
      - testing.prow_artifacts
      - --artifacts_dir=/mnt/test-data-volume/e2e-test-presubmit-20180102-204405/output
      - create_finished
      env:
      - name: PYTHONPATH
        value: /mnt/test-data-volume/e2e-test-presubmit-20180102-204405/src:/mnt/test-data-volume/e2e-test-presubmit-20180102-204405/src/tensorflow_k8s
      - name: GOOGLE_APPLICATION_CREDENTIALS
        value: /secret/gcp-credentials/key.json
      - name: GIT_TOKEN
        valueFrom:
          secretKeyRef:
            key: github_token
            name: github-token
      - name: REPO_OWNER
        value: google
      - name: REPO_NAME
        value: kubeflow
      - name: PULL_NUMBER
        value: "72"
      - name: PULL_PULL_SHA
        value: pr
      - name: BUILD_NUMBER
        value: "101"
      - name: JOB_NAME
        value: kubeflow-presubmit
      image: gcr.io/mlkube-testing/kubeflow-testing
      volumeMounts:
      - mountPath: /mnt/test-data-volume
        name: kubeflow-test-volume
      - mountPath: /secret/github-token
        name: github-token
      - mountPath: /secret/gcp-credentials
        name: gcp-credentials
    name: create-finished
  - container:
      command:
      - python
      - -m
      - testing.prow_artifacts
      - --artifacts_dir=/mnt/test-data-volume/e2e-test-presubmit-20180102-204405/output
      - copy_artifacts
      - --bucket=mlkube-testing_temp
      env:
      - name: PYTHONPATH
        value: /mnt/test-data-volume/e2e-test-presubmit-20180102-204405/src:/mnt/test-data-volume/e2e-test-presubmit-20180102-204405/src/tensorflow_k8s
      - name: GOOGLE_APPLICATION_CREDENTIALS
        value: /secret/gcp-credentials/key.json
      - name: GIT_TOKEN
        valueFrom:
          secretKeyRef:
            key: github_token
            name: github-token
      - name: REPO_OWNER
        value: google
      - name: REPO_NAME
        value: kubeflow
      - name: PULL_NUMBER
        value: "72"
      - name: PULL_PULL_SHA
        value: pr
      - name: BUILD_NUMBER
        value: "101"
      - name: JOB_NAME
        value: kubeflow-presubmit
      image: gcr.io/mlkube-testing/kubeflow-testing
      volumeMounts:
      - mountPath: /mnt/test-data-volume
        name: kubeflow-test-volume
      - mountPath: /secret/github-token
        name: github-token
      - mountPath: /secret/gcp-credentials
        name: gcp-credentials
    name: copy-artifacts
  volumes:
  - name: github-token
    secret:
      secretName: github-token
  - name: gcp-credentials
    secret:
      secretName: kubeflow-testing-credentials
  - name: kubeflow-test-volume
    persistentVolumeClaim:
      claimName: kubeflow-testing

@jessesuen
Copy link

Sure lemme take a look

@jessesuen
Copy link

jessesuen commented Jan 3, 2018

Here are my suggestions:

  1. It seems the only difference between the all the steps, is the python command which is executed. If that's the case, the workflow could be simplified to only have three templates. The checkout template, the e2e, and a generic kubeflow-testing template. The python command could be sent in as a input parameter to the kubeflow-testing template.
  2. I noticed that the workflow name e2e-test-presubmit-20180102-204405 is used as part of the volume paths. The workflow name is already made available as the variable {{workflow.name}} so you can choose to use that instead of doing the substitution in ksonnet.
  3. Along the lines of suggestion 2, PULL_NUMBER, PULL_PULL_SHA, BUILD_NUMBER appear to be inputs to the workflow so these could be made as workflow level parameters and globally referenced using {{workflow.parameters.XXXX}}.

Suggestions 2 and 3 need the 2.0.0-alpha3 version. Here is a simplified yaml after incorporating the above suggestions.

apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  name: e2e-test-presubmit-20180102-204405
  namespace: kubeflow-test-infra
spec:
  entrypoint: e2e
  arguments:
    parameters:
    - name: pull-number
      value: "72"
    - name: pull-pull-sha
      value: pr
    - name: build-number
      value: "101"
  templates:
  - name: e2e
    steps:
    - - name: checkout
        template: checkout
    - - name: test-deploy
        template: kubeflow-testing
        arguments:
          parameters:
          - name: cmd
            value: |
              python -m testing.test_deploy --project=mlkube-testing --cluster=kubeflow-testing --zone=us-east1-d 
              --github_token=$(GIT_TOKEN) --test_dir=/mnt/test-data-volume/{{workflow.name}} 
              --artifacts_dir=/mnt/test-data-volume/{{workflow.name}}/output/artifacts
      - name: create-started
        template: kubeflow-testing
        arguments:
          parameters:
          - name: cmd
            value: python -m testing.prow_artifacts --artifacts_dir=/mnt/test-data-volume/{{workflow.name}}/output create_started
      - name: create-pr-symlink
        template: kubeflow-testing
        arguments:
          parameters:
          - name: cmd
            value: python -m testing.prow_artifacts --artifacts_dir=/mnt/test-data-volume/{{workflow.name}}/output create_pr_symlink --bucket=mlkube-testing_temp
    - - name: create-finished
        template: kubeflow-testing
        arguments:
          parameters:
          - name: cmd
            value: python -m testing.prow_artifacts --artifacts_dir=/mnt/test-data-volume/{{workflow.name}}/output create_finished
    - - name: copy-artifacts
        template: kubeflow-testing
        arguments:
          parameters:
          - name: cmd
            value: python -m testing.prow_artifacts --artifacts_dir=/mnt/test-data-volume/{{workflow.name}}/output copy_artifacts --bucket=mlkube-testing_temp
  - container:
      args:
      - /mnt/test-data-volume/{{workflow.name}}/src
      command:
      - /usr/local/bin/checkout.sh
      env:
      - name: REPO_OWNER
        value: google
      - name: REPO_NAME
        value: kubeflow
      - name: PULL_NUMBER
        value: "{{workflow.parameters.pull-number}}"
      - name: PULL_PULL_SHA
        value: "{{workflow.parameters.pull-pull-sha}}"
      - name: BUILD_NUMBER
        value: "{{workflow.parameters.build-number}}"
      - name: JOB_NAME
        value: kubeflow-presubmit
      image: gcr.io/mlkube-testing/kubeflow-testing
      volumeMounts:
      - mountPath: /mnt/test-data-volume
        name: kubeflow-test-volume
    name: checkout
  - inputs:
      parameters:
      - name: cmd
    container:
      command: [sh, -c]
      args: ["{{inputs.parameters.cmd}}"]
      env:
      - name: PYTHONPATH
        value: /mnt/test-data-volume/{{workflow.name}}/src:/mnt/test-data-volume/{{workflow.name}}/src/tensorflow_k8s
      - name: GOOGLE_APPLICATION_CREDENTIALS
        value: /secret/gcp-credentials/key.json
      - name: GIT_TOKEN
        valueFrom:
          secretKeyRef:
            key: github_token
            name: github-token
      - name: REPO_OWNER
        value: google
      - name: REPO_NAME
        value: kubeflow
      - name: PULL_NUMBER
        value: "{{workflow.parameters.pull-number}}"
      - name: PULL_PULL_SHA
        value: "{{workflow.parameters.pull-pull-sha}}"
      - name: BUILD_NUMBER
        value: "{{workflow.parameters.build-number}}"
      - name: JOB_NAME
        value: kubeflow-presubmit
      image: gcr.io/mlkube-testing/kubeflow-testing
      volumeMounts:
      - mountPath: /mnt/test-data-volume
        name: kubeflow-test-volume
      - mountPath: /secret/github-token
        name: github-token
      - mountPath: /secret/gcp-credentials
        name: gcp-credentials
    name: kubeflow-testing

http://127.0.0.1:8001/api/v1/proxy/namespaces/kubeflow-test-infra/services/argo-ui:80/
```

TODO(jlewi): We can probably make the UI publicly available since I don't think it offers any ability to launch workflows.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

argo-ui has a feature and API to exec into running containers (enabled using --enable-web-console install option). This is disabled by default in alpha3 but be aware you would not want this enabled if the UI were to be made public facing.

@jlewi
Copy link
Contributor Author

jlewi commented Jan 3, 2018

Thanks Jesse

It seems the only difference between the all the steps, is the python command which is executed. If that's the case, the workflow could be simplified to only have three templates. The checkout template, the e2e, and a generic kubeflow-testing template. The python command could be sent in as a input parameter to the kubeflow-testing template.

Since we're using ksonnet to avoid unnecessary code duplication; I think there's less of an incentive to use Argo's templating feature. It looks like using Argo templates requires us to wrap the command in a shell because we can only do string substitution not list substitution.

So I think a prefer duplicating the templates.

I noticed that the workflow name e2e-test-presubmit-20180102-204405 is used as part of the volume paths. The workflow name is already made available as the variable {{workflow.name}} so you can choose to use that instead of doing the substitution in ksonnet.

I prefer to do all the substitution in ksonnet; I think this makes things simpler.

Along the lines of suggestion 2, PULL_NUMBER, PULL_PULL_SHA, BUILD_NUMBER appear to be inputs to the workflow so these could be made as workflow level parameters and globally referenced using {{workflow.parameters.XXXX}}.

That won't work because the set of environment variables is unknown. The input to the workflow is a comma separated list of key, value pairs which is turned into environment variables. We use ksonnet to split this and turn into an array of environment variables.

As an example, PULL_NUMBER won't always be set; its only set for presubmit jobs.

@jlewi
Copy link
Contributor Author

jlewi commented Jan 5, 2018

@foxish Approved?

@jlewi jlewi merged commit 17aa4ce into kubeflow:master Jan 6, 2018
@k8s-ci-robot
Copy link
Contributor

@jlewi: The following test failed, say /retest to rerun them all:

Test name Commit Details Rerun command
kubeflow-presubmit b19ae59 link /test kubeflow-presubmit

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

yanniszark pushed a commit to arrikto/kubeflow that referenced this pull request Nov 1, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants