Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Re-enable click_deploy_test #2364

Closed
lluunn opened this issue Feb 1, 2019 · 11 comments
Closed

Re-enable click_deploy_test #2364

lluunn opened this issue Feb 1, 2019 · 11 comments

Comments

@lluunn
Copy link
Contributor

lluunn commented Feb 1, 2019

https://github.com/kubeflow/kubeflow/blob/master/prow_config.yaml#L6

I think it's not being run:
https://k8s-testgrid.appspot.com/sig-big-data#kubeflow-periodic-master

@kunmingg mentioned the cluster might need more quota.

@lluunn lluunn changed the title Reenable click_deploy_test Re-enable click_deploy_test Feb 1, 2019
@kunmingg
Copy link
Contributor

kunmingg commented Feb 2, 2019

An easy approach is to make e2e test clean up before each run, we can reuse what prober test is doing today:
https://github.com/kubeflow/kubeflow/blob/master/testing/test_deploy_app.py#L363

@lluunn
Copy link
Contributor Author

lluunn commented Feb 2, 2019

It looks like it should be running in kubeflow-ci/kubeflow-testing cluster:
https://github.com/kubeflow/kubeflow/blob/master/testing/workflows/components/click_deploy_test.jsonnet#L192

@kunmingg If it's quota problem, shouldn't the test still be run, and then failed?

@kunmingg
Copy link
Contributor

kunmingg commented Feb 2, 2019

Let's setup a dedicated cluster for it.
Will double check periodic test config, maybe some PRs break the periodic test config

@lluunn
Copy link
Contributor Author

lluunn commented Feb 4, 2019

From this run: https://gubernator.k8s.io/build/kubernetes-jenkins/logs/kubeflow-periodic-master/719

screen shot 2019-02-04 at 14 41 35

The log shows: the container main in pod XXX is not available: Bad Request

@lluunn
Copy link
Contributor Author

lluunn commented Feb 4, 2019

manually launched the workflow. Saw:

CreateContainerConfigError: secrets "kubeflow-oauth" not found

@lluunn
Copy link
Contributor Author

lluunn commented Feb 4, 2019

The plan is to deploy to a new project (kubflow_ci_bootstrap or something under kubeflow.org) so that quotas/resources are easier to cleanup.

@lluunn
Copy link
Contributor Author

lluunn commented Feb 5, 2019

Created project: kubeflow-ci-deployment

@lluunn
Copy link
Contributor Author

lluunn commented Feb 7, 2019

screen shot 2019-02-07 at 10 42 28

Still failing:

go: finding github.com/prometheus/client_model v0.0.0-20180712105110-5c3871d89910
go: finding google.golang.org/genproto v0.0.0-20180831171423-11092d34479b
go: finding go.opencensus.io v0.18.1-0.20181204023538-aab39bd6a98b
go: finding github.com/beorn7/perks v0.0.0-20180321164747-3a771d992973
go: error loading module requirements
+ result=1
+ set -e
+ [[ 1 -eq 0 ]]
+ echo Command failed: /mnt/test-data-volume/kubeflow-periodic-master-deployapp-736-7f63/src/github.com/kubeflow/kubeflow/bootstrap
/build_image.sh /mnt/test-data-volume/kubeflow-periodic-master-deployapp-736-7f63/src/github.com/kubeflow/kubeflow/bootstrap/Docker
file gcr.io/kubeflow-ci/bootstrapper kubeflow-periodic-master-deployapp-736-7f63
Command failed: /mnt/test-data-volume/kubeflow-periodic-master-deployapp-736-7f63/src/github.com/kubeflow/kubeflow/bootstrap/build_
image.sh /mnt/test-data-volume/kubeflow-periodic-master-deployapp-736-7f63/src/github.com/kubeflow/kubeflow/bootstrap/Dockerfile gc
r.io/kubeflow-ci/bootstrapper kubeflow-periodic-master-deployapp-736-7f63
+ echo 'command didn'\''t succeed'
+ exit 1

The cleanup also failed:

+ kubectl delete namespace kubeflow-periodic-master-deployapp-736-7f63
Error from server (NotFound): namespaces "kubeflow-periodic-master-deployapp-736-7f63" not found

@lluunn
Copy link
Contributor Author

lluunn commented Feb 7, 2019

Verify the error is the same when building locally. The problem is one dependency needs hg to clone

@lluunn
Copy link
Contributor Author

lluunn commented Feb 7, 2019

screen shot 2019-02-07 at 15 28 24

Now the error is this line:
https://github.com/kubeflow/kubeflow/blob/master/bootstrap/cmd/bootstrap/app/ksServer.go#L1176

While it's waiting for the deployment manager, got an error read: connection reset by peer
@kunmingg have you seen this before?

The deployment succeeded eventually
screen shot 2019-02-07 at 15 30 43

@lluunn
Copy link
Contributor Author

lluunn commented Feb 11, 2019

https://gubernator.k8s.io/build/kubernetes-jenkins/logs/kubeflow-periodic-master/756
it's passing now, but now shown on testgrid,
will fix in kubeflow/testing#301

@lluunn lluunn closed this as completed Feb 11, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants