Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Estimator e2etest #814

Merged
merged 2 commits into from
Sep 5, 2018
Merged

Conversation

richardsliu
Copy link
Contributor

@richardsliu richardsliu commented Aug 31, 2018

Related to #762

This change is Reviewable

@richardsliu
Copy link
Contributor Author

/uncc @ankushagarwal
/uncc @ChanYiLin

@coveralls
Copy link

coveralls commented Aug 31, 2018

Coverage Status

Coverage remained the same at 56.807% when pulling 6f6f2fe on richardsliu:estimator_test into a71fac1 on kubeflow:master.

author Richard Liu <ricliu@google.com> 1535586798 -0700
committer Richard Liu <ricliu@google.com> 1536118792 -0700

Modify dockerfile

Add runconfig to server

Remove test env var

Remove test env var

Add workflow

Dump response as json

Test sending runconfig rpc

Fix errors

Fix param parsing

Fix errors

Verify before waiting for job to end

Fixes

Fix lint error

Fix docker image

Verify runconfig

Fix pylint

Fix error message

Make test fail

Fix docker image version

Fix test

Fix a few things
@richardsliu richardsliu changed the title WIP: Estimator e2etest Estimator e2etest Sep 5, 2018
@richardsliu
Copy link
Contributor Author

/assign @jlewi

Copy link
Member

@gaocegege gaocegege left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/assign @jlewi

@jlewi
Copy link
Contributor

jlewi commented Sep 5, 2018

nit: @richardsliu please reference issues in the PR description e.g. #762

terminate_replica(masterHost, namespace, full_target)

# TODO(richardsliu):
# There are lots of verifications in this file, consider refactoring them.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes; I would suggest creating a separate test driver for this test e.g. like
https://github.com/kubeflow/tf-operator/blob/master/py/test_invalid_job.py

But its fine to do this later.

containers: [
{
name: "tensorflow",
image: "gcr.io/kubeflow-images-staging/tf-operator-test-server:v20180904-7d89548b",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Make the image a parameter so you can easily update the image of the test server.

@jlewi
Copy link
Contributor

jlewi commented Sep 5, 2018

There are 3 different cases we want to test.

  1. There is a chief replica which is the chief
  2. There is no chief replica in which case we use worker 0 as the chief
  3. There is a replica named master (for older versions of TF which didn't use chief) (see tensorflow: Support old versions of estimator #809).

For cases 1 and 2 we already have
master_is_chief_v1alpha2.jsonnet
worker0_is_chief_v1alpha2.jsonnet
in https://github.com/kubeflow/tf-operator/tree/master/test/workflows/components

So maybe we could just add the TFConfig verification to those jobs?

For 3 we could add a similar workflow.

@richardsliu I know you're headed OOO so if you can't address the above before you do just go ahead and submit this PR (cancel the hold) as is. This will make it easier for someone else to make the additional changes in a follow on pr.

/lgtm
/approve
/hold

@k8s-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jlewi

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@richardsliu
Copy link
Contributor Author

Fixes #762

@richardsliu
Copy link
Contributor Author

@jlewi Thanks. These are good suggestions but require a bit of refactoring. I'll address these when I come back.

/hold cancel

@richardsliu richardsliu merged commit 6af5924 into kubeflow:master Sep 5, 2018
@richardsliu richardsliu deleted the estimator_test branch September 5, 2018 17:39
@k8s-ci-robot
Copy link

@richardsliu: The following test failed, say /retest to rerun them all:

Test name Commit Details Rerun command
kubeflow-tf-operator-presubmit 6f6f2fe link /test kubeflow-tf-operator-presubmit

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants