Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make pending timeout customizable #1268

Merged
merged 2 commits into from May 5, 2019

Conversation

cheyang
Copy link
Contributor

@cheyang cheyang commented May 1, 2019

This change is Reviewable

@@ -29,7 +29,7 @@ def estimator_op(name, image, command,
evaluator=False, evaluator_cpu_limit=0, evaluator_memory_limit=0,
env=[], data=[], sync_source=None,
metrics=['Train-accuracy:PERCENTAGE'],
arena_image='cheyang/arena_launcher:v0.5',
arena_image='cheyang/arena_launcher:v0.6',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should these images be moved to a more neutral docker registry name?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with your suggestions. I'd like to move it to a neutral docker registry name. I will keep it in mind and try to find. Thanks.

@@ -62,7 +62,7 @@ def parameter_servers_op(name, image, command, env, data, sync_source, annotatio
tensorboard,
worker_port, ps_port,
metrics=['Train-accuracy:PERCENTAGE'],
arena_image='cheyang/arena_launcher:v0.5',
arena_image='cheyang/arena_launcher:v0.6',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above

@@ -250,6 +250,9 @@ def main(argv=None):
parser.add_argument('--timeout-hours', type=int,
default=200,
help='Time in hours to wait for the Job submitted by arena to complete')
parser.add_argument('--pending-timeout-minutes', type=int,
default=360,
help='Time in hours to wait for the Job submitted by arena from pending to running')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The var says --pending-timeout-minutes, whereas the descriptions says hours?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for spotting it! Fix it.

@animeshsingh
Copy link
Contributor

Thanks for the changes.
/lgtm
/approve

@cheyang
Copy link
Contributor Author

cheyang commented May 2, 2019

/assign @hongye-sun

@@ -249,7 +249,10 @@ def main(argv=None):
parser.add_argument('--tensorboard-image', type=str, default='tensorflow/tensorflow:1.12.0')
parser.add_argument('--timeout-hours', type=int,
default=200,
help='Time in hours to wait for the Job submitted by arena to complete')
help='Time in minutes to wait for the Job submitted by arena to complete')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be hours, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, because training always takes more than 1 hour.

@hongye-sun
Copy link
Contributor

/approve
/hold

Please unhold the PR when you feel it's ready to merge. Thanks.

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: animeshsingh, hongye-sun

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

1 similar comment
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: animeshsingh, hongye-sun

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@cheyang
Copy link
Contributor Author

cheyang commented May 5, 2019

/unhold

@cheyang
Copy link
Contributor Author

cheyang commented May 5, 2019

/hold cancel

@k8s-ci-robot k8s-ci-robot merged commit 36d31fe into kubeflow:master May 5, 2019
hamedhsn pushed a commit to hamedhsn/pipelines that referenced this pull request May 5, 2019
* make pending timeout customizable

* fix the description of arg
magdalenakuhn17 pushed a commit to magdalenakuhn17/pipelines that referenced this pull request Oct 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants