Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow negative --idle-timeout on cml-runner #700

Merged
merged 2 commits into from
Sep 7, 2021

Conversation

0x2b3bfa0
Copy link
Member

@0x2b3bfa0 0x2b3bfa0 commented Aug 20, 2021

See iterative/cml.dev#91 (comment); using 0 to convey disabled is not especially friendly, numerically speaking.

@0x2b3bfa0 0x2b3bfa0 added the ui/ux User interface/experience label Aug 20, 2021
@0x2b3bfa0 0x2b3bfa0 self-assigned this Aug 20, 2021
@0x2b3bfa0 0x2b3bfa0 temporarily deployed to internal August 20, 2021 19:35 Inactive
@0x2b3bfa0 0x2b3bfa0 marked this pull request as ready for review August 20, 2021 19:39
bin/cml-runner.js Outdated Show resolved Hide resolved
Co-authored-by: Casper da Costa-Luis <casper.dcl@physics.org>
@0x2b3bfa0 0x2b3bfa0 temporarily deployed to internal August 20, 2021 19:58 Inactive
@casperdcl
Copy link
Contributor

should we also change RUNNER_IDLE_TIMEOUT = 5 * 60 to actually be -1 if we're not in GH/GL/BB?

@0x2b3bfa0
Copy link
Member Author

0x2b3bfa0 commented Aug 20, 2021

@casperdcl, do you intend to automate the default value to detect if it has been launched by humans or if it's just an ad-hoc runner launched by a previous step in a workflow? It looks like a good idea, but having too many different defaults could also confuse users.

@casperdcl
Copy link
Contributor

casperdcl commented Aug 20, 2021

We already do custom checks for e.g. cml.driver == 'github' to retry before 72h. I think the point of CML is precisely this sort of convenient default config.

"Seconds to wait for jobs before shutting down. Default: 300 in CI or -1 (disabled) otherwise."

@0x2b3bfa0
Copy link
Member Author

0x2b3bfa0 commented Aug 20, 2021

Yes, CML already has a fair amount of moving parts to infer whether it's running on a CI/CD platform or not, which platform it's running on and quite a few other things that try to offload as much choices as possible from users.

This could be nice if it's what users expect, but promising a default value when running cml-runner --help locally and doing something completely different on the cloud could be less boring than what we would desire.

@0x2b3bfa0
Copy link
Member Author

0x2b3bfa0 commented Aug 20, 2021

It wouldn't be the first time we take a choice based on the environment, but altering a default value still is a bit risky, even if we document it.

@0x2b3bfa0
Copy link
Member Author

I'm inclined to think that the cloud use case is the prevalent one and the few users that want to use cml-runner on-premises can afford typing a few characters more.

Is that extra option on the documentation verbose enough to be replaced by "smart" code? iterative/cml.dev#93 (comment)

Copy link
Contributor

@DavidGOrtega DavidGOrtega left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 lgtm

@casperdcl
Copy link
Contributor

casperdcl commented Sep 1, 2021

@DavidGOrtega what about #700 (comment) / #700 (comment)?

@DavidGOrtega
Copy link
Contributor

@casperdcl I think is a nice idea butI wonder how many breaks introduces. As I stated I launch self hosted GPU runners (2 titans) and expect to shutdown the machine after no more jobs. Its not a big deal, but that makes me think that the change is not really needed since:

  • it behaves the same in cloud or other env
  • maybe we are introducing a riddle to some users

@casperdcl casperdcl merged commit d8de143 into master Sep 7, 2021
@casperdcl casperdcl deleted the allow-negative-idle-timeout branch September 7, 2021 17:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ui/ux User interface/experience
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants