Implement deployments for system jobs #3644

preetapan · 2017-12-11T18:22:23Z

As of 0.7.0 deployments - update stanzas with auto_revert, canary, etc - are only implemented for service type jobs. We need to implement auto reverts and other reconciliation features for system jobs. There should be a way to stop a rolling upgrade of a bad system job across the fleet.

The text was updated successfully, but these errors were encountered:

mihasya · 2017-12-12T17:26:21Z

/me clicks on issue linked by co-worker, reads issue, sees @schmichael in the activity feed, strokes beard while nodding with approval

👋

alxark · 2018-02-15T13:48:47Z

I'm so bored with non-transparent updates for system jobs. Really needs this feature.

jrasell · 2018-06-08T12:13:56Z

@preetapan @schmichael @dadgar this is something I really want to see and am happy to have a crack at it unless its already being worked on internally? If you're not, any thoughts, ideas or tips would be greatly appreciated.

preetapan · 2018-06-12T19:47:41Z

@jrasell We want to make several improvements to the system scheduler including implementing deployments, as well as bringing in other improvements that are in the reconciler. This is a fairly large scoped project and implementing this will involve a set of non trivial changes. We are currently targeting this for a future release, likely after Nomad 0.9.0

jpasichnyk · 2019-01-19T02:32:00Z

@preetapan any update on a timeline for this? I specifically am looking for canary support for system jobs.

mgeggie · 2019-07-11T23:54:39Z

@preetapan we've just launched into the world of Nomad, and found this issue when deploying our first system-level job to our cluster. Any update on when we can expect healthchecking for system jobs?

taer · 2019-11-22T14:48:53Z

@preetapan I see a new 0.10 nomad was recently released. Any updates on this feature?

burdandrei · 2020-02-23T10:15:32Z

@jrasell will you take this in your hands now? ;)

dpn · 2020-04-30T14:10:10Z

We'd also love to see this functionality, so +1 from our end! 👍

xsikor · 2020-06-15T22:27:49Z

Any updates about status for this functionality?
Or maybe it's already done but only for enterprise version?

schmichael · 2020-09-09T18:08:18Z

Or maybe it's already done but only for enterprise version?

Nope, it will be OSS!

This is roadmapped. As everyone can probably guess there's a lot going on in everyone's lives, so a timeline has been very tricky. We're very excited to see the initial PR #8841 from @dubadub and hope to have someone dig into it with them. There are some very tricky aspects to deployments for system jobs that we need to be right to maximize usability and minimize complexity.

For example if we spun up canaries concurrently with the stable version's allocation on the same node, there would likely be resource conflicts (static ports, host volumes) that block placement or prevent proper functioning. Therefore it seems like system deployments should diverge from service deployments in that canaries should act as replacements instead of additional capacity.

To further complicate matters: as @dubadub discovered in #8841 the code in question could use some refactoring. The layout of Nomad's scheduler has basically never changed, so as you can imagine there are some opportunities for cleanup.

So please keep the use cases coming! The more detailed you can be about the desired behaviors the better! I know it seems like we're silent sometimes, but we definitely parse and discuss and rehash every word of Github comments to ensure we're meeting the desired use case.

josegonzalez · 2021-05-20T08:10:26Z

This is something we'd love to have. We built a system on top of Nomad that allows our developers to know how far along a job has been rolled out. At the moment, we can sort of fudge it by parsing the annotations in a nomad plan and stringing that along into our allocation tailing process, though this makes the code much more complicated than it would be if we just used the same method as how we rollout service jobs.

weargoggles · 2021-10-25T10:40:00Z

I'd love to see this arrive. I would be completely satisfied with the design as I understand it in #8841, which allows a fixed number of allocations to be replaced at one time.

johnnyplaydrums · 2022-03-02T22:53:44Z

Clarifying a use case: de-risking system job updates with the auto_revert flag. Currently, if an update to a system job is deployed, and the new allocations fail due to health checks or task state, that system job will be down across the cluster. Allowing the system job to be auto reverted would make rolling out updates to system jobs much less stressful 😅 I think this use case was mentioned in the initial comment but just wanted to spell it out a bit more. Thanks y'all!

m1keil · 2022-06-25T09:27:54Z

Even without auto_revert, one of the big problems is that there is no feedback from the nomad CLI about deploy issues as in the case of regular service jobs. The monitor returns 0 no matter what. This confuses both CI/CD and the user because it makes it seem as if the job was successfully started. A word of warning would be nice to make it clear that the user shouldn't expect the same guarantees as with service job.

axsuul · 2022-08-05T06:28:32Z

Supporting this would be valuable for us since we use system jobs as a way to scale up our app server and background worker Nomad jobs. That is, all we need to do is launch new nodes and Nomad automatically schedules new allocations to run on these nodes. But with system jobs, we have to deal with the drawback of not being able to use canaries, etc. even though these aren't typical system jobs.

robloxrob · 2022-08-05T15:19:12Z

Supporting this would be valuable for us since we use system jobs as a way to scale up our app server and background worker Nomad jobs. That is, all we need to do is launch new nodes and Nomad automatically schedules new allocations to run on these nodes. But with system jobs, we have to deal with the drawback of not being able to use canaries, etc. even though these aren't typical system jobs.

Great point here. This is the same case for how we use system jobs.

komapa · 2022-11-22T23:49:27Z

Another strong +1 for this feature. Everything we face was already mentioned by other comments so I will not repeat those great points, just wanted to bump this issue once again. Thank you!

preetapan added type/enhancement theme/scheduling labels Dec 11, 2017

preetapan mentioned this issue Dec 11, 2017

Job level update stanza has no effect #3638

Closed

schmichael changed the title ~~Auto reverts and other reconciliation features for system jobs~~ Implement deployments for system jobs Dec 11, 2017

tgross added the stage/accepted Confirmed, and intend to work on. No timeline committment though. label Aug 24, 2020

dubadub mentioned this issue Sep 8, 2020

[WIP] Adds support of deployments into system jobs #8841

Open

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement deployments for system jobs #3644

Implement deployments for system jobs #3644

preetapan commented Dec 11, 2017 •

edited by schmichael

Loading

mihasya commented Dec 12, 2017

alxark commented Feb 15, 2018

jrasell commented Jun 8, 2018

preetapan commented Jun 12, 2018

jpasichnyk commented Jan 19, 2019 •

edited

Loading

mgeggie commented Jul 11, 2019

taer commented Nov 22, 2019

burdandrei commented Feb 23, 2020

dpn commented Apr 30, 2020

xsikor commented Jun 15, 2020

schmichael commented Sep 9, 2020

josegonzalez commented May 20, 2021

weargoggles commented Oct 25, 2021

johnnyplaydrums commented Mar 2, 2022 •

edited

Loading

m1keil commented Jun 25, 2022 •

edited

Loading

axsuul commented Aug 5, 2022 •

edited

Loading

robloxrob commented Aug 5, 2022

komapa commented Nov 22, 2022

Implement deployments for system jobs #3644

Implement deployments for system jobs #3644

Comments

preetapan commented Dec 11, 2017 • edited by schmichael Loading

mihasya commented Dec 12, 2017

alxark commented Feb 15, 2018

jrasell commented Jun 8, 2018

preetapan commented Jun 12, 2018

jpasichnyk commented Jan 19, 2019 • edited Loading

mgeggie commented Jul 11, 2019

taer commented Nov 22, 2019

burdandrei commented Feb 23, 2020

dpn commented Apr 30, 2020

xsikor commented Jun 15, 2020

schmichael commented Sep 9, 2020

josegonzalez commented May 20, 2021

weargoggles commented Oct 25, 2021

johnnyplaydrums commented Mar 2, 2022 • edited Loading

m1keil commented Jun 25, 2022 • edited Loading

axsuul commented Aug 5, 2022 • edited Loading

robloxrob commented Aug 5, 2022

komapa commented Nov 22, 2022

preetapan commented Dec 11, 2017 •

edited by schmichael

Loading

jpasichnyk commented Jan 19, 2019 •

edited

Loading

johnnyplaydrums commented Mar 2, 2022 •

edited

Loading

m1keil commented Jun 25, 2022 •

edited

Loading

axsuul commented Aug 5, 2022 •

edited

Loading