-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement deployments for system jobs #3644
Comments
/me clicks on issue linked by co-worker, reads issue, sees @schmichael in the activity feed, strokes beard while nodding with approval 👋 |
I'm so bored with non-transparent updates for system jobs. Really needs this feature. |
@preetapan @schmichael @dadgar this is something I really want to see and am happy to have a crack at it unless its already being worked on internally? If you're not, any thoughts, ideas or tips would be greatly appreciated. |
@jrasell We want to make several improvements to the system scheduler including implementing deployments, as well as bringing in other improvements that are in the reconciler. This is a fairly large scoped project and implementing this will involve a set of non trivial changes. We are currently targeting this for a future release, likely after Nomad 0.9.0 |
@preetapan any update on a timeline for this? I specifically am looking for canary support for system jobs. |
@preetapan we've just launched into the world of Nomad, and found this issue when deploying our first system-level job to our cluster. Any update on when we can expect healthchecking for system jobs? |
@preetapan I see a new 0.10 nomad was recently released. Any updates on this feature? |
@jrasell will you take this in your hands now? ;) |
We'd also love to see this functionality, so +1 from our end! 👍 |
Any updates about status for this functionality? |
Nope, it will be OSS! This is roadmapped. As everyone can probably guess there's a lot going on in everyone's lives, so a timeline has been very tricky. We're very excited to see the initial PR #8841 from @dubadub and hope to have someone dig into it with them. There are some very tricky aspects to deployments for system jobs that we need to be right to maximize usability and minimize complexity. For example if we spun up canaries concurrently with the stable version's allocation on the same node, there would likely be resource conflicts (static ports, host volumes) that block placement or prevent proper functioning. Therefore it seems like system deployments should diverge from service deployments in that canaries should act as replacements instead of additional capacity. To further complicate matters: as @dubadub discovered in #8841 the code in question could use some refactoring. The layout of Nomad's scheduler has basically never changed, so as you can imagine there are some opportunities for cleanup. So please keep the use cases coming! The more detailed you can be about the desired behaviors the better! I know it seems like we're silent sometimes, but we definitely parse and discuss and rehash every word of Github comments to ensure we're meeting the desired use case. |
This is something we'd love to have. We built a system on top of Nomad that allows our developers to know how far along a job has been rolled out. At the moment, we can sort of fudge it by parsing the annotations in a nomad plan and stringing that along into our allocation tailing process, though this makes the code much more complicated than it would be if we just used the same method as how we rollout |
I'd love to see this arrive. I would be completely satisfied with the design as I understand it in #8841, which allows a fixed number of allocations to be replaced at one time. |
Clarifying a use case: de-risking system job updates with the |
Even without |
Supporting this would be valuable for us since we use system jobs as a way to scale up our app server and background worker Nomad jobs. That is, all we need to do is launch new nodes and Nomad automatically schedules new allocations to run on these nodes. But with system jobs, we have to deal with the drawback of not being able to use canaries, etc. even though these aren't typical system jobs. |
Great point here. This is the same case for how we use system jobs. |
Another strong +1 for this feature. Everything we face was already mentioned by other comments so I will not repeat those great points, just wanted to bump this issue once again. Thank you! |
As of 0.7.0 deployments - update stanzas with
auto_revert
,canary
, etc - are only implemented forservice
type jobs. We need to implement auto reverts and other reconciliation features for system jobs. There should be a way to stop a rolling upgrade of a bad system job across the fleet.The text was updated successfully, but these errors were encountered: