Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide for dependencies between tasks in a group #419

Open
mfischer-zd opened this issue Nov 15, 2015 · 61 comments
Open

Provide for dependencies between tasks in a group #419

mfischer-zd opened this issue Nov 15, 2015 · 61 comments
Labels

Comments

@mfischer-zd
Copy link

@mfischer-zd mfischer-zd commented Nov 15, 2015

Tasks in a group sometimes need to be ordered to start up correctly.

For example, to support the Ambassador pattern, proxy containers (P[n]) used for outbound request routing by a dependent application may be started only after the dependent application (A) is started. This is because Docker needs to know the name of A to configure shared-container networking when launching P[n].

In the first approximation of the solution, ordering can be simple, e.g., by having the task list in a group be an array.

@ChrisHines
Copy link
Contributor

@ChrisHines ChrisHines commented Nov 16, 2015

+1

Batch jobs need task dependencies to manage data-flow between tasks.

I would expand this to allow dependencies between tasks in a job rather than just a task group. Data flow is not constrained to a single node.

@gsuyashs
Copy link

@gsuyashs gsuyashs commented Nov 30, 2015

Any tentative date to add this enhancement feature in Nomad?

@dadgar
Copy link
Contributor

@dadgar dadgar commented Nov 30, 2015

@suyash1983: we don't have any dates yet but it will most likely be something that comes in 0.4

@blalor
Copy link
Contributor

@blalor blalor commented Jan 15, 2016

This is really going to be core to a number of use cases, like allowing EBS volumes to be mounted on a host before the application starts.

@c4milo
Copy link
Contributor

@c4milo c4milo commented Mar 4, 2016

or pulling down certificates and secrets from Vault before starting a given service.

@vrenjith
Copy link
Contributor

@vrenjith vrenjith commented Mar 7, 2016

+1
This will be a great addition for large enterprise cluster deployments

@steve-jansen
Copy link
Contributor

@steve-jansen steve-jansen commented Mar 16, 2016

+1

Running consul-template alongside an app...

@dadgar
Copy link
Contributor

@dadgar dadgar commented Mar 16, 2016

@steve-jansen You can run consul-template currently. We do this in our Nomad deployment. Essentially you have a script that runs consul-template which produces your config, and then runs your binary

@steve-jansen
Copy link
Contributor

@steve-jansen steve-jansen commented Mar 16, 2016

@dadgar nice, thanks for sharing. It's always great to hear how a company uses their own products. I imagine consul-template creates the env vars in the job config...

One catch for us, we aspire to have consul-template rewrite the config in response to changes in a Consul k/v, or more importantly, when Vault rotates a secret. We're trying to have consul-template signal the co-scheduled task to reload its config.

Indeed, my need for task dependencies is very narrow. In a perfect world for me, Nomad would have integration with Consul K/V and/or Vault. My need is config that updates dynamically for tasks. That would eliminate my need for task dependencies.

@dadgar
Copy link
Contributor

@dadgar dadgar commented Mar 16, 2016

@steve-jansen consul-template can supervise the process and restart/signal it whenever the config changes. @sethvargo to verify

@sethvargo
Copy link
Contributor

@sethvargo sethvargo commented Mar 16, 2016

Hi @steve-jansen

Vault currently does not support blocking queries, which is what Consul (and CT) use to give you that "real-time" trigger when something changes. CT will renew secrets at lease_duration/2.0, but it's not currently possible to trigger a change in Vault and have that immediately notify another process. There is an open issue in Vault for blocking queries. Please note this was an intentional design decision in Vault for performance reasons, and not a bug in CT or Vault.

CT will start the process, but it is not a supervisor or monitor. CT has the expectation that the given command will return within 30s, typically by delegating to some supervisor. If you're running in pure Docker, CT can optionally cleanup PIDs and act like PID 1, but it doesn't monitor the process itself (which is best monitored by the scheduler anyway).

@OferE
Copy link

@OferE OferE commented Sep 18, 2016

+1 - any updates on this?

@nugend
Copy link

@nugend nugend commented Oct 10, 2016

I'd also like to see this land. I'm evaluating nomad for service orchestration and it would be very nice to be able to explicitly express dependencies.

Alternatively, or as a stop gap, maybe a recipe for doing this with the existing APIs could be written up (#1065 mentions this is possible)

@mbravorus
Copy link

@mbravorus mbravorus commented Dec 18, 2016

Currently, the use cases which require dependencies and cron-like control tend to push me towards airbnb's Chronos which implies Mesos. I would be delighted if I could just use Nomad.

Recurring jobs are possible (starting from #540, ultimately described in https://www.nomadproject.io/docs/job-specification/periodic.html), but in a very common scenario where periodic tasks need interlocking dependency control, there is no way to do it with Nomad currently (or I didn't manage to find it)

@nugend
Copy link

@nugend nugend commented Dec 20, 2016

Given that this doesn't seem to be a high priority, could someone from the project briefly explain the suggestion about how to achieve this with existing APIs as mentioned in #1065?

@dadgar
Copy link
Contributor

@dadgar dadgar commented Dec 20, 2016

@nugend You would essentially need to put a wrapper around the task that uses the allocation endpoint to determine if the task it should wait for has finished successfully and then start itself. It is non-trivial amount of work though

@RobertAtomic
Copy link

@RobertAtomic RobertAtomic commented Sep 27, 2017

+1 for DAG support... It's quite important for my use case.

@tduffield
Copy link

@tduffield tduffield commented Sep 28, 2017

I too would love to see this.

@SunSparc
Copy link

@SunSparc SunSparc commented Oct 21, 2017

+1 for task sequence/dependency

@nugend
Copy link

@nugend nugend commented Oct 23, 2017

For anyone really pining for this, there is a slightly easier method provided you are registering the upstream tasks as services: use a template stanza to create a run script and make that the command of your task. Then you can use a conditional dependent on the service being available to either launch the actual process or sleep forever.

Works pretty well, though it does require a bit of scripting (and I’m not sure how to do it in Windows)

@vrenjith
Copy link
Contributor

@vrenjith vrenjith commented Nov 4, 2017

@abohne
Copy link

@abohne abohne commented Dec 12, 2017

Is there any update on where this might fall on the roadmap?

@dpisklov
Copy link

@dpisklov dpisklov commented Jun 3, 2019

@cgbaker hi Chris, would you be able to provide a (rough) date for 0.10?

@cgbaker
Copy link
Member

@cgbaker cgbaker commented Jun 3, 2019

@dpisklov , we are still trying to figure out which features at going into the first 0.10.0 release and which into later point releases. feel free to keep checking back here!

@shantanugadgil
Copy link
Contributor

@shantanugadgil shantanugadgil commented Jun 7, 2019

Please consider prioritizing this for a more recent release than 0.10.0 🏆

Task dependency is something that many folks (including me) could/would benefit from in designing application deployment workflows without getting into hacks like writing sentinel files for subsequent tasks.

Thanks and Regards,
Shantanu Gadgil

@obikay200
Copy link

@obikay200 obikay200 commented Jun 13, 2019

Please can this be prioritised asap as we are trying to convince clients to move over to nomad and away from jenkins pipelines. they are not building but executing tasks! so a sequential ability would improve things 10 fold.

@kainoaseto
Copy link

@kainoaseto kainoaseto commented Jun 28, 2019

This would be really great to have to solve deployment issues where there are implicit dependencies between tasks, such as container networks, that nomad will not prioritize for. Along with all the other use-cases mentioned here, this would be a very welcomed addition to Nomad.

If this can be prioritized even ahead of 0.10.0 that would be fantastic but at least making the 0.10.0 release would be extremely welcomed!

@manish-panwar
Copy link

@manish-panwar manish-panwar commented Aug 7, 2019

Are we ever going to implement this feature - 4 years is long time. I think this is an important feature. We are evaluating Nomad as our enterprise orchestrator - but now I am thinking if we are making the right decision to use Nomad.

@kcwong-verseon
Copy link

@kcwong-verseon kcwong-verseon commented Aug 7, 2019

This is not an easy matter. I'd rather the HashiCorp folks take the time to think this through. Since Nomad has job types, one can see task dependency having different implications for different job types. For service jobs, the simplest approach would forego task dependency and just let restarts sort things out. However, in the use-case as stated in this issue, things like namespace dependence cannot be addressed via restart. Other use cases that require dependency management include setup and teardown for a task, which doesn't really work well with having a job-type of service (a more viable solution may be pre- and post- task "tasklets".)

In the case of batch jobs, a task group may represent a pipeline in which a subsequent task may depend on the output of a previous one.

Add to these complexities is the handling of various failure scenarios. What to do if a task in a group fail? What to do if you need to drain the node? What about Nomad agent crashing? Thank goodness I don't have to noodle on how to deal with them in a comprehensive manner.

One thing I really like about Nomad is a reasonable progression in feature addition (I'm staring at that giant tornado that's k8s.) Who knows, may be they'll conclude job-type is too much of a simplification...

@keith6014
Copy link

@keith6014 keith6014 commented Aug 28, 2019

@kcwong-verseon no one said it was going to be easy. But people said for years the feature will exist. I rather them say "no" so we didn't have to hold our breath. We moved on to Airflow and other custom solutions.

@keith6014
Copy link

@keith6014 keith6014 commented Aug 28, 2019

Are we ever going to implement this feature - 4 years is long time. I think this is an important feature. We are evaluating Nomad as our enterprise orchestrator - but now I am thinking if we are making the right decision to use Nomad.

We stopped offering it internally for this precise reason. Almost 4 years since the issue was raised.

@nugend
Copy link

@nugend nugend commented Aug 28, 2019

@keith6014 Is Airflow now suitable for service graphs as well as task pipelines? Or were you referring to using Nomad for building task pipelines?

@kcwong-verseon
Copy link

@kcwong-verseon kcwong-verseon commented Aug 28, 2019

@keith6014 I'm sure you know Airflow is a very different beast than Nomad. They have completely different primary objective. They may be able to work together, however.

@keith6014
Copy link

@keith6014 keith6014 commented Aug 28, 2019

@keith6014 Is Airflow now suitable for service graphs as well as task pipelines? Or were you referring to using Nomad for building task pipelines?

for building task pipelines. Data flow, ETL.

@ValFadeev
Copy link

@ValFadeev ValFadeev commented Sep 28, 2019

Something that might cater for the use case: https://github.com/ValFadeev/rundeck-nomad-plugin
Basically, using Rundeck as an enhanced UI for Nomad and taking advantage of its workflow-building and time scheduling functionality.
Cons:

  • adding Rundeck as another moving piece;
  • project needs updating for more recent versions;
    Pros:
  • still running jobs on Nomad;
@yishan-lin
Copy link
Contributor

@yishan-lin yishan-lin commented Oct 10, 2019

Hi everyone - thank you for the patience.

We are working on implementing native task dependencies now and are exploring a potential Airflow integration.

Would love support in adding feedback + your interest in this ticket to the Apache Airflow committee so they may understand the demand. Ideally, we'd like to optimize the experience by providing a first-class integration, rather than a maintained fork.

https://issues.apache.org/jira/browse/AIRFLOW-5633

cc @jazzyfresh

@CarlosDomingues
Copy link

@CarlosDomingues CarlosDomingues commented Oct 17, 2019

@yishan-lin a Nomad executor for Airflow would be absolutely brilliant.

@Sea-Flying
Copy link

@Sea-Flying Sea-Flying commented Dec 3, 2019

in watching, and expect

@sagarrakshe
Copy link

@sagarrakshe sagarrakshe commented Jan 21, 2020

I faced the similar issue in our deployments, so I created a tool.
https://github.com/sagarrakshe/nomad-dtree

@DhashS
Copy link

@DhashS DhashS commented Jan 31, 2020

We needed this enough that we implemented it ourselves. We have an AST for nomad jobs and interpret it to figure out which consul health checks to watch, wait for their success/fail timeout, and add the unblocked jobs to the work queue.

@recursionbane
Copy link

@recursionbane recursionbane commented Feb 1, 2020

Agreed, we could not wait, either.

We ended up writing a DAG parser to evaluate eligibility of a node based on complex boolean dependencies, only exposing eligible nodes to Nomad for scheduling.

Not ideal, since we are now reliant on a single-threaded process for scheduling, but we are able to schedule several thousand jobs per minute this way. This might pay off in the long term, since it is unlikely Nomad's dependency roadmap includes boolean/complex dependencies.

@yishan-lin
Copy link
Contributor

@yishan-lin yishan-lin commented Mar 19, 2020

Hey all, for those that missed our Nomad Virtual Day livestream last week - task dependencies is coming in Nomad 0.11, which folks will hear more about it in the coming weeks.

Here is a recording of the wonderful demo and presentation for reference that @jazzyfresh did on the feature - https://www.hashicorp.com/resources/preview-of-nomad-0-11-task-dependencies

For more complex dependencies as @recursionbane mentioned, we are targeting an integration with Apache Airflow to support such functionality.

@eigengrau
Copy link

@eigengrau eigengrau commented Mar 20, 2020

That’s great news. @jazzyfresh I have a question related to this issue: I presume if we wanted to have a database server up and running before the main task, we would declare it as a pre-start, sidecar task in Nomad v0.11. Does the new lifecycle-hook mechanism observe the Consul health-check of the database service before moving on with the main lifecycle phase? Or would we need to leverage Apache AirFlow for this?

@DhashS
Copy link

@DhashS DhashS commented Apr 5, 2020

@yishan-lin that's awesome! Prestart and Poststop hooks are definitely not just a nice-to-have, and i'm super happy that you added them.

However, i don't think that those hooks count as "task dependencies". Consider a group with 5 containers, one that needs to run before (prestart), one that needs to run after (poststop), and the other three containers need to be brought up in sequence.
Prestart and poststop partition the scheduling space into 3 chunks, not N chunks like a true "task dependencies" addition would.

An example of this is how we bring up ZK/Kafka in our software (we run them on nomad with host volumes). We have to submit two different jobs since there's no way to have "generic" task dependencies, so we're forced to wait until ZK's health check comes back before submitting the kafka job. True task dependencies would allow us to coalesce them into one job.

@yishan-lin
Copy link
Contributor

@yishan-lin yishan-lin commented May 4, 2020

Hey Dhash - you and I synced on this offline but recapping it here for visibility for all. The 5 container group example you mentioned is the kind of DAG functionality that I'd look for our Apache Airflow integration to cover, which is on our roadmap and coming soon!

@DhashS
Copy link

@DhashS DhashS commented May 21, 2020

Our use case has been worked around well by the use of consul_service_health and nomad_job in terraform.

We now use terraform to submit all our nomad jobs, and the wait_for parameter in the consul_service_health allows the data dependency to the next nomad job to not be fulfilled until all checks are passing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
You can’t perform that action at this time.