Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide for dependencies between tasks in a group #419

Open
mfischer-zd opened this issue Nov 15, 2015 · 65 comments
Open

Provide for dependencies between tasks in a group #419

mfischer-zd opened this issue Nov 15, 2015 · 65 comments
Labels
stage/accepted Confirmed, and intend to work on. No timeline committment though. theme/core type/enhancement

Comments

@mfischer-zd
Copy link

mfischer-zd commented Nov 15, 2015

Tasks in a group sometimes need to be ordered to start up correctly.

For example, to support the Ambassador pattern, proxy containers (P[n]) used for outbound request routing by a dependent application may be started only after the dependent application (A) is started. This is because Docker needs to know the name of A to configure shared-container networking when launching P[n].

In the first approximation of the solution, ordering can be simple, e.g., by having the task list in a group be an array.

@ChrisHines
Copy link
Contributor

ChrisHines commented Nov 16, 2015

+1

Batch jobs need task dependencies to manage data-flow between tasks.

I would expand this to allow dependencies between tasks in a job rather than just a task group. Data flow is not constrained to a single node.

@suyash-repo
Copy link

suyash-repo commented Nov 30, 2015

Any tentative date to add this enhancement feature in Nomad?

@dadgar
Copy link
Contributor

dadgar commented Nov 30, 2015

@suyash1983: we don't have any dates yet but it will most likely be something that comes in 0.4

@blalor
Copy link
Contributor

blalor commented Jan 15, 2016

This is really going to be core to a number of use cases, like allowing EBS volumes to be mounted on a host before the application starts.

@c4milo
Copy link
Contributor

c4milo commented Mar 4, 2016

or pulling down certificates and secrets from Vault before starting a given service.

@vrenjith
Copy link
Contributor

vrenjith commented Mar 7, 2016

+1
This will be a great addition for large enterprise cluster deployments

@steve-jansen
Copy link
Contributor

steve-jansen commented Mar 16, 2016

+1

Running consul-template alongside an app...

@dadgar
Copy link
Contributor

dadgar commented Mar 16, 2016

@steve-jansen You can run consul-template currently. We do this in our Nomad deployment. Essentially you have a script that runs consul-template which produces your config, and then runs your binary

@steve-jansen
Copy link
Contributor

steve-jansen commented Mar 16, 2016

@dadgar nice, thanks for sharing. It's always great to hear how a company uses their own products. I imagine consul-template creates the env vars in the job config...

One catch for us, we aspire to have consul-template rewrite the config in response to changes in a Consul k/v, or more importantly, when Vault rotates a secret. We're trying to have consul-template signal the co-scheduled task to reload its config.

Indeed, my need for task dependencies is very narrow. In a perfect world for me, Nomad would have integration with Consul K/V and/or Vault. My need is config that updates dynamically for tasks. That would eliminate my need for task dependencies.

@dadgar
Copy link
Contributor

dadgar commented Mar 16, 2016

@steve-jansen consul-template can supervise the process and restart/signal it whenever the config changes. @sethvargo to verify

@sethvargo
Copy link
Contributor

sethvargo commented Mar 16, 2016

Hi @steve-jansen

Vault currently does not support blocking queries, which is what Consul (and CT) use to give you that "real-time" trigger when something changes. CT will renew secrets at lease_duration/2.0, but it's not currently possible to trigger a change in Vault and have that immediately notify another process. There is an open issue in Vault for blocking queries. Please note this was an intentional design decision in Vault for performance reasons, and not a bug in CT or Vault.

CT will start the process, but it is not a supervisor or monitor. CT has the expectation that the given command will return within 30s, typically by delegating to some supervisor. If you're running in pure Docker, CT can optionally cleanup PIDs and act like PID 1, but it doesn't monitor the process itself (which is best monitored by the scheduler anyway).

@OferE
Copy link

OferE commented Sep 18, 2016

+1 - any updates on this?

@nugend
Copy link

nugend commented Oct 10, 2016

I'd also like to see this land. I'm evaluating nomad for service orchestration and it would be very nice to be able to explicitly express dependencies.

Alternatively, or as a stop gap, maybe a recipe for doing this with the existing APIs could be written up (#1065 mentions this is possible)

@mbravorus
Copy link

mbravorus commented Dec 18, 2016

Currently, the use cases which require dependencies and cron-like control tend to push me towards airbnb's Chronos which implies Mesos. I would be delighted if I could just use Nomad.

Recurring jobs are possible (starting from #540, ultimately described in https://www.nomadproject.io/docs/job-specification/periodic.html), but in a very common scenario where periodic tasks need interlocking dependency control, there is no way to do it with Nomad currently (or I didn't manage to find it)

@nugend
Copy link

nugend commented Dec 20, 2016

Given that this doesn't seem to be a high priority, could someone from the project briefly explain the suggestion about how to achieve this with existing APIs as mentioned in #1065?

@dadgar
Copy link
Contributor

dadgar commented Dec 20, 2016

@nugend You would essentially need to put a wrapper around the task that uses the allocation endpoint to determine if the task it should wait for has finished successfully and then start itself. It is non-trivial amount of work though

@RobertAtomic
Copy link

RobertAtomic commented Sep 27, 2017

+1 for DAG support... It's quite important for my use case.

@tduffield
Copy link

tduffield commented Sep 28, 2017

I too would love to see this.

@SunSparc
Copy link

SunSparc commented Oct 21, 2017

+1 for task sequence/dependency

@nugend
Copy link

nugend commented Oct 23, 2017

For anyone really pining for this, there is a slightly easier method provided you are registering the upstream tasks as services: use a template stanza to create a run script and make that the command of your task. Then you can use a conditional dependent on the service being available to either launch the actual process or sleep forever.

Works pretty well, though it does require a bit of scripting (and I’m not sure how to do it in Windows)

@vrenjith
Copy link
Contributor

vrenjith commented Nov 4, 2017

@abohne
Copy link

abohne commented Dec 12, 2017

Is there any update on where this might fall on the roadmap?

@kcwong-verseon
Copy link
Contributor

kcwong-verseon commented Aug 7, 2019

This is not an easy matter. I'd rather the HashiCorp folks take the time to think this through. Since Nomad has job types, one can see task dependency having different implications for different job types. For service jobs, the simplest approach would forego task dependency and just let restarts sort things out. However, in the use-case as stated in this issue, things like namespace dependence cannot be addressed via restart. Other use cases that require dependency management include setup and teardown for a task, which doesn't really work well with having a job-type of service (a more viable solution may be pre- and post- task "tasklets".)

In the case of batch jobs, a task group may represent a pipeline in which a subsequent task may depend on the output of a previous one.

Add to these complexities is the handling of various failure scenarios. What to do if a task in a group fail? What to do if you need to drain the node? What about Nomad agent crashing? Thank goodness I don't have to noodle on how to deal with them in a comprehensive manner.

One thing I really like about Nomad is a reasonable progression in feature addition (I'm staring at that giant tornado that's k8s.) Who knows, may be they'll conclude job-type is too much of a simplification...

@keith6014
Copy link

keith6014 commented Aug 28, 2019

@kcwong-verseon no one said it was going to be easy. But people said for years the feature will exist. I rather them say "no" so we didn't have to hold our breath. We moved on to Airflow and other custom solutions.

@keith6014
Copy link

keith6014 commented Aug 28, 2019

Are we ever going to implement this feature - 4 years is long time. I think this is an important feature. We are evaluating Nomad as our enterprise orchestrator - but now I am thinking if we are making the right decision to use Nomad.

We stopped offering it internally for this precise reason. Almost 4 years since the issue was raised.

@nugend
Copy link

nugend commented Aug 28, 2019

@keith6014 Is Airflow now suitable for service graphs as well as task pipelines? Or were you referring to using Nomad for building task pipelines?

@kcwong-verseon
Copy link
Contributor

kcwong-verseon commented Aug 28, 2019

@keith6014 I'm sure you know Airflow is a very different beast than Nomad. They have completely different primary objective. They may be able to work together, however.

@keith6014
Copy link

keith6014 commented Aug 28, 2019

@keith6014 Is Airflow now suitable for service graphs as well as task pipelines? Or were you referring to using Nomad for building task pipelines?

for building task pipelines. Data flow, ETL.

@ValFadeev
Copy link

ValFadeev commented Sep 28, 2019

Something that might cater for the use case: https://github.com/ValFadeev/rundeck-nomad-plugin
Basically, using Rundeck as an enhanced UI for Nomad and taking advantage of its workflow-building and time scheduling functionality.
Cons:

  • adding Rundeck as another moving piece;
  • project needs updating for more recent versions;
    Pros:
  • still running jobs on Nomad;

@yishan-lin
Copy link
Contributor

yishan-lin commented Oct 10, 2019

Hi everyone - thank you for the patience.

We are working on implementing native task dependencies now and are exploring a potential Airflow integration.

Would love support in adding feedback + your interest in this ticket to the Apache Airflow committee so they may understand the demand. Ideally, we'd like to optimize the experience by providing a first-class integration, rather than a maintained fork.

https://issues.apache.org/jira/browse/AIRFLOW-5633

cc @jazzyfresh

@CarlosDomingues
Copy link

CarlosDomingues commented Oct 17, 2019

@yishan-lin a Nomad executor for Airflow would be absolutely brilliant.

@sfs77
Copy link

sfs77 commented Dec 3, 2019

in watching, and expect

@sagarrakshe
Copy link

sagarrakshe commented Jan 21, 2020

I faced the similar issue in our deployments, so I created a tool.
https://github.com/sagarrakshe/nomad-dtree

@DhashS
Copy link

DhashS commented Jan 31, 2020

We needed this enough that we implemented it ourselves. We have an AST for nomad jobs and interpret it to figure out which consul health checks to watch, wait for their success/fail timeout, and add the unblocked jobs to the work queue.

@recursionbane
Copy link

recursionbane commented Feb 1, 2020

Agreed, we could not wait, either.

We ended up writing a DAG parser to evaluate eligibility of a node based on complex boolean dependencies, only exposing eligible nodes to Nomad for scheduling.

Not ideal, since we are now reliant on a single-threaded process for scheduling, but we are able to schedule several thousand jobs per minute this way. This might pay off in the long term, since it is unlikely Nomad's dependency roadmap includes boolean/complex dependencies.

@yishan-lin
Copy link
Contributor

yishan-lin commented Mar 19, 2020

Hey all, for those that missed our Nomad Virtual Day livestream last week - task dependencies is coming in Nomad 0.11, which folks will hear more about it in the coming weeks.

Here is a recording of the wonderful demo and presentation for reference that @jazzyfresh did on the feature - https://www.hashicorp.com/resources/preview-of-nomad-0-11-task-dependencies

For more complex dependencies as @recursionbane mentioned, we are targeting an integration with Apache Airflow to support such functionality.

@eigengrau
Copy link

eigengrau commented Mar 20, 2020

That’s great news. @jazzyfresh I have a question related to this issue: I presume if we wanted to have a database server up and running before the main task, we would declare it as a pre-start, sidecar task in Nomad v0.11. Does the new lifecycle-hook mechanism observe the Consul health-check of the database service before moving on with the main lifecycle phase? Or would we need to leverage Apache AirFlow for this?

@DhashS
Copy link

DhashS commented Apr 5, 2020

@yishan-lin that's awesome! Prestart and Poststop hooks are definitely not just a nice-to-have, and i'm super happy that you added them.

However, i don't think that those hooks count as "task dependencies". Consider a group with 5 containers, one that needs to run before (prestart), one that needs to run after (poststop), and the other three containers need to be brought up in sequence.
Prestart and poststop partition the scheduling space into 3 chunks, not N chunks like a true "task dependencies" addition would.

An example of this is how we bring up ZK/Kafka in our software (we run them on nomad with host volumes). We have to submit two different jobs since there's no way to have "generic" task dependencies, so we're forced to wait until ZK's health check comes back before submitting the kafka job. True task dependencies would allow us to coalesce them into one job.

@yishan-lin
Copy link
Contributor

yishan-lin commented May 4, 2020

Hey Dhash - you and I synced on this offline but recapping it here for visibility for all. The 5 container group example you mentioned is the kind of DAG functionality that I'd look for our Apache Airflow integration to cover, which is on our roadmap and coming soon!

@DhashS
Copy link

DhashS commented May 21, 2020

Our use case has been worked around well by the use of consul_service_health and nomad_job in terraform.

We now use terraform to submit all our nomad jobs, and the wait_for parameter in the consul_service_health allows the data dependency to the next nomad job to not be fulfilled until all checks are passing

@tgross tgross added the stage/accepted Confirmed, and intend to work on. No timeline committment though. label Aug 24, 2020
@evandam
Copy link

evandam commented Dec 9, 2020

Hey @yishan-lin, I was just curious if there are any updates on the Airflow integration? We would love to see a Nomad executor!

@retarpt
Copy link

retarpt commented May 24, 2021

Hi, does anyone here have experience using Nomad for scheduling Airflow tasks (or vice-versa)? I am looking to constrain resources of individual tasks within an Airflow DAG by isolating them with cgroups and namespaces provided by Nomad's exec driver. Any help, resources, or advice would be so very much appreciated! Thank you, all.

@Oloremo
Copy link

Oloremo commented Jun 30, 2021

Interested in that as well

benbuzbee pushed a commit to benbuzbee/nomad that referenced this issue Jul 21, 2022
@ahmedwonolo
Copy link

ahmedwonolo commented Aug 8, 2022

Any update on this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stage/accepted Confirmed, and intend to work on. No timeline committment though. theme/core type/enhancement
Projects
None yet
Development

No branches or pull requests