Support for multiple job dependencies #279

selwin · 2013-10-25T12:54:04Z

I started with @jchia's awesome initial pull request and reworked some parts to make it more readable (hopefully).

As far as I'm concerned, this PR is mostly complete except for some pipelining that we need to do for efficiency reasons.

Would be great if @nvie and @jchia can do another quick review on this PR so we can get this merged and get 0.4 out the door.

nvie · 2013-10-26T10:20:52Z

This looks great, @selwin. I'm terribly busy these days with non-open source stuff. Thanks for keeping the shop open while I'm out :) I might look into this either tomorrow or Monday. Please excuse me.

selwin · 2013-10-26T11:48:19Z

@nvie no worries :). If you have the time, do take a peek also at #233 and #152

selwin · 2013-10-27T01:39:54Z

@jchia I have address most of your comments except for job saving being moved outside the pipeline. When refactoring, I moved the logic of checking for remaining dependencies to job.register_dependencies for better separation of concerns, readability and testability.

With the way things are now, I'm not sure how we can pipeline the dependency registration and job saving while keeping the APIs simple and elegant. Any suggestions?

jchia · 2013-10-28T22:05:40Z

I think from among interface simplicity, implementation simplicity and performance, we can only have two things. I thought of a way to have API simplicity and performance, but thought the implementation was rather more involved than the current implementation.

jchia · 2013-10-29T05:31:15Z

BTW, I mean this generally for rq, not just the dependency registration and job saving.

nvie · 2013-10-29T09:10:17Z

rq/job.py

@@ -367,6 +380,8 @@ def cancel(self):
        without worrying about the internals required to implement job
        cancellation.  Technically, this call is (currently) the same as just
        deleting the job hash.
+
+        NOTE: Any job that depends on this job becomes orphaned.


How are these cleaned up?

nvie · 2013-10-29T09:43:59Z

My general take on this pull request is that I think the code is pretty hard to read when you don't know what we're trying to achieve here. I'm hesitant to pulling this in the current state, as this would basically make it harder to change things later on. Also, I think performance has much lower priority right now than readability of implementation, at least in the first version. Let's get it correct first, with a reasonable amount of certainty.

I don't think the problem at hand is complex enough to justify this hairy implementation.

In my blunt opinion, I think these things are important:

Every job should get a set containing all of the job's dependencies. AFAIC, this set can live in a separate key, next to the job's hash—it doesn't need to be part of it.
Every job should get a set containing all of the job's "dependents". This can also just be living next to the job hash. This should be a list instead of a set, though, as order matters here.
Personally, I find the naming pretty confusing. Dependencies and "dependents" are really similar and hard to keep apart. I think I like the more verbose "reverse_dependencies" better, still. It takes no mental effort to distinguish it from the dependencies set.
Whenever you add a dependency to a job, you need to immediately (atomically!) add the reverse relation to some other "reverse dependency" set. Ideally, this atomic action would be taken care of in a method explicitly. Now, this logic is a bit scattered and depends on register_dependencies() being called at the correct time.
Similarly, when removing a job dependency, the reverse relation needs to be removed as well (atomically).
Take into account cascading relationships. When we have a chain of dependencies like A <- B <- C (A goes first), and A gets canceled, it should cascade-cancel B and C as well, I think, so we don't leak Redis memory. If the situation is like A <- B, B <- C, D <- C, and A gets cancelled, B and C should be cascade-cancelled, too, but D should continue with an empty reverse_dependencies list.
I think failure, as opposed to cancellation, is a different case. When a job fails that others depend on, it should go to the failed queue like normal, and support being re-queued. This should semantically have the same effect as its initial run, meaning jobs that depend on it get executed normally. If they fail, the same semantics apply throughout the whole chain.

selwin · 2013-12-06T06:20:11Z

Sorry but I've been a bit busy in the past few weeks. I'll try to find some time to work on it in the next few weeks. What are your thoughts about releasing the current master as 0.4.0?

We can then work on getting this pull request ready for 0.4.1. The depends_on syntax will still be backwards compatible anyway.

selwin · 2013-12-09T00:27:58Z

@nvie mind taking another look? The goal of my commit here is to make the logic easier to review. Replying to your concerns above:

We already have job dependencies stored in a separate key job.remaining_dependencies_key. I assume you're talking about job.dependencies which I'll address in my reply to number 4.
I'm not sure why order should matter as we just need to add or remove them from the set. Reverse dependencies may finish in any order.
Done
Isn't this already implemented using a combination of pipeline and watch in register_dependencies? As far as I know, this is the only place where dependencies and reverse dependencies are added. We have job.dependencies but it's never really used to check for dependencies during run time and is only there for book keeping purposes. Should we just remove job.dependencies?
Is removing reverse relation necessary? If the concern is to not leak Redis memory, perhaps we should just remove it during deletion.
0.4.1? ;)

selwin · 2013-12-23T09:54:30Z

Ping @nvie :)

nvie · 2014-01-06T23:47:47Z

I'm really trying to find some time this week to properly look at this, or add some extra details I'd wish to add. One thing in particular that I got as a suggestion is to support on_success and on_failure dependencies, which I like, too. This is on my mind to verify / add:

on success / on failure dependencies
making sure this implementation does not "leak" any jobs
general naming convention

Conflicts: rq/job.py rq/worker.py tests/test_job.py

olingerc · 2014-03-17T10:12:50Z

I really like the current depends_on function and am using it heavily in my project. Specifically I use depends_on to queue an "orchestrator" job after each job. My jobs are organized in job groups with complex workflows. One job can depend on multiple others or a finished job can launch multiple others. This orchestrator handles all that.
I added a few lines of code to the push_job_id function in the queue module: Since currently the worker queues the dependant job in the same queue as the job that calls it, its possible that it gets queued after lots of other jobs if there are not many workers and the job group gets blocked. So when an orchestrator job is queued I use LPUSH to get it in front of all others.

    def push_job_id(self, job_id, job_description=''):  # noqa
        """Pushes a job ID on the corresponding Redis queue."""

        """If it is the orchestrator job, it should not have to wait
        on other tasks in the same queue"""
        if job_description.startswith('bioseq_tasks.orchestrator.orchestrator'):
            self.connection.lpush(self.key, job_id)
        else:
            self.connection.rpush(self.key, job_id)

https://github.com/olingerc/rq/blob/master/rq/queue.py#L138

Do you think it would be helpful to have this as a new parameter for the enqueue function? Like "immediate_execution" or "queue_before_all". I could also imagine that the job could decide over which queue it gets pushed into (I have a special "fast running" queue).

jchia · 2014-03-27T08:23:36Z

@selwin In 26add7, the way you broke down the original bump_reverse_dependencies() into remove_dependency() and friends is not safe with multiple workers. If job A depends on jobs B and C, when B and C finish at around the same time by two different workers, they may both try to enqueue A.

jchia · 2014-03-27T08:46:46Z

@selwin It also fails to delete the reverse_dependencies_key.

jchia · 2014-03-27T08:58:39Z

One more problem I realized is that if A depends on B, A will ultimately get truly enqueued to B's queue even though enqueue_call() was called with another queue.

aneilbaboo · 2014-08-04T22:46:56Z

@selwin - any plans to complete multi-dependency - or were there insurmountable problems? I'd be willing to take a crack at it if you think it's worth it.

selwin · 2014-08-05T01:34:56Z

@aneilbaboo no real technical hurdles, I just haven't had the need to finish up the implementation as I currently don't need this feature so I'm working on other features in RQ.

From my end, it's just a matter of tidying things up to find the cleanest possible implementation. @nvie has some concerns regarding the way job dependency is implemented so, I'd advise you to start with #387 before continuing this pull request :)

jim-bo · 2014-08-07T20:20:38Z

I would second the request for multi-dependency branch merge. I attempted to merge them myself but failed quickly I think due to my un-familiarity.

ThisGuyCodes · 2014-11-11T18:03:59Z

shameless bump. Any chance of this getting some more love? If not, if I were to tackle this, is the implementation in this branch still applicable or would a refactoring of the dependency system be prudent before this is attempted?

DerekMarshall · 2014-11-12T05:44:17Z

+1 for getting this merged - my work-arounds suck

julen · 2015-02-20T20:14:29Z

Besides the merge conflicts, what's holding this from being merged?

olingerc · 2015-05-25T09:33:05Z

Is tgere something holding this back? I could get rid of a whole module in my application when this is introduced. I offer my help if you need it.
( in fact this is another shameless bump.)

yaghmr · 2015-07-23T14:08:09Z

What's the status of this ticket? What's holding this from being merged?

NewbiZ · 2015-10-27T11:14:59Z

I'm also very interested in this issue :) Any ETA?

jchia and others added 8 commits October 5, 2013 01:57

Added support for multi-job dependency

fb02367

Simplify isinstance checks in job.py.

e3ab0d6

Renamed prerequisite -> dependency.

04dad3c

Simplify job.register_dependencies.

d4b64db

Tidied up queue.enqueue_call

5ab2f30

Moved dependency checking logic to job.register_dependencies().

8666c41

Minor cleanup to queue.bump_dependents.

d45b575

Delete unused import in queue.py.

fa56474

selwin mentioned this pull request Oct 25, 2013

Added support for multi-job dependency #277

Closed

selwin added 4 commits October 25, 2013 19:57

Fix tests on Python 3.

388184e

Fixed tests in Python 3.

99ad58d

Deprecate the use of assertEquals.

d009930

Removed deprecated assert statements.

c226a42

selwin added 5 commits October 27, 2013 07:44

Simplify job.dependencies.

28a8e0a

Use a single Redis call to watch multiple keys.

41e0594

Use redis.pipeline context manager in job.register_dependencies().

091f2ef

Use Job.dependents_key_for in register_dependencies test.

f4deab5

Test that register_dependencies ignores finished jobs.

f64e327

nvie reviewed Oct 29, 2013
View reviewed changes

selwin mentioned this pull request Nov 8, 2013

I can't enqueue a job #281

Closed

selwin added 2 commits December 6, 2013 17:12

Renamed dependents to reverse_dependencies.

0122487

Simplified logic for removing job dependencies.

26add7d

Merge branch 'master' into multi-dependencies

f572f70

Conflicts: rq/job.py rq/worker.py tests/test_job.py

selwin mentioned this pull request Mar 27, 2014

depends_on #326

Closed

nvie mentioned this pull request Jul 25, 2014

Make the job dependency API more robust #387

Closed

jchia mentioned this pull request Nov 7, 2015

Suggestion: multi-job dependency #260

Closed

drice mentioned this pull request Aug 23, 2018

depends_on attribute #987

Closed

thomasmatecki mentioned this pull request Oct 21, 2019

Multi Dependency Support [Internal API Changes] #1147

Merged

selwin closed this Jan 25, 2020

selwin deleted the multi-dependencies branch January 25, 2020 01:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for multiple job dependencies #279

Support for multiple job dependencies #279

selwin commented Oct 25, 2013

nvie commented Oct 26, 2013

selwin commented Oct 26, 2013

selwin commented Oct 27, 2013

jchia commented Oct 28, 2013

jchia commented Oct 29, 2013

nvie Oct 29, 2013

nvie commented Oct 29, 2013

selwin commented Dec 6, 2013

selwin commented Dec 9, 2013

selwin commented Dec 23, 2013

nvie commented Jan 6, 2014

olingerc commented Mar 17, 2014

jchia commented Mar 27, 2014

jchia commented Mar 27, 2014

jchia commented Mar 27, 2014

aneilbaboo commented Aug 4, 2014

selwin commented Aug 5, 2014

jim-bo commented Aug 7, 2014

ThisGuyCodes commented Nov 11, 2014

DerekMarshall commented Nov 12, 2014

julen commented Feb 20, 2015

olingerc commented May 25, 2015

yaghmr commented Jul 23, 2015

NewbiZ commented Oct 27, 2015

Support for multiple job dependencies #279

Support for multiple job dependencies #279

Conversation

selwin commented Oct 25, 2013

nvie commented Oct 26, 2013

selwin commented Oct 26, 2013

selwin commented Oct 27, 2013

jchia commented Oct 28, 2013

jchia commented Oct 29, 2013

nvie Oct 29, 2013

Choose a reason for hiding this comment

nvie commented Oct 29, 2013

selwin commented Dec 6, 2013

selwin commented Dec 9, 2013

selwin commented Dec 23, 2013

nvie commented Jan 6, 2014

olingerc commented Mar 17, 2014

jchia commented Mar 27, 2014

jchia commented Mar 27, 2014

jchia commented Mar 27, 2014

aneilbaboo commented Aug 4, 2014

selwin commented Aug 5, 2014

jim-bo commented Aug 7, 2014

ThisGuyCodes commented Nov 11, 2014

DerekMarshall commented Nov 12, 2014

julen commented Feb 20, 2015

olingerc commented May 25, 2015

yaghmr commented Jul 23, 2015

NewbiZ commented Oct 27, 2015