New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for multiple job dependencies #279
Conversation
This looks great, @selwin. I'm terribly busy these days with non-open source stuff. Thanks for keeping the shop open while I'm out :) I might look into this either tomorrow or Monday. Please excuse me. |
@jchia I have address most of your comments except for job saving being moved outside the pipeline. When refactoring, I moved the logic of checking for remaining dependencies to With the way things are now, I'm not sure how we can pipeline the dependency registration and job saving while keeping the APIs simple and elegant. Any suggestions? |
I think from among interface simplicity, implementation simplicity and performance, we can only have two things. I thought of a way to have API simplicity and performance, but thought the implementation was rather more involved than the current implementation. |
BTW, I mean this generally for rq, not just the dependency registration and job saving. |
@@ -367,6 +380,8 @@ def cancel(self): | |||
without worrying about the internals required to implement job | |||
cancellation. Technically, this call is (currently) the same as just | |||
deleting the job hash. | |||
|
|||
NOTE: Any job that depends on this job becomes orphaned. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How are these cleaned up?
My general take on this pull request is that I think the code is pretty hard to read when you don't know what we're trying to achieve here. I'm hesitant to pulling this in the current state, as this would basically make it harder to change things later on. Also, I think performance has much lower priority right now than readability of implementation, at least in the first version. Let's get it correct first, with a reasonable amount of certainty. I don't think the problem at hand is complex enough to justify this hairy implementation. In my blunt opinion, I think these things are important:
|
Sorry but I've been a bit busy in the past few weeks. I'll try to find some time to work on it in the next few weeks. What are your thoughts about releasing the current master as 0.4.0? We can then work on getting this pull request ready for 0.4.1. The |
@nvie mind taking another look? The goal of my commit here is to make the logic easier to review. Replying to your concerns above:
|
Ping @nvie :) |
I'm really trying to find some time this week to properly look at this, or add some extra details I'd wish to add. One thing in particular that I got as a suggestion is to support on_success and on_failure dependencies, which I like, too. This is on my mind to verify / add:
|
Conflicts: rq/job.py rq/worker.py tests/test_job.py
I really like the current depends_on function and am using it heavily in my project. Specifically I use depends_on to queue an "orchestrator" job after each job. My jobs are organized in job groups with complex workflows. One job can depend on multiple others or a finished job can launch multiple others. This orchestrator handles all that. def push_job_id(self, job_id, job_description=''): # noqa
"""Pushes a job ID on the corresponding Redis queue."""
"""If it is the orchestrator job, it should not have to wait
on other tasks in the same queue"""
if job_description.startswith('bioseq_tasks.orchestrator.orchestrator'):
self.connection.lpush(self.key, job_id)
else:
self.connection.rpush(self.key, job_id) https://github.com/olingerc/rq/blob/master/rq/queue.py#L138 Do you think it would be helpful to have this as a new parameter for the enqueue function? Like "immediate_execution" or "queue_before_all". I could also imagine that the job could decide over which queue it gets pushed into (I have a special "fast running" queue). |
@selwin In 26add7, the way you broke down the original bump_reverse_dependencies() into remove_dependency() and friends is not safe with multiple workers. If job A depends on jobs B and C, when B and C finish at around the same time by two different workers, they may both try to enqueue A. |
@selwin It also fails to delete the reverse_dependencies_key. |
One more problem I realized is that if A depends on B, A will ultimately get truly enqueued to B's queue even though enqueue_call() was called with another queue. |
@selwin - any plans to complete multi-dependency - or were there insurmountable problems? I'd be willing to take a crack at it if you think it's worth it. |
@aneilbaboo no real technical hurdles, I just haven't had the need to finish up the implementation as I currently don't need this feature so I'm working on other features in RQ. From my end, it's just a matter of tidying things up to find the cleanest possible implementation. @nvie has some concerns regarding the way job dependency is implemented so, I'd advise you to start with #387 before continuing this pull request :) |
I would second the request for multi-dependency branch merge. I attempted to merge them myself but failed quickly I think due to my un-familiarity. |
shameless bump. Any chance of this getting some more love? If not, if I were to tackle this, is the implementation in this branch still applicable or would a refactoring of the dependency system be prudent before this is attempted? |
+1 for getting this merged - my work-arounds suck |
Besides the merge conflicts, what's holding this from being merged? |
Is tgere something holding this back? I could get rid of a whole module in my application when this is introduced. I offer my help if you need it. |
What's the status of this ticket? What's holding this from being merged? |
I'm also very interested in this issue :) Any ETA? |
I started with @jchia's awesome initial pull request and reworked some parts to make it more readable (hopefully).
As far as I'm concerned, this PR is mostly complete except for some pipelining that we need to do for efficiency reasons.
Would be great if @nvie and @jchia can do another quick review on this PR so we can get this merged and get 0.4 out the door.