Only reserves resources for workers that have recently called get_work #1272

daveFNbuck · 2015-10-05T23:34:04Z

This presents workers that are scheduling or being phased out from
holding on to resources that they're not ready to use yet or will never
request. To accomplish this, we only reserve resources for workers that
have recently made a get_work request. We define recently by being
within the worker_disconnect_delay. Really this should be the
wait_interval, but the scheduler doesn't have access to that value.

Tarrasch · 2015-10-06T08:53:37Z

I'm confused. I thought you could only hold on to a resource by running a task that contains resources. No?

daveFNbuck · 2015-10-06T14:14:36Z

No, in the get_work we also do speculative scheduling and reserve resources for higher priority tasks. That way if you have a worker with a lot of low-priority tasks, it can't hold on to all the resources just because it's the first to ask when one of its jobs finishes.

The problem that this fixes is that I was scheduling a very large backfill and it included some high priority tasks that blocked resource usage for the 30+ minutes that scheduling took.

Tarrasch · 2015-10-06T14:23:14Z

test/central_planner_test.py

@@ -343,6 +343,7 @@ def test_do_not_lock_resources_when_not_ready(self):
        self.assertEqual('C', self.sch.get_work(worker='Y')['task_id'])

    def test_lock_resources_when_one_of_multiple_workers_is_ready(self):
+        self.sch.get_work(worker='X')


Can you add a comment for these new lines why you are doing them?

Even with the context of this PR, I'm not certain. I suppose it's for making the worker "active"?

yeah, comment added

Tarrasch · 2015-10-06T14:26:46Z

I wonder if this really is a good solution. Maybe it would be better to "just implement" that the scheduling and get_work phases happen in parallel.

Otherwise this LGTM.

daveFNbuck · 2015-10-06T16:20:37Z

I'd still want this solution because it helps when a worker stops requesting more jobs. For example, after a SIGUSR1. I think long-term we probably want a better solution for how the reservations work, but I think that would involve basing reservations on get work calls too.

…k request This presents workers that are scheduling or being phased out from holding on to resources that they're not ready to use yet or will never request. To accomplish this, we only reserve resources for workers that have recently made a get_work request. We define recently by being within the worker_disconnect_delay. Really this should be the wait_interval, but the scheduler doesn't have access to that value.

Tarrasch · 2015-10-06T16:25:58Z

Ah, the SIGUSR1 case makes sense. I'm off now but I'll review it again and potentially merge by tomorrow.

Tarrasch · 2015-10-07T13:10:18Z

Makes sense to me. Thanks @daveFNbuck!

Only reserves resources for workers that have recently called get_work

Tarrasch reviewed Oct 6, 2015
View reviewed changes

daveFNbuck force-pushed the reserve_after_get branch from 57bc755 to 51a6c8e Compare October 6, 2015 16:24

Tarrasch added a commit that referenced this pull request Oct 7, 2015

Merge pull request #1272 from Houzz/reserve_after_get

b4656dc

Only reserves resources for workers that have recently called get_work

Tarrasch merged commit b4656dc into spotify:master Oct 7, 2015

daveFNbuck deleted the reserve_after_get branch June 1, 2017 23:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Only reserves resources for workers that have recently called get_work #1272

Only reserves resources for workers that have recently called get_work #1272

daveFNbuck commented Oct 5, 2015

Tarrasch commented Oct 6, 2015

daveFNbuck commented Oct 6, 2015

Tarrasch Oct 6, 2015

daveFNbuck Oct 6, 2015

Tarrasch commented Oct 6, 2015

daveFNbuck commented Oct 6, 2015

Tarrasch commented Oct 6, 2015

Tarrasch commented Oct 7, 2015

Only reserves resources for workers that have recently called get_work #1272

Only reserves resources for workers that have recently called get_work #1272

Conversation

daveFNbuck commented Oct 5, 2015

Tarrasch commented Oct 6, 2015

daveFNbuck commented Oct 6, 2015

Tarrasch Oct 6, 2015

Choose a reason for hiding this comment

daveFNbuck Oct 6, 2015

Choose a reason for hiding this comment

Tarrasch commented Oct 6, 2015

daveFNbuck commented Oct 6, 2015

Tarrasch commented Oct 6, 2015

Tarrasch commented Oct 7, 2015