Introduce planner_multi_t #349

dongahn · 2018-05-26T07:27:13Z

This PR is still experimental: I thought I should post this now as I will be busy with travel for awhile.

This PR mainly addresses a minimum time resource tree problem within the planner layer. This reverse search tree is designed to satisify a query like "when is the earliest scheduled point at which I can schedule n amounts of resources" in O(lg N) time. However, this advanced algorithm only works when it deals with a single resource type and doesn't work correctly when it deals with multiple resource types.

In response, this PR limits the planner layer such that that it only manages a single resource type. Instead, it adds planner_multi_t layer to handle multiple resources. It also adjusted the rest of the resource code for this change.

Finally a bunch of test cases has also been added.

When I circle back to this, I will do further cleanup and tighten some loosen ends to ensure correctness for a few more corner cases.

codecov-io · 2018-05-26T07:44:35Z

Codecov Report

Merging #349 into master will not change coverage.
The diff coverage is n/a.

@@           Coverage Diff           @@
##           master     #349   +/-   ##
=======================================
  Coverage   73.35%   73.35%           
=======================================
  Files          59       59           
  Lines        9829     9829           
=======================================
  Hits         7210     7210           
  Misses       2619     2619

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b89717f...6448e7f. Read the comment docs.

coveralls · 2018-05-26T07:44:48Z

Coverage increased (+0.6%) to 75.421% when pulling 6448e7f on dongahn:planner_fix into b89717f on flux-framework:master.

dongahn · 2018-07-05T20:46:12Z

@SteVwonder and I discussed this. We will try to land this with a minimal amount of work so that @SteVwonder can start to integrate the hwloc reader into the new graph-based infrastructure.

SteVwonder · 2018-07-05T21:38:19Z

@dongahn can you rebase onto master? I can merge afterwards.

Also, you mentioned in the PR description

When I circle back to this, I will ... tighten some loosen ends to ensure correctness for a few more corner cases.

Do you happen to remember what those corner cases were? If so, can you document them in an issue so that we do not forget to circle back around to them?

dongahn · 2018-07-05T21:39:09Z

Yes, I will do those. Thanks!

dongahn · 2018-07-06T04:30:09Z

@SteVwonder: OK. I did rebase this to the upstream master. I think the main thing was to change some method names in traverser classes and etc to increase readability and add generally more testing. Both can come later.

f99c21e is so that users can program prune filters more effectively. But this can also come much later.

Perhaps I can change this commit as "add a placeholder command line option" This way, completing this functionality by me don't have to be in the way of your progress.

SteVwonder · 2018-07-06T05:36:00Z

Sounds like a good idea

SteVwonder

I really like this abstraction into single and multi! Most of the complexity has been isolated to the "single" case and the "multi" case is a straightforward use of the "single" case.

My main review comments are really just nitpicking the naming of the "resource counts". I think the phrase request is overloaded and causes a bit of confusion in places. resource_counts and request_counts seem more clear to me. There are also a few inconsistencies in the naming across the "single" and "multi" interfaces as well as within the "multi" interfaces.

The testing looks good for now. It seems to cover the common cases. As per the discussion earlier today, the extended testing of the corner cases will be saved for later. For example, I don't think passing a duration of 0 is tested.

SteVwonder · 2018-07-06T19:24:44Z

resource/planner/planner.c


 typedef struct request {
    int64_t on_or_after;
    uint64_t duration;
-    resource_array_t resources;
-    size_t dimension;
+    int64_t request;


I've been tripping over the use of request in the planner a little bit. It is an internal struct, a member of that struct, and an argument in many of the public functions. So you end up with things like request = ctx->current_request->request at line 794. I think the struct name makes a lot of sense, but maybe size or count would make more sense here and in the function arguments. So that would turn the previous example into size = ctx->current_request->size.

To keep in line with what is done in planner_multi.h maybe resource_count would be best?

SteVwonder · 2018-07-06T20:07:55Z

resource/planner/planner_multi.h

+ */
+int planner_multi_avail_during (planner_multi_t *ctx, int64_t at,
+                                uint64_t duration,
+                                const uint64_t *resource_requests, size_t len);


Based on the comments and other functions, resource_requests -> resource_counts

Actually, on further thought, maybe for the functions that consume the counts, resource_requests makes more sense. And for functions that copy the counts into the array, resource_counts makes more sense.

I am not exactly sure, and I don't have super strong feelings either way. But there is an inconsistency between the function argument and the comments.

SteVwonder · 2018-07-06T20:13:00Z

resource/planner/planner_multi.h

+ *                                  range, e.g., reserving more than available.
+ */
+int64_t planner_multi_add_span (planner_multi_t *ctx, int64_t start_time,
+                                uint64_t duration, const uint64_t *request,


At least should be changed to the plural (requests), but I think like in planner_multi_avail_time_first and planner_multi_avail_during, the naming request_counts would fit too.

dongahn · 2018-07-06T21:00:32Z

@SteVwonder: Thank you for the review. Yes, I will have one more pass on those names and at least have those names consistent.

SteVwonder · 2018-07-10T17:31:22Z

Does this PR fix #319?

dongahn · 2018-07-10T17:32:25Z

Yes.

Solve a problem where the mintime resource tree within planner does not work for multi-resource types. The problem is described in Issue flux-framework#319. Change the API and implementation of the planner layer so that it now only concerns only a single resource type. With this change, the upper layer (resource-query etc) will now have to use multiple planner objects if it needs to track multiple resource types. Note that we considered other alternatives: e.g., introducing kd tree or making each scheduled point node in the resource binary search tree to maintain sub-resource BST for the next significant resource types. But these alternatives will significantly increase the software complexity for a single software layer, and we decide to push the complexity to the upper layer. Finally, adjust the unit test case as well.

Add an option to allow users to configure the planner-based caches to speed up graph walks in the future.

dongahn · 2018-07-14T16:08:48Z

@SteVwonder: I think I addressed all of your suggestions. Please take a look and merge if this looks good to you. Thanks.

SteVwonder · 2018-07-15T19:24:14Z

Looks like one of the Travis jobs stalled out and didn't even produce any output. I just restart it. Once Travis turns green, I will merge. Thanks @dongahn!

SteVwonder added the Status: In Progress label May 26, 2018

dongahn force-pushed the planner_fix branch 2 times, most recently from 182db05 to 5335e23 Compare May 26, 2018 17:08

dongahn force-pushed the planner_fix branch from 5335e23 to f99c21e Compare July 6, 2018 04:24

SteVwonder reviewed Jul 6, 2018

View reviewed changes

SteVwonder mentioned this pull request Jul 10, 2018

resource: software engineering clean up #356

Closed

dongahn added 7 commits July 14, 2018 09:04

planner: Add a thin abstraction to manage multiple planners

5c25efc

build: Add planner_multi support

dbb0793

test: Add test cases for planner_multi

b370981

resource: Use planner_multi when dealing with multiple resource types

240d2dc

test: fix a minor comment typo

d250bd5

query: Add a placeholder for prune filters configuration cli option

6448e7f

Add an option to allow users to configure the planner-based caches to speed up graph walks in the future.

dongahn force-pushed the planner_fix branch from f99c21e to 6448e7f Compare July 14, 2018 16:06

dongahn added enhancement and removed Status: In Progress labels Jul 14, 2018

SteVwonder mentioned this pull request Jul 15, 2018

travis: change email notifications to go to the flux-sched-dev listserv #353

Merged

SteVwonder merged commit 95862f6 into flux-framework:master Jul 16, 2018

dongahn mentioned this pull request Jul 17, 2018

Planner: Need an algorithmic fix in mintime resource tree for multiple resource types #319

Closed

dongahn mentioned this pull request Sep 20, 2018

Need FY-end performance analysis + tuning for resource #390

Closed

dongahn mentioned this pull request Oct 31, 2018

Wire in --prune-filter option for resource-query and resource matching service #399

Closed

dongahn deleted the planner_fix branch July 13, 2019 21:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduce planner_multi_t #349

Introduce planner_multi_t #349

dongahn commented May 26, 2018

codecov-io commented May 26, 2018 •

edited

coveralls commented May 26, 2018 •

edited

dongahn commented Jul 5, 2018

SteVwonder commented Jul 5, 2018

dongahn commented Jul 5, 2018

dongahn commented Jul 6, 2018

SteVwonder commented Jul 6, 2018

SteVwonder left a comment •

edited

SteVwonder Jul 6, 2018

SteVwonder Jul 6, 2018

SteVwonder Jul 6, 2018

SteVwonder Jul 6, 2018 •

edited

SteVwonder Jul 6, 2018

dongahn commented Jul 6, 2018

SteVwonder commented Jul 10, 2018

dongahn commented Jul 10, 2018

dongahn commented Jul 14, 2018

SteVwonder commented Jul 15, 2018

Introduce planner_multi_t #349

Introduce planner_multi_t #349

Conversation

dongahn commented May 26, 2018

codecov-io commented May 26, 2018 • edited

Codecov Report

coveralls commented May 26, 2018 • edited

dongahn commented Jul 5, 2018

SteVwonder commented Jul 5, 2018

dongahn commented Jul 5, 2018

dongahn commented Jul 6, 2018

SteVwonder commented Jul 6, 2018

SteVwonder left a comment • edited

Choose a reason for hiding this comment

SteVwonder Jul 6, 2018

Choose a reason for hiding this comment

SteVwonder Jul 6, 2018

Choose a reason for hiding this comment

SteVwonder Jul 6, 2018

Choose a reason for hiding this comment

SteVwonder Jul 6, 2018 • edited

Choose a reason for hiding this comment

SteVwonder Jul 6, 2018

Choose a reason for hiding this comment

dongahn commented Jul 6, 2018

SteVwonder commented Jul 10, 2018

dongahn commented Jul 10, 2018

dongahn commented Jul 14, 2018

SteVwonder commented Jul 15, 2018

codecov-io commented May 26, 2018 •

edited

coveralls commented May 26, 2018 •

edited

SteVwonder left a comment •

edited

SteVwonder Jul 6, 2018 •

edited