Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce planner_multi_t #349

Merged
merged 7 commits into from
Jul 16, 2018
Merged

Conversation

dongahn
Copy link
Member

@dongahn dongahn commented May 26, 2018

This PR is still experimental: I thought I should post this now as I will be busy with travel for awhile.

This PR mainly addresses a minimum time resource tree problem within the planner layer. This reverse search tree is designed to satisify a query like "when is the earliest scheduled point at which I can schedule n amounts of resources" in O(lg N) time. However, this advanced algorithm only works when it deals with a single resource type and doesn't work correctly when it deals with multiple resource types.

In response, this PR limits the planner layer such that that it only manages a single resource type. Instead, it adds planner_multi_t layer to handle multiple resources. It also adjusted the rest of the resource code for this change.

Finally a bunch of test cases has also been added.

When I circle back to this, I will do further cleanup and tighten some loosen ends to ensure correctness for a few more corner cases.

@codecov-io
Copy link

codecov-io commented May 26, 2018

Codecov Report

Merging #349 into master will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master     #349   +/-   ##
=======================================
  Coverage   73.35%   73.35%           
=======================================
  Files          59       59           
  Lines        9829     9829           
=======================================
  Hits         7210     7210           
  Misses       2619     2619

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b89717f...6448e7f. Read the comment docs.

@coveralls
Copy link

coveralls commented May 26, 2018

Coverage Status

Coverage increased (+0.6%) to 75.421% when pulling 6448e7f on dongahn:planner_fix into b89717f on flux-framework:master.

@dongahn
Copy link
Member Author

dongahn commented Jul 5, 2018

@SteVwonder and I discussed this. We will try to land this with a minimal amount of work so that @SteVwonder can start to integrate the hwloc reader into the new graph-based infrastructure.

@SteVwonder
Copy link
Member

@dongahn can you rebase onto master? I can merge afterwards.

Also, you mentioned in the PR description

When I circle back to this, I will ... tighten some loosen ends to ensure correctness for a few more corner cases.

Do you happen to remember what those corner cases were? If so, can you document them in an issue so that we do not forget to circle back around to them?

@dongahn
Copy link
Member Author

dongahn commented Jul 5, 2018

Yes, I will do those. Thanks!

@dongahn
Copy link
Member Author

dongahn commented Jul 6, 2018

@SteVwonder: OK. I did rebase this to the upstream master. I think the main thing was to change some method names in traverser classes and etc to increase readability and add generally more testing. Both can come later.

f99c21e is so that users can program prune filters more effectively. But this can also come much later.

Perhaps I can change this commit as "add a placeholder command line option" This way, completing this functionality by me don't have to be in the way of your progress.

@SteVwonder
Copy link
Member

Sounds like a good idea

Copy link
Member

@SteVwonder SteVwonder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like this abstraction into single and multi! Most of the complexity has been isolated to the "single" case and the "multi" case is a straightforward use of the "single" case.

My main review comments are really just nitpicking the naming of the "resource counts". I think the phrase request is overloaded and causes a bit of confusion in places. resource_counts and request_counts seem more clear to me. There are also a few inconsistencies in the naming across the "single" and "multi" interfaces as well as within the "multi" interfaces.

The testing looks good for now. It seems to cover the common cases. As per the discussion earlier today, the extended testing of the corner cases will be saved for later. For example, I don't think passing a duration of 0 is tested.


typedef struct request {
int64_t on_or_after;
uint64_t duration;
resource_array_t resources;
size_t dimension;
int64_t request;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been tripping over the use of request in the planner a little bit. It is an internal struct, a member of that struct, and an argument in many of the public functions. So you end up with things like request = ctx->current_request->request at line 794. I think the struct name makes a lot of sense, but maybe size or count would make more sense here and in the function arguments. So that would turn the previous example into size = ctx->current_request->size.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To keep in line with what is done in planner_multi.h maybe resource_count would be best?

*/
int planner_multi_avail_during (planner_multi_t *ctx, int64_t at,
uint64_t duration,
const uint64_t *resource_requests, size_t len);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on the comments and other functions, resource_requests -> resource_counts

Copy link
Member

@SteVwonder SteVwonder Jul 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, on further thought, maybe for the functions that consume the counts, resource_requests makes more sense. And for functions that copy the counts into the array, resource_counts makes more sense.

I am not exactly sure, and I don't have super strong feelings either way. But there is an inconsistency between the function argument and the comments.

* range, e.g., reserving more than available.
*/
int64_t planner_multi_add_span (planner_multi_t *ctx, int64_t start_time,
uint64_t duration, const uint64_t *request,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At least should be changed to the plural (requests), but I think like in planner_multi_avail_time_first and planner_multi_avail_during, the naming request_counts would fit too.

@dongahn
Copy link
Member Author

dongahn commented Jul 6, 2018

@SteVwonder: Thank you for the review. Yes, I will have one more pass on those names and at least have those names consistent.

@SteVwonder
Copy link
Member

Does this PR fix #319?

@dongahn
Copy link
Member Author

dongahn commented Jul 10, 2018

Yes.

Solve a problem where the mintime resource tree within planner
does not work for multi-resource types.
The problem is described in Issue flux-framework#319.

Change the API and implementation of the planner layer
so that it now only concerns only a single resource type.
With this change, the upper layer (resource-query etc) will
now have to use multiple planner objects if it needs to
track multiple resource types.

Note that we considered other alternatives: e.g.,
introducing kd tree or  making each scheduled point node in the
resource binary search tree to maintain sub-resource BST
for the next significant resource types. But these
alternatives will significantly increase the software
complexity for a single software layer, and we decide
to push the complexity to the upper layer.

Finally, adjust the unit test case as well.
Add an option to allow users to configure the planner-based
caches to speed up graph walks in the future.
@dongahn
Copy link
Member Author

dongahn commented Jul 14, 2018

@SteVwonder: I think I addressed all of your suggestions. Please take a look and merge if this looks good to you. Thanks.

@SteVwonder
Copy link
Member

Looks like one of the Travis jobs stalled out and didn't even produce any output. I just restart it. Once Travis turns green, I will merge. Thanks @dongahn!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants