Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sched: Add queue-depth=1 FCFS perf/scalability optimization #294

Merged
merged 2 commits into from Mar 23, 2018

Conversation

Projects
None yet
4 participants
@dongahn
Copy link
Contributor

dongahn commented Mar 23, 2018

Avoid traversing the entire resource tree to release reservations at every schedule loop invocation when a very simple, high performance, in-order (queue-depth=1) FCFS policy is used.

See the discussion in flux-framework/flux-core#1358 (comment)

dongahn added some commits Mar 23, 2018

sched: performance/scalability optimization for simple inorder FCFS
Avoid traversing the entire resource tree to release
reservations at every schedule loop invocation when a very simple,
high performance, in-order (queue-depth=1) FCFS policy is used.

Add get_sched_properties() into the scheduler policy plugin API
so that the scheduler framework comms module can get the properties
of the loaded policy scheduler plugin in order to turn on/off
this optimization.

Initially, only support whether the loaded sched plugin is
out-of-order scheduling capable. Backfill schedulers and FCFS
with queue-depth > 1 can always schedule jobs out of order.

Somewhat counterintuitive, but FCFS schedulers requires
the reservation capability for the general case.

If the first job doesn't find all of the required resources,
the FCFS scheduler reserves any partially found resources so that
it can move on to the next job to schedule.

The reason for this out of order behavior is that the next job
may only require resources of different type than what the first job
needs, and the fact that the first job is not scheduled
shouldn't prevent it from being scheduled -- that is,
without having to use the resources that the first job can use
at the next schedule loop.

But with queue-depth=1, this out-of-order behavior isn't
possible and we use this for optimization -- bypassing the
reservation release.

We can view this as part of scheduler specialization.
@codecov-io

This comment has been minimized.

Copy link

codecov-io commented Mar 23, 2018

Codecov Report

Merging #294 into master will increase coverage by <.01%.
The diff coverage is 81.48%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #294      +/-   ##
==========================================
+ Coverage   74.13%   74.14%   +<.01%     
==========================================
  Files          49       49              
  Lines        9257     9279      +22     
==========================================
+ Hits         6863     6880      +17     
- Misses       2394     2399       +5
Impacted Files Coverage Δ
sched/sched_backfill.c 90.44% <100%> (+0.24%) ⬆️
sched/sched_fcfs.c 94.31% <100%> (+0.34%) ⬆️
sched/plugin.c 55.73% <60%> (+0.08%) ⬆️
sched/sched.c 73.67% <72.72%> (-0.1%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 822f739...d691856. Read the comment docs.

@coveralls

This comment has been minimized.

Copy link

coveralls commented Mar 23, 2018

Coverage Status

Coverage increased (+0.004%) to 75.886% when pulling d691856 on dongahn:sched_prop into 822f739 on flux-framework:master.

@dongahn

This comment has been minimized.

Copy link
Contributor Author

dongahn commented Mar 23, 2018

@twrs: if you want to try this before merge, you can cherry-pick this.

@SteVwonder

This comment has been minimized.

Copy link
Member

SteVwonder commented Mar 23, 2018

LGTM! Pressing the button.

@SteVwonder SteVwonder merged commit 09ad395 into flux-framework:master Mar 23, 2018

4 checks passed

codecov/patch 81.48% of diff hit (target 74.13%)
Details
codecov/project 74.14% (+<.01%) compared to 822f739
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
coverage/coveralls Coverage increased (+0.004%) to 75.886%
Details

@grondo grondo referenced this pull request May 11, 2018

Closed

Need 0.5.0 Release #340

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.