Add EASY/HYBRID/CONSERVATIVE policies #504

dongahn · 2019-07-30T01:06:40Z

This PR adds various backfill queuing polices and queue-depth support for optimization:

Add EASY/HYBRID/CONSERVATIVE policies
Introduce queue_policy_bf_base.hpp and queue_policy_bf_base_impl.hpp that implement core algorithms for a majority of backfillling strategies. Derive from this class to implement EASY/HYBRID/CONSERVATIVE policies.
Add queue-depth support for backfill policies

This PR builds on PR #502 and PR #503 so please don't merge it before those PRs land. Resolve #476, #482, and #498.

dongahn · 2019-07-30T01:24:30Z

It is likely a test case will fail on one of the travis config. So I will work on that before our review. It would be best if this PR is reviewed last anyway.

codecov-io · 2019-07-30T17:41:27Z

Codecov Report

Merging #504 into master will increase coverage by 0.33%.
The diff coverage is 78.73%.

@@            Coverage Diff             @@
##           master     #504      +/-   ##
==========================================
+ Coverage   75.61%   75.95%   +0.33%     
==========================================
  Files          60       63       +3     
  Lines        6210     6442     +232     
==========================================
+ Hits         4696     4893     +197     
- Misses       1514     1549      +35

Impacted Files	Coverage Δ
qmanager/policies/queue_policy_fcfs.hpp	`100% <ø> (ø)`	⬆️
qmanager/policies/base/queue_policy_base.hpp	`100% <ø> (ø)`	⬆️
qmanager/policies/queue_policy_easy_impl.hpp	`100% <100%> (+100%)`	⬆️
qmanager/policies/queue_policy_bf_base.hpp	`100% <100%> (ø)`
resource/traversers/dfu_impl.cpp	`83.75% <100%> (+0.02%)`	⬆️
qmanager/policies/queue_policy_fcfs_impl.hpp	`97.05% <100%> (+1.82%)`	⬆️
...anager/policies/queue_policy_conservative_impl.hpp	`100% <100%> (ø)`
qmanager/policies/queue_policy_factory_impl.hpp	`66.66% <64.7%> (+33.33%)`	⬆️
qmanager/policies/base/queue_policy_base_impl.hpp	`71.35% <66.1%> (-2.22%)`	⬇️
qmanager/modules/qmanager.cpp	`70.55% <70.14%> (-0.1%)`	⬇️
... and 13 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 59c2417...a93d023. Read the comment docs.

dongahn · 2019-07-30T22:04:05Z

OK. This PR fixed all the issues including CI test failures and new/free mismatch problem that @grondo reported. This should be ready for your reviews -- But still preferably after PR #503 and #502.

dongahn · 2019-07-31T04:00:33Z

Rebased to the master after PR #502 and forced a push.

SteVwonder

Thanks @dongahn! Some questions, comments, and suggestions below. I also opened #508 to document some reservation optimizations that we can make to these new backfilling policies down the road.

qmanager/config/queue_system_defaults.hpp

resource/modules/resource_match.cpp

SteVwonder · 2019-08-09T20:06:49Z

resource/modules/resource_match.cpp

 {
    job_lifecycle_t state = job_lifecycle_t::INIT;

-    if (id < 0 || at < 0) {
+    if (id < 0 || now < 0 || at < 0) {


might also be useful to check for and error out when at < now

Sure. Thanks.

qmanager/policies/base/queue_policy_base_impl.hpp

qmanager/policies/queue_policy_hybrid_impl.hpp

qmanager/policies/queue_policy_conservative_impl.hpp

dongahn · 2019-09-16T16:20:42Z

@SteVwonder: just a heads-up. I will get to this today unless something else pops up.

Add MATCH_ALLOCATE_W_SATISFIABILITY. This operation attempts to allocate first. And if succeeds, it returns the matching info as before. If fails, however, it sets the scheduled time to a point as late as possible within the time box of the planner and try the match. Then, if it can't still find matching resources at that scheduled point, the jobspec is deemed unsatisfiable. This satisfiability query must not change the visible state of the graph data store. To support this, overload the update method that only cleans up the state changes created by the satisfiability checking walk. dfu_impl_t::schedule() (and therefore also run()) returns this information with -1 return code with errno=ENODEV. This is different than EBUSY, which is set when dfu_impl_t::schedule() (and therefore also run()) cannot find the matching resources at a given scheduled point (e.g., now). Cleaned up how various errnos are propagated in dfu_impl.cpp to simplify how errno=ENODEV and errno=EBUSY are determined at dfu_impl_t::schedule(). Update in-line documentation with error numbers for the public dfu_traverser_t::run() method.

Decompose queue_policy_fcfs_t<reapi_type>::run_sched_loop() into two orthogonal methods: cancel_completed_jobs() and allocate_jobs().

dongahn · 2019-09-17T00:48:02Z

@SteVwonder: I rebased this PR to the current master and addressed all of your (excellent) review items. If CI turns green, this can use your second review. The only changes you will need to review are the ones marked with [squash].

dongahn · 2019-09-17T01:00:34Z

OK. Travis CI just turned green!

SteVwonder · 2019-09-17T22:20:13Z

LGTM! Squash away and I'll push the button.

Problem: Upcoming queuing policies (EASY, HYBRID and CONSERVATIVE) will have to reserve and cancel jobs many times as part of their backfill algorithms. If a cancelled job remains in the resource matching service module, a subsequent allocate_orelse_reserve on the same job will immediately fail with ENOENT. Completely remove jobs from the resource module when they are successfully cancelled (via cancel callback). Adjust our cancel tests in t/t4003-cancel-info.t with this new cancel semantics. Also adjust how performance statstics are computed via the stat callback. We now explicitly track the number of jobs that have been matched instead of inferring it by looking up the number of jobs that is in the resource module.

Problem: We use time 0 as base time for allocate or allocate_orelse_reserve. While it was handy to test one schedule loop within our resource infrastructure, this doesn't allow for correct implementations for plan-based scheduling algorithms including EAST and CONSERVATIVE. For these algorithms, we need to move forward the base time as the wallclock time moves forward. Take the current time via gettimeofday() upon receiving each match-allocate request within resource. If the matched time is equal to the current time, we deem this case to be "allocated." If the matched time is greater than the current time, we deem this to be "reserved," instead.

Use the ISO 8601 format to display the scheduled times reported by the flux-resource command. Now that we use gettimeofday() as base time for scheduling, this command can no longer display "Now". ISO 8601 provides a more human readable time representation than the epoch time reported by gettimeofday().

Add support to pass parameters pertaining to the queuing policy into the base class of queue policy layer. Each parameter is a key-value pair specified as key=value. Multiple parameters can be passed in threeo different ways: 1) call the set_params method multiple times with a single key-value pair; 2) call the set_params method with multiple comma-delimited key-value key-value pairs (e.g., reservation-depth=3,other-policy-param1=true); ; or 3) something in between. Introduce apply_params() method into the the base class of queue policy layer. This method must be called to effect the parameters that have been passed so far. Note that this is a virtual method which can be overriden by derived queue policy classes. This way dervied classes can customize how to enforce their parameter setting. If not overriden, the default apply_params() is a no-op.

Add support for queue-policy=<policy> where only fcfs policy is added. Add support for policy-params as well. The key-value pair(s) passed via queue-params will control general queuing behaviors (such as queue-depth=k in the future).

Add an implementation of EASY backfilling queuing policy. EASY backfilling: Only the 1st waiting job is considered for allocation, with a guaranteed starting time. When this 1st job cannot start right away because its requested amount of resources is not available, the algorithm browses the list of waiting jobs to find candidates for backfilling. These candidates are jobs that can start immediately, but without delaying the 1st job of the list. Introduce queue_policy_bf_base.hpp and queue_policy_bf_base_impl.hpp that implement core algorithms for a majority of backfillling strategies. Then, queue_policy_easy.hpp and queue_policy_easy_impl.hpp implement an EASY backfilling algorithm by deriving from this base class while specifializing it with reservation-depth=1.

CONSERVATIVE backfilling: A less aggressive alternative to EASY with similar performance. It determines an allocation for each job when it enters the system. Then a job can be a candidate to backfilling􏰁if and only if it can begin its execution immediately without delaying any of the other pre-allocated jobs. HYBRID backfilling: An algorithm lies in between EASY and CONSERVATIVE backfilling strategies. It allows for limiting the number of waiting jobs that must have a starting time guarantee to be K where K is configurable. K must be greater than 1 and less than the system max. Implement HYBRID in queue_policy_hybrid.hpp and queue_policy_bybrid_impl.hpp and CONSERVATIVE in queue_policy_conservative.hpp and queue_policy_conservative_impl.hpp. Derived from the backfill base class while specifializing it with reservation-depth=system-max or reservation-depth=K where K is configurable.

When queue-params=queue-depth=<K> is specified as part of qmanager module load commandline, run_sched_loop() in each different queuing policy classes will not look beyond Kth job in their pending job queue. Improve scheduling performance in exchange for potentially less effective backfilling. Set reservation depth to queue depth when bigger.

Problem: Max values are set relatively low compared to the practices used by other schedulers. Max queue depth is now one million and max reservation depth is now 100000.

Generate a bunch of differently sized and timed jobs using flux-jobspec srun. Test whether those jobs are scheduled in the order defined by different backfilling strategies: EASY, HYBRID and CONSERVATIVE.

Add hwloc-whitelist=node,core,gpu into resource's rc1 script so that our graph scheduler can build and operate on a significantly trimmed graph for scheduling (thus fast).

With the default hwloc-whitelist support within resource's rc1, pu will not even be built into the graph populated out of hwloc. So prune-filter=ALL:pu won't do much in helping reducing tree walks for matching. Change this to prune-filters=ALL:core as core will be present in most cases and by default it will be at the fringes of the tree graph built with hwloc.

Add t1004-qmanager-optimize.t to populate test cases pertaining to queue optimization. Add cases for queue-depth optimization for backfilling policies: EASY, HYBRID and CONSERVATIVE.

t/t1003-qmanager-policy.t misses a double amphesand at the end of one intermediate statements.

Fix a new/free mismatch detected by Valgrind.

Problem: We currently set the queue depth and reservation depth parameters to the default values when the parameter values given by users are greater than the MAX values. This can be non-intuitive. Set them to the max values instead as this behavior would be less confusing to the users.

dongahn · 2019-09-18T17:56:49Z

@SteVwonder: I squashed most of them but decide to keep two commits on their own.

I thought it would be reasonable to leave a trail about these changes. If you are okay with these change, this PR is ready to go in once Travis turns green.

dongahn · 2019-09-18T18:19:39Z

OK. It is green now.

SteVwonder · 2019-09-18T19:55:14Z

LGTM! Thanks @dongahn!

Copy flux-core's t5000-valgrind.t script, since a few things have changed. Adjust for flux-sched: - use SHARNESS_BUILD_DIRECTORY, not FLUX_BUILD_DIR - reference config.h at top level of build tree - drop unused BROKER variable Fixes flux-framework#504

dongahn force-pushed the queue-policies branch from 437c378 to 1b8c32e Compare July 30, 2019 19:44

dongahn requested review from SteVwonder and grondo July 30, 2019 22:04

dongahn force-pushed the queue-policies branch from 1b8c32e to f124848 Compare July 31, 2019 03:58

dongahn force-pushed the queue-policies branch from f124848 to e8250ea Compare August 7, 2019 22:45

SteVwonder mentioned this pull request Aug 9, 2019

qmanager: preserve reservations across scheduler loops (and optimize reservation clearing) #508

Closed

SteVwonder reviewed Aug 9, 2019

View reviewed changes

dongahn added 2 commits September 16, 2019 17:42

queue: refactor FCFS queue policy class

e3299f6

Decompose queue_policy_fcfs_t<reapi_type>::run_sched_loop() into two orthogonal methods: cancel_completed_jobs() and allocate_jobs().

dongahn force-pushed the queue-policies branch from 6cdeff4 to b267c45 Compare September 17, 2019 00:43

dongahn added 12 commits September 18, 2019 10:31

resource: Add time units for flux-resource stat's measurement

6ed5305

qmanager: Module-load commandline support

6ca06bf

Add support for queue-policy=<policy> where only fcfs policy is added. Add support for policy-params as well. The key-value pair(s) passed via queue-params will control general queuing behaviors (such as queue-depth=k in the future).

qmanager: Integrate EASY backfilling policy

045660d

queue: Set higher values for MAX queue and reservation depth

30bb692

Problem: Max values are set relatively low compared to the practices used by other schedulers. Max queue depth is now one million and max reservation depth is now 100000.

test: the correctness of various backfill policies

6c128e3

Generate a bunch of differently sized and timed jobs using flux-jobspec srun. Test whether those jobs are scheduled in the order defined by different backfilling strategies: EASY, HYBRID and CONSERVATIVE.

dongahn added 8 commits September 18, 2019 10:52

build: Integrate t1003-qmanager-policy.t tests

892f86f

resource: Add default hwloc-whitelist into rc1

2cbf670

Add hwloc-whitelist=node,core,gpu into resource's rc1 script so that our graph scheduler can build and operate on a significantly trimmed graph for scheduling (thus fast).

policy: Add correctness tests for queue-depth optimization

592416f

Add t1004-qmanager-optimize.t to populate test cases pertaining to queue optimization. Add cases for queue-depth optimization for backfilling policies: EASY, HYBRID and CONSERVATIVE.

build: Integrate t1004-qmanager-optimize.t

14ac6cd

test: fix an error detected by sharness lint

426814c

t/t1003-qmanager-policy.t misses a double amphesand at the end of one intermediate statements.

qmanager: use delete for a buffer allocated with new

83e529f

Fix a new/free mismatch detected by Valgrind.

dongahn force-pushed the queue-policies branch from b267c45 to a93d023 Compare September 18, 2019 17:53

SteVwonder merged commit 17553f7 into flux-framework:master Sep 18, 2019

This was referenced Sep 18, 2019

Mismatched new/free #505

Closed

Default hwloc-whitelist for our next near-term target? #498

Closed

Satisfiability check within resource #478

Closed

Add EASY and CONSERVATIVE queueing policy into qmanager #482

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add EASY/HYBRID/CONSERVATIVE policies #504

Add EASY/HYBRID/CONSERVATIVE policies #504

dongahn commented Jul 30, 2019

dongahn commented Jul 30, 2019

codecov-io commented Jul 30, 2019 •

edited

Loading

dongahn commented Jul 30, 2019

dongahn commented Jul 31, 2019

SteVwonder left a comment

SteVwonder Aug 9, 2019

dongahn Sep 16, 2019

dongahn commented Sep 16, 2019

dongahn commented Sep 17, 2019

dongahn commented Sep 17, 2019

SteVwonder commented Sep 17, 2019

dongahn commented Sep 18, 2019

dongahn commented Sep 18, 2019

SteVwonder commented Sep 18, 2019

Add EASY/HYBRID/CONSERVATIVE policies #504

Add EASY/HYBRID/CONSERVATIVE policies #504

Conversation

dongahn commented Jul 30, 2019

dongahn commented Jul 30, 2019

codecov-io commented Jul 30, 2019 • edited Loading

Codecov Report

dongahn commented Jul 30, 2019

dongahn commented Jul 31, 2019

SteVwonder left a comment

Choose a reason for hiding this comment

SteVwonder Aug 9, 2019

Choose a reason for hiding this comment

dongahn Sep 16, 2019

Choose a reason for hiding this comment

dongahn commented Sep 16, 2019

dongahn commented Sep 17, 2019

dongahn commented Sep 17, 2019

SteVwonder commented Sep 17, 2019

dongahn commented Sep 18, 2019

dongahn commented Sep 18, 2019

SteVwonder commented Sep 18, 2019

codecov-io commented Jul 30, 2019 •

edited

Loading