Join GitHub today
GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together.Sign up
Add EASY/HYBRID/CONSERVATIVE policies #504
This PR adds various backfill queuing polices and queue-depth support for optimization:
@@ Coverage Diff @@ ## master #504 +/- ## ========================================== + Coverage 75.61% 75.95% +0.33% ========================================== Files 60 63 +3 Lines 6210 6442 +232 ========================================== + Hits 4696 4893 +197 - Misses 1514 1549 +35
Decompose queue_policy_fcfs_t<reapi_type>::run_sched_loop() into two orthogonal methods: cancel_completed_jobs() and allocate_jobs().
Add MATCH_ALLOCATE_W_SATISFIABILITY. This operation attempts to allocate first. And if succeeds, it returns the matching info as before. If fails, however, it sets the scheduled time to a point as late as possible within the time box of the planner and try the match. Then, if it can't still find matching resources at that scheduled point, the jobspec is deemed unsatisfiable. This satisfiability query must not change the visible state of the graph data store. To support this, overload the update method that only cleans up the state changes created by the satisfiability checking walk. dfu_impl_t::schedule() (and therefore also run()) returns this information with -1 return code with errno=ENODEV. This is different than EBUSY, which is set when dfu_impl_t::schedule() (and therefore also run()) cannot find the matching resources at a given scheduled point (e.g., now). Cleaned up how various errnos are propagated in dfu_impl.cpp to simplify how errno=ENODEV and errno=EBUSY are determined at dfu_impl_t::schedule(). Update in-line documentation with error numbers for the public dfu_traverser_t::run() method.
Problem: Upcoming queuing policies (EASY, HYBRID and CONSERVATIVE) will have to reserve and cancel jobs many times as part of their backfill algorithms. If a cancelled job remains in the resource matching service module, a subsequent allocate_orelse_reserve on the same job will immediately fail with ENOENT. Completely remove jobs from the resource module when they are successfully cancelled (via cancel callback). Adjust our cancel tests in t/t4003-cancel-info.t with this new cancel semantics. Also adjust how performance statstics are computed via the stat callback. We now explicitly track the number of jobs that have been matched instead of inferring it by looking up the number of jobs that is in the resource module.
Add an implementation of EASY backfilling queuing policy. EASY backfilling: Only the 1st waiting job is considered for allocation, with a guaranteed starting time. When this 1st job cannot start right away because its requested amount of resources is not available, the algorithm browses the list of waiting jobs to find candidates for backfilling. These candidates are jobs that can start immediately, but without delaying the 1st job of the list. Introduce queue_policy_bf_base.hpp and queue_policy_bf_base_impl.hpp that implement core algorithms for a majority of backfillling strategies. Then, queue_policy_easy.hpp and queue_policy_easy_impl.hpp implement an EASY backfilling algorithm by deriving from this base class while specifializing it with reservation-depth=1.
Add support for queue-policy=<policy> where only fcfs policy is added. Add support for policy-params as well. The key-value pair(s) passed via queue-params will control general queuing behaviors (such as queue-depth=k in the future).
Add support to pass parameters pertaining to the queuing policy into the base class of queue policy layer. Each parameter is a key-value pair specified as key=value. Multiple parameters can be passed in threeo different ways: 1) call the set_params method multiple times with a single key-value pair; 2) call the set_params method with multiple comma-delimited key-value key-value pairs (e.g., reservation-depth=3,other-policy-param1=true); ; or 3) something in between. Introduce apply_params() method into the the base class of queue policy layer. This method must be called to effect the parameters that have been passed so far. Note that this is a virtual method which can be overriden by derived queue policy classes. This way dervied classes can customize how to enforce their parameter setting. If not overriden, the default apply_params() is a no-op.
Use the ISO 8601 format to display the scheduled times reported by the flux-resource command. Now that we use gettimeofday() as base time for scheduling, this command can no longer display "Now". ISO 8601 provides a more human readable time representation than the epoch time reported by gettimeofday().
Problem: We use time 0 as base time for allocate or allocate_orelse_reserve. While it was handy to test one schedule loop within our resource infrastructure, this doesn't allow for correct implementations for plan-based scheduling algorithms including EAST and CONSERVATIVE. For these algorithms, we need to move forward the base time as the wallclock time moves forward. Take the current time via gettimeofday() upon receiving each match-allocate request within resource. If the matched time is equal to the current time, we deem this case to be "allocated." If the matched time is greater than the current time, we deem this to be "reserved," instead.
CONSERVATIVE backfilling: A less aggressive alternative to EASY with similar performance. It determines an allocation for each job when it enters the system. Then a job can be a candidate to backfillingif and only if it can begin its execution immediately without delaying any of the other pre-allocated jobs. HYBRID backfilling: An algorithm lies in between EASY and CONSERVATIVE backfilling strategies. It allows for limiting the number of waiting jobs that must have a starting time guarantee to be K where K is configurable. K must be greater than 1 and less than the system max. Implement HYBRID in queue_policy_hybrid.hpp and queue_policy_bybrid_impl.hpp and CONSERVATIVE in queue_policy_conservative.hpp and queue_policy_conservative_impl.hpp. Derived from the backfill base class while specifializing it with reservation-depth=system-max or reservation-depth=K where K is configurable.
When queue-params=queue-depth=<K> is specified as part of qmanager module load commandline, run_sched_loop() in each different queuing policy classes will not look beyond Kth job in their pending job queue. Improve scheduling performance in exchange for potentially less effective backfilling. Set reservation depth to queue depth when bigger.
Generate a bunch of differently sized and timed jobs using flux-jobspec srun. Test whether those jobs are scheduled in the order defined by different backfilling strategies: EASY, HYBRID and CONSERVATIVE.
Add hwloc-whitelist=node,core,gpu into resource's rc1 script so that our graph scheduler can build and operate on a significantly trimmed graph for scheduling (thus fast).
With the default hwloc-whitelist support within resource's rc1, pu will not even be built into the graph populated out of hwloc. So prune-filter=ALL:pu won't do much in helping reducing tree walks for matching. Change this to prune-filters=ALL:core as core will be present in most cases and by default it will be at the fringes of the tree graph built with hwloc.
Add t1004-qmanager-optimize.t to populate test cases pertaining to queue optimization. Add cases for queue-depth optimization for backfilling policies: EASY, HYBRID and CONSERVATIVE.
t/t1003-qmanager-policy.t misses a double amphesand at the end of one intermediate statements.
Problem: We currently set the queue depth and reservation depth parameters to the default values when the parameter values given by users are greater than the MAX values. This can be non-intuitive. Set them to the max values instead as this behavior would be less confusing to the users.