Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add EASY/HYBRID/CONSERVATIVE policies #504

Merged
merged 22 commits into from
Sep 18, 2019

Conversation

dongahn
Copy link
Member

@dongahn dongahn commented Jul 30, 2019

This PR adds various backfill queuing polices and queue-depth support for optimization:

  • Add EASY/HYBRID/CONSERVATIVE policies
    Introduce queue_policy_bf_base.hpp and queue_policy_bf_base_impl.hpp that implement core algorithms for a majority of backfillling strategies. Derive from this class to implement EASY/HYBRID/CONSERVATIVE policies.

  • Add queue-depth support for backfill policies

This PR builds on PR #502 and PR #503 so please don't merge it before those PRs land. Resolve #476, #482, and #498.

@dongahn
Copy link
Member Author

dongahn commented Jul 30, 2019

It is likely a test case will fail on one of the travis config. So I will work on that before our review. It would be best if this PR is reviewed last anyway.

@codecov-io
Copy link

codecov-io commented Jul 30, 2019

Codecov Report

Merging #504 into master will increase coverage by 0.33%.
The diff coverage is 78.73%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #504      +/-   ##
==========================================
+ Coverage   75.61%   75.95%   +0.33%     
==========================================
  Files          60       63       +3     
  Lines        6210     6442     +232     
==========================================
+ Hits         4696     4893     +197     
- Misses       1514     1549      +35
Impacted Files Coverage Δ
qmanager/policies/queue_policy_fcfs.hpp 100% <ø> (ø) ⬆️
qmanager/policies/base/queue_policy_base.hpp 100% <ø> (ø) ⬆️
qmanager/policies/queue_policy_easy_impl.hpp 100% <100%> (+100%) ⬆️
qmanager/policies/queue_policy_bf_base.hpp 100% <100%> (ø)
resource/traversers/dfu_impl.cpp 83.75% <100%> (+0.02%) ⬆️
qmanager/policies/queue_policy_fcfs_impl.hpp 97.05% <100%> (+1.82%) ⬆️
...anager/policies/queue_policy_conservative_impl.hpp 100% <100%> (ø)
qmanager/policies/queue_policy_factory_impl.hpp 66.66% <64.7%> (+33.33%) ⬆️
qmanager/policies/base/queue_policy_base_impl.hpp 71.35% <66.1%> (-2.22%) ⬇️
qmanager/modules/qmanager.cpp 70.55% <70.14%> (-0.1%) ⬇️
... and 13 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 59c2417...a93d023. Read the comment docs.

@dongahn
Copy link
Member Author

dongahn commented Jul 30, 2019

OK. This PR fixed all the issues including CI test failures and new/free mismatch problem that @grondo reported. This should be ready for your reviews -- But still preferably after PR #503 and #502.

@dongahn
Copy link
Member Author

dongahn commented Jul 31, 2019

Rebased to the master after PR #502 and forced a push.

Copy link
Member

@SteVwonder SteVwonder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @dongahn! Some questions, comments, and suggestions below. I also opened #508 to document some reservation optimizations that we can make to these new backfilling policies down the road.

qmanager/config/queue_system_defaults.hpp Outdated Show resolved Hide resolved
resource/modules/resource_match.cpp Outdated Show resolved Hide resolved
{
job_lifecycle_t state = job_lifecycle_t::INIT;

if (id < 0 || at < 0) {
if (id < 0 || now < 0 || at < 0) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might also be useful to check for and error out when at < now

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. Thanks.

qmanager/policies/base/queue_policy_base_impl.hpp Outdated Show resolved Hide resolved
qmanager/policies/queue_policy_hybrid_impl.hpp Outdated Show resolved Hide resolved
qmanager/policies/queue_policy_conservative_impl.hpp Outdated Show resolved Hide resolved
@dongahn
Copy link
Member Author

dongahn commented Sep 16, 2019

@SteVwonder: just a heads-up. I will get to this today unless something else pops up.

Add MATCH_ALLOCATE_W_SATISFIABILITY.

This operation attempts to allocate first.
And if succeeds, it returns the matching info as before.

If fails, however, it sets the scheduled time to a point
as late as possible within the time box of the planner
and try the match.
Then, if it can't still find matching resources
at that scheduled point, the jobspec is deemed
unsatisfiable.
This satisfiability query must not change the visible
state of the graph data store. To support this,
overload the update method that only cleans up
the state changes created by the satisfiability
checking walk.

dfu_impl_t::schedule() (and therefore also run())
returns this information with -1 return code
with errno=ENODEV.
This is different than EBUSY, which is set when
dfu_impl_t::schedule() (and therefore also run())
cannot find the matching resources
at a given scheduled point (e.g., now).

Cleaned up how various errnos are propagated
in dfu_impl.cpp to simplify how errno=ENODEV
and errno=EBUSY are determined at dfu_impl_t::schedule().

Update in-line documentation with error numbers
for the public dfu_traverser_t::run() method.
Decompose queue_policy_fcfs_t<reapi_type>::run_sched_loop() into
two orthogonal methods: cancel_completed_jobs() and allocate_jobs().
@dongahn
Copy link
Member Author

dongahn commented Sep 17, 2019

@SteVwonder: I rebased this PR to the current master and addressed all of your (excellent) review items. If CI turns green, this can use your second review. The only changes you will need to review are the ones marked with [squash].

@dongahn
Copy link
Member Author

dongahn commented Sep 17, 2019

OK. Travis CI just turned green!

@SteVwonder
Copy link
Member

LGTM! Squash away and I'll push the button.

Problem: Upcoming queuing policies (EASY, HYBRID and
CONSERVATIVE) will have to reserve and cancel jobs
many times as part of their backfill algorithms.
If a cancelled job remains in the resource matching
service module, a subsequent allocate_orelse_reserve
on the same job will immediately fail with ENOENT.

Completely remove jobs from the resource module when
they are successfully cancelled (via cancel callback).

Adjust our cancel tests in t/t4003-cancel-info.t
with this new cancel semantics.

Also adjust how performance statstics are computed
via the stat callback. We now explicitly track
the number of jobs that have been matched instead
of inferring it by looking up the number of jobs
that is in the resource module.
Problem: We use time 0 as base time for allocate
or allocate_orelse_reserve. While it was handy to test one
schedule loop within our resource infrastructure, this
doesn't allow for correct implementations for plan-based
scheduling algorithms including EAST and CONSERVATIVE.
For these algorithms, we need to move forward the base
time as the wallclock time moves forward.

Take the current time via gettimeofday() upon receiving each
match-allocate request within resource. If the matched
time is equal to the current time, we deem this case to
be "allocated." If the matched time is greater than the
current time, we deem this to be "reserved," instead.
Use the ISO 8601 format to display the scheduled times
reported by the flux-resource command.

Now that we use gettimeofday() as base time for scheduling,
this command can no longer display "Now".
ISO 8601 provides a more human readable time
representation than the epoch time reported by
gettimeofday().
Add support to pass parameters pertaining to the queuing policy
into the base class of queue policy layer.

Each parameter is a key-value pair specified as key=value. Multiple
parameters can be passed in threeo different ways: 1) call
the set_params method multiple times with a single key-value pair;
2) call the set_params method with multiple comma-delimited key-value
key-value pairs (e.g., reservation-depth=3,other-policy-param1=true);
; or 3) something in between.

Introduce apply_params() method into the the base class
of queue policy layer. This method must be called to effect the
parameters that have been passed so far. Note that this is a virtual
method which can be overriden by derived queue policy classes. This
way dervied classes can customize how to enforce their parameter
setting. If not overriden, the default apply_params() is a no-op.
Add support for queue-policy=<policy> where only fcfs
policy is added.

Add support for policy-params as well.
The key-value pair(s) passed via queue-params will control
general queuing behaviors (such as queue-depth=k in the future).
Add an implementation of EASY backfilling queuing policy.

EASY backfilling: Only the 1st waiting job is considered
for allocation, with a guaranteed starting time. When this
1st job cannot start right away because its requested amount
of resources is not available, the algorithm browses the list
of waiting jobs to find candidates for backfilling.
These candidates are jobs that can start immediately, but
without delaying the 1st job of the list.

Introduce queue_policy_bf_base.hpp and
queue_policy_bf_base_impl.hpp that implement core algorithms
for a majority of backfillling strategies. Then,
queue_policy_easy.hpp and queue_policy_easy_impl.hpp
implement an EASY backfilling algorithm by deriving from
this base class while specifializing it with reservation-depth=1.
CONSERVATIVE backfilling: A less aggressive alternative
to EASY with similar performance. It determines an allocation
for each job when it enters the system. Then a job can be a candidate
to backfilling􏰁if and only if it can begin its execution immediately
without delaying any of the other pre-allocated jobs.

HYBRID backfilling: An algorithm lies in between EASY and
CONSERVATIVE backfilling strategies. It allows for limiting
the number of waiting jobs that must have a starting time
guarantee to be K where K is configurable. K must be greater than
1 and less than the system max.

Implement HYBRID in queue_policy_hybrid.hpp and
queue_policy_bybrid_impl.hpp and CONSERVATIVE
in queue_policy_conservative.hpp and
queue_policy_conservative_impl.hpp.

Derived from the backfill base class while specifializing
it with reservation-depth=system-max or reservation-depth=K
where K is configurable.
When queue-params=queue-depth=<K> is specified as part
of qmanager module load commandline, run_sched_loop()
in each different queuing policy classes will not look
beyond Kth job in their pending job queue.

Improve scheduling performance in exchange for potentially
less effective backfilling.

Set reservation depth to queue depth when bigger.
Problem: Max values are set relatively low compared to
the practices used by other schedulers.

Max queue depth is now one million and max reservation
depth is now 100000.
Generate a bunch of differently sized and timed jobs
using flux-jobspec srun.

Test whether those jobs are scheduled in the order
defined by different backfilling strategies: EASY,
HYBRID and CONSERVATIVE.
Add hwloc-whitelist=node,core,gpu into resource's rc1 script
so that our graph scheduler can build and operate
on a significantly trimmed graph for scheduling (thus fast).
With the default hwloc-whitelist support within
resource's rc1, pu will not even be built into the
graph populated out of hwloc. So prune-filter=ALL:pu
won't do much in helping reducing tree walks
for matching.

Change this to prune-filters=ALL:core as core will
be present in most cases and by default it will
be at the fringes of the tree graph built with hwloc.
Add t1004-qmanager-optimize.t to populate test cases
pertaining to queue optimization.

Add cases for queue-depth optimization for backfilling
policies: EASY, HYBRID and CONSERVATIVE.
t/t1003-qmanager-policy.t misses a double amphesand
at the end of one intermediate statements.
Fix a new/free mismatch detected by Valgrind.
Problem: We currently set the queue depth and reservation
depth parameters to the default values when the parameter
values given by users are greater than the MAX values.
This can be non-intuitive.

Set them to the max values instead as this behavior
would be less confusing to the users.
@dongahn
Copy link
Member Author

dongahn commented Sep 18, 2019

@SteVwonder: I squashed most of them but decide to keep two commits on their own.

I thought it would be reasonable to leave a trail about these changes. If you are okay with these change, this PR is ready to go in once Travis turns green.

@dongahn
Copy link
Member Author

dongahn commented Sep 18, 2019

OK. It is green now.

@SteVwonder SteVwonder merged commit 17553f7 into flux-framework:master Sep 18, 2019
@SteVwonder
Copy link
Member

LGTM! Thanks @dongahn!

garlick added a commit to garlick/flux-sched that referenced this pull request Jul 25, 2020
Copy flux-core's t5000-valgrind.t script, since a few things
have changed.  Adjust for flux-sched:
- use SHARNESS_BUILD_DIRECTORY, not FLUX_BUILD_DIR
- reference config.h at top level of build tree
- drop unused BROKER variable

Fixes flux-framework#504
garlick added a commit to garlick/flux-sched that referenced this pull request Jul 25, 2020
Copy flux-core's t5000-valgrind.t script, since a few things
have changed.  Adjust for flux-sched:
- use SHARNESS_BUILD_DIRECTORY, not FLUX_BUILD_DIR
- reference config.h at top level of build tree
- drop unused BROKER variable

Fixes flux-framework#504
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add queuing optimization support
3 participants