Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure scheduling correctness with different prune filter configurations #419

Merged
merged 14 commits into from Jan 17, 2019

Conversation

Projects
None yet
3 participants
@dongahn
Copy link
Contributor

dongahn commented Dec 18, 2018

This PR add various fixes to ensure correctness around the use of pruning filters.

  • Use at least one pruning filter type to ensure the schedule method within dfu.cpp can traverse resource state changing scheduled points for all cases.

  • Enable the pruning filter planner, even if the specified filter types are not present at the subtree of the anchor type. This ensures correctness when such a condition ever arises.

  • Add test cases using previous failed test codes

  • Add test cases to test various module load options for the resource match service

  • Other misc. changes

Resolve Issue #400
Resolve Issue #388

@dongahn

This comment has been minimized.

Copy link
Contributor Author

dongahn commented Dec 18, 2018

@SteVwonder: I had a few more things I wanted to address. But before I will be out until the end of this year, I thought I wanted to get this much work to be reviewed and possibly merged. Thanks.

@dongahn dongahn requested a review from SteVwonder Dec 18, 2018

@dongahn dongahn force-pushed the dongahn:prune_correctness branch from 4b55339 to c8e64ad Jan 3, 2019

@codecov-io

This comment has been minimized.

Copy link

codecov-io commented Jan 3, 2019

Codecov Report

Merging #419 into master will increase coverage by 0.04%.
The diff coverage is 80%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #419      +/-   ##
==========================================
+ Coverage   75.62%   75.67%   +0.04%     
==========================================
  Files          67       67              
  Lines       10992    11026      +34     
==========================================
+ Hits         8313     8344      +31     
- Misses       2679     2682       +3
Impacted Files Coverage Δ
resource/policies/base/matcher.hpp 100% <ø> (ø) ⬆️
resource/generators/gen.cpp 86.66% <100%> (+0.71%) ⬆️
resource/traversers/dfu.cpp 73.68% <100%> (ø) ⬆️
resource/modules/resource_match.cpp 75.28% <100%> (+3.65%) ⬆️
resource/traversers/dfu_impl.hpp 100% <100%> (ø) ⬆️
resource/planner/planner.c 77.93% <100%> (ø) ⬆️
resource/traversers/dfu_impl.cpp 82.66% <100%> (+0.03%) ⬆️
resource/policies/base/matcher.cpp 66.91% <60%> (-6.31%) ⬇️
resource/utilities/resource-query.cpp 39.09% <88.88%> (+1.93%) ⬆️
... and 3 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update cf92d6d...b9602d0. Read the comment docs.

@dongahn dongahn force-pushed the dongahn:prune_correctness branch from c8e64ad to 11366ef Jan 3, 2019

@dongahn

This comment has been minimized.

Copy link
Contributor Author

dongahn commented Jan 3, 2019

@SteVwonder: I have added all of the fixes that I had originally planned, and this PR is ready to be reviewed.

@dongahn dongahn force-pushed the dongahn:prune_correctness branch from 11366ef to 609d253 Jan 8, 2019

@dongahn

This comment has been minimized.

Copy link
Contributor Author

dongahn commented Jan 8, 2019

OK. Rebased it to pick up @grondo's PR #420.

@dongahn dongahn force-pushed the dongahn:prune_correctness branch from 609d253 to 8925325 Jan 10, 2019

@dongahn

This comment has been minimized.

Copy link
Contributor Author

dongahn commented Jan 10, 2019

OK. Rebased.

@dongahn

This comment has been minimized.

Copy link
Contributor Author

dongahn commented Jan 11, 2019

Ok. Rebased and forced a push for PR #427.

@SteVwonder
Copy link
Member

SteVwonder left a comment

Thanks @dongahn for this fix (and giving me the high-level overview on the whiteboard). Generally, looks good to me!

Left a few comments that could be optimizations or could be misunderstandings of the logic on my end.

Related: in #385, I change the default prune filter for resource-match to ALL:pu but didn't change resource-query. So resource-query and resource-match no longer have the same default prune filters (ALL:core and ALL:pu, respectively). Since resource-query still uses ALL:core, several of the tests in t3011-resource-filt.t are actually testing the same pruning filter set. Do you mind switching resource-query to ALL:pu in this PR? Sorry about that!

Show resolved Hide resolved resource/policies/base/matcher.cpp Outdated
Show resolved Hide resolved resource/traversers/dfu_impl.cpp Outdated
Show resolved Hide resolved resource/traversers/dfu_impl.cpp
@dongahn

This comment has been minimized.

Copy link
Contributor Author

dongahn commented Jan 14, 2019

@SteVwonder: Thank you for thorough review!

Related: in #385, I change the default prune filter for resource-match to ALL:pu but didn't change resource-query. So resource-query and resource-match no longer have the same default prune filters (ALL:core and ALL:pu, respectively).

This is actually intended. We use resource-query on GRUG whose test cases don't have pu, I wanted to use core as the filter type. Plus, I thought having pu on the resource matching service and core on the CLI could increase our testing coverage.

Since resource-query still uses ALL:core, several of the tests in t3011-resource-filt.t are actually testing the same pruning filter set.

Also intended. Wanted to make sure specifying the same filter type as the default leads to correct scheduling behavior.

@dongahn

This comment has been minimized.

Copy link
Contributor Author

dongahn commented Jan 14, 2019

@SteVwonder: I will try to get to some of your review points maybe tomorrow. Let me know if you disagree with anything I commented above. Thanks.

@dongahn dongahn force-pushed the dongahn:prune_correctness branch from d7c85d7 to 87d8d83 Jan 15, 2019

dongahn added some commits Dec 15, 2018

resource: Use a more meaningful method name
Change count to count_relevant_types. Longer method name
but this should improve readability.
resource: Update prune filters even for a jobspec without filter types
E.g., Given that --prune-filters=ALL:core and the jobsepc is

resources:
    - type: cluster
      count: 1
      with:
        - type: rack
          count: 1
          with:
            - type: slot
              count: 2
              label: default
              with:
                - type: node
                  count: 1

Since there is no change in the core-type resource, the old
scheme simply won't update the subtree planner
(so called prune filter).

However, this can lead to a side effect in that you can't use
this planner to retrieve all of the state changing
scheduled point.

The most notable problem arises with respect to the filter
at the root (e.g., cluster) vertex.

Within the dfu_traverser_t::schedule method, we use
planner_multi_avail_time_first and _next to quickly
traverse the resource state changing scheduled points.
In the worse case, if the previous jobspecs had not included
the type of resources specified in the prune filter,
this traversal will simply lead to a failure
-- finding no scheduled point!.
resource: Maintain at least one prune filter as default
Problem: With no prune filter installed, the schedule method
in dfu.cpp cannot correctly traverse the available scheduled
point for reservation.

If an argument to --prune-filters is given, we now append
these additional filter types to the list instead of
replacing the list.

Redundant types are resolved in the set_pruning_type method
by the use of std::set as the value field of the std::map.
resource: Add get_my_prune_types method
Return all of the qualified pruning type given the anchor type.

Make it easy for the upper layer to traverse through all
of the pruning types at each anchor type.
resource: Add all of the qualified types into the prune filter
Enable the pruning filter planner, even if the specified filter
types are not present at the subtree of the anchor type.

Ensure correctness when such a condition ever arises.

dongahn added some commits Jan 16, 2019

resource: Bug fix for when anchor_type is ANY_RESOURCE_TYPE
Problem: If anchor_type is ANY_RESOURCE_TYPE, the
get_my_pruning_types method copies the pruning types
twice into the out argument.

Fix it with an additional conditional.
resource: Better handle redundant prune-filter specs.
Problem: when prune filters are specified redundantly,
the matcher code has some loose end. For example,
given prune-filters="ALL:core,node:core",
get_my_pruning_types will return two exactly
same entries (whose type is "core"). In this case,
the logic should be to detect that "core" has already
been installed as a prune type for all of the
resource vertices and simply ignore "node:core."
The same is true when prune-filters="node:core,ALL:core".

Fix the above problem.

@dongahn dongahn force-pushed the dongahn:prune_correctness branch from 87d8d83 to b9602d0 Jan 16, 2019

@dongahn

This comment has been minimized.

Copy link
Contributor Author

dongahn commented Jan 16, 2019

@SteVwonder: The last three commits should address your comments. I decide to change the set_pruning_type method such that all redundant pruning filter type specifications will be handled correctly.

Rebased to the current master. Let me know if I miss something from your reviews.

@SteVwonder

This comment has been minimized.

Copy link
Member

SteVwonder commented Jan 16, 2019

LGTM! Thanks @dongahn! Let me know whether or not you want to squash any commits. If not, I'm happy to merge as is, otherwise, I will wait for your signal to merge.

@dongahn

This comment has been minimized.

Copy link
Contributor Author

dongahn commented Jan 16, 2019

Thanks. @SteVwonder. I don't think we need to squash. Once this is merged, we should have all the changes needed for our next release!

@SteVwonder SteVwonder merged commit 0524359 into flux-framework:master Jan 17, 2019

3 checks passed

codecov/patch 80% of diff hit (target 75.62%)
Details
codecov/project 75.67% (+0.04%) compared to 270c160
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@dongahn

This comment has been minimized.

Copy link
Contributor Author

dongahn commented Jan 17, 2019

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.