Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add hwloc whitelist support #467

Merged
merged 6 commits into from Jul 6, 2019
Merged

Conversation

@dongahn
Copy link
Contributor

dongahn commented Jun 11, 2019

This PR adds support for hwloc whitelist, which allows the resource-query and -match to load in hwloc resources that are in the whitelist.

This alleviate two problems: 1) the graph date store created with hwloc is relatively large with no clear benefit to scheduling; 2) because hwloc topology oftentimes exhibit variations between different platforms, one jobspec that work on one platform doesn't necessarily work for another platform.

This also has a fix for a couple bugs: Issue #466.

Resolve Issue Issue #454 and #466.

@codecov-io

This comment has been minimized.

Copy link

codecov-io commented Jun 11, 2019

Codecov Report

Merging #467 into master will increase coverage by 0.08%.
The diff coverage is 73.91%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #467      +/-   ##
==========================================
+ Coverage   76.23%   76.31%   +0.08%     
==========================================
  Files          45       45              
  Lines        5478     5535      +57     
==========================================
+ Hits         4176     4224      +48     
- Misses       1302     1311       +9
Impacted Files Coverage Δ
resource/utilities/command.hpp 100% <ø> (ø) ⬆️
resource/traversers/dfu_impl.hpp 100% <ø> (ø) ⬆️
resource/generators/gen.cpp 87.04% <68.11%> (-0.75%) ⬇️
resource/modules/resource_match.cpp 71.46% <77.77%> (-0.01%) ⬇️
resource/traversers/dfu_impl.cpp 82.84% <82.35%> (+0.11%) ⬆️
resource/utilities/resource-query.cpp 49.71% <90.9%> (+1.16%) ⬆️
resource/writers/match_writers.cpp 93.59% <0%> (+0.6%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9720656...ca87b14. Read the comment docs.

@dongahn dongahn force-pushed the dongahn:whitelist branch 3 times, most recently from 327d24a to 5c2e852 Jun 14, 2019
@dongahn dongahn requested a review from SteVwonder Jun 14, 2019
@dongahn dongahn changed the title WIP: Add hwloc whitelist support Add hwloc whitelist support Jun 14, 2019
@dongahn

This comment has been minimized.

Copy link
Contributor Author

dongahn commented Jun 14, 2019

@SteVwonder: OK. Travis is green. This is ready for your review. This is kind of important to support minimal portable jobspec. One caveat is this may still not support the complete portability of a jobspec. Even though you will whitelist and downselect resources to populate the resource graph data store with this, the logic preserves the relative hierarchy of those resources. So if the relative hierarchy of a resource as reported by hwloc is different between two platforms, one jobspec may not work across both platforms. I suspect, GPUs could be in that category. I can imagine in one platform, hwloc may report this as:

node->socket->cpu
    ->gpu

And in another platform:

node->socket->cpu
            ->gpu

Although I haven't seen this yet.

@dongahn

This comment has been minimized.

Copy link
Contributor Author

dongahn commented Jun 14, 2019

I don't understand why codecov/patch coverage is reduced significantly...

@dongahn

This comment has been minimized.

Copy link
Contributor Author

dongahn commented Jun 16, 2019

I don't understand why codecov/patch coverage is reduced significantly...

@SteVwonder or @grondo: Should this be just okay? I vaguely remember there were problems with code coverage with docker support.

Copy link
Member

SteVwonder left a comment

Thanks @dongahn for putting this together! A few comments are below from my pass. I still need to give this a quick spin.

resource/generators/gen.cpp Outdated Show resolved Hide resolved
t/data/resource/commands/basics/cmds12.in Outdated Show resolved Hide resolved
t/data/resource/commands/basics/cmds12.in Outdated Show resolved Hide resolved
t/t4004-match-hwloc.t Outdated Show resolved Hide resolved
t/t4004-match-hwloc.t Outdated Show resolved Hide resolved
t/t4004-match-hwloc.t Outdated Show resolved Hide resolved
resource/traversers/dfu_impl.hpp Show resolved Hide resolved
resource/traversers/dfu_impl.cpp Outdated Show resolved Hide resolved
@SteVwonder

This comment has been minimized.

Copy link
Member

SteVwonder commented Jun 19, 2019

Should this be just okay? I vaguely remember there were problems with code coverage with docker support.

From what I'm seeing, it looks like the overall coverage stayed basically the same, but of the code that changed in this PR, a slightly smaller percentage was covered than the percentage covered in the repo as a whole. Peeking at the coverage diff breakdown, it looks like the only changes that aren't covered by testing are error paths. So I'm fine with this coverage as-is.

@dongahn

This comment has been minimized.

Copy link
Contributor Author

dongahn commented Jul 4, 2019

Will try to get to this tomorrow.

@dongahn

This comment has been minimized.

Copy link
Contributor Author

dongahn commented Jul 5, 2019

@SteVwonder: I push my changes that address all of your (great) review comments. Can you please quickly take a look the new commits? If all is good, I will squash them and then this should be good to be merged.

@SteVwonder

This comment has been minimized.

Copy link
Member

SteVwonder commented Jul 5, 2019

Thanks @dongahn. LGTM! Ready for squashing and then merging.

@dongahn dongahn force-pushed the dongahn:whitelist branch from 3957869 to 0f135d3 Jul 6, 2019
dongahn added 6 commits Jun 11, 2019
Problem: 1) the graph date store created with hwloc is relatively large
with no clear benefit to scheduling; 2) because hwloc topology
oftentimes exhibit variations between different platforms, one
jobspec that work on one platform doesn't necessarily work
for another platform.

Introduce hwloc whitelist support to mitigate these problems.

For query, specifying --hwloc-whitelist=node,socket,core will only
add these resource types (if detected from hwloc) into the
graph data store.

For resource-match service, hwloc-whiltelist=... as a module
load option will have the same effect.
Problem: We had a partial match logic to allow a jobspec
to omit the prefix of hierarchical resource requests.
It turned out this partial match logic accidentally
and thus incorrectly attempts partial matching for non-prefix,
lower-level components as well. For example,
  - with resource graph: cluster->node->socket->numanode->core,
  - and jobspec: node->socket->core
This logic doesn't appear to fail matching when it couldn't find
a qualifed resource at the socket level. Notice at this level,
resource graph has numanode while the jobspec looks for core.
This resulted in successful match with only partially matched R
("node->socket" but no "core"!).

Fix this by differentiating no-match case for prefix omission
support vs. other types of no-match.
Enumerate the former condition as PRESTINE_NONE_MATCH
and the latter NONE_MATCH: Traverser returns unsuccessful
match when it detects NONE_MATCH.
@dongahn dongahn force-pushed the dongahn:whitelist branch from 0f135d3 to ca87b14 Jul 6, 2019
@dongahn

This comment has been minimized.

Copy link
Contributor Author

dongahn commented Jul 6, 2019

@SteVwonder: Thanks. Squashed and pushed. Once Travis is ok, this is good to go.

@SteVwonder SteVwonder merged commit 3ed3da7 into flux-framework:master Jul 6, 2019
2 of 3 checks passed
2 of 3 checks passed
codecov/patch 73.91% of diff hit (target 76.23%)
Details
codecov/project 76.31% (+0.08%) compared to 9720656
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@SteVwonder

This comment has been minimized.

Copy link
Member

SteVwonder commented Jul 6, 2019

Thanks @dongahn!

@dongahn dongahn deleted the dongahn:whitelist branch Jul 13, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can’t perform that action at this time.