Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

resource: support for hwloc ingestion #385

Merged
merged 8 commits into from Dec 13, 2018

Conversation

Projects
None yet
5 participants
@SteVwonder
Copy link
Member

SteVwonder commented Aug 31, 2018

Current state of the code for populating the resource library using HWLOC resource information.

Supports both an HWLOC XML file and the HWLOC in the Flux KVS as input.

Closes #404
Closes #355

@SteVwonder SteVwonder force-pushed the SteVwonder:resource-hwloc branch from a61a5fd to 3f43362 Aug 31, 2018

@coveralls

This comment has been minimized.

Copy link

coveralls commented Aug 31, 2018

Coverage Status

Coverage decreased (-1.03%) to 76.515% when pulling 3f43362 on SteVwonder:resource-hwloc into 3f36f33 on flux-framework:master.

@dongahn

This comment has been minimized.

Copy link
Contributor

dongahn commented Sep 1, 2018

Thanks @SteVwonder. I will take a look at this PR as soon as practical. I have just posted a PR to coordinate my work better with yours. Please look at the proposed steps to see if they sound okay with your plan.

Show resolved Hide resolved resource/Makefile.am Outdated
Show resolved Hide resolved resource/generators/gen.cpp Outdated
Show resolved Hide resolved resource/generators/gen.hpp Outdated
Show resolved Hide resolved resource/generators/gen.hpp Outdated
Show resolved Hide resolved resource/utilities/resource-query.cpp Outdated
Show resolved Hide resolved resource/Makefile.am Outdated
Show resolved Hide resolved resource/generators/gen.cpp
Show resolved Hide resolved resource/generators/gen.cpp
@codecov-io

This comment has been minimized.

Copy link

codecov-io commented Sep 6, 2018

Codecov Report

Merging #385 into master will increase coverage by 0.09%.
The diff coverage is 77.52%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #385      +/-   ##
==========================================
+ Coverage   75.53%   75.62%   +0.09%     
==========================================
  Files          67       67              
  Lines       10833    10992     +159     
==========================================
+ Hits         8183     8313     +130     
- Misses       2650     2679      +29
Impacted Files Coverage Δ
resource/generators/spec.hpp 100% <ø> (ø) ⬆️
resource/utilities/command.hpp 100% <ø> (ø) ⬆️
resource/modules/resource_match.cpp 71.63% <61.11%> (-0.03%) ⬇️
resource/utilities/resource-query.cpp 37.16% <62.5%> (+1.42%) ⬆️
resource/generators/gen.cpp 85.95% <87.96%> (+0.9%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 70bf109...fdd8a18. Read the comment docs.

@dongahn

This comment has been minimized.

Copy link
Contributor

dongahn commented Oct 24, 2018

@SteVwonder: Any progress for this one? I have knocked down most of the cleanup work I needed to do and my next task would be R format. I believe landing this PR can significantly help us to make progress on R. Thanks!

@SteVwonder

This comment has been minimized.

Copy link
Member Author

SteVwonder commented Nov 21, 2018

@dongahn: checkpointing before the holiday.

Note: I had to switch the prune filter to ALL:pu as we did while debugging in order for matches to occur against the hwloc resource graph. Once #400 is closed, we can go back to using the default prune-filter.

I created a separate issue for the generator class hierarchy (#411).

Still TODO:

  1. watch-once only on the 'resource.hwloc.loaded' key rather than every rank string
  2. switch to async KVS operations

For 1, I think this change would only take an hour or so to implement.
For 2, this is more involved, and will be much easier once flux-framework/flux-core#1826 is done. Do we want to make a separate issue & PR for that or bundle it in with this PR? My preference would be to do it in a separate PR.

@SteVwonder SteVwonder force-pushed the SteVwonder:resource-hwloc branch from ab3a8cc to d52cf86 Nov 22, 2018

@dongahn

This comment has been minimized.

Copy link
Contributor

dongahn commented Nov 22, 2018

Note: I had to switch the prune filter to ALL:pu as we did while debugging in order for matches to occur against the hwloc resource graph.

Yes, this is fine. We can perhaps even pass this as a resource match module load option.

@dongahn

This comment has been minimized.

Copy link
Contributor

dongahn commented Nov 22, 2018

Do we want to make a separate issue & PR for that or bundle it in with this PR? My preference would be to do it in a separate PR.

We definitely want a separate PR so that we can make progress on other work that depends on hwloc!

@dongahn

This comment has been minimized.

Copy link
Contributor

dongahn commented Nov 22, 2018

checkpointing before the holiday.

Have a great Thanksgiving, Stephen!

@SteVwonder

This comment has been minimized.

Copy link
Member Author

SteVwonder commented Nov 22, 2018

Yes, this is fine. We can perhaps even pass this as a resource match module load option.

Sorry, I should have been more clear. The code still has ALL:core as the default. I'm passing prune-filter=ALL:pu to the resource module when it is being loaded: https://github.com/flux-framework/flux-sched/pull/385/files#diff-18dbb5848dd9395d8821569fdae9a452R47

Have a great Thanksgiving, Stephen!

You too!

@SteVwonder

This comment has been minimized.

Copy link
Member Author

SteVwonder commented Nov 27, 2018

Restarted a hung builder. Travis seems happy now. Coverage is low due to the light testing of resource-query (the testing is currently done through the matching service). I can add a test that uses resource-query to load an hwloc xml file and run a few queries against the resource graph. Since there is also no testing against an hwloc with GPUs, I'll try and get two birds with one stone.

Also, I thought resource.hwloc.loaded was a single key, but it is actually a directory of keys, so reducing the total number of kvs watches will require entering the reactor (to watch the entire directory) which is a bigger refactor than I want to do for this PR. So let's push that off into a separate PR as well.

@SteVwonder SteVwonder force-pushed the SteVwonder:resource-hwloc branch from d52cf86 to 84d1451 Nov 28, 2018

@SteVwonder SteVwonder changed the title [WIP] resource: support for hwloc ingestion resource: support for hwloc ingestion Nov 28, 2018

@SteVwonder SteVwonder force-pushed the SteVwonder:resource-hwloc branch 2 times, most recently from 544a6b2 to a68cb53 Nov 28, 2018

@SteVwonder

This comment has been minimized.

Copy link
Member Author

SteVwonder commented Nov 28, 2018

Travis is happy. Coveralls is happy. I think this PR is (finally) ready for review. Sorry for the delay on this one.

The only piece of code (outside of error paths) that isn't tested is the opencl co-processor piece of hwloc parsing. At some point, we should get the hwloc output from a machine with AMD GPUs and add it to the test suite (making an issue about that now).

Show resolved Hide resolved resource/utilities/Makefile.am Outdated
Show resolved Hide resolved resource/utilities/Makefile.am Outdated
Show resolved Hide resolved resource/utilities/resource-query.cpp Outdated
@dongahn

This comment has been minimized.

Copy link
Contributor

dongahn commented Dec 5, 2018

@SteVwonder: I skimmed through the changes. By and large, this looks pretty good.

Please look at my inline review comments above. It seems there are some interim changes that are later removed and/or fixed. If you can squash those changes appropriately, it will help me to finish up my reviews sooner.

As part of that, could you also rebase?

Thanks!

@SteVwonder SteVwonder force-pushed the SteVwonder:resource-hwloc branch 2 times, most recently from d3ef6f5 to 198ffea Dec 5, 2018

@SteVwonder

This comment has been minimized.

Copy link
Member Author

SteVwonder commented Dec 5, 2018

If you can squash those changes appropriately, it will help me to finish up my reviews sooner.
As part of that, could you also rebase?

👍 Rebased onto master and incremental development has been squashed.

@SteVwonder SteVwonder force-pushed the SteVwonder:resource-hwloc branch 2 times, most recently from 3d07944 to 585e62c Dec 5, 2018

@SteVwonder

This comment has been minimized.

Copy link
Member Author

SteVwonder commented Dec 5, 2018

flux (and jansson) have now been removed from resource-query but support for reading from an hwloc xml file was left intact.

@SteVwonder SteVwonder referenced this pull request Dec 6, 2018

Closed

compilation: remove_if #416

@SteVwonder SteVwonder force-pushed the SteVwonder:resource-hwloc branch from 585e62c to d5d427d Dec 6, 2018

Show resolved Hide resolved resource/generators/gen.cpp Outdated
Show resolved Hide resolved resource/generators/gen.cpp Outdated
Show resolved Hide resolved resource/generators/gen.cpp Outdated
Show resolved Hide resolved resource/generators/gen.cpp
Show resolved Hide resolved resource/generators/gen.cpp Outdated
Show resolved Hide resolved resource/modules/resource_match.cpp Outdated
Show resolved Hide resolved resource/modules/resource_match.cpp Outdated
Show resolved Hide resolved resource/modules/resource_match.cpp Outdated
Show resolved Hide resolved resource/modules/resource_match.cpp Outdated
Show resolved Hide resolved resource/policies/base/matcher.cpp Outdated
@dongahn

This comment has been minimized.

Copy link
Contributor

dongahn commented Dec 10, 2018

@SteVwonder: OK. I finally managed to give my full review. This PR looks great with many needed functionalities!

There are a few minor nits for which I added in-line comments above. Once those are addressed, I will look at it one last time before this can land. Please rebase though.

@garlick and @grondo: there is one question for you regarding a comms. module calling assert

Thanks.

@SteVwonder

This comment has been minimized.

Copy link
Member Author

SteVwonder commented Dec 12, 2018

Thanks @dongahn. How would you prefer that I add the changes before your next review? Is it easier for you to review a single additional commit on top of the existing commits? Or should I immediately squash the changes into the existing commits?

@dongahn

This comment has been minimized.

Copy link
Contributor

dongahn commented Dec 12, 2018

For this PR, squashing the changes into the existing commits should work for me. This should minimize your work anyway. For a much more complex PR, I understand why you want to take the former approach. Thanks!

@SteVwonder SteVwonder force-pushed the SteVwonder:resource-hwloc branch from d5d427d to e659e19 Dec 12, 2018

@SteVwonder

This comment has been minimized.

Copy link
Member Author

SteVwonder commented Dec 12, 2018

@dongahn, I believe I addressed all of your comments. I pushed them as a single commit on top of the existing commits, thinking that will be easier to review. I can squash that down before your next pass, if you want, otherwise, I'll squash it down before this gets merged into master.

@dongahn

This comment has been minimized.

Copy link
Contributor

dongahn commented Dec 12, 2018

@SteVwonder: ok. I will get to this soonish...

@SteVwonder

This comment has been minimized.

Copy link
Member Author

SteVwonder commented Dec 12, 2018

Sorry! I apparently broke the build. Probably a silly syntax error (I thought with the minor changes, I was relatively safe). I'll try and fix later this evening, otherwise I do tomorrow.

@dongahn

This comment has been minimized.

Copy link
Contributor

dongahn commented Dec 12, 2018

generators/gen.cpp: In function 'int check_hwloc_version()':
generators/gen.cpp:631:9: error: 'm_err_msg' was not declared in this scope
         m_err_msg += msg.str();
         ^~~~~~~~~
generators/gen.cpp: In member function 'int Flux::resource_model::resource_generator_t::read_ranked_hwloc_xml(const char*, int, const ggv_t&, Flux::resource_model::resource_graph_db_t&)':
generators/gen.cpp:666:71: error: invalid operands of types 'const char*' and 'const char [3]' to binary 'operator+'
             m_err_msg += "Failed to load hwloc xml from rank " + rank + "; ";
                          ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~
Makefile:820: recipe for target 'generators/libresource_la-gen.lo' failed

My guess is there is no overloaded operator+ that takes the const char * type for both of its arguments .

 m_err_msg += std::string ("Failed to load hwloc xml from rank ")
                             + std::to_string (rank) + std::string ("; ");

should probably work.

@dongahn

This comment has been minimized.

Copy link
Contributor

dongahn commented Dec 12, 2018

@SteVwonder: OK. Changes look good. You can squash those changes and see if Travis CI will be happy. Once that's done, I will look through the code one last time before merging. Thanks.

SteVwonder added some commits Nov 9, 2018

m4/yamlcpp: add msg about optional check for Node::Mark
Add a message to separate output from an optional check that can fail
from the previous test that is required to pass.
resource-query: enable initializing resource graph from hwloc xml
valid hwloc xml sources include an xml file
resource_match: refactor argument error handling
continue processing even if unknown arguments are present.
make unknown arguments a non-fatal error.
make other initialization errors fatal.

@SteVwonder SteVwonder force-pushed the SteVwonder:resource-hwloc branch from e659e19 to 915084d Dec 13, 2018

@SteVwonder

This comment has been minimized.

Copy link
Member Author

SteVwonder commented Dec 13, 2018

All green now. Had to restart 3 hung builders. Hopefully Travis was just having a bad night last night, and that it is not a symptom of a larger problem.

@dongahn

This comment has been minimized.

Copy link
Contributor

dongahn commented Dec 13, 2018

@SteVwonder: LGTM. Thank you for much for finishing up this much anticipated PR!

If others want to give their reviews, this is the time to tell me.

Otherwise, I will merge 2PMish this afternoon.

@dongahn dongahn merged commit 781b95e into flux-framework:master Dec 13, 2018

3 checks passed

codecov/patch 77.52% of diff hit (target 75.53%)
Details
codecov/project 75.62% (+0.09%) compared to 70bf109
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details

@SteVwonder SteVwonder deleted the SteVwonder:resource-hwloc branch Feb 16, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.