-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
resource: stop collecting/reducing hwloc XML #4263
Conversation
Oh darn, looks like fluxion is failing a few tests:
|
I reran the sched test after sched merged the PR to disable the failing tests, but it looks like there may be another set that I missed before?
|
f5cda56
to
974ba5c
Compare
Codecov Report
@@ Coverage Diff @@
## master #4263 +/- ##
========================================
Coverage 83.54% 83.55%
========================================
Files 387 387
Lines 64867 64725 -142
========================================
- Hits 54194 54081 -113
+ Misses 10673 10644 -29
|
Rebased on current master and added a commit that improves error responses for unknown service methods (noticed while tracking down fluxion's use of the deprecated This won't pass the sched CI test until flux-framework/flux-sched#929 is merged. |
The sched PR was merged so kicking of CI again. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Thanks! |
Problem: the resource module collects hwloc XML to rank 0 by tacking it onto its Rlocal -> R reduction. This was to support fluxion while an Rv1 reader was being developed. That has now been completed, so this can be removed. Don't aggregate hwloc XML at rank 0. Drop the resource.get-xml RPC used by fluxion. Update resource tests.
Problem: the message dispatcher returns a human readable error when requests that don't match a handler, but the topic is not included, which is inconvenient when diagnosing failures. Add the topic to the error response. Also, clean up message topic usage in this function: - set topic to "unknown" if flux_msg_get_topic() fails to avoid uninitialized variable contents being passed to caliper on failure - eliminate duplicate flux_msg_get_topic() in fprintf call that is used as a last resort when a message is going to be dropped
Problem: a comment in topo.c states that XML is reduced along with R_local, however this is not true. XML is no longer reduced since flux-framework#4263 (in flux-core-0.39.0). Drop that part of the comment.
Problem: a comment in topo.c states that XML is reduced along with R_local, however this is not true. XML is no longer reduced since flux-framework#4263 (in flux-core-0.39.0). Drop that part of the comment.
Problem: a comment in topo.c states that XML is reduced along with R_local, however this is not true. XML is no longer reduced since flux-framework#4263 (in flux-core-0.39.0). Drop that part of the comment.
Now that fluxion has the Rv1 reader (flux-framework/flux-sched#921), it no longer requires the
resource.get-xml
RPC to fetch aggregated hwloc XML for the instance. This PR removes that RPC and leaves the hwloc XML out of the Rlocal ➡️ reduction, greatly reducing the amount of data that needs to be moved before the scheduler can be started when resources are dynamically discovered (e.g. under slurm).This PR is based on top of #4262 (remove
flux-hwloc
)