Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

flux-shell: populate PMI_process_mapping key #2278

Merged
merged 6 commits into from Jul 31, 2019

Conversation

@garlick
Copy link
Member

commented Jul 30, 2019

I forgot to populate this key, which, if present, is used by MPI to determine which procs can used shared memory to communicate because they are on the same node.

Add a helper function to libpmi, then use it in the shell's pmi implementation.

Add some tests that verify cliques work as intended for simple cases.

garlick added 5 commits Jul 30, 2019
Add a helper for encoding the PMI_process_mapping value
from an array of 'struct pmi_map_block's.

Link clique.c in with both libpmi_server.so and libpmi_client.so
instead of just the client, so this server-side helper
can be used by server implementations.
Problem: PMI_process_mapping is unavailable for clique
computation by MPI tasks.

Add logic to generate this key, and add it to the PMI
KVS hash on each shell.

If the key cannot be set because the value exceeds the
static value length, emit a log message on stderr but
don't treat it as an error since the client can run
without clique information.

Fixes #2277
Problem: this description of callbacks from the PMI protocol
engine was unclear.  It sounds like the whole thing is blocked
while one callback is handled.

Reword to emphasize that only the task that caused the
callback is effectively blocked while awaiting a response.
Other tasks may continue to work in the protocol engine,
and the shell's reactor is never stalled.
@garlick garlick force-pushed the garlick:PMI_process_mapping branch from bf45e5c to fdea25c Jul 30, 2019
PMI_process_mapping is added to the local PMI hash
at initialization, and should have the same value
on every shell.  But because it's in the hash, it's
considered "dirty data" to be committed to the shared
KVS during a PMI_Barrier.

Explicitly exclude this key from the KVS transaction
committed during the PMI_Barrier, and thus avoid overhead
that has no effect.
@codecov-io

This comment has been minimized.

Copy link

commented Jul 30, 2019

Codecov Report

Merging #2278 into master will increase coverage by <.01%.
The diff coverage is 85%.

@@            Coverage Diff             @@
##           master    #2278      +/-   ##
==========================================
+ Coverage   80.84%   80.84%   +<.01%     
==========================================
  Files         213      213              
  Lines       33419    33458      +39     
==========================================
+ Hits        27016    27049      +33     
- Misses       6403     6409       +6
Impacted Files Coverage Δ
src/common/libpmi/clique.c 100% <100%> (ø) ⬆️
src/shell/pmi.c 74.53% <76%> (+0.08%) ⬆️
@grondo

This comment has been minimized.

Copy link
Contributor

commented Jul 31, 2019

Can't find anything wrong here! Merging.

@grondo grondo merged commit 505e7c1 into flux-framework:master Jul 31, 2019
4 checks passed
4 checks passed
Summary 1 potential rule
Details
codecov/patch 85% of diff hit (target 80.84%)
Details
codecov/project 80.84% (+<.01%) compared to 94cf968
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@garlick

This comment has been minimized.

Copy link
Member Author

commented Jul 31, 2019

@garlick garlick deleted the garlick:PMI_process_mapping branch Jul 31, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can’t perform that action at this time.