Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify the PMIx_Get processing: #114

Closed
wants to merge 1 commit into from

Conversation

rhc54
Copy link
Contributor

@rhc54 rhc54 commented Jul 18, 2016

If the key starts with “pmix”, then it is a PMIx-namespace key and can only be in the data provided at startup (which is stored in the “internal" hash table). Otherwise, it is data that was PMix_Put and is in the “modex” hash table. PMIx job-level data is stored in the “internal” hash table against rank=PMIX_RANK_WILDCARD.

This reduces all calls to PMIx_Get to a single hash-table lookup

@jladd-mlnx
Copy link

👍

@rhc54
Copy link
Contributor Author

rhc54 commented Jul 18, 2016

@dsolt Please see if this fixes the problem you observed

@rhc54
Copy link
Contributor Author

rhc54 commented Jul 19, 2016

I think the Jenkins server is picking up a stale aclocal.m4 that is leftover from the MCA pull request. I can't replicate a build problem with this PR, including doing make_dist_tarball

@rhc54 rhc54 force-pushed the topic/simplifyget branch 2 times, most recently from da3fe06 to f6fe9e6 Compare July 21, 2016 03:39
"pmix: found requested value");
if (PMIX_SUCCESS != (rc = pmix_bfrop.copy((void**)&val, cur_kval->value, PMIX_VALUE))) {
"pmix: unpacked key %s", cur_kval->key);
if (PMIX_SUCCESS != (rc = pmix_hash_store(&nptr->modex, cur_rank, cur_kval))) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is Get, why do we pmix_hash_store here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I see. This is response and we store as well as return.

@rhc54
Copy link
Contributor Author

rhc54 commented Jul 21, 2016

In addition to the line-based reviews, I've hit something in the OMPI cleanup that bothers me a bit and likely merits more discussion. The issue arose out of support for heterogeneous systems. In this case, each process "puts" its locally detected architecture, the info is circulated as part of the "fence", and then retrieved via "get". However, the architecture can also be provided by the local daemon (so each proc isn't probing it) and even a hetero setup could be known by the RM, thus enabling all arch's to be provided in the job info.

My point here is the issue of what prefix to use when retrieving arch info? If the RM provided it, then it needs to have the "pmix" prefix. However, if the process "puts" it, then we won't find it if we use a key that starts with "pmix" - the process must "put" it with a non-PMIx key.

So I'm wondering if we need to rethink this optimization? I could see adding a check on "put" that returns an error if the key uses a "pmix" prefix - that would at least prevent someone from making the mistake. Alternatively, we could take all "put" data that uses a key with "pmix" prefix and separate it out so it gets stored in the right place on the remote end.

Opinions? I have a slight preference for the latter approach as I don't like returning errors if it can be avoided.

@artpol84
Copy link
Contributor

Can PMIx itself probe for architecture? In this case procs will never need to probe it's arch and PMIx is free to use pmix prefix.

@rhc54
Copy link
Contributor Author

rhc54 commented Jul 21, 2016

I don't think so as this isn't something standardized - every library has their own representation for architecture. It's why the RM has a problem in providing it. In this case, OMPI's internal daemon is "acting" as the RM, and so the values are consistent - but that is the exception case, not the rule.

So we could just define a more generic architecture info that the RM can provide, and then let the libraries convert it to their internal representation. Then the key for this generic value would have a "pmix" prefix, and if the library wants to circulate their internal representation, then that must not have a "pmix" prefix.

Does that make sense?

@artpol84
Copy link
Contributor

Sounds OK to me

2016-07-21 23:18 GMT+06:00 rhc54 notifications@github.com:

I don't think so as this isn't something standardized - every library has
their own representation for architecture. It's why the RM has a problem in
providing it. In this case, OMPI's internal daemon is "acting" as the RM,
and so the values are consistent - but that is the exception case, not the
rule.

So we could just define a more generic architecture info that the RM can
provide, and then let the libraries convert it to their internal
representation. Then the key for this generic value would have a "pmix"
prefix, and if the library wants to circulate their internal
representation, then that must not have a "pmix" prefix.

Does that make sense?


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#114 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AHL5PpCUundzvrfKPjkqjge7Y5ke-rPaks5qX6nPgaJpZM4JPEzy
.

С Уважением, Поляков Артем Юрьевич
Best regards, Artem Y. Polyakov

@rhc54
Copy link
Contributor Author

rhc54 commented Jul 21, 2016

I have addressed the comments - if acceptable, I'd like to push off the change regarding the architecture value to another PR, and defer the dstor integration to @artpol84 and friends for after this is committed.

@rhc54
Copy link
Contributor Author

rhc54 commented Jul 21, 2016

@artpol84 Question arose about whether we really need to make the constraint on rank value - i.e., that all job-related values must be requested with rank "wildcard", and non-job-related values must have their specific rank. This makes things a little harder on users as they have to understand which PMIx keys pertain to job-related data, vs which pertain to proc-specific data.

While the constraint saves us one hash table search, we were a little uncomfortable with imposing that knowledge on the user. It is what tripped us up in OMPI, and I confess I had to think hard on every single pmix put/get call to ensure I had it right.

It occurred to us that the dstor might actually resolve the problem by making the performance hit of looking in two places essentially disappear - we could check the job-info memory location and then check the rank-specific one without any significant time penalty.

Does this sound reasonable to you based on what you guys have done for dstor? Assuming someone asks for a PMIx-prefixed key, can we check job-level data (i.e., data stored with rank wildcard) and then check rank-level data (assuming they gave us a non-wildcard rank) with little penalty?

@rhc54
Copy link
Contributor Author

rhc54 commented Jul 21, 2016

I suppose we also need to ask: what do we do about backward compatibility here? Do we need to worry about 1.1.4-based apps that didn't follow this rule? In the "cross-version" code, we know what library version the app is using - do we need to take that into account here?

Seems like we probably do...sigh...

@ggouaillardet
Copy link
Contributor

@rhc54 it seems this has already landed into ompi master.
i found an issue (unpack failure) when mpirun -np 1 ./dynamic/spawn
and the root cause was the nspace was not unpacked.
here is a patch for this branch (from open-mpi/ompi@21e7f31)

diff --git a/src/client/pmix_client_get.c b/src/client/pmix_client_get.c
index b167344..714d0e5 100644
--- a/src/client/pmix_client_get.c
+++ b/src/client/pmix_client_get.c
@@ -1,7 +1,7 @@
 /* -*- Mode: C; c-basic-offset:4 ; indent-tabs-mode:nil -*- */
 /*
  * Copyright (c) 2014-2016 Intel, Inc.  All rights reserved.
- * Copyright (c) 2014-2015 Research Organization for Information Science
+ * Copyright (c) 2014-2016 Research Organization for Information Science
  *                         and Technology (RIST). All rights reserved.
  * Copyright (c) 2014      Artem Y. Polyakov <artpol84@gmail.com>.
  *                         All rights reserved.
@@ -298,6 +298,15 @@ static void _getnb_cbfunc(struct pmix_peer_t *pr, pmix_usock_hdr_t *hdr,
         if (PMIX_SUCCESS == (rc = pmix_bfrop.unpack(buf, &bptr, &cnt, PMIX_BUFFER))) {
             /* if the rank is WILDCARD, then this is an nspace blob */
             if (PMIX_RANK_WILDCARD == cur_rank) {
+                char *nspace;
+                /* unpack the nspace - we don't really need it, but have to
+                 * unpack it to maintain sequence */
+                cnt = 1;
+                if (PMIX_SUCCESS != (rc = pmix_bfrop.unpack(bptr, &nspace, &cnt, PMIX_STRING))) {
+                    PMIX_ERROR_LOG(rc);
+                    return;
+                }
+                free(nspace);
                 pmix_client_process_nspace_blob(cb->nspace, bptr);
             } else {
                 cnt = 1;

@rhc54
Copy link
Contributor Author

rhc54 commented Aug 1, 2016

@ggouaillardet Thanks! Yes, this has been on the OMPI master for the last several weeks as some groups needed it there.

@rhc54 rhc54 changed the title Simplify the PMIx_get processing: Simplify the PMIx_Get processing: Aug 1, 2016
@rhc54
Copy link
Contributor Author

rhc54 commented Aug 1, 2016

@artpol84 Looks like Jenkins is failing to find libevent - can you folks take a look?

@artpol84
Copy link
Contributor

artpol84 commented Aug 1, 2016

@rhc54 ok

@artpol84
Copy link
Contributor

artpol84 commented Aug 1, 2016

bot:retest

@artpol84
Copy link
Contributor

artpol84 commented Aug 1, 2016

looks like libevent migrated from sourceforge to github

@rhc54
Copy link
Contributor Author

rhc54 commented Aug 1, 2016

oho - interesting!

@artpol84
Copy link
Contributor

artpol84 commented Aug 1, 2016

bot:retest

@rhc54
Copy link
Contributor Author

rhc54 commented Aug 1, 2016

@artpol84 Thanks!

@rhc54 rhc54 force-pushed the topic/simplifyget branch 3 times, most recently from ad81e00 to 0f7c690 Compare August 2, 2016 19:31
@rhc54 rhc54 added this to the v2.0 milestone Aug 2, 2016
@rhc54 rhc54 self-assigned this Aug 2, 2016
@rhc54
Copy link
Contributor Author

rhc54 commented Aug 2, 2016

It occurred to me that this commit isn't complete - for example, if we are going to require that someone specify the nodeid when requesting the number of peers executing on that node, then we must have stored that info as a function of the nodeid. Our current storage code is based solely on rank and has no concept of other indices. With this change, we will need to revamp that system so it can recognize data that is rank-independent and store it accordingly.

@rhc54
Copy link
Contributor Author

rhc54 commented Aug 4, 2016

Per telecon of Aug 4th:

  • Primary concern was over how to handle node-specific information. Our current storage is based on rank, not nodeid, and so is the PMIx_Get function. We discussed ways of indicating that the PMIx_Get call is requesting node-level data, but it rapidly became confusing and clunky.
  • We propose to do the following:
    • Use the PMIx_Query function to retrieve node-level information, thus avoiding further overload of PMIx_Get
    • Modify PMIx_Query to use a new pmix_query_t structure that contains:
      • char **keys - identify information being requested
      • pmix_info_t qualifiers[] - array of qualifiers to direct the search
      • size_t nqual - number of qualifiers in array
    • Return function for PMIx_Query to remain the same, with each pmix_info_t in the array to correspond to a pmix_query_t. Multiple values returned for a single pmix_query_t will be provided in the pmix_data_array_t field in the pmix_info_t's value
    • Create a new hash table for node-level data

@artpol84
Copy link
Contributor

artpol84 commented Aug 8, 2016

bot:retest

@rhc54 rhc54 force-pushed the topic/simplifyget branch 2 times, most recently from 33797f7 to ea88862 Compare August 11, 2016 22:00
If the key starts with “pmix”, then it is a PMIx-namespace key and can only be in the data provided at startup (which is stored in the “internal" hash table). Otherwise, it is data that was PMix_Put and is in the “modex” hash table. PMIx job-level data is stored in the “internal” hash table against rank=PMIX_RANK_WILDCARD.

This reduces all calls to PMIx_Get to a single hash-table lookup

Fix the problem of a proc asking for job-level data from another nspace by ensuring that the server detects the situation, and then returns the nspace blob for the other nspace.

Backport of Coverity warning fixes

Address review line comments from @artpol84.

Backport changes from OMPI

free(nspace) in job_data that is read and allocated but not used.  Still needs to be freed
even though we don't use it.

Fix two cases of pack being called with the wrong pointer type

When reading a namespace blob from a namespace other than our own, we still have to read
the nspace name off first and then we need to actually check if the blob contains what
we are looking for.

Update comments on keys per telecon

Check incoming keys to PMIx_Put for namespace reservation reject if they violate the PMIx reservation

Remove generated file

Add missing file back

Reset common file
@dsolt
Copy link
Contributor

dsolt commented Aug 16, 2016

When I use this branch with open-mpi master it fails in MPI_Init. I'm using the external component for pmix. I have not tried simply copying the code in place. Has anyone else tested this with open-mpi master?

@rhc54
Copy link
Contributor Author

rhc54 commented Aug 16, 2016

It won't work without an update of the OMPI integration as there is further redefinition of the types. I believe I have a branch to support it, but it is also out of date while I work on the node info support.

@rhc54
Copy link
Contributor Author

rhc54 commented Aug 16, 2016

So go back on vacation, dude! I'll have this up next week when you get back.

@rhc54
Copy link
Contributor Author

rhc54 commented Jun 2, 2017

I think events have overtaken this PR and it is no longer valid or required

@rhc54 rhc54 closed this Jun 2, 2017
@rhc54 rhc54 deleted the topic/simplifyget branch July 6, 2017 18:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants