Skip to content

Conversation

@rhc54
Copy link
Contributor

@rhc54 rhc54 commented Jun 28, 2017

[skip ci]
bot:notest

Just for others to debug with.

@artpol84 @karasevb There is a problem in the dstore support. When the fence stores its data and then the client tries to read it, the keys are garbage. It all works if I restrict pmix to using the "hash" GDS component. Can you please take a look?

@rhc54
Copy link
Contributor Author

rhc54 commented Jun 28, 2017

I think one possible problem may be that we release the pmix_kval_t after we call "store". The "hash" component does a PMIX_RETAIN on it, and so this ensures proper reference counting. However, if the "dstore" component is just pointing at the kval's data, then it will be free'd and you'll point to garbage.

If that's the root cause, you can avoid it by setting the kval's pointers to NULL after you copy them to the shmem location. This will avoid having the pmix_kval_t destructor free the memory.

@rhc54
Copy link
Contributor Author

rhc54 commented Jun 28, 2017

Note: you have to run this on two nodes to see the problem.

Ralph Castain added 2 commits July 20, 2017 11:12
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
@rhc54
Copy link
Contributor Author

rhc54 commented Jul 20, 2017

Problems were fixed by latest PMIx master commits

@rhc54 rhc54 merged commit 4d4dec6 into open-mpi:master Jul 20, 2017
@rhc54 rhc54 deleted the topic/pmix210 branch July 20, 2017 19:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant