Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create file for file backed shared memory in process job session dir. #3739

Merged
merged 2 commits into from Aug 15, 2017

Conversation

cniethammer
Copy link
Contributor

Prevent file collisions and allow cleaning of temporary files by orte-clean used for shared file pointer management: Create file for file backed shared memory in process job session dir instead in base /tmp.

Prevents file collisions and can also be cleaned by orte-clean properly.

Signed-off-by: Christoph Niethammer <niethammer@hlrs.de>
@hppritcha
Copy link
Member

hppritcha commented Jun 29, 2017

@edgargabriel something for you to review when you're back from vacation

@hppritcha
Copy link
Member

I'll open PRs against the release branches once this is merged in to master.

@edgargabriel
Copy link
Member

I liked originally the idea behind this commit, and I can confirm that it works as expected. My hesitation comes from a more theoretical scenario, and maybe I don't understand the session directories well enough, so somebody please feel free to correct me.

Basically, the scenario that could cause trouble when using this patch is: a) an MPI job connecting with an MPI job, b) using Intercomm_merge to create an intra communicator that contains processes from the two different jobs (i.e. they will have different session directories if I understand correctly).

I understand that this is a very theoretical scenario, but it would not work with this patch. Could we bcast the session directory of rank zero, such that everybody uses that one, similarly to how we deal with the jobid?

@rhc54
Copy link
Contributor

rhc54 commented Jul 6, 2017

Your basic understanding of the session directories is correct. I think the critical question here is: what is supposed to happen to this file when the two jobs disconnect? Does the file get removed during the disconnect procedure? If so, then it can exist in one job's session directory since neither job can leave before disconnecting (else it's an error and the file gets dumped anyway).

If one job is supposed to inherit the file, then it gets a little trickier. If we know at the beginning which job is to "own" the file, then we put it there. Otherwise, we'd have to figure out how to deal with it.

If multiple jobs connect, then we get the same question: if one job disconnects, does the shared file go away? Or does it stay so long as any of the jobs remains alive?

@edgargabriel
Copy link
Member

thanks for the explanations @rhc54 . Yes, the file is supposed to be removed whenever the file has been closed and the communicator that contains the processes from the connected jobs is automatically freed during the MPI_File_close. Maybe we can talk briefly about that at the devel meeting next week.

@rhc54
Copy link
Contributor

rhc54 commented Jul 7, 2017

I added it to the agenda

@edgargabriel
Copy link
Member

I think its probably a 5 min conversation between you and me, since I really have mostly some administrative questions on how to approach this.

@rhc54
Copy link
Contributor

rhc54 commented Jul 7, 2017

Agreed - just wanted a reminder 😄

@edgargabriel
Copy link
Member

We had a conversation with @rhc54 at the OMPI meeting, and I think the consensus is to not worry about dynamically connected jobs. Shared memory components are also are not being used for communication in dynamically connected jobs since they run into exactly the same problems that we would run into here, namely where to place the backend shared memory file, how to ensure that processes from two different jobs have read/write permission to the different session directories (which might have been launched by different user ids) etc.

I will have one minor change request to this pr, and will make a modification to disable the component for dynamically connected communicators in the component query function.

Copy link
Member

@edgargabriel edgargabriel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the session directory is unique on a per-job basis, we do not need to have the jobid in the filename anymore, so I would suggest to remove that part of the code.

We could add instead the communicator id instead of the job-id as part of the filename. This would allow to support the scenario, where a code opens the same file multiple times and have unique shared file pointers per opening instance. Since Open MPI does a comm-dup on every file open, the communicator id for each file open operation is unique and could be used for the identification.

…terjobid.

Signed-off-by: Christoph Niethammer <niethammer@hlrs.de>
Copy link
Member

@edgargabriel edgargabriel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good, thanks @cniethammer

@bwbarrett
Copy link
Member

bot:retest

I think this got caught in a different bug in master; the failures are related to using MPI headers in non-MPI code. Let's make sure...

@edgargabriel
Copy link
Member

bot:mellanox:retest

@edgargabriel edgargabriel merged commit 99c7482 into open-mpi:master Aug 15, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants