Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Correct cleanup of internal resources associated with communicators #925

Open
bosilca opened this issue Feb 15, 2024 · 3 comments
Open

Correct cleanup of internal resources associated with communicators #925

bosilca opened this issue Feb 15, 2024 · 3 comments
Assignees

Comments

@bosilca
Copy link

bosilca commented Feb 15, 2024

This issue is not entirely UCC related, but relates to UCC integration into an MPI library (in this case Open MPI).

Trigger: If an application calls MPI_Finalize without properly freeing all communicators there is no way to properly release UCC resources.

The root cause is the way UCC tracks OMPI communicators destruction, not by refcounting the collective referenced by the communicator but by adding an attribute (COMM_ATTR) on each communicator and expecting the attribute to be properly deleted (which will call into ucc_comm_attr_del_fn). This is the case when the user free a communicator via MPI_Comm_free, but the MPI standard does not required the release of attributes for non freed communicators in MPI_Finalize (at least not for attributes not associated with MPI_COMM_WORLD). The outcome is a disconnect between OMPI (where all communicators are released) and UCC where they are not.

A proper solution could be complicated to implement. It would require to refcount all the collectives used by OMPI, or to track all communicators internally and release them when the attribute deletion function (ucc_comm_attr_del_fn) is called on MPI_COMM_WORLD.

@Sergei-Lebedev
Copy link
Contributor

@bosilca can we close this issue since open-mpi/ompi#12429 was merged?

@bosilca
Copy link
Author

bosilca commented May 31, 2024

Technically yes, but this is not completely over yet for UCC and HCOL. In open-mpi/ompi#12429 I have not removed the attribute on MPI_COMM_WORLD that both libraries are using to detect MPI_Finalize. I don't think it is needed anymore, but such a change must come from y'all.

@Sergei-Lebedev
Copy link
Contributor

I see, let's keep this issue open then, thanks for update

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants