Skip to content

Conversation

@mdosanjh
Copy link
Contributor

The current persist module of partitioned communication uses a single counter to assign send tags to use for the underlying persistent sends. This could create a potential integer overflow problem in cases where a large number of partitioned requests are created. This pull request addresses this by moving to a counter per peer; while rollover is still possible, it's significantly less likely.

@mdosanjh mdosanjh changed the title Optimizing use of tags within partitioned operations (perist module) Optimizing use of tags within partitioned operations (persist module) Jun 21, 2021
@bosilca
Copy link
Member

bosilca commented Jun 21, 2021

The current approach is neither scalable nor thread safe. Why do you even need an additional tag, one has already been provided, and if you really need something unique, expand the handshake to deal with it.

@mdosanjh
Copy link
Contributor Author

The current approach is neither scalable nor thread safe. Why do you even need an additional tag, one has already been provided, and if you really need something unique, expand the handshake to deal with it.

While I think your criticisms are valid, the real answer to them is that we eventually need to implement this directly over the btls/mtls, which is something I'm looking into.

The current implementation is using persistent communication as the back-end for the implementation. Admittedly, this is a caveat to the current implementation. To accommodate this, each persistent send (which currently maps to a send side partition) requires a unique tag to separate them from the other partitions and other partitioned requests (these persistent sends are currently using module-owned communicator for data transfer, to avoid conflicts with other communication the user is doing).

As for thread safety, I do need to move this into the lock protected region of MPI_Psend_init function, which is something I will do in the near future.

@bosilca
Copy link
Member

bosilca commented Jun 30, 2021

As you have your own communicator, you do not care about tag collisions, so you can use the entire range with preallocated blocks. This would allow you to have a variable number of sends/receives as you could use probe to identify the pending messages, before pulling the data. This solution might not be very efficient in all cases, which points me toward a more discrete use of dynamic windows.

Have you looked into replacing the use of the additional communicator by a dynamic window, and use the user-provided tag just for the initial/final handshake ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants