Skip to content

Conversation

@bwbarrett
Copy link
Member

@bwbarrett bwbarrett commented Nov 24, 2025

Release 5.0.9 + the single Fabric/Domain per process patch series (which will be in v5.0.10).

bot:notacherrypick

abouteiller and others added 30 commits July 10, 2025 16:38
`requested` in `MPI_Init_thread` would invoke the error handler, even
though it is an useful override in some threaded library use cases.

Signed-off-by: Aurelien Bouteiller <abouteil@amd.com>
(cherry picked from commit 27332fc)
(single,etc) in addition to numeric 0-3 values

Signed-off-by: Aurelien Bouteiller <abouteil@amd.com>
(cherry picked from commit 3de2489)
…ages

Including, but not limited to:

* Added much more description of and distinction between the MPI world
  model and the MPI session model.  Updated a lot of old,
  pre-MPI-world-model/pre-MPI-session-model text that was now stale /
  outdated, especially in the following pages:
  * MPI_Init(3), MPI_Init_thread(3)
  * MPI_Initialized(3)
  * MPI_Finalize(3)
  * MPI_Finalized(3)
  * MPI_Session_init(3)
  * MPI_Session_finalize(3)
* Numerous formatting updates
* Slightly improve the C code examples
* Describe the mathematical relationship between the various
  MPI_THREAD_* constants in MPI_Init_thread(3)
  * Note that the mathematical relationships render nicely in HTML,
    but don't render entirely properly in nroff.  This commit author
    is of the opinion that the nroff rendering is currently "good
    enough", and some Sphinx maintainer will fix it someday.
* Add descriptions about the $OMPI_MPI_THREAD_LEVEL env variable and
  how it is used in MPI_Init_thread(3)
* Added more seealso links

Signed-off-by: Jeff Squyres <jeff@squyres.com>
(cherry picked from commit aff3afd)
…it doc.

Signed-off-by: Aurelien Bouteiller <abouteil@amd.com>
Signed-off-by: Aurelien Bouteiller <abouteil@amd.com>
Thanks to Ben Menadue for pointing out that ompi_fortran_string_c2f()
missed a case to properly terminate the resulting Fortran string when
copying from a longer C source string.

Signed-off-by: Jeff Squyres <jeff@squyres.com>
(cherry picked from commit 694e78a)
Followup to commit 694e78a: Ben Menadue correctly pointed out that <
should have been <=.

Signed-off-by: Jeff Squyres <jeff@squyres.com>
(cherry picked from commit cc03d5b)
The table added in 061f908 (A variety of docs updates:, 2022-09-12)
mentioning the different prefixes for Open MPI, PMIx and PRRTE MCA
parameters set via environment variables has one too many "R"'s in
'PRRTE_MCA_': the correct prefix is 'PRTE_MCA_'. Fix that, and make it
clear that it is not a typo.

Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com>
(cherry picked from commit bd9adb4)
…-c2f-string-copy

v5.0.x: fortran: fix ompi string c2f where len(fstr) < len(cstr)
…ar-prefix-5.0

v5.0.x: docs/mca.rst: fix MCA environment variable prefix for PRRTE
…level-ignored@v5

v5.0.x: Thread level set from ENV crashes (cherry open-mpi#13211)
PMIx  v5.0.9
PRRTE v3.0.12

Signed-off-by: Ralph Castain <rhc@pmix.org>
Check PMIx/PRRTE release branches prior to release
… or failure

The MCA_PML_OB1_ADD_ACK_TO_PENDING method creates a mca_pml_ob1_pckt_pending_t
to hold an ack to be sent later. This method builds the pending packet then puts
it on the mca_pml_ob1.pckt_pending list for later transmission. It does not,
however, set the required hdr_size field on the struct. This leads to issues
when the packet is later sent because it could contain any value. With some btls
this will lead to memory corruption (if the size is not checked against
btl_max_send_size) or just allocation failure because the size is too big. In
other situations it could lead to a truncated packet being send (if the size
previously in hdr_size is smaller than an ack).

To fix the issue this commit gets rid of the macro entirely and replaces it with
a new inline helper method that does the same thing. This helper uses the
existing mca_pml_ob1_add_to_pending helper (which sets hdr_size) to reduce
duplicated code.

Tested and verified this fixes a critical issue triggered on our hardware.

Signed-off-by: Nathan Hjelm <hjelmn@google.com>
(cherry picked from commit 48490b9)
…where_pending_packets_can_have_incorrect_header_sizes

Fix bug in MCA_PML_OB1_ADD_ACK_TO_PENDING that causes memory overruns…
- Update VERSION file to v5.0.9rc1 with correct date (23 September 2025)
- Update NEWS with actual changes from v5.0.8 to v5.0.9rc1 including:
  * PMIx v5.0.9 and PRRTE v3.0.12 updates
  * GPFS 5.2.3-0+ support
  * OFI accelerator memory enhancements
  * Critical PML OB1 bug fix for memory overruns
  * Fortran string conversion fixes
  * Threading improvements
  * Various documentation and build system fixes

Signed-off-by: Tomislav Janjusic <tomislavj@nvidia.com>
Signed-off-by: Mikhail Brinskii <mikhailb@nvidia.com>
Signed-off-by: Sergey Lebedev <sergeyle@nvidia.com>
(cherry picked from commit 0caae60)
…ocal_id_v5

COLL/UCC: set node local id - v5.0.x
OMPI/MCA/PML/UCX: Set node local id - v5.0.x
In some cases the CUDA install directory contains two libcuda.so and
this breaks OMPI CUDA detection. Pick the first of these libraries seems
to be a good soltuion for all cases.

Signed-off-by: George Bosilca <gbosilca@nvidia.com>
(cherry picked from commit c7e27b9)
Signed-off-by: xbw <78337767+xbw22109@users.noreply.github.com>
(cherry picked from commit 0999325)
Signed-off-by: charlesgwaldman <120225331+charlesgwaldman@users.noreply.github.com>
(cherry picked from commit a6b8cd3)
Fix `see-also` errors in the document (v5.0.x)
Use unique, NVIDIA-specific workflow names so that it's easier to
identify these workflows on the github dashboard backend.

Signed-off-by: Jeff Squyres <jeff@squyres.com>
(cherry picked from commit dcac103)
…ia-github-actions

NVIDIA github workflows: use unique workflow names (v5.0.x)
…ng-fix

Update history.rst (spelling) (v5.0.x)
Signed-off-by: Tomislav Janjusic <tomislavj@nvidia.com>
janjust and others added 10 commits October 15, 2025 15:06
v5.0.x: prepare v5.0.9rc2 release
Signed-off-by: Kento Hasegawa <hasegawa.kento@fujitsu.com>
(cherry picked from commit 4b1b9a9)
…nitialization

COLL/UCC: Fix initialization in non-blocking (v5.0.x)
Signed-off-by: Tomislav Janjusic <tomislavj@nvidia.com>
Signed-off-by: Jessie Yang <jiaxiyan@amazon.com>
(cherry picked from commit f65f900)
Add FI_COMPLETION flag to ensure completion entries are generated
for all data transfer operations.

Signed-off-by: Jessie Yang <jiaxiyan@amazon.com>
(cherry picked from commit 15fe246)
Share the domain between the MTL and BTL layers to reduce the total
number of domains created. This helps avoid hitting system resource
limits on platforms with high core counts.

Instead of having the common code allocate a single domain with the
superset of all required capabilities, we attempt to reuse an existing
fabric and domain if the providers can support MTL’s and BTL’s different
capability sets. This approach allows providers that support domain
sharing to reuse resources efficiently while still preserving
flexibility. If the providers cannot reuse the fabric and domain due to
incompatible requirements, separate domains will be created as before.

Signed-off-by: Jessie Yang <jiaxiyan@amazon.com>
(cherry picked from commit 69d2737)
Signed-off-by: Brian Barrett <bbarrett@amazon.com>
@bwbarrett bwbarrett requested a review from jiaxiyan November 24, 2025 22:20
@github-actions github-actions bot added this to the v5.0.8 milestone Nov 24, 2025
@open-mpi open-mpi deleted a comment from github-actions bot Nov 24, 2025
@bwbarrett bwbarrett merged commit adf9a96 into open-mpi:v5.0.x-aws Nov 25, 2025
17 of 18 checks passed
@bwbarrett bwbarrett deleted the dist/5.0.9amzn1-prep branch November 25, 2025 00:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.