Join GitHub today
GitHub is home to over 36 million developers working together to host and review code, manage projects, and build software together.Sign up
Open MPI Weekly Telcon
- Dialup Info: (Do not post to public mailing list or public wiki)
- Geoff Paulsen
- Jeff Squyres
- Howard Pritchard
- Josh Hursey
- Arm Patinyasakdikul
- Joshua Ladd
- Nathan Hjelm
- Ryan Grant
- Sylvain Jeaugey
- Todd Kordenbrock
- Milestones: https://github.com/open-mpi/ompi-release/milestones/v1.10.3
- No news / Good news.
- Wiki: https://github.com/open-mpi/ompi/wiki/Releasev20
- Blocker Issues: https://github.com/open-mpi/ompi/issues?utf8=%E2%9C%93&q=is%3Aopen+milestone%3Av2.0.0+label%3Ablocker *
- Milestones: https://github.com/open-mpi/ompi-release/milestones/v2.0.0
- 2.0.0 mostly ready to go.
- RDMACM - broken with MT build. It never worked, and never was tested.
- Not a regression so community is okay with shipping like this.
- Will ask Chelcio their opinion.
- In terms of functionality, we are good to go for 2.0.0.
- PR 1248 & PR 1249 (needed)
- Merge both of these to v2.x
- Allow to run both in MTT tonight.
- Tomorrow Create a Release Candidate 4 after these merges.
- 1251 - Fix RDMACM Hang / Locking bug.
- Nathan will pull out Chelcio fix into seperate PR, and run through tests tonight.
- Mallanox will temporarily remove RDMACM issue with checkins.
- Goal of releasing 2.0.0 next Tuesday.
- After than wait a few days, and start merging in 2.0.1 changes a few at a time.
- Then need a conversation on when to merge repos.
- Has improved. 233 failures, currently on Cisco.
- Cisco - Many Cisco failures are local cluster issues. Art is working on cleaning up.
- Jeff put in a patch into MTT to allow thread hangs to be marked as hangs.
- nVidia failures are all PMIx failures.
- Giles found a race condition in PMIx 2.0.
- v2.x failures on Comm_spawn_loop.
- overall not too bad.
MTT Dev status:
- Face to face coming up
- Need to discuss ways to take payments.
- WebSite transitions
- Website itself
- Nightly tarballs
- Archives of mailing lists entries.
- Have mbox archives of all of the lists also. But as soon as we move stuff, where do NEW posts get archived?
- Travis was hung over the weekend. Not sure why.
- ibm jenkins was off over the weekend, should be fixed now.
- Have Arm, got usNIC BTL thread multiple in master
- lots of minor bug fixing and 2.x items.
- been more focused on libfabric stuff.
- Watching MTT
- when have 2 cpus and IB card on node, might want to use IB card to do transfers between GPUs.
Status Update Rotation
- Mellanox, Sandia, Intel
- LANL, Houston, IBM
- Cisco, ORNL, UTK, NVIDIA