Skip to content

WeeklyTelcon_20230613

Geoffrey Paulsen edited this page Jun 13, 2023 · 1 revision

Open MPI Weekly Telecon ---

  • Dialup Info: (Do not post to public mailing list or public wiki)

Attendees (on Web-ex)

  • Geoff Paulsen (IBM)
  • Jeff Squires (CISCO)
  • Brian Barrett (Amazon)
  • Luke Robison (Amazon)
  • Quincey (AWS)
  • Thomas Huber (Cornelis Networks)
  • Todd Kordenbrok
  • Tomislav Janjusic (nVidia)
  • Joseph Schuchart

v4.1

  • Issue # with OMPI v4.1.5 - and latest PMIx (v4.2.5???)
    • Just got a patch to fix it.
    • Will drive a v4.1.x release
    • fix is on the branch, so works in latest nightlies
  • Potential Issue #11749 - OMPI won't spawn

v5.0

  • #11532 - mca base params file - No progress yet

  • submodule pointer update got merged.

    • Are we on tags in these submodule pointers?
  • Doc Issues: https://github.com/open-mpi/ompi/projects/3

  • async modex issues - just need to increase the timeout one

    • Trying to make the timeout directly related to scale.
    • very hard to know what this will look out
      • Please update help message
  • Opened many other document Issues to make it easier to

  • NIC selection coming back from main PR #11739

    • Somewhat large change, but really want this back, but it broke EFA
  • #11683 - Grequest issue, just a straight up bug fix.

    • Not a v5.0.0 blocker
  • Quincey Still working on mpirun Docs #11730

  • Quincey talked to PRRTE last week to see how we could better manage documentation across repos

    • Okay if we update the text in PRRTE to make this easier.
    • Like to have text up to date
      • mpirun --help and prterun --help pull from plain text.
      • Can pull in text into .rst with includes
    • He's updated it so manpage output is pulled from same plain text file
  • Idea, what if instead of the "source of true" being in rst

    • then render rst into text, and then build man pages, and docs.
    • Would require rst anyway, this is already needed/done in places.
      • Can do makefile logic to be optional
  • One thing we're losing is the manpage option, is the ability to have internal links to jump around.

    • Maybe we could keep this if we do this new Idea process "calling it inverted process"
  • Trying to keep all of the source for the document in the PRRTE repo, so need support from their community

    • If we can't get PRRTE support for this, how about the current approach?
      • Bummer, because we lose nice pretty HTML text in RST.
      • We moved to RST thinking this HTML would be the primary source for users
  • One set of source that generates man, help, and HTML3

    • x2 for mpirun and prte
  • PMIX v4.2 async modex issue: https://github.com/openpmix/openpmix/issues/3077

    • Work around: -x PMIX_MCA_gds= or enable opal_pmix_collect_all_data
    • Need to up the timeout, fix in OMPI before PMIX_Get, increase timeout as a function of scale with user override.
    • Likely that the original issue is missing an additional variable for async modex. to ompi_pml_base_check_pml
    • New parameter exists for v5.0.x MUST be documented,
  • MCA Params issues are biggest issues now - no new updates.

  • Need to cherry-pick NIC selection (distances PR fixes) to v5.0.x

    • Several PRs will go into main, including coverity fixes.
    • Amir to open up a v5.0.x PR to track all main commits and cherry-pick to v5.0.x when finished.
    • Pending review -
    • Will create initial v5.0.x PR as a pre-PR for the NIC selection: needs review

Previous Issues:

  • #11726 -N bind ppr:X:node, map by package (socket), or core
    • What we've confirmed is that there is a change to the way that binding works if you just specify -N
    • Seems like we try to change the schizo component so that we maintain behavior from v4 to v5.
    • With this, we can decide what to do.
  • #11722 - Cannot build+install with out of source builds (VPATH)
    • Possible blocker, need to update submodule pointers.
      • only on main
      • main needs submodule update - Austen
Clone this wiki locally