Skip to content

WeeklyTelcon_20160322

Geoff Paulsen edited this page Mar 22, 2016 · 17 revisions

Open MPI Weekly Telcon


  • Dialup Info: (Do not post to public mailing list or public wiki)

Attendees

  • Jeff Squyres
  • Geoff Paulsen
  • Brad Benton
  • Geoffroy Vallee
  • Howard Pritchard
  • Josh Hursey
  • Nathan Hjelm
  • Nysal
  • Ralph Castain
  • Sylvain Jeaugey
  • Todd Kordenbrock
  • Tommy Janjusic (Mellanox)

Agenda

Review 1.10

Review 2.0.x

  • Wiki: https://github.com/open-mpi/ompi/wiki/Releasev20
  • Blocker Issues: https://github.com/open-mpi/ompi/issues?utf8=%E2%9C%93&q=is%3Aopen+milestone%3Av2.0.0+label%3Ablocker
    • Issue 1406
      • TCP BTL THREAD_MULTIPLE deadlock
        • If old and new rewrite of TCP BTL are "compatible", then we can switch based on threaded state of MPI_Init.
        • OR could require two different TCP components "tcp" / "tcpmt", and expose this issue to users.
      • George working on patch.
    • Is Issue 299 an issue on 2.0.0? If so, that should also be a blocker!
      • Issue linked on mailing list. User sent
      • Related to Issue 429?
        • Would merging opal and orte into one lib workaround this? - Probably, but that's alot of work.
      • Still issue with UCX (who does hooking?)
      • If we make a blocker, can accelerate fix? - Yes.
      • Some assembly required. Might need someone to
      • Assign to Mark Allen. - HIGH PRIORITY.
    • Unblock PR 1353 move to 2.1?
      • For 2.0 document what behavior is, and push change back to 2.1.
    • 1038 - waiting on review mellanox
    • 1015 - segv ibm - Howard
    • 1014 - don't return Err_pending from collectives?
    • Jeff will test USOCK today. - Coverity found an issue, Ralph will look at tomorrow.
    • George has a patch to fix TCP (smaller patch to fix
  • Milestones: https://github.com/open-mpi/ompi-release/milestones/v2.0.0 *

Review Master?

  • Master tests are failing.
  • C++ bindings failing to compile is a symptom. cyclic dependencies?

MTT status:

  • Working on submit interface to make client easier to submit to. Should help Ralph's team.

  • PR1483 - allows base btl iterators to determine flag values.

    • Only issues is how this is exposed to MTT
    • Expose to Tools Working Group
    • Gives you comma delineated list, and gives you all possible values.
  • PR1482 - Support in MCA base to force components to always be on.

    • If you make your component static, and component says it's Always-on.
    • Key usage for this is forcing BTL self (If you say not-self, will give error and abort)
    • Useful for hook licensing framework.
    • need to look at other components (Maybe Coll-basic), (Same with libnbc)

Status Updates:

  • Mellanox -
  • Sandia - Todd Kordenbrock - Made a PR1443 PR 1037 for rondevue
  • Intel - Working on Debugger Attach. If Ralph simulates debugger attach it works okay.
    • Ralph doesn't have partner license.
    • Not sure what the issue is, does mpirun not know if it's being debugged? Is the message not getting through?
      • How does this relate to USOCK issue?
      • Ralph was hoping to fix this, and debugger attach, and USOCK
      • Once John helps Ralph reproduce issue, can do USOCK this right away.
    • Still working alot on PMIx event reservation system.
    • Still working on ORTE launch fun.

Status Update Rotation

  1. Cisco, ORNL, UTK, NVIDIA
  2. Mellanox, Sandia, Intel
  3. LANL, Houston, IBM

Back to 2016 WeeklyTelcon-2016

Clone this wiki locally