Skip to content

WeeklyTelcon_20190212

Geoffrey Paulsen edited this page Mar 12, 2019 · 3 revisions

Open MPI Weekly Telecon


  • Dialup Info: (Do not post to public mailing list or public wiki)

Attendees (on Web-ex)

  • Geoff Paulsen
  • Jeff Squyres
  • Brian Barrett
  • Geoffroy Vallee
  • Josh Hursey
  • Matias Cabral
  • Ralph Castain
  • Thomas Naughton
  • Todd Kordenbrock
  • Xin Zhao

not there today (I keep this for easy cut-n-paste for future notes)

  • David Bernholdt
  • Matthew Dosanjh
  • George
  • Akshay Venkatesh
  • Edgar Gabriel
  • Howard Pritchard
  • Josh Hursey
  • Aravind Gopalakrishnan (Intel)
  • Joshua Ladd
  • Nathan Hjelm
  • Dan Topa (LANL)
  • Akshay Venkatesh (nVidia)
  • Arm (UTK)
  • Peter Gottesman (Cisco)
  • mohan

Agenda/New Business

  • The HostGator web site (open-mpi.org) is coming up for renewal. We need to decide what we are going to do about it
    • Expires in Summer (Start in May) Expires July 27th.
    • Need to move domain names. (Who owns that?)
    • It'd be nice to move to AWS.
    • DNS should be owned by SPI. Still need to transfer that.
    • Topic for April.
  • Nathan Hjelm's day job will no longer involve Open MPI, so if you want him to review something, please check with him first.
  • Next face to face is San Jose - April 23-April25 @ Cisco -San Jose.

Minutes

Review All Open Blockers

Review v3.0.x Milestones v3.0.3

Review v3.1.x Milestones v3.1.0

Review v4.0.x Milestones v4.0.1

  • Schedule: waiting for Issue6278 fix
  • v4.0.0
  • Consider disabling pmix-new-shmem mca param. (see PMIx Issue 1114)
  • Adding OSHMEM API - bugfix. Need to rev .so versions correctly
  • Serious issue https://github.com/open-mpi/ompi/issues/6198, but won't hold v4.0.1
  • OOB version checking - discussed in a meeting in dec, but didn't implement anything.
    • An issue for certain container models (mpirun outside, mpids inside Docker model)
    • Cross compatibility issue between versions because of OOB selection logic.
    • Some chatter on this PR about how to deal with.
    • Could do something for v4.0.1
    • https://github.com/open-mpi/ompi/pull/6157
    • Could just set the "major" version to 4 for OOB protocol.
    • Probably don't need to worry about mpirun-oob trying to talk to orted-oob.
    • ssh launches the docker container, and sets env var to container, so mpid has a way to connect back to mpirun
    • Think we'll merge this patch. And then File an issue on Master to make this OOB is backwards compatible with OMPI v4.x.
      And then we have the same risk profile as today.
    • PMIx handles this compatibilty issue, and detects versions inside and outside and adjust
    • We can not guarantee this use case, but we can say we do a "Best Effort" it's not just OOB, also OPAL Datatypes, messages we send, etc. A whole bunch
    • If we start testing, people will expect we will fix if it breaks.

v5.0.0

  • Schedule: Delaying post Summer ***
  • Discussion of schedule depends on scope discussion
    • if we want to separate Orte out for that? Would be a bit past summer.
    • Giles has a prototype of PRTE replacing ORTE
  • Want to open up release-manager elections.
    • Now that we're delaying, will decide at face2face.
  • Is anyone pushing for a Summer of 2019 schedule?
    • It seems too aggressive to everyone on the call
    • One driver was to remove things to break ABI.
    • Not a bad idea to DO v5.0, but summer timing is bad.
    • Delaying would allow for switching to PRTE.
    • PMIx Tools support
  • Now the possibility of v4.1 from master is a possibility
    • If we instead do a v4.1, some things we'd need fixed on master.
  • will discuss more at face to face.

Master

PMIx

  • New Alert in PMIx side PMIx Issue 1114. - wrong answer in shared memory component.
    • Should disable new shared memory segment in PMIx until resolved.
    • Considering adding mca param to disable in v3.1 (internal and external), in v3.0 (external, as internal probably not in that PMIx version)

MTT

New topics

  • PMIX direct call / PRTE replacement for ORTE.
  • Howard has been changing OMPI or OPAL places that call the PMIx framework,
    • to use PMIx data structures directly in the code.
    • Doesn't look like Howard would step on Ralph's toes.
  • March 4th is next MPI Forum (then June)
  • We have a new open-mpi SLACK channel for Open MPI developers.
    • Not for users, just developers...
    • email Jeff If you're interested in being added.

face to face -

  • how do we get more participation, and make MTT more meaningful

Review Master Master Pull Requests

  • didn't discuss today.

Review Master MTT testing

Oldest PR

Oldest Issue


Status Updates:

Status Update Rotation

  1. Mellanox, Sandia, Intel
  2. LANL, Houston, IBM, Fujitsu
  3. Amazon,
  4. Cisco, ORNL, UTK, NVIDIA

Back to 2018 WeeklyTelcon-2018

Clone this wiki locally
You can’t perform that action at this time.