Skip to content
Marc-Andre Hermanns edited this page Nov 4, 2019 · 4 revisions

Participants

  • Marc-Andre Hermanns
  • Bengisu Elis
  • Chris Chambreau
  • Joachim Protze
  • Josh Cottingham
  • Martin Schulz

Topics

QMPI

  • In OMPT the address given to the tool is the address where to jump back to

    • Usually is still in a register within the first call outside of application
      • Cheap to get
    • For a wrapper this is trivial to get
      • for GCC: __GET_RETURN_ADDRESS (see GCC Docs)
      • for Visual C/C++: _ReturnAddress (see MS Docs)
    • From a PMPI to PnMPI
      • Access to this address has recently been added to PnMPI
      • On x86 architectures you can
        1. substract 1 from this address
        2. feed this to addr2line
        3. get source line info from where the function was called
      • cheapest way to obtain such information
        • no stack tracing needed
        • you can store just the address and resolve on demand (or only once)
      • Helps with address-space randomisation
      • Can the address help with stack walking
        • Not directly, as it is a pointer to the next instruction
          • no stack information
        • Frame address of the first frame in the runtime would be interesting for this
          • Pointer to the stack frame where the application entered MPI
          • Should also be easy to obtain right for the MPI implementation
    • Should we expose this information through a semi-opaque type like MPI_Status?
      • Quick access to known parameters
      • Allow implementation to provide internal information as well
      • Just a single additional argument to the QMPI callbacks (future proof)
  • Thread-safety to register and de-register tools at runtime

    • Dynamic registration/deregistration at runtime may become problematic
    • Global registry would be needed
      • needs to be locked every time to look into the table
      • runtime overhead not worth the additional
    • In OMPT a tool should just return (in the callback) instead of trying to deregister
    • for QMPI query table at the begining and then data can live in a local variable
    • atomics would not really help
      • memory fences prevent hashing and slow down performance
      • all threads look at same data structure (no copies possible)
        • access across NUMA boundaries (incurs performance hit)
      • Maybe less of a Problem for MPI as calls are less frequent?
        • This would be good to verify on a broader set of platforms
    • What do we want to optimize for?
      • OMPT -> optimize for no tool in the chain
        • Some additional penalty for adding a tool
        • Do static branch prediction (assume code-path without tool to be likely)
        • Would probably for favoured by implementations
  • Static linking/loading

    • Always both present or not?
    • What about extensions rather than tools?
    • Can you make the same static library active at the same time?
      • Tool needs to handle this
    • Dynamic tool may need
      • Dynamic library linked at link time (loaded)
      • Dynamic library opened vi dlopen
Clone this wiki locally