Skip to content

2014 05 15

Andre Merzky edited this page May 22, 2014 · 3 revisions
  • Who: Mark, Matteo, Shantenu, Ole, AndreM, Antons

  • Agenda:

    • open TODOs
      • TODO OW: get quantitative EnMDTK requirements
      • TODO AM: micro benchmarks for RP
      • TODO AM: re-check 2048 ceiling
      • DONE OW: provide MPI prototype for stampede
      • TODO MS: base MPI agent on this
      • TODO OW: start on Cray agent, based on ATs scripts
      • DONE AM/OW: set up regular 10min meets between AT and OW
      • TODO AT: expand scripts toward MPI jobs, and further to inter-node MPI jobs
      • TODO SJ: check pipeline example (MTMS)
      • DONE MS: repost data proposal on list
      • TODO ALL: provide feedback
    • MS-7 checkpoints:
      • May 8:
        • OW: simple MPI support for Stampede complete (prototype)
        • AT, OW: draft architecture for Cray agent
      • May 15:
        • MS: implementation proposal for MPI support beyond stampede
        • AM: MPI integration tests set up
        • OW: first prototype of non-MPI agent for cray
        • ALL: agree on implementation plan for Cray agent
    • status reports
    • discussion on Mark's data proposal
    • benchmarking plans
    • (?) what role plays scheduling on agent level?
  • Notes:

    • TODO MS, OW: check module load / shell startup issues

    • ibrun vs. mpiexec

    • TODO OW: bootstrap for agent on archer

    • mongodb on headnode of archer

    • DONE OW: email about port forwarding to Iain(?)

    • Antons integrates scripts in agent, expands towards MPI / aprun

    • Antons: might not need agent hierarchy

    • data feedback:

      • OW: clunky, decoupled from CU (cannot refer to data from other CUs)
      • MS: it acts within the sandbox, which was not possible before; its a building block
      • MS: CU deps can / will be addressed above
      • OW: actual deps are out of scope anyways...
      • MS: next: higher abstraction, implicit data locations for intermediate data
      • MS: lifetime management of staging are is up to higher levels
      • implementation: now agent can also pull data and copy/link/move
      • adds saga dependency to agent: should be optional then
      • OW: staging-area is transient, may want to use proper object?
      • next steps: come up with serious pilot data
    • MS: agent is very stand-alone in terms of code, does not even share constants, nor data-db abstraction layer, should be addressed in the long run

    • benchmarking: benchmarks != tracing

    • MT: want cancel() on any state

    • TODO OW: yes, makes sense, will do

    • MS: RP state model: doesn't easily cover actively staging agent TODO MS: proposal

Clone this wiki locally