Skip to content

2015 09 10

Andre Merzky edited this page Sep 10, 2015 · 5 revisions
  • Agenda:
    • open TODOs:
      • WIP AM: review communication model with MS
      • WIP AM/MS: prepare action/support plan for activities on BW
        • objectives, challenges, timelines, phase 1
      • WIP IP: anaconda support on client side?
        • client side seems to 'just work', which is good
        • agent side is expected to fail, and does
        • seemingly differences between system / user space installs of RP (sdist builds)
      • DONE AM: how much does AWS tutorial backup cost
        • two weeks 24/7, 8 cores, 15GB, 1TB, 5GB transfer: $250,-

      • HOLD AM: check if we can switch to HeartbeatMonitor for pilot health checks
      • HOLD AM: suggest alternatives for PTY layer resource consumption
      • HOLD MS: Anaconda/SuperMUC (October)
      • HOLD MS: add NAMD examples eventually? (Tom Bishop)
      • HOLD AM: set up example on how to use synapse as RP workload
      • HOLD AM: check documentation of state diagram in released docs
      • HOLD MT: move semantic elements of tools into RP.utils
      • HOLD AM: proposal to json export to persistent storage
      • HOLD MS: proposal for persistent experimental data storage
    • Development Progress:
      • release plan:
        • 0.36: mid September
          • 1 week merging of branches (agent split, profiling)
          • 1 week of testing
          • -> delayed O(days).
          • TODO: start tutorial preps in parallel
        • 0.37: September 23rd
          • documentation, examples, tutorials
          • -> as planned
        • 0.38: end October
          • module refactor
          • final state model
          • -> as planned
      • testing:
        • TODO AT:
          • move to RADICAL-Jenkins (with one fixture)
          • this week
      • Yarn:
        • DONE IP: merge with agent_split
        • TODO IP: toward dynamic multi node (lower priority)
        • TODO AM: daemon startup over LMs?
        • WIP IP: check what (non)queue system is used on chameleon(?) cloud
          • no batch system TODO IP: open ticket for '+ssh://'
        • when is YARN release scheduled? undetermined.
      • Spark
        • HOLD GC: compare to Yarn integration
      • BW/OSG:
        • reworked the bootstrapping scheme (customized sub-agent system setup)
        • reworked the sub-agent startup mechanism
        • clean handling of network information (interfaces)
        • having fun with compiler licenses :P
        • -> at the moment, we can't get the communication channels to work between sub-agents
        • spreading out to other Crays once this is solved (WIP already)
        • SJ: OTP token for our allocation is still pending
        • titan token interaction is painful (all ssh slaves need tokens)
        • bw token can create proxies which are valid for 10+ days
    • Data Roadmap:
    • Experiments:
      • micro vs. macro benchmarks
      • profile status
    • Publications:
    • AOB:
      • CECAM Tutorial
        • online documentation vs. online tutorial
        • begin to work on interactive examples (which involve user activity)
          • how to submit n tasks of size A and m tasks of size B, toward hosts X and Y
          • TODO AT: simple repex example
            • TODO AT: check with SJ about suitable example / exercise mode
          • TODO VB: simple MD example
          • TODO AM: simple RP example
        • execution env, software stack, applications/libraries
        • TODO AM: assign documentation tickets (Ming, Nikhil, MT, SJ, MS, AM)
      • SC15 Tutorial
  • Notes: *
Clone this wiki locally