Skip to content

2021.05.13 Meeting Notes

Joshua S Brown edited this page May 14, 2021 · 3 revisions

Agenda

  • Individual/group updates
  • Status of Argonne GPU Hackathon
  • Paper
  • Review non-WIP PRs

Individual/group updates

LANL CS

Joshua Brown

  • Currently, running weak scaling test with Jonas HDF5 compression implementation
  • Submitted a pr to add mpi-performance-regression tests
  • Submitted a pr to fix scheduled ci runs of performance regression tests
  • Code review

Jonas Lippuner

  • Been redoing the sparse interface after getting feedback from Jonah
  • Moved communication into the initialization
  • Sparse variables will now be the same as cell variables, metadata now exists everywhere, every block and rank will have the same structure.
  • Also been working on GPU hackathon

LANL Physics

Jonah

  • GPU Hackathon
  • Spent time testing performance in Phoebus instead of riot because need sparse pr to be merged before can use it fully in RIOT.
  • Spent time analyzing why Parthenon actually became slower after implementing a fix. Discovered that old machinery was being called instead of newer faster implementation. Fixed the problem and saw a factor of 10 improvement on 16 cubed cells.
  • Hackathon improvements look to have improved the remeshing algorithm by about 30 %

Ben Ryan

  • Been working on downstream code with Phoebus seems to be working well.

AthenaPK

Philipp Grete

  • Busy with implementing 3 components from GPU Hackathon.
  • Worked on the remeshing functionality of parthenon, previously a large number of allocations were made for buffers, this was slow. With the new pr, instead a single large allocation is made.
  • Buffer pack in one - was a drop in replacement for the communication routines, now also possible with cell centered variables. There is also a new CMake variable that turns buffer packing machinery on.
  • Been looking into the buffer packing function and playing around with different patterns which perform significantly better or worse depending on the whether they are being run on the device or host.

Forrest Glines

  • Has been helping with each of the pr's that were submitted from the Hackathon
  • Has been working on some performance improvements in the python regression testing framework to make downstream integration with AthenaPK easier.
  • Also worked to speed up the python HDF5 diff script, from 5 minutes to a second on some files.

Agenda

Hackathon

  • Around 8 hours were spent searching fro a bug that was actually not a bug. It was noticed when the team was changing from creating multiple allocations to a single large allocation for the buffers that more memory was being used than should have. Max from Nvidia discovered that the main problem was nvidia devices were doing page management, because the large buffer allocation was a little over 2 megabytes it was creating a new page which was barely being utilized. Subsequent allocations were not small enough to backfill the new page which was leading to wasted memory. The solution is to use a memory pool. Max suggested waiting - seems to be movement on Nvidia's side for implementing something behind the scenes in Kokkos.

  • Another result that came out of the Hackathon that bares looking into was the difference in performance caused from using 1 vector with 10 components vs 10 vectors each with 1 component. For some reason the 10 vectors with 1 component was performing better, might have something todo with inner loop access patterns.

Paper

TODO

  • Add formatting capability to python scripts - Josh
Clone this wiki locally