Skip to content
vectorflux edited this page Aug 26, 2015 · 6 revisions

PI: Fabien Delalondre

Team:

  • Fabien Delalondre
  • Timothée Ewart
  • Pramod Kumbhar
  • Aleksandr Ovcharenko
  • Ben Cumming (mentor)
  • Jakob Progsch (mentor)

Institution: École polytechnique fédérale de Lausanne EPFL

Application Area: The CoreNeuron application is used to simulate the electrical activity of networks of morphologically detailed neuronal networks.

Strategy for the Hackathon: Fabien and Timothee worked on porting four of the most representative kernels of CoreNeuron on GPU using OpenACC directives. In the mean time, Pramod and Aleksandr worked on porting CoreNeuron workflow using OpenACC (modification of data structures, copy back and forth between CPU and GPU memory). At certain points in time, both subteams were synchronizing so that kernel new implementation would be integrated into the scientific application.

Approach:

Split into two teams: one to port a MiniApp to OpenACC, the other to port data / workflow to accelerators. Used OpenACC 2.0 APIs.

Preliminary results: The memory structure of the code has been appropriately adapted. The first GPU kernels shows a 8x speed up compared to a single CPU thread execution. Further developments are needed to a) take into accounts kernels having dependencies between each other to ensure result correctness and b) improve performance of the ported kernels. Initially GPU 1.4x slower than code running on 1 full 8-threaded node. Solver which consumes 5% of computation time, becomes bottleneck, needs to be ported to GPUs (future). Used multiple threads to launch kernels: with 8 threads code is not 1.6x faster on GPU.

Issues:

  • complex data structure, but fortunately the leaves are contiguous arrays
  • Cray: lack of compiler information, Craypat/perftools has limitations for excluding routines.
  • PGI: vectorization and inlining

Future:

Port solver to GPU.

Final Presentation