Investigate alternative communication backends for Heat #2270
brownbaerchen
started this conversation in
Student projects
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Background
When Heat does distributed linear algebra, a lot of data needs to be communicated between tasks. At the moment, communication is handled via MPI (Message Passing Interface), but this has a few drawbacks for GPUs. For a detailed description of issues with MPI and possible alternatives for Heat, see this issue.
In this project, you will experiment with different communication backends and explore advantages and disadvantages. Which is the fastest? Which is the most portable and actually works on all of the machines we support? Which is the best compromise overall?
You will do a deep dive on the available landscape and learn a lot about this crucial part of modern computing. Recent large machines are all powered by multi-GPU and communicating data between GPUs is as critical in making the code performant as the algorithm or other implementation details.
You will implement communication with various backends and benchmark them: Why is the communication taking so long? Is the cable bandwidth saturated or is time lost in idle waiting?
Who is this for?
This project is suitable for both Bachelor's and Master's thesis projects. Identifying the best way to move forward could be handled within the former, while actually implementing a more performant communication backend in Heat is better suited to a Master's project.
We expect you to be comfortable with the Python language, but you need not be an expert. A basic grasp of linear algebra and why distributed implementations are useful is advantageous, but not necessary to successfully complete this project. We encourage interested students from all sorts of STEM fields to apply for this project.
If this sounds interesting to you, please get in touch via this GitHub discussion. We are looking forward to hearing from you!
Beta Was this translation helpful? Give feedback.
All reactions