Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Would pangeo application using Infiniband based cluster speed up using RDMA optimised communication lib? #43

Open
tinaok opened this issue Jan 26, 2021 · 3 comments

Comments

@tinaok
Copy link
Collaborator

tinaok commented Jan 26, 2021

Basic installation of pangeo on infiniband cluster, use Tcp ip communication. Thus not benefitting from it's 'real' high speed /band width communication. Using RDMA connection between dask clients , running on an infiniband based cluster, should speed up it's communication..
There are benchmarks on infiniband cluster with GPU's using UCXPY or MPI4Dask. (https://blog.dask.org/2019/06/09/ucx-dgx, https://www.hpcadvisorycouncil.com/events/2020/australia-conference/pdf/HighPerfDeepMachineLearnonHPCSyst_010920_DKPanda.pdf, slide 46-47, http://hibd.cse.ohio-state.edu/features/#mpi4dask)
Our pangeo bench is based on CPU, and results we have in our repo uses infiniband based HPC clusters. Benchmarking of pangeo, for communication-bound (like rechunking, ..) may get speed up.

@kmpaul
Copy link
Collaborator

kmpaul commented Jan 26, 2021

This is great, @tinaok! Thanks for the ping.

By the way, my colleague (@halehawk) is working on some stuff in a fork of this repository and is planning on doing a merge at some point in the future. One of the things @halehawk is working on is a platform service to hold all benchmarking results/plots submitted from other people using the same benchmarking utility. She's done some thing to address other issues in this repo, too.

Anyway, I am hopeful that after the merge, we can collaborate on this and maybe get some benchmarking measurements with Dask+Infiniband!

@halehawk
Copy link
Collaborator

halehawk commented Jan 26, 2021 via email

@kmpaul
Copy link
Collaborator

kmpaul commented Jan 26, 2021

@halehawk: Yes. It sounds like (@tinaok, correct me if I'm wrong) the new Dask+Infiniband work will use RDMA optimization. Which could be a huge benefit!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants