-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Would pangeo application using Infiniband based cluster speed up using RDMA optimised communication lib? #43
Comments
This is great, @tinaok! Thanks for the ping. By the way, my colleague (@halehawk) is working on some stuff in a fork of this repository and is planning on doing a merge at some point in the future. One of the things @halehawk is working on is a platform service to hold all benchmarking results/plots submitted from other people using the same benchmarking utility. She's done some thing to address other issues in this repo, too. Anyway, I am hopeful that after the merge, we can collaborate on this and maybe get some benchmarking measurements with Dask+Infiniband! |
@tinaok @kmpaul, it is a good idea. Just I am wondering if Dask works with
RDMA optimised communication lib or not, if not, how many efforts need to
make it available?
…On Tue, Jan 26, 2021 at 10:07 AM Kevin Paul ***@***.***> wrote:
This is great, @tinaok <https://github.com/tinaok>! Thanks for the ping.
By the way, my colleague ***@***.*** <https://github.com/halehawk>) is
working on some stuff in a fork of this repository
<https://github.com/NCAR/benchmarking> and is planning on doing a merge
at some point in the future. One of the things @halehawk
<https://github.com/halehawk> is working on is a platform service to hold
all benchmarking results/plots submitted from other people using the same
benchmarking utility. She's done some thing to address other issues in this
repo, too.
Anyway, I am hopeful that after the merge, we can collaborate on this and
maybe get some benchmarking measurements with Dask+Infiniband!
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#43 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACAPEFHP6JFJ43MEMPS6JELS33ZGDANCNFSM4WTEZBLQ>
.
|
Basic installation of pangeo on infiniband cluster, use Tcp ip communication. Thus not benefitting from it's 'real' high speed /band width communication. Using RDMA connection between dask clients , running on an infiniband based cluster, should speed up it's communication..
There are benchmarks on infiniband cluster with GPU's using UCXPY or MPI4Dask. (https://blog.dask.org/2019/06/09/ucx-dgx, https://www.hpcadvisorycouncil.com/events/2020/australia-conference/pdf/HighPerfDeepMachineLearnonHPCSyst_010920_DKPanda.pdf, slide 46-47, http://hibd.cse.ohio-state.edu/features/#mpi4dask)
Our pangeo bench is based on CPU, and results we have in our repo uses infiniband based HPC clusters. Benchmarking of pangeo, for communication-bound (like rechunking, ..) may get speed up.
The text was updated successfully, but these errors were encountered: