Skip to content
forked from Rivendile/byteps

A high performance and generic framework for distributed DNN training

License

Notifications You must be signed in to change notification settings

netx-repo/byteps

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

How Useful is Communication Scheduling for Distributed Training?

Introduction

This is the repository of paper "How Useful is Communication Scheduling for Distributed Training?".
The code is forked from BytePS.
As for PS and all-reduce, please use the code in branch bytescheduler. Please refer to the readme in bytescheduler of branch bytescheduler for detailed usage.
As for BytePS, please use the code in branch master. Please refer to xxx for detailed usage.

Content

Testing scripts to repoduce the results in out paper.

Environment requirement

We have used EC2 Image: Deep Learning Base AMI (Ubuntu 18.04) Version 32.0 ami-0404ddec9491a5a31 with CUDA 10.0
Belows are enviroment setup scripts from docker images with BytePS/Bytescheduler and MxNet/Pytorch/TensorFlow(TF environment needs to add some operators...).

How to reproduce the results

Make sure that each machine can be connected from any other machines. You can edit the enviorment varible to modify your experiment content. We have provided our sample settings in each script file.

For horovod 0.16.1, default cycle-time 5ms is too long, resulting in long pauses between all-reduce calls, 1-2ms might be more suitable (specific value is determined by your machines). Similarly, the fusion buffer threshold of 64MB is often too small for models such as ResNet50 with fp32 gradients, change it to 128MB can significantly improve the throughput.

Contact

For any questions, please contact zhaoyh98 at pku dot edu dot cn,

About

A high performance and generic framework for distributed DNN training

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Python 88.6%
  • Jupyter Notebook 5.8%
  • C++ 3.7%
  • Shell 1.3%
  • Groovy 0.4%
  • Makefile 0.1%
  • Other 0.1%