Join GitHub today
GitHub is home to over 36 million developers working together to host and review code, manage projects, and build software together.Sign up
Update Gloo api for data layer #1120
Updated the AllGather(), AllReduce(), and Broadcast() operations in the data layer using unified gloo apis.
Benchmark with MPI (Matric: processing image per second based on ResNet50)
Above results are executing entirely on CPU. For better demonstration, we calculate the gradient on GPU and do allreduce on CPU. The results are as follow:
9 times, most recently
Jun 3, 2019
3 times, most recently
Jun 4, 2019
alsrgv left a comment
Left a few comments, and have a question about the benchmark: is it done on CPU servers? The values look too small. Does it mean that with batch size 64 you'd only allreduce every 20 seconds? If that's the case, the variability is likely not due to the allreduce algorithm, but due to variability of execution on CPUs (due to thread starvation or noisy neighbors).
I'd suggest running benchmarks on GPU (but keeping allreduce on CPU).