Optcast

Optcast is an implementation of a reduction server written in Rust, specifically designed for enhanced performance in distributed machine learning environments utilizing the NVIDIA NCCL library for the AllReduce collective operation. Although still in its prototype stage, Optcast has achieved a 50% speed improvement in NCCL's AllReduce operation under certain conditions.

Ring-Allreduce vs Reduction Server

AllReduce is a crucial collective communication operation that is frequently used in distributed machine learning, and optimizing it is a vital concern.

In distributed machine learning environments with NVIDIA GPUs, NCCL commonly implements AllReduce using the Ring-AllReduce algorithm. Ring-AllReduce involves a ring-connected sequence of GPU nodes, each adding the gradients received from the preceding node to its own and passing them on to the next node. This algorithm is simple and efficiently utilizes network bandwidth in many environments, but it requires transferring twice the data volume of the gradients.

On the other hand, a reduction server is a server that simply receives the gradients from each GPU node, adds them together, and sends them back to all GPU nodes. Using reduction servers requires setting up separate servers for reduction, but it can theoretically double the processing speed of AllReduce since it only needs to transfer half the data volume compared to Ring-AllReduce.

For more detailed explanations of Ring-AllReduce and reduction servers, refer to this blog article.

Furthermore, it's known that Ring Allreduce may encounter precision issues, which, in principle, can be resolved by using reduction servers. Optcast has not yet evaluated the precision aspect, but it is an intriguing topic for future exploration.

Features

Implemented in Rust:
- Optcast is developed in Rust. Given its extensive use of multi-threading, choosing Rust, which boasts Fearless Concurrency, was a logical decision.
Support for Multiple Transport Protocols:
- Communication between GPU servers and the reduction server uses NCCL Net Plugin, NCCL's communication primitive library. As a result, Optcast can operate in environments supporting sockets, InfiniBand, RoCE, and EFA, as enabled by NCCL.
Utilization of Rust Portable SIMD:
- FP32 addition operations are accelerated using SIMD instructions. Rust's Portable SIMD is employed to ensure implementation is not dependent on specific CPUs.
FP16/BF16 Support:
- Basic FP16/BF16 support is provided using half-rs.
- On aarch64 processors that support the target-feature fp16 (for example, Graviton3), optimization is done through SIMD, enabling FP16 AllReduce to be performed with equivalent performance to FP32.

Similar Technologies

NVIDIA SHARP
- NVIDIA SHARP optimizes AllReduce by performing reduction operations in the InfiniBand switch, eliminating the need for a separate reduction server. SHARP has its own NCCL plugin, so using it doesn't require changes to the application code. However, currently, SHARP is only available in InfiniBand environments.
Google Vertex AI Reduction Server
- This is a reduction server available in Google Vertex AI. According to the blog article, it approximately doubled the speed of each step in BERT's fine-tuning compared to NCCL Ring AllReduce.
Parameter Servers (e.g., BytePS):
- Parameter servers are similar to reduction servers but are not limited to optimizing AllReduce. They centrally manage the parameters of a learning model. Rather than being a collective communication acceleration component, they are offered as a framework for distributed machine learning and require modifications to the application code for use.

Documentation

License

Optcast is licensed under the BSD 3-Clause License.

Name		Name	Last commit message	Last commit date
Latest commit History 51 Commits
.github/workflows		.github/workflows
docs		docs
nccl_plugin		nccl_plugin
reduction_server		reduction_server
test		test
.gitignore		.gitignore
.gitmodules		.gitmodules
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.github/workflows

.github/workflows

docs

docs

nccl_plugin

nccl_plugin

reduction_server

reduction_server

test

test

.gitignore

.gitignore

.gitmodules

.gitmodules

Dockerfile

Dockerfile

LICENSE

LICENSE

README.md

README.md

Repository files navigation

Optcast

Ring-Allreduce vs Reduction Server

Features

Similar Technologies

Documentation

License

About

Releases

Packages

Contributors 2

Languages

License

osrg/optcast

Folders and files

Latest commit

History

Repository files navigation

Optcast

Ring-Allreduce vs Reduction Server

Features

Similar Technologies

Documentation

License

About

Resources

License

Stars

Watchers

Forks

Languages