Reading notes of "Communication Efficient Distributed Machine Learning with the Parameter Server" By Li #138

zou000 · 2018-12-28T02:43:00Z

Communication Efficient Distributed Machine Learning with the Parameter Server

This paper proposed several ideas to optimize communications between workers and parameter servers. It demonstrated the efficacy of the approach by implementing Proximal Gradient Method in this framework, and also mathematically proved that the method converges with bounded delay PS (see below). The most important things for us, I think, is the categorization of PS.

Sequential distributed SGD

Worker loop, at iteration k

Load training dataset
Compute gradient g(k) using model m(k)
Push g(k) to PS
Pull m(k+1) from PS

PS loop, at iteration k

Sum g(k) from all workers
Compute m(k+1) from m(k) and the gradient sum

Relaxations

The sequential approach's latency is dominated by the slowest worker. There are two possible relaxations

Eventual consistency

This is essentially the same as Downpour SGD is DistBelif, where both PS and workers are asynchronous.

Bounded Delay

This method limits the staleness of the parameters, e.g. a worker will block until all parameters computation from τ times ago are finished.

Authors also observed that with delay bound increases, the learning rate should be decreased to ensure convergence.

Other Communication-Saving Techniques

The Authors also proposed several methods to save bandwidths cost of transmitting parameters.

Only push parameters with significant change.
Only push a random subset of parameters
(In proximal GD) Only push gradients that affect parameter computation on PS
Cache range of keys and use hash of the range to pull parameters
Use lossy or lossless compression on parameter values.

The text was updated successfully, but these errors were encountered:

wangkuiyi mentioned this issue Dec 28, 2018

An exhaustive survey of distributed optimization algorithms for deep learning #114

Closed

zou000 closed this as completed May 21, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reading notes of "Communication Efficient Distributed Machine Learning with the Parameter Server" By Li #138

Reading notes of "Communication Efficient Distributed Machine Learning with the Parameter Server" By Li #138

zou000 commented Dec 28, 2018

Reading notes of "Communication Efficient Distributed Machine Learning with the Parameter Server" By Li #138

Reading notes of "Communication Efficient Distributed Machine Learning with the Parameter Server" By Li #138

Comments

zou000 commented Dec 28, 2018

Communication Efficient Distributed Machine Learning with the Parameter Server

Sequential distributed SGD

Relaxations

Eventual consistency

Bounded Delay

Other Communication-Saving Techniques