Skip to content
Mihir Shete edited this page Oct 29, 2016 · 3 revisions

Introduction

Massive amounts of data are generated around the globe everyday. Organizations collect and store this data on geographically distributed data centers. The combination of these large datasets with faster computer processors and parallel computing capabilities on the data centers, made large-scale complex machine learning models with unprecedented accuracy a reality [6]. A lot of research effort has gone into solving challenging algorithmic and systems problems that arise when training these massive models in a single data center [1]. Training these models models across multiple data centers, where each data center has different data, is a problem that has not received much attention until recently [7]. Nonetheless, ML over geo-distributed data is of particular interest because larger and more diverse datasets can benefit ML models applied to areas such as recommender systems, finance, and predictive maintenance.

ML Over Geo-Distributed Data: Challenges

The main challenge of training a machine learning model over geo-distributed data is the communication bottleneck between data centers. The communication bottleneck is aggravated by the large quantities of data that organizations have to deal with. Realistic quantities of data can be anywhere from 1TB to 1PB and the models generated by various algorithms are known to have 10^9 to 10^12 parameters [3]. Such models do not fit into the memory of a single compute node and have to be shared across various nodes.

Parameter servers were developed to facilitate dealing with large models trained on even larger datasets. Since the introduction of parameter servers [3,4], they have gained a lot of traction in academia and industry. In this model there is a pool of nodes called parameter servers. Each server stores a range of parameters of the model as dense or sparse vectors and matrices. Worker nodes read the training data and calculate a gradient to the model. The crux of the idea here is that for every iteration of calculating the gradient the worker nodes do not need access to all the parameters. So the workers query the servers for required parameters, calculate the subgradient and send it over to the servers, which aggregate subgradients from various workers and calculate new model parameters.

The Parameter Server model has been extensively applied for running distributed machine learning algorithms on large datasets in one datacenter. But as far as we are aware there is no research on extending this model to work with large data distributed across multiple datacenters. So we propose a line of research which aims to evaluate how this model extends to a geo-distributed setting.

Geo-Distributed Parameter Server: Proposal

In this project we will leverage the idea of parameter server for geo-distributed Machine Learning. We will build a hierarchical and a peer to peer architecture where each datacenter will have a pool of local parameter servers that exchange model information with other parameter servers.

We will leverage MXNet Machine Learning framework [2] and the ps-lite [3,4] parameter server for this study and build hierarchical and peer to peer geo-distributed parameter server architectures with various model transmission and aggregation strategies for evaluation. Our evaluation will center around measuring training loss, training time, and network bandwidth trade-offs for the geo-distributed architectures using algorithms like Distributed Subgradient Descent and Sparse Logistic Regression.

References

  1. Jeffrey Dean, Greg S. Corrado, Rajat Monga, Kai Chen, Matthieu Devin, Quoc V. Le, Mark Z. Mao, Marc’Aurelio Ranzato, Andrew Senior, Paul Tucker, Ke Yang, Andrew Y. Ng. Large Scale Distributed Deep Networks. NIPS, 2012.
  2. Tianqi Chen, Mu Li, Yutian Li, Min Lin, Naiyan Wang, Minjie Wang, Tianjun Xiao, Bing Xu, Chiyuan Zhang, Zheng Zhang. MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems. NIPS Workshop on Machine Learning Systems, 2016.
  3. Mu Li, David G. Andersen, Jun Woo Park, Alexander J. Smola, Amr Ahmed, Vanja Josifovski, James Long, Eugene J. Shekita, Bor-Yiing Su. Scaling Distributed Machine Learning with the Parameter Server. OSDI, 2014.
  4. Mu Li, David G. Andersen, Alexander Smola, and Kai Yu. Communication Efficient Distributed Machine Learning with the Parameter Server.
  5. Ilan Lobel and Asuman Ozdaglar. Distributed Subgradient Methods for Convex Optimization Over Random Networks. IEEE Transactions on Automatic Control, 2011.
  6. Tim Dettmers. Deep Learning in a Nutshell: History and Training. NVidia Parallel for all blog, 2015.
  7. Ignacio Cano, Markus Weimer, Dhruv Mahajan, Carlo Curino, Giovanni Matteo Fumarola. Towards Geo-Distributed Machine Learning. Arxiv, 2016.
Clone this wiki locally