An example of data parallelism and async updates of parameter in tensorflow.
Switch branches/tags
Nothing to show
Clone or download
ischlag Update
makes clear this repo is deprecated
Latest commit 4e81805 Aug 1, 2018
Failed to load latest commit information. Update Aug 1, 2018 Update Jul 13, 2017

Distributed Tensorflow 1.2 Example (DEPRECATED)

Using data parallelism with shared model parameters while updating parameters asynchronous. See comment for some changes to make the parameter updates synchronous (not sure if the synchronous part is implemented correctly though).

Trains a simple sigmoid Neural Network on MNIST for 20 epochs on three machines using one parameter server. The goal was not to achieve high accuracy but to get to know tensorflow.

Run it like this:

First, change the hardcoded host names with your own and run the following commands on the respective machines.

pc-01$ python --job_name="ps" --task_index=0 
pc-02$ python --job_name="worker" --task_index=0 
pc-03$ python --job_name="worker" --task_index=1 
pc-04$ python --job_name="worker" --task_index=2 

Thanks to snowsquizy for updating the script to TensorFlow 1.2.