- This is a distributed version of Tensorflow Advanced Convolutional Neural Networks Example: Tensorflow CNN example tutorial. The goal of the original tutorial aims at building a convolutional neural network (Alexnet with a few differences in the top few layers).
- The code of the original example can be found here.
- This distributed training example is based on the original code and makes a few modification (e.g., adding some training hooks) to facilitate distributed training.
cifar10_distributed.py
--contains training entry pointcifar10_input.py
--contains utility functions for handling data input
Please refer to distributed tensorflow. You need at least specify the following arguments:
--ps_hosts, --worker_hosts, --job_name, --task_index
cifar10_distributed.py
andcifar10_input.py
should be placed in the same directory when launching each training task.- Modify default arugment value and global constant in
cifar10_distributed.py
as needed.
- Tensorflow 1.5.0+
- Python 3
- Ubuntu 16