This repository has been archived by the owner on Aug 5, 2022. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 491
Multinode Training with PyCaffe
Feng Zou edited this page Apr 16, 2018
·
4 revisions
Now Intel Caffe (release 1.1.1) supports multinode training via PyCaffe interface. To speed up training by multinode on Intel CPUs, you can simply inject 2 lines of code into python code. Here takes LeNet as an example. For single node, the python code is:
...
solver = caffe.SGDSolver('examples/mnist/lenet_auto_solver.prototxt')
...
for it in range(niter):
solver.step(1) # SGD by Caffe
...
To support multinode, you need to inject 2 lines of code to initialize MultiSync object as below:
...
solver = caffe.SGDSolver('examples/mnist/lenet_auto_solver.prototxt')
***sync = caffe.MultiSync(solver)***
***sync.init()***
...
for it in range(niter):
solver.step(1) # SGD by Caffe
...
And to achieve better performance, we recommend calling update_and_forward, clear_param_diffs and backward functions, instead of step function only, to overlap gradient synchronization and update with forward:
...
# we need to call step once as test net used shared weights of train net
solver.step(1)
for it in range(niter):
sync.solver.update_and_forward()
sync.solver.clear_param_diffs()
sync.solver.backward()
solver.increment_iter()
...
Sample code for single node
...
solver = caffe.SGDSolver('examples/mnist/lenet_auto_solver.prototxt')
...
# we need to call step once as test net used shared weights of train net
solver.step(1)
for it in range(niter):
solver.net.clear_param_diffs()
solver.net.forward()
solver.net.backward()
solver.apply_update()
solver.increment_iter()
...