Makes AccumulateGrad high priority in backwards passes #7604

mruberry · 2018-05-16T00:50:28Z

This change is a little tricky, and is best understood with a simple example.

Assume you have a simple model like ab = a + b, cd = c + d, final = ab + cd and you have some loss and call backward(). Backwards will evaluate the final summation, then the cd summation, then the ab summation, then c and d, where it will accumulate the gradient into both c and d, then finally a and b where it will accumulate the gradient into both a and b. So far so good.

Now if you zero_grad() (or not) and run the model again the order of the backward pass changes and the AccumulateGrad functions are called last.

This is because when functions are created they are assigned a (monotonically increasing, thread local) sequence number which is used to prioritize them in the backwards pass. On the first pass through the graph the AccumulateGrad functions have sequence numbers near (actually just higher than) those of the first functions where the variables are used. This means that the AccumulateGrad functions are called as soon as possible on the first call to backward().

On the second pass, however, the functions other than AccumulateGrad are new functions and so are all given higher sequence numbers than the AccumulateGrad functions. This means that the AccumulateGrad functions will all be processed last, not as soon as possible.

Processing all the AccumulateGrad functions at the end of the backward pass is unfortunate when trying to overlap communication with the backward pass, like @csarofeen and @mcarilli are doing. This PR adds a new "backwards_priority" to functions that makes backward() process AccumulateGrad ASAP. This should also make the order of AccumulateGrad calls consistent across passes through the same network.

Adding backwards_priority to functions was a little tricky because the existing constructor has a default first argument. To avoid changing the current constructor I added a new one. The former constructor is now a delegating constructor to avoid code duplication. The implementation of backwards_priority is otherwise analogous to sequence_nr. Priorities during backwards are now set by backwards_priority first and sequence_nr second.

csarofeen · 2018-05-16T00:57:02Z

@teng-li This should also fix the bad perf you've been seeing while scaling gloo.

mcarilli · 2018-05-16T01:28:38Z

I've attached a minimal working sample that exposes the symptoms and hopefully illustrates our point.
example.py.txt

mruberry · 2018-05-16T01:37:27Z

The failure on linux-trusty-py3.6-gcc7.2 seems unrelated:

00:58:43 gcc -pthread -B /opt/conda/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -fPIC -I/var/lib/jenkins/workspace -I/var/lib/jenkins/workspace/torch/csrc -I/var/lib/jenkins/workspace/third_party/pybind11/include -I/var/lib/jenkins/workspace/torch/lib/tmp_install/include -I/var/lib/jenkins/workspace/torch/lib/tmp_install/include/TH -I/var/lib/jenkins/workspace/torch/lib/tmp_install/include/THNN -I/var/lib/jenkins/workspace/torch/lib/tmp_install/include/ATen -I/opt/conda/lib/python3.6/site-packages/numpy/core/include -I/var/lib/jenkins/workspace/torch/lib/tmp_install/include/THD -I/opt/conda/include/python3.6m -c torch/csrc/jit/script/compiler.cpp -o build/temp.linux-x86_64-3.6/torch/csrc/jit/script/compiler.o -D_THP_CORE -std=c++11 -Wall -Wextra -Wno-unused-parameter -Wno-missing-field-initializers -Wno-write-strings -Wno-zero-length-array -Wno-unknown-pragmas -Wno-deprecated-declarations -fno-strict-aliasing -Wno-missing-braces -Werror -DWITH_NUMPY -DWITH_DISTRIBUTED
00:58:57 error: command 'gcc' failed with exit status 1
00:58:57 + cleanup
00:58:57 + retcode=1
00:58:57 + set +x
00:58:57 Compile requests 623
00:58:57 Compile requests executed 486
00:58:57 Cache hits 423
00:58:57 Cache misses 45
00:58:57 Cache timeouts 0
00:58:57 Cache read errors 0
00:58:57 Forced recaches 0
00:58:57 Cache write errors 0
00:58:57 Compilation failures 12
00:58:57 Cache errors 6
00:58:57 Non-cacheable compilations 0
00:58:57 Non-cacheable calls 0
00:58:57 Non-compilation calls 137
00:58:57 Unsupported compiler calls 0
00:58:57 Average cache write 0.149 s
00:58:57 Average cache read miss 13.506 s
00:58:57 Average cache read hit 0.063 s
00:58:57 Cache location S3, bucket: Bucket(name=ossci-compiler-cache, base_url=http://ossci-compiler-cache.s3.amazonaws.com/)
01:08:57 + echo 'Stopping container...'
01:08:57 Stopping container...
01:08:57 + '[' -n '' ']'
01:08:57 + docker rm -f 35527c232014d0842f41df6fb7ded71ca520630334c2e02050a524bb3e965f34
01:08:58 Build step 'Execute shell' marked build as failure
01:08:58 Sending email to ezyang@fb.com pietern@fb.com willfeng@fb.com englund@fb.com.
01:09:04 Email sent.
01:09:04 Finished: FAILURE

soumith · 2018-05-16T01:45:29Z

@mruberry nice debugging to figure out the backward hooks issue. Your reputation is well-calibrated!

apaszke

Nice catch, but I think the implementation could be greatly simplified.

torch/csrc/autograd/function.h

@@ -303,6 +316,7 @@ struct Function : std::enable_shared_from_this<Function> {
  // Since `Function`s are neither copyable nor moveable, we can have const
  // fields.
  const uint64_t sequence_nr_;
+  const int backwards_priority_;


mruberry · 2018-05-16T17:24:48Z

The errors seem unrelated.

On Windows:
16:35:24 ninja: error: loading 'build.ninja': The system cannot find the file specified.

On linux-trusty-py3.6-gcc7.2:
16:48:23 error: error: failed to execute compilefailed to execute compile
16:48:23
16:48:23 caused by: caused by: error reading compile response from servererror reading compile response from server
16:48:23
16:48:23 caused by: caused by: Failed to read response headerFailed to read response header
16:48:23
16:48:23 caused by: caused by: failed to fill whole bufferfailed to fill whole buffer
16:48:23
16:48:23 error: command 'gcc' failed with exit status 2

soumith · 2018-05-17T21:49:21Z

thanks @mruberry !

… in backwards passes (pytorch/pytorch#7604) pytorch/pytorch@f229549

* upstream/master: Makes AccumulateGrad high priority in backwards passes (pytorch#7604) [C++ API] Implement builder style construction (pytorch#7597) C10D: Added TCPStore to support C10D store interface (pytorch#7560) [auto] Update onnx to ba86ec2 - Protobuf typing (onnx/onnx#982) onnx/onnx@ba86ec2 Add LBFGS optimization algorithm to C++ API (pytorch#7596)

* Makes accumulate_grad functions high priority in backwards passes * Delegating constructor and comments * Sequence_nr ain't pretty no more * Sequence_nr ain't pretty no more

mruberry added 2 commits May 15, 2018 17:06

Makes accumulate_grad functions high priority in backwards passes

ea1e9fa

Delegating constructor and comments

0145925

mruberry requested review from apaszke, colesbury, ezyang, gchanan, soumith and zdevito as code owners May 16, 2018 00:50

apaszke suggested changes May 16, 2018

View reviewed changes

mruberry added 2 commits May 16, 2018 09:26

Sequence_nr ain't pretty no more

0f9967e

Sequence_nr ain't pretty no more

9bab040

apaszke approved these changes May 16, 2018

View reviewed changes

soumith merged commit f229549 into pytorch:master May 17, 2018

onnxbot added a commit to onnxbot/onnx-fb-universe that referenced this pull request May 17, 2018

[auto] Update pytorch to f229549 - Makes AccumulateGrad high priority…

47166f4

… in backwards passes (pytorch/pytorch#7604) pytorch/pytorch@f229549

mcarilli mentioned this pull request Jul 25, 2018

The overlapping of communication over computation seems not been realized due to the GPU log NVIDIA/apex#33

Closed

ezyang added the open source label Jun 24, 2019

mruberry deleted the backwards_accumulate_timing branch August 19, 2019 22:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Makes AccumulateGrad high priority in backwards passes #7604

Makes AccumulateGrad high priority in backwards passes #7604

mruberry commented May 16, 2018

csarofeen commented May 16, 2018

mcarilli commented May 16, 2018 •

edited

Loading

mruberry commented May 16, 2018

soumith commented May 16, 2018

apaszke left a comment

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

mruberry commented May 16, 2018

soumith commented May 17, 2018

Makes AccumulateGrad high priority in backwards passes #7604

Makes AccumulateGrad high priority in backwards passes #7604

Conversation

mruberry commented May 16, 2018

csarofeen commented May 16, 2018

mcarilli commented May 16, 2018 • edited Loading

mruberry commented May 16, 2018

soumith commented May 16, 2018

apaszke left a comment

Choose a reason for hiding this comment

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

mruberry commented May 16, 2018

soumith commented May 17, 2018

mcarilli commented May 16, 2018 •

edited

Loading