Fix final callbacks for reentrant backwards #35066

mrshenli · 2020-03-19T21:34:02Z

Stack from ghstack:

[WIP] Support delay allreduce in DDP #35083 [WIP] Support delay allreduce in DDP
Fix final callbacks for reentrant backwards #35066 Fix final callbacks for reentrant backwards

Prior to this commit, final_callbacks_ are cleared on exit of ANY
backward. When using reentrant backward, the last backward would
remove all callbacks from the engine. However, this might lead to
unexpected behavior. For example, the application could install
a final callback after forward, and expecting this callback to fire
when all gradients are ready. If there is a renentrant backward on
a subgraph, it would fire the callback and delete it on exit,
meaning that when fired, not all gradients are ready.

Failed Attempt

The 1st attempt was trying to move the callback to the GraphTask
in engine::execute(). However, this failed because more callbacks
could be installed during backward pass.

Current Solution

Final callbacks are stored as a member variable in the GraphTask.

Insertion: use the thread_local current_graph_task to find the
target GraphTask, and append final callback.
Deletion: final callbacks have the same lifetime as a GraphTask
Execution: Use the GraphTask provided in the argument to find
final callbacks.

Differential Revision: D20546474

Closes #24965 Prior to this commit, final_callbacks_ are cleared on exit of ANY backward. When using reentrant backward, the last backward would remove all callbacks from the engine. However, this might lead to unexpected behavior. For example, the application could install a final callback after forward, and expecting this callback to fire when all gradients are ready. If there is a renentrant backward on a subgraph, it would fire the callback and delete it on exit, meaning that when fired, not all gradients are ready. The 1st attempt was trying to move the callback to the GraphTask in engine::execute(). However, this failed because more callbacks could be installed during backward pass. Final callbacks are stored as part of the context for each reentrant backward. Thisi PR converts final_callbacks_ into a map that maps reentrant depth to the callbacks installed during the lifetime of the corresponding backward. More specifically: * Insertion: when inserting callbacks, queue_callback() uses the thread_local total_depth to find the vector to insert into. * Deletion: the ClearCallbacks guard remembers the depth of the current backward call, and only erases the corresponding vector from the map on exit. * Execution: at the end of backward with depth x, the engine runs all callbacks installed for depth smaller or equal to x. This design decision tries to stay consistent with the previous version where we only have a single global final_callbacks_ vector, which is executed and then cleared by the last reentrant backward. Note that: (1) Using a map instead of a vector to avoid the overhead of creating an empty vector for each depth, while the callbacks might be only installed in a small subset of those reentrant backward calls. (2) The execution order should still be the same as the installation order as the engine will not resume working on depth x before finishing depth x + 1 and all previously installed callbacks in depth x are run before callbacks installed in depth x + 1. [ghstack-poisoned]

Closes #24965 Prior to this commit, final_callbacks_ are cleared on exit of ANY backward. When using reentrant backward, the last backward would remove all callbacks from the engine. However, this might lead to unexpected behavior. For example, the application could install a final callback after forward, and expecting this callback to fire when all gradients are ready. If there is a renentrant backward on a subgraph, it would fire the callback and delete it on exit, meaning that when fired, not all gradients are ready. The 1st attempt was trying to move the callback to the GraphTask in engine::execute(). However, this failed because more callbacks could be installed during backward pass. Final callbacks are stored as part of the context for each reentrant backward. Thisi PR converts final_callbacks_ into a map that maps reentrant depth to the callbacks installed during the lifetime of the corresponding backward. More specifically: * Insertion: when inserting callbacks, queue_callback() uses the thread_local total_depth to find the vector to insert into. * Deletion: the ClearCallbacks guard remembers the depth of the current backward call, and only erases the corresponding vector from the map on exit. * Execution: at the end of backward with depth x, the engine runs all callbacks installed for depth smaller or equal to x. This design decision tries to stay consistent with the previous version where we only have a single global final_callbacks_ vector, which is executed and then cleared by the last reentrant backward. Note that: (1) Using a map instead of a vector to avoid the overhead of creating an empty vector for each depth, while the callbacks might be only installed in a small subset of those reentrant backward calls. (2) The execution order should still be the same as the installation order as the engine will not resume working on depth x before finishing depth x + 1 and all previously installed callbacks in depth x are run before callbacks installed in depth x + 1. ghstack-source-id: 0858bbd Pull Request resolved: #35066

dr-ci · 2020-03-20T00:35:22Z

💊 CircleCI build failures summary and remediations

As of commit 8cf2130 (more details on the Dr. CI page):

✅ None of the build failures appear to be your fault 💚

2/2 broken upstream at merge base 2623448 on Mar 24 from 9:16pm to 11:59pm (7 commits; fce6780 - a7f8655)
Please rebase on the viable/strict branch (expand for instructions)

Since your merge base is older than viable/strict, run these commands:
```
git fetch https://github.com/pytorch/pytorch viable/strict
git rebase FETCH_HEAD
```
Check out the recency history of this "viable master" tracking branch.

🚧 2 upstream failures:

These were probably caused by upstream breakages:

pytorch_macos_10_13_py3_test on Mar 24 from 8:56pm to 11:24pm (8 commits; 6b5740c - ee7cd84)
pytorch_linux_xenial_cuda10_2_cudnn7_py3_gcc7_test on Mar 24 from 9:16pm to 11:59pm (7 commits; fce6780 - a7f8655)

This comment was automatically generated by Dr. CI (expand for details).

Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker.

This comment has been revised 32 times.

Closes #24965 Prior to this commit, final_callbacks_ are cleared on exit of ANY backward. When using reentrant backward, the last backward would remove all callbacks from the engine. However, this might lead to unexpected behavior. For example, the application could install a final callback after forward, and expecting this callback to fire when all gradients are ready. If there is a renentrant backward on a subgraph, it would fire the callback and delete it on exit, meaning that when fired, not all gradients are ready. ### Failed Attempt The 1st attempt was trying to move the callback to the GraphTask in engine::execute(). However, this failed because more callbacks could be installed during backward pass. ### Solution in This PR Final callbacks are stored as part of the context for each reentrant backward. Thisi PR converts final_callbacks_ into a map that maps reentrant depth to the callbacks installed during the lifetime of the corresponding backward. More specifically: * Insertion: when inserting callbacks, queue_callback() uses the thread_local total_depth to find the vector to insert into. * Deletion: the ClearCallbacks guard remembers the depth of the current backward call, and only erases the corresponding vector from the map on exit. * Execution: at the end of backward with depth x, the engine runs all callbacks installed for depth smaller or equal to x. This design decision tries to stay consistent with the previous version where we only have a single global final_callbacks_ vector, which is executed and then cleared by the last reentrant backward. Note that: (1) Using a map instead of a vector to avoid the overhead of creating an empty vector for each depth, while the callbacks might be only installed in a small subset of those reentrant backward calls. (2) The execution order should still be the same as the installation order as the engine will not resume working on depth x before finishing depth x + 1 and all previously installed callbacks in depth x are run before callbacks installed in depth x + 1. Differential Revision: [D20546474](https://our.internmc.facebook.com/intern/diff/D20546474) [ghstack-poisoned]

Closes #24965 Prior to this commit, final_callbacks_ are cleared on exit of ANY backward. When using reentrant backward, the last backward would remove all callbacks from the engine. However, this might lead to unexpected behavior. For example, the application could install a final callback after forward, and expecting this callback to fire when all gradients are ready. If there is a renentrant backward on a subgraph, it would fire the callback and delete it on exit, meaning that when fired, not all gradients are ready. The 1st attempt was trying to move the callback to the GraphTask in engine::execute(). However, this failed because more callbacks could be installed during backward pass. Final callbacks are stored as part of the context for each reentrant backward. Thisi PR converts final_callbacks_ into a map that maps reentrant depth to the callbacks installed during the lifetime of the corresponding backward. More specifically: * Insertion: when inserting callbacks, queue_callback() uses the thread_local total_depth to find the vector to insert into. * Deletion: the ClearCallbacks guard remembers the depth of the current backward call, and only erases the corresponding vector from the map on exit. * Execution: at the end of backward with depth x, the engine runs all callbacks installed for depth smaller or equal to x. This design decision tries to stay consistent with the previous version where we only have a single global final_callbacks_ vector, which is executed and then cleared by the last reentrant backward. Note that: (1) Using a map instead of a vector to avoid the overhead of creating an empty vector for each depth, while the callbacks might be only installed in a small subset of those reentrant backward calls. (2) The execution order should still be the same as the installation order as the engine will not resume working on depth x before finishing depth x + 1 and all previously installed callbacks in depth x are run before callbacks installed in depth x + 1. ghstack-source-id: 34c4715 Pull Request resolved: #35066

Closes #24965 Prior to this commit, final_callbacks_ are cleared on exit of ANY backward. When using reentrant backward, the last backward would remove all callbacks from the engine. However, this might lead to unexpected behavior. For example, the application could install a final callback after forward, and expecting this callback to fire when all gradients are ready. If there is a renentrant backward on a subgraph, it would fire the callback and delete it on exit, meaning that when fired, not all gradients are ready. ### Failed Attempt The 1st attempt was trying to move the callback to the GraphTask in engine::execute(). However, this failed because more callbacks could be installed during backward pass. ### Solution in This PR Final callbacks are stored as part of the context for each reentrant backward. Thisi PR converts final_callbacks_ into a map that maps reentrant depth to the callbacks installed during the lifetime of the corresponding backward. More specifically: * Insertion: when inserting callbacks, queue_callback() uses the thread_local total_depth to find the vector to insert into. * Deletion: the ClearCallbacks guard remembers the depth of the current backward call, and only erases the corresponding vector from the map on exit. * Execution: at the end of backward with depth x, the engine runs all callbacks installed for depth smaller or equal to x. This design decision tries to stay consistent with the previous version where we only have a single global final_callbacks_ vector, which is executed and then cleared by the last reentrant backward. Note that: (1) Using a map instead of a vector to avoid the overhead of creating an empty vector for each depth, while the callbacks might be only installed in a small subset of those reentrant backward calls. (2) The execution order should still be the same as the installation order as the engine will not resume working on depth x before finishing depth x + 1 and all previously installed callbacks in depth x are run before callbacks installed in depth x + 1. Differential Revision: [D20546474](https://our.internmc.facebook.com/intern/diff/D20546474) [ghstack-poisoned]

Closes #24965 Prior to this commit, final_callbacks_ are cleared on exit of ANY backward. When using reentrant backward, the last backward would remove all callbacks from the engine. However, this might lead to unexpected behavior. For example, the application could install a final callback after forward, and expecting this callback to fire when all gradients are ready. If there is a renentrant backward on a subgraph, it would fire the callback and delete it on exit, meaning that when fired, not all gradients are ready. ### Failed Attempt The 1st attempt was trying to move the callback to the GraphTask in engine::execute(). However, this failed because more callbacks could be installed during backward pass. ### Solution in This PR Final callbacks are stored as part of the context for each reentrant backward. This PR converts final_callbacks_ into a map that maps reentrant depth to the callbacks installed during the lifetime of the corresponding backward. More specifically: * Insertion: when inserting callbacks, queue_callback() uses the thread_local total_depth to find the vector to insert into. * Deletion: the ClearCallbacks guard remembers the depth of the current backward call, and only erases the corresponding vector from the map on exit. * Execution: at the end of backward with depth x, the engine only executes callbacks installed during the corresponding backward with depth x. Note that: Using a map instead of a vector to avoid the overhead of creating an empty vector for each depth, while the callbacks might be only installed in a small subset of those reentrant backward calls. Differential Revision: [D20546474](https://our.internmc.facebook.com/intern/diff/D20546474) [ghstack-poisoned]

Closes #24965 Prior to this commit, final_callbacks_ are cleared on exit of ANY backward. When using reentrant backward, the last backward would remove all callbacks from the engine. However, this might lead to unexpected behavior. For example, the application could install a final callback after forward, and expecting this callback to fire when all gradients are ready. If there is a renentrant backward on a subgraph, it would fire the callback and delete it on exit, meaning that when fired, not all gradients are ready. **Failed Attempt** The 1st attempt was trying to move the callback to the GraphTask in engine::execute(). However, this failed because more callbacks could be installed during backward pass. **Current Solution** Final callbacks are stored as a member variable in the GraphTask. * Insertion: use the thread_local current_graph_task to find the target GraphTask, and append final callback. * Deletion: final callbacks have the same lifetime as a GraphTask * Execution: Use the GraphTask provided in the argument to find final callbacks. Differential Revision: [D20546474](https://our.internmc.facebook.com/intern/diff/D20546474) [ghstack-poisoned]

ezyang · 2020-03-25T14:35:46Z

torch/csrc/autograd/engine.cpp


+// The guard that sets and restores current_graph_task.
+struct GraphTaskGuard {
+  GraphTaskGuard(const std::shared_ptr<GraphTask>& graph_task) {


You're taking ownership of GraphTask, so you should accept graph_task as shared_ptr by value (not const ref)

Closes #24965 Prior to this commit, final_callbacks_ are cleared on exit of ANY backward. When using reentrant backward, the last backward would remove all callbacks from the engine. However, this might lead to unexpected behavior. For example, the application could install a final callback after forward, and expecting this callback to fire when all gradients are ready. If there is a renentrant backward on a subgraph, it would fire the callback and delete it on exit, meaning that when fired, not all gradients are ready. **Failed Attempt** The 1st attempt was trying to move the callback to the GraphTask in engine::execute(). However, this failed because more callbacks could be installed during backward pass. **Current Solution** Final callbacks are stored as a member variable in the GraphTask. * Insertion: use the thread_local current_graph_task to find the target GraphTask, and append final callback. * Deletion: final callbacks have the same lifetime as a GraphTask * Execution: Use the GraphTask provided in the argument to find final callbacks. Differential Revision: [D20546474](https://our.internmc.facebook.com/intern/diff/D20546474) [ghstack-poisoned]

albanD

Looks good. Thanks for making all the changes :)

torch/csrc/autograd/engine.cpp

Closes #24965 Prior to this commit, final_callbacks_ are cleared on exit of ANY backward. When using reentrant backward, the last backward would remove all callbacks from the engine. However, this might lead to unexpected behavior. For example, the application could install a final callback after forward, and expecting this callback to fire when all gradients are ready. If there is a renentrant backward on a subgraph, it would fire the callback and delete it on exit, meaning that when fired, not all gradients are ready. **Failed Attempt** The 1st attempt was trying to move the callback to the GraphTask in engine::execute(). However, this failed because more callbacks could be installed during backward pass. **Current Solution** Final callbacks are stored as a member variable in the GraphTask. * Insertion: use the thread_local current_graph_task to find the target GraphTask, and append final callback. * Deletion: final callbacks have the same lifetime as a GraphTask * Execution: Use the GraphTask provided in the argument to find final callbacks. Differential Revision: [D20546474](https://our.internmc.facebook.com/intern/diff/D20546474) [ghstack-poisoned]

facebook-github-bot · 2020-03-25T22:18:11Z

@mrshenli merged this pull request in c9117f2.

mrshenli requested review from albanD and apaszke as code owners March 19, 2020 21:34

mrshenli mentioned this pull request Mar 19, 2020

Reduce the number of iterations in test_autograd_context #35037

Closed

mrshenli requested a review from ezyang March 19, 2020 21:37

AshkanAliabadi mentioned this pull request Mar 20, 2020

Update XNNPACK to Git revision 1b354636b5942826547055252f3b359b54acff95. #35081

Closed

mrshenli mentioned this pull request Mar 20, 2020

[WIP] Support delay allreduce in DDP #35083

Closed

mrshenli added 6 commits March 20, 2020 09:17

ezyang reviewed Mar 25, 2020

View reviewed changes

ezyang approved these changes Mar 25, 2020

View reviewed changes

albanD approved these changes Mar 25, 2020

View reviewed changes

torch/csrc/autograd/engine.cpp Outdated Show resolved Hide resolved

facebook-github-bot closed this in c9117f2 Mar 25, 2020

facebook-github-bot added the merged label Mar 25, 2020

facebook-github-bot deleted the gh/mrshenli/137/head branch March 29, 2020 14:18

mruberry added the Merged label Oct 28, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix final callbacks for reentrant backwards #35066

Fix final callbacks for reentrant backwards #35066

Uh oh!

mrshenli commented Mar 19, 2020 •

edited

Loading

Uh oh!

dr-ci bot commented Mar 20, 2020 •

edited

Loading

Uh oh!

ezyang Mar 25, 2020

Uh oh!

albanD left a comment

Uh oh!

Uh oh!

facebook-github-bot commented Mar 25, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Fix final callbacks for reentrant backwards #35066

Fix final callbacks for reentrant backwards #35066

Uh oh!

Conversation

mrshenli commented Mar 19, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dr-ci bot commented Mar 20, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

💊 CircleCI build failures summary and remediations

🚧 2 upstream failures:

Uh oh!

ezyang Mar 25, 2020

Choose a reason for hiding this comment

Uh oh!

albanD left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

facebook-github-bot commented Mar 25, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

mrshenli commented Mar 19, 2020 •

edited

Loading

dr-ci bot commented Mar 20, 2020 •

edited

Loading