Skip to content

Conversation

zhaojuanmao
Copy link
Contributor

Summary: move unused parameters to end of bucket orders when rebuild buckets for static graph

Test Plan: unit tests

Differential Revision: D28366689

@facebook-github-bot facebook-github-bot added cla signed oncall: distributed Add this issue/PR to distributed oncall triage queue labels May 11, 2021
@facebook-github-bot
Copy link
Contributor

facebook-github-bot commented May 11, 2021

💊 CI failures summary and remediations

As of commit 1baa975 (more details on the Dr. CI page):


  • 3/3 failures possibly* introduced in this PR
    • 1/3 non-scanned failure(s)

2 failures not recognized by patterns:

Job Step Action
GitHub Actions calculate-docker-image Chown workspace 🔁 rerun
GitHub Actions render_test_results Download PyTorch Test Reports 🔁 rerun

This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D28366689

@zhaojuanmao zhaojuanmao changed the title [not for review, testing only]move unused parameters to end of bucket orders when rebuild buckets for static graph move unused parameters to end of bucket orders when rebuild buckets for static graph May 18, 2021
…or static graph (pytorch#58097)

Summary:
Pull Request resolved: pytorch#58097

move unused parameters to end of bucket orders when rebuild buckets for static graph

Test Plan: unit tests

Differential Revision: D28366689

fbshipit-source-id: 76bf8938ef94e248c675a5dbfd88b9c7d640d0e0
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D28366689

Copy link
Contributor

@rohan-varma rohan-varma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

// Finally mark variable for which this function was originally called.
mark_variable_ready(index);
}
} else {
if (should_rebuild_buckets()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we just have it before mark_variable_ready to reduce the code duplication?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the codes need to be put in different places for static graph and non-static graph case. e.g., when grad hooks are triggered more than once in static graph, we want to push this parameter to 'rebuilt_bucket_indices' only once

@@ -767,6 +769,11 @@ void Reducer::mark_variable_ready(VariableIndex index) {
}
// Check that all buckets were completed and had their work kicked off.
TORCH_INTERNAL_ASSERT(next_bucket_ == buckets_.size());
if (static_graph_after_first_iteration() && should_rebuild_buckets()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Add a comment here mentioning why unused parameters need to be pushed at the end so that it's not accidentally changed by developer later on?

Also curious, are there any workflows/use cases where this could actually negatively affect performance?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should not result in negative performance, as it is not blocking other all reduces

@codecov
Copy link

codecov bot commented May 18, 2021

Codecov Report

Merging #58097 (1baa975) into master (af463d2) will increase coverage by 0.00%.
The diff coverage is 40.00%.

@@           Coverage Diff           @@
##           master   #58097   +/-   ##
=======================================
  Coverage   76.45%   76.46%           
=======================================
  Files        1992     1992           
  Lines      199913   199917    +4     
=======================================
+ Hits       152853   152859    +6     
+ Misses      47060    47058    -2     

@facebook-github-bot
Copy link
Contributor

This pull request has been merged in ea0f7c4.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla signed fb-exported Merged oncall: distributed Add this issue/PR to distributed oncall triage queue
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants