Skip dummy node creation for autograd engine when there is a single input and place on correct queue #47592

soulitzer · 2020-11-09T07:03:16Z

Removes dummy node
Places graph root on the correct queue based on input buffer's device instead of cpu queue by default

cpu - no significant change in speed (too noisy to measure), but we see up to 7% reduction in small graphs
cuda - small reduction in speed (still very noisy) and up to ~20% reduction in instruction count for small graphs

CPU
Code:

import torch
from torch.utils.benchmark import Timer

setup="""
a = torch.rand((2, 2), requires_grad=True)
b = torch.rand((2, 2), requires_grad=True)
gradient = torch.ones(2, 2)
"""

stmt="""
torch.autograd.grad(a*b, [a, b], gradient)
"""

timer = Timer(stmt, setup)

print(timer.timeit(10000))
print(timer.collect_callgrind(100))

Before (when dummy node is not skipped):

torch.autograd.grad(a*b, [a, b], gradient)
setup:
  a = torch.rand((2, 2), requires_grad=True)
  b = torch.rand((2, 2), requires_grad=True)
  gradient = torch.ones(2, 2)

  26.62 us
  1 measurement, 10000 runs , 1 thread
<torch.utils.benchmark.utils.valgrind_wrapper.timer_interface.CallgrindStats object at 0x7efee44ad8e0>
torch.autograd.grad(a*b, [a, b], gradient)
setup:
  a = torch.rand((2, 2), requires_grad=True)
  b = torch.rand((2, 2), requires_grad=True)
  gradient = torch.ones(2, 2)

                           All          Noisy symbols removed
    Instructions:      9755488                    9659378
    Baseline:             4300                       3784
100 runs per measurement, 1 thread

After

<torch.utils.benchmark.utils.common.Measurement object at 0x7f56961a7730>
torch.autograd.grad(a*b, [a, b], gradient)
setup:
  a = torch.rand((2, 2), requires_grad=True)
  b = torch.rand((2, 2), requires_grad=True)
  gradient = torch.ones(2, 2)

  26.78 us
  1 measurement, 10000 runs , 1 thread
<torch.utils.benchmark.utils.valgrind_wrapper.timer_interface.CallgrindStats object at 0x7f56961a78e0>
torch.autograd.grad(a*b, [a, b], gradient)
setup:
  a = torch.rand((2, 2), requires_grad=True)
  b = torch.rand((2, 2), requires_grad=True)
  gradient = torch.ones(2, 2)

                           All          Noisy symbols removed
    Instructions:      9045508                    8939872
    Baseline:             4280                       3784
100 runs per measurement, 1 thread

Cuda

Before

<torch.utils.benchmark.utils.common.Measurement object at 0x7f84cbaa1ee0>
torch.autograd.grad(out, [x, y], gradient)
setup:
  x = torch.rand((2,2), requires_grad=True, device="cuda")
  y = torch.rand((2,2), requires_grad=True, device="cuda")
  out = x + y
  gradient = torch.ones(2, 2).cuda()

  70.49 us
  1 measurement, 10000 runs , 1 thread
<torch.utils.benchmark.utils.valgrind_wrapper.timer_interface.CallgrindStats object at 0x7f84cbaa1e50>
torch.autograd.grad(out, [x, y], gradient)
setup:
  x = torch.rand((2,2), requires_grad=True, device="cuda")
  y = torch.rand((2,2), requires_grad=True, device="cuda")
  out = x + y
  gradient = torch.ones(2, 2).cuda()

                           All          Noisy symbols removed
    Instructions:      5054581                    4951911
    Baseline:             4105                       3735
100 runs per measurement, 1 thread

Remove dummy node only

<torch.utils.benchmark.utils.common.Measurement object at 0x7fbf29c67eb0>
torch.autograd.grad(out, [x, y], gradient)
setup:
  x = torch.rand((2,2), requires_grad=True, device="cuda")
  y = torch.rand((2,2), requires_grad=True, device="cuda")
  out = x + y
  gradient = torch.ones(2, 2).cuda()

  55.65 us
  1 measurement, 10000 runs , 1 thread
<torch.utils.benchmark.utils.valgrind_wrapper.timer_interface.CallgrindStats object at 0x7fbf29c67e20>
torch.autograd.grad(out, [x, y], gradient)
setup:
  x = torch.rand((2,2), requires_grad=True, device="cuda")
  y = torch.rand((2,2), requires_grad=True, device="cuda")
  out = x + y
  gradient = torch.ones(2, 2).cuda()

                           All          Noisy symbols removed
    Instructions:      5002105                    4900841
    Baseline:             4177                       3731
100 runs per measurement, 1 thread

Remove dummy node and put in correct queue

<torch.utils.benchmark.utils.common.Measurement object at 0x7fb64438ce80>
torch.autograd.grad(out, [x, y], gradient)
setup:
  x = torch.rand((2,2), requires_grad=True, device="cuda")
  y = torch.rand((2,2), requires_grad=True, device="cuda")
  out = x + y
  gradient = torch.ones(2, 2).cuda()

  27.56 us
  1 measurement, 10000 runs , 1 thread
<torch.utils.benchmark.utils.valgrind_wrapper.timer_interface.CallgrindStats object at 0x7fb64438cdf0>
torch.autograd.grad(out, [x, y], gradient)
setup:
  x = torch.rand((2,2), requires_grad=True, device="cuda")
  y = torch.rand((2,2), requires_grad=True, device="cuda")
  out = x + y
  gradient = torch.ones(2, 2).cuda()

                           All          Noisy symbols removed
    Instructions:      4104433                    4007555
    Baseline:             4159                       3735
100 runs per measurement, 1 thread

…y-node

dr-ci · 2020-11-09T07:13:17Z

💊 CI failures summary and remediations

As of commit 11f8e29 (more details on the Dr. CI page):

💚 💚 Looks good so far! There are no failures yet. 💚 💚

This comment was automatically generated by Dr. CI (expand for details).

Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group.

See how this bot performed.

This comment has been revised 41 times.

torch/csrc/autograd/engine.cpp

albanD

LGTM!

torch/csrc/autograd/engine.cpp

facebook-github-bot

@soulitzer has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

torch/csrc/autograd/engine.cpp

…y-node

facebook-github-bot

@soulitzer has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

codecov · 2020-11-12T19:51:43Z

Codecov Report

Merging #47592 (11f8e29) into master (b6cb2ca) will increase coverage by 0.00%.
The diff coverage is 100.00%.

@@           Coverage Diff           @@
##           master   #47592   +/-   ##
=======================================
  Coverage   81.21%   81.21%           
=======================================
  Files        1837     1837           
  Lines      198086   198095    +9     
=======================================
+ Hits       160874   160884   +10     
+ Misses      37212    37211    -1

facebook-github-bot

@soulitzer has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot · 2020-11-16T21:11:50Z

@soulitzer merged this pull request in d20483a.

soulitzer added 2 commits November 8, 2020 22:50

Skip dummy node if only single root

4d73ce4

Use move semantics

f09d945

facebook-github-bot added the cla signed label Nov 9, 2020

Merge branch 'master' of github.com:pytorch/pytorch into pr/skip-dumm…

2392588

…y-node

soulitzer marked this pull request as ready for review November 10, 2020 18:38

soulitzer requested review from albanD and apaszke as code owners November 10, 2020 18:38

albanD reviewed Nov 10, 2020

View reviewed changes

torch/csrc/autograd/engine.cpp Outdated Show resolved Hide resolved

torch/csrc/autograd/engine.cpp Outdated Show resolved Hide resolved

torch/csrc/autograd/engine.cpp Outdated Show resolved Hide resolved

torch/csrc/autograd/engine.cpp Show resolved Hide resolved

soulitzer requested a review from albanD November 10, 2020 23:47

albanD approved these changes Nov 11, 2020

View reviewed changes

torch/csrc/autograd/engine.cpp Outdated Show resolved Hide resolved

torch/csrc/autograd/engine.cpp Outdated Show resolved Hide resolved

torch/csrc/autograd/engine.cpp Outdated Show resolved Hide resolved

facebook-github-bot reviewed Nov 11, 2020

View reviewed changes

soulitzer added 5 commits November 11, 2020 15:01

Skip dummy node if only single root

2964465

Use move semantics

612e5ee

Linter

1aac0d1

Use a loop

8fbeea9

Review

79b6eac

soulitzer force-pushed the pr/skip-dummy-node branch from 75a9111 to 79b6eac Compare November 11, 2020 23:01

soulitzer added 3 commits November 11, 2020 23:39

Merge

2bd5a9e

Fix cuda test

1fa0e61

Fix

7ac3e10

albanD self-requested a review November 12, 2020 14:47

albanD reviewed Nov 12, 2020

View reviewed changes

torch/csrc/autograd/engine.cpp Outdated Show resolved Hide resolved

soulitzer added 3 commits November 12, 2020 08:19

Create var before move

9832cbe

Merge branch 'master' of github.com:pytorch/pytorch into pr/skip-dumm…

292c018

…y-node

Consistency

11f8e29

facebook-github-bot reviewed Nov 12, 2020

View reviewed changes

soulitzer changed the title ~~Skip dummy node creation for autograd engine when there is a single input~~ Skip dummy node creation for autograd engine when there is a single input and place on correct queue Nov 12, 2020

facebook-github-bot closed this in d20483a Nov 16, 2020

facebook-github-bot added the Merged label Nov 16, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Skip dummy node creation for autograd engine when there is a single input and place on correct queue #47592

Skip dummy node creation for autograd engine when there is a single input and place on correct queue #47592

soulitzer commented Nov 9, 2020 •

edited

dr-ci bot commented Nov 9, 2020 •

edited

albanD left a comment

facebook-github-bot left a comment

facebook-github-bot left a comment

codecov bot commented Nov 12, 2020

facebook-github-bot left a comment

facebook-github-bot commented Nov 16, 2020

Skip dummy node creation for autograd engine when there is a single input and place on correct queue #47592

Skip dummy node creation for autograd engine when there is a single input and place on correct queue #47592

Conversation

soulitzer commented Nov 9, 2020 • edited

dr-ci bot commented Nov 9, 2020 • edited

💊 CI failures summary and remediations

albanD left a comment

Choose a reason for hiding this comment

facebook-github-bot left a comment

Choose a reason for hiding this comment

facebook-github-bot left a comment

Choose a reason for hiding this comment

codecov bot commented Nov 12, 2020

Codecov Report

facebook-github-bot left a comment

Choose a reason for hiding this comment

facebook-github-bot commented Nov 16, 2020

soulitzer commented Nov 9, 2020 •

edited

dr-ci bot commented Nov 9, 2020 •

edited