Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rework requires_grad on DifferentiableGraphOp #57575

Closed
wants to merge 5 commits into from

Conversation

jjsjann123
Copy link
Collaborator

@jjsjann123 jjsjann123 commented May 4, 2021

Stack from ghstack:

This PR does two things:

  1. reverts "Manual revert of D27369251 (Manual revert of D27369251 #56080)" in commit
    92a09fb.

  2. fixing DifferentiableGraph output with wrong requires_grad flag

Fixing requires_grad on outputs from DifferentiableGraph, the proper flag is
retrieved from profiling information. We previously only retrieves the profiling
information on the first profile node in all its uses. However, in case where
control flows are present, we need to iteratively search for profile node with
profiling information available, in case the first use is in an inactive code
path.

e.g.

  graph(%0 : Tensor,
        %1 : Bool):
  ..., %2 : Tensor = prim::DifferentiableGraph_0(%0)
  %3 : Tensor = prim::If(%1)
    block0():
      %4 : Tensor = prim::DifferentiableGraph_1(%2)
      -> (%4)
    block1():
      %5 : Tensor = prim::DifferentiableGraph_2(%2)
      -> (%5)
  -> (%3)
with prim::DifferentiableGraph_0 = graph(%0 : Tensor):
  ...
  %out : Tensor = aten::operation(...)
  ...
  return (..., %out)
with prim::DifferentiableGraph_1 = graph(%0 : Tensor):
  %temp : Tensor = prim::profile[profiled_type=Tensor](%0)
  ...
with prim::DifferentiableGraph_2 = graph(%0 : Tensor):
  %temp : Tensor = prim::profile[profiled_type=Float(...)](%0)
  ...

Differential Revision: D29038773

@facebook-github-bot
Copy link
Contributor

facebook-github-bot commented May 4, 2021

💊 CI failures summary and remediations

As of commit 8f02ee2 (more details on the Dr. CI page):


  • 2/2 failures possibly* introduced in this PR
    • 1/2 non-scanned failure(s)

🕵️ 1 new failure recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

See CircleCI build pytorch_linux_xenial_py3_6_gcc5_4_test (1/1)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun)

Jun 10 20:23:41 [E request_callback_no_python.c...quest type 275: Unexpected end of pickler archive.
Jun 10 20:23:41   File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/dist_utils.py", line 100, in new_test_method
Jun 10 20:23:41     return_value = old_test_method(self, *arg, **kwargs)
Jun 10 20:23:41   File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/distributed/ddp_under_dist_autograd_test.py", line 654, in test_ddp_dist_autograd_local_vs_remote
Jun 10 20:23:41     args=(remote_layer1.module_rref, context_id),
Jun 10 20:23:41   File "/opt/conda/lib/python3.6/site-packages/torch/distributed/rpc/api.py", line 77, in wrapper
Jun 10 20:23:41     return func(*args, **kwargs)
Jun 10 20:23:41   File "/opt/conda/lib/python3.6/site-packages/torch/distributed/rpc/api.py", line 757, in rpc_sync
Jun 10 20:23:41     return fut.wait()
Jun 10 20:23:41 RuntimeError: RPCErr:1:RPC ran for more than set timeout (60000 ms) and will now be marked with an error
Jun 10 20:23:41  exiting process with exit code: {MultiProcessTestCase.TEST_ERROR_EXIT_CODE}
Jun 10 20:23:41 [E request_callback_no_python.cpp:552] Received error while processing request type 275: Unexpected end of pickler archive.
Jun 10 20:23:41 Exception raised from readSlowWithBuffer at /var/lib/jenkins/workspace/torch/csrc/jit/serialization/unpickler.cpp:756 (most recent call first):
Jun 10 20:23:41 frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x69 (0x7f2d1ba76ed9 in /opt/conda/lib/python3.6/site-packages/torch/lib/libc10.so)
Jun 10 20:23:41 frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, char const*) + 0xc5 (0x7f2d1ba73155 in /opt/conda/lib/python3.6/site-packages/torch/lib/libc10.so)
Jun 10 20:23:41 frame #2: <unknown function> + 0x3e4b448 (0x7f2d14347448 in /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so)
Jun 10 20:23:41 frame #3: torch::jit::Unpickler::run() + 0xdf (0x7f2d1435197f in /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so)
Jun 10 20:23:41 frame #4: torch::jit::Unpickler::parse_ivalue() + 0x2e (0x7f2d14351aee in /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so)
Jun 10 20:23:41 frame #5: torch::jit::unpickle(std::function<unsigned long (char*, unsigned long)>, std::function<c10::StrongTypePtr (c10::QualifiedName const&)>, c10::ArrayRef<at::Tensor>) + 0x25c (0x7f2d143249cc in /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so)
Jun 10 20:23:41 frame #6: torch::jit::unpickle(char const*, unsigned long, std::function<c10::StrongTypePtr (c10::QualifiedName const&)>, c10::ArrayRef<at::Tensor>) + 0xdd (0x7f2d14324edd in /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so)
Jun 10 20:23:41 frame #7: torch::distributed::autograd::CleanupAutogradContextReq::fromMessage(torch::distributed::rpc::Message const&) + 0xcb (0x7f2d145b838b in /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so)
Jun 10 20:23:41 frame #8: torch::distributed::rpc::deserializeRequest(torch::distributed::rpc::Message const&) + 0x1ed (0x7f2d145fec1d in /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so)

ci.pytorch.org: 1 failed


This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

This PR does two things:

1. reverts "Manual revert of D27369251 (#56080)" in commit
   92a09fb.

2. fixing DifferentiableGraph output with wrong requires_grad flag

Fixing requires_grad on outputs from DifferentiableGraph, the proper flag is
retrieved from profiling information. We previously only retrieves the profiling
information on the first profile node in all its uses. However, in case where
control flows are present, we need to iteratively search for profile node with
profiling information available, in case the first use is in an inactive code
path.

e.g.
```
  graph(%0 : Tensor,
        %1 : Bool):
  ..., %2 : Tensor = prim::DifferentiableGraph_0(%0)
  %3 : Tensor = prim::If(%1)
    block0():
      %4 : Tensor = prim::DifferentiableGraph_1(%2)
      -> (%4)
    block1():
      %5 : Tensor = prim::DifferentiableGraph_2(%2)
      -> (%5)
  -> (%3)
with prim::DifferentiableGraph_0 = graph(%0 : Tensor):
  ...
  %out : Tensor = aten::operation(...)
  ...
  return (..., %out)
with prim::DifferentiableGraph_1 = graph(%0 : Tensor):
  %temp : Tensor = prim::profile[profiled_type=Tensor](%0)
  ...
with prim::DifferentiableGraph_2 = graph(%0 : Tensor):
  %temp : Tensor = prim::profile[profiled_type=Float(...)](%0)
  ...
```

[ghstack-poisoned]
@jjsjann123
Copy link
Collaborator Author

This is the second part in #56466 that restores the reverted code change, combined with the fixes in #55767

jjsjann123 added a commit that referenced this pull request May 4, 2021
This PR does two things:

1. reverts "Manual revert of D27369251 (#56080)" in commit
   92a09fb.

2. fixing DifferentiableGraph output with wrong requires_grad flag

Fixing requires_grad on outputs from DifferentiableGraph, the proper flag is
retrieved from profiling information. We previously only retrieves the profiling
information on the first profile node in all its uses. However, in case where
control flows are present, we need to iteratively search for profile node with
profiling information available, in case the first use is in an inactive code
path.

e.g.
```
  graph(%0 : Tensor,
        %1 : Bool):
  ..., %2 : Tensor = prim::DifferentiableGraph_0(%0)
  %3 : Tensor = prim::If(%1)
    block0():
      %4 : Tensor = prim::DifferentiableGraph_1(%2)
      -> (%4)
    block1():
      %5 : Tensor = prim::DifferentiableGraph_2(%2)
      -> (%5)
  -> (%3)
with prim::DifferentiableGraph_0 = graph(%0 : Tensor):
  ...
  %out : Tensor = aten::operation(...)
  ...
  return (..., %out)
with prim::DifferentiableGraph_1 = graph(%0 : Tensor):
  %temp : Tensor = prim::profile[profiled_type=Tensor](%0)
  ...
with prim::DifferentiableGraph_2 = graph(%0 : Tensor):
  %temp : Tensor = prim::profile[profiled_type=Float(...)](%0)
  ...
```

ghstack-source-id: e4ee036f68b63966ea1566dab6377e4413c26679
Pull Request resolved: #57575
This PR does two things:

1. reverts "Manual revert of D27369251 (#56080)" in commit
   92a09fb.

2. fixing DifferentiableGraph output with wrong requires_grad flag

Fixing requires_grad on outputs from DifferentiableGraph, the proper flag is
retrieved from profiling information. We previously only retrieves the profiling
information on the first profile node in all its uses. However, in case where
control flows are present, we need to iteratively search for profile node with
profiling information available, in case the first use is in an inactive code
path.

e.g.
```
  graph(%0 : Tensor,
        %1 : Bool):
  ..., %2 : Tensor = prim::DifferentiableGraph_0(%0)
  %3 : Tensor = prim::If(%1)
    block0():
      %4 : Tensor = prim::DifferentiableGraph_1(%2)
      -> (%4)
    block1():
      %5 : Tensor = prim::DifferentiableGraph_2(%2)
      -> (%5)
  -> (%3)
with prim::DifferentiableGraph_0 = graph(%0 : Tensor):
  ...
  %out : Tensor = aten::operation(...)
  ...
  return (..., %out)
with prim::DifferentiableGraph_1 = graph(%0 : Tensor):
  %temp : Tensor = prim::profile[profiled_type=Tensor](%0)
  ...
with prim::DifferentiableGraph_2 = graph(%0 : Tensor):
  %temp : Tensor = prim::profile[profiled_type=Float(...)](%0)
  ...
```

[ghstack-poisoned]
@Krovatkin Krovatkin self-requested a review May 4, 2021 23:44
@Krovatkin
Copy link
Contributor

@jjsjann123 let me do some extra testing against this stack internally. Fingers crossed.

jjsjann123 added a commit that referenced this pull request May 13, 2021
This PR does two things:

1. reverts "Manual revert of D27369251 (#56080)" in commit
   92a09fb.

2. fixing DifferentiableGraph output with wrong requires_grad flag

Fixing requires_grad on outputs from DifferentiableGraph, the proper flag is
retrieved from profiling information. We previously only retrieves the profiling
information on the first profile node in all its uses. However, in case where
control flows are present, we need to iteratively search for profile node with
profiling information available, in case the first use is in an inactive code
path.

e.g.
```
  graph(%0 : Tensor,
        %1 : Bool):
  ..., %2 : Tensor = prim::DifferentiableGraph_0(%0)
  %3 : Tensor = prim::If(%1)
    block0():
      %4 : Tensor = prim::DifferentiableGraph_1(%2)
      -> (%4)
    block1():
      %5 : Tensor = prim::DifferentiableGraph_2(%2)
      -> (%5)
  -> (%3)
with prim::DifferentiableGraph_0 = graph(%0 : Tensor):
  ...
  %out : Tensor = aten::operation(...)
  ...
  return (..., %out)
with prim::DifferentiableGraph_1 = graph(%0 : Tensor):
  %temp : Tensor = prim::profile[profiled_type=Tensor](%0)
  ...
with prim::DifferentiableGraph_2 = graph(%0 : Tensor):
  %temp : Tensor = prim::profile[profiled_type=Float(...)](%0)
  ...
```

ghstack-source-id: 781a52aee2e152a59a031e8aec687c78416b7acc
Pull Request resolved: #57575
This PR does two things:

1. reverts "Manual revert of D27369251 (#56080)" in commit
   92a09fb.

2. fixing DifferentiableGraph output with wrong requires_grad flag

Fixing requires_grad on outputs from DifferentiableGraph, the proper flag is
retrieved from profiling information. We previously only retrieves the profiling
information on the first profile node in all its uses. However, in case where
control flows are present, we need to iteratively search for profile node with
profiling information available, in case the first use is in an inactive code
path.

e.g.
```
  graph(%0 : Tensor,
        %1 : Bool):
  ..., %2 : Tensor = prim::DifferentiableGraph_0(%0)
  %3 : Tensor = prim::If(%1)
    block0():
      %4 : Tensor = prim::DifferentiableGraph_1(%2)
      -> (%4)
    block1():
      %5 : Tensor = prim::DifferentiableGraph_2(%2)
      -> (%5)
  -> (%3)
with prim::DifferentiableGraph_0 = graph(%0 : Tensor):
  ...
  %out : Tensor = aten::operation(...)
  ...
  return (..., %out)
with prim::DifferentiableGraph_1 = graph(%0 : Tensor):
  %temp : Tensor = prim::profile[profiled_type=Tensor](%0)
  ...
with prim::DifferentiableGraph_2 = graph(%0 : Tensor):
  %temp : Tensor = prim::profile[profiled_type=Float(...)](%0)
  ...
```

[ghstack-poisoned]
Copy link
Contributor

@Krovatkin Krovatkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@Krovatkin
Copy link
Contributor

@jjsjann123 could you please rebase?

This PR does two things:

1. reverts "Manual revert of D27369251 (#56080)" in commit
   92a09fb.

2. fixing DifferentiableGraph output with wrong requires_grad flag

Fixing requires_grad on outputs from DifferentiableGraph, the proper flag is
retrieved from profiling information. We previously only retrieves the profiling
information on the first profile node in all its uses. However, in case where
control flows are present, we need to iteratively search for profile node with
profiling information available, in case the first use is in an inactive code
path.

e.g.
```
  graph(%0 : Tensor,
        %1 : Bool):
  ..., %2 : Tensor = prim::DifferentiableGraph_0(%0)
  %3 : Tensor = prim::If(%1)
    block0():
      %4 : Tensor = prim::DifferentiableGraph_1(%2)
      -> (%4)
    block1():
      %5 : Tensor = prim::DifferentiableGraph_2(%2)
      -> (%5)
  -> (%3)
with prim::DifferentiableGraph_0 = graph(%0 : Tensor):
  ...
  %out : Tensor = aten::operation(...)
  ...
  return (..., %out)
with prim::DifferentiableGraph_1 = graph(%0 : Tensor):
  %temp : Tensor = prim::profile[profiled_type=Tensor](%0)
  ...
with prim::DifferentiableGraph_2 = graph(%0 : Tensor):
  %temp : Tensor = prim::profile[profiled_type=Float(...)](%0)
  ...
```

[ghstack-poisoned]
jjsjann123 added a commit that referenced this pull request Jun 10, 2021
This PR does two things:

1. reverts "Manual revert of D27369251 (#56080)" in commit
   92a09fb.

2. fixing DifferentiableGraph output with wrong requires_grad flag

Fixing requires_grad on outputs from DifferentiableGraph, the proper flag is
retrieved from profiling information. We previously only retrieves the profiling
information on the first profile node in all its uses. However, in case where
control flows are present, we need to iteratively search for profile node with
profiling information available, in case the first use is in an inactive code
path.

e.g.
```
  graph(%0 : Tensor,
        %1 : Bool):
  ..., %2 : Tensor = prim::DifferentiableGraph_0(%0)
  %3 : Tensor = prim::If(%1)
    block0():
      %4 : Tensor = prim::DifferentiableGraph_1(%2)
      -> (%4)
    block1():
      %5 : Tensor = prim::DifferentiableGraph_2(%2)
      -> (%5)
  -> (%3)
with prim::DifferentiableGraph_0 = graph(%0 : Tensor):
  ...
  %out : Tensor = aten::operation(...)
  ...
  return (..., %out)
with prim::DifferentiableGraph_1 = graph(%0 : Tensor):
  %temp : Tensor = prim::profile[profiled_type=Tensor](%0)
  ...
with prim::DifferentiableGraph_2 = graph(%0 : Tensor):
  %temp : Tensor = prim::profile[profiled_type=Float(...)](%0)
  ...
```

ghstack-source-id: 7d493ce5cf179a04b7a05723e2a18117e29313f8
Pull Request resolved: #57575
This PR does two things:

1. reverts "Manual revert of D27369251 (#56080)" in commit
   92a09fb.

2. fixing DifferentiableGraph output with wrong requires_grad flag

Fixing requires_grad on outputs from DifferentiableGraph, the proper flag is
retrieved from profiling information. We previously only retrieves the profiling
information on the first profile node in all its uses. However, in case where
control flows are present, we need to iteratively search for profile node with
profiling information available, in case the first use is in an inactive code
path.

e.g.
```
  graph(%0 : Tensor,
        %1 : Bool):
  ..., %2 : Tensor = prim::DifferentiableGraph_0(%0)
  %3 : Tensor = prim::If(%1)
    block0():
      %4 : Tensor = prim::DifferentiableGraph_1(%2)
      -> (%4)
    block1():
      %5 : Tensor = prim::DifferentiableGraph_2(%2)
      -> (%5)
  -> (%3)
with prim::DifferentiableGraph_0 = graph(%0 : Tensor):
  ...
  %out : Tensor = aten::operation(...)
  ...
  return (..., %out)
with prim::DifferentiableGraph_1 = graph(%0 : Tensor):
  %temp : Tensor = prim::profile[profiled_type=Tensor](%0)
  ...
with prim::DifferentiableGraph_2 = graph(%0 : Tensor):
  %temp : Tensor = prim::profile[profiled_type=Float(...)](%0)
  ...
```

[ghstack-poisoned]
@Krovatkin
Copy link
Contributor

@Krovatkin has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@Krovatkin merged this pull request in 9ad0de3.

@facebook-github-bot facebook-github-bot deleted the gh/jjsjann123/6/head branch June 18, 2021 14:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla signed Merged oncall: jit Add this issue/PR to JIT oncall triage queue open source
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants