Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't use RTLD_GLOBAL to load _C. #31162

Closed
wants to merge 26 commits into from
Closed

Conversation

ezyang
Copy link
Contributor

@ezyang ezyang commented Dec 12, 2019

Stack from ghstack:

This should help us resolve a multitude of weird segfaults and crashes
when PyTorch is imported along with other packages. Those would often
happen because libtorch symbols were exposed globally and could be used
as a source of relocations in shared libraries loaded after libtorch.

Based off of apaszke's original work at #28536

Fixes #3059.

Some of the subtleties in preparing this patch:

  • Getting ASAN to play ball was a pain in the ass. The basic problem is that when we load with RTLD_LOCAL, we now may load a library multiple times into the address space; this happens when we have custom C++ extensions. Since the libraries are usually identical, this is usually benign, but it is technically undefined behavior and UBSAN hates it. I sprayed a few ways of getting things to "work" correctly: I preload libstdc++ (so that it is seen consistently over all library loads) and added turned off vptr checks entirely. Another possibility is we should have a mode where we use RTLD_GLOBAL to load _C, which would be acceptable in environments where you're sure C++ lines up correctly. There's a long comment in the test script going into more detail about this.
  • Making some of our shared library dependencies load with RTLD_LOCAL breaks them. OpenMPI and MKL don't work; they play linker shenanigans to look up their symbols which doesn't work when loaded locally, and if we load a library with RLTD_LOCAL we aren't able to subsequently see it with ctypes. To solve this problem, we employ a clever device invented by apaszke: we create a dummy library torch_global_deps with dependencies on all of the libraries which need to be loaded globally, and then load that with RTLD_GLOBAL. As long as none of these libraries have C++ symbols, we can avoid confusion about C++ standard library.

Signed-off-by: Edward Z. Yang ezyang@fb.com

Differential Revision: D19262579

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

[ghstack-poisoned]
@kostmo
Copy link
Member

kostmo commented Dec 12, 2019

💊 CircleCI build failures summary and remediations

As of commit 2771d72:

None of the build failures appear to be your fault.

  • 1/1 broken upstream at merge base 3c7db5c since Jan 07

    You may want to rebase on the viable/strict branch (expand for instructions)

    If your commit is newer than viable/strict, you can try basing on an older, stable commit:

    git fetch origin viable/strict
    git rebase --onto viable/strict $(git merge-base origin/master HEAD)
    

    If your commit is older than viable/strict:

    git fetch origin viable/strict
    git rebase viable/strict
    

    Check out the recency history of this "viable master" tracking branch.

Detailed failure analysis

One may explore the probable reasons each build failed interactively on the Dr. CI website.

1 failure not recognized by patterns:

Job Step Status
CircleCI binary_macos_libtorch_2_7_cpu_build Build 🛑 Broken upstream

This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker.

This comment has been revised 77 times.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

[ghstack-poisoned]
Signed-off-by: Edward Z. Yang <ezyang@fb.com>

[ghstack-poisoned]
Signed-off-by: Edward Z. Yang <ezyang@fb.com>

[ghstack-poisoned]
ezyang added a commit that referenced this pull request Dec 12, 2019
Signed-off-by: Edward Z. Yang <ezyang@fb.com>

ghstack-source-id: 8f6e17418ca8503cb17b40ee5f5212221bacf2ff
Pull Request resolved: #31162
Signed-off-by: Edward Z. Yang <ezyang@fb.com>

[ghstack-poisoned]
ezyang added a commit that referenced this pull request Dec 13, 2019
Signed-off-by: Edward Z. Yang <ezyang@fb.com>

ghstack-source-id: 764d6138f7e3d113677e65cc06b6c103e3ec1c55
Pull Request resolved: #31162
Signed-off-by: Edward Z. Yang <ezyang@fb.com>

[ghstack-poisoned]
Signed-off-by: Edward Z. Yang <ezyang@fb.com>

[ghstack-poisoned]
Signed-off-by: Edward Z. Yang <ezyang@fb.com>

[ghstack-poisoned]
Signed-off-by: Edward Z. Yang <ezyang@fb.com>

[ghstack-poisoned]
ezyang added a commit that referenced this pull request Dec 13, 2019
Signed-off-by: Edward Z. Yang <ezyang@fb.com>

ghstack-source-id: 948c142f04846261b8840060d9522112a45e8156
Pull Request resolved: #31162
# especially applies to type info, which is almost always weak. This
# has implications for RTTI (which UBSAN is rightly flagging won't
# work), but in our codebase, we don't use RTTI (because it doesn't
# work in mobile). However, UBSAN relies on UBSAN to detect vptr
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: UBSAN relies on UBSAN ?

This should help us resolve a multitude of weird segfaults and crashes
when PyTorch is imported along with other packages. Those would often
happen because libtorch symbols were exposed globally and could be used
as a source of relocations in shared libraries loaded after libtorch.

Fixes #3059.

Some of the subtleties in preparing this patch:

* Getting ASAN to play ball was a pain in the ass. The basic problem is that when we load with `RTLD_LOCAL`, we now may load a library multiple times into the address space; this happens when we have custom C++ extensions. Since the libraries are usually identical, this is usually benign, but it is technically undefined behavior and UBSAN hates it. I sprayed a few ways of getting things to "work" correctly: I preload libstdc++ (so that it is seen consistently over all library loads) and added turned off vptr checks entirely. Another possibility is we should have a mode where we use RTLD_GLOBAL to load _C, which would be acceptable in environments where you're sure C++ lines up correctly. There's a long comment in the test script going into more detail about this.
* Making some of our shared library dependencies load with `RTLD_LOCAL` breaks them. OpenMPI and MKL don't work; they play linker shenanigans to look up their symbols which doesn't work when loaded locally, and if we load a library with `RLTD_LOCAL` we aren't able to subsequently see it with `ctypes`. To solve this problem, we employ a clever device invented by apaszke: we create a dummy library `torch_global_deps` with dependencies on all of the libraries which need to be loaded globally, and then load that with `RTLD_GLOBAL`. As long as none of these libraries have C++ symbols, we can avoid confusion about C++ standard library.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Differential Revision: [D19262579](https://our.internmc.facebook.com/intern/diff/D19262579)

[ghstack-poisoned]
This should help us resolve a multitude of weird segfaults and crashes
when PyTorch is imported along with other packages. Those would often
happen because libtorch symbols were exposed globally and could be used
as a source of relocations in shared libraries loaded after libtorch.

Fixes #3059.

Some of the subtleties in preparing this patch:

* Getting ASAN to play ball was a pain in the ass. The basic problem is that when we load with `RTLD_LOCAL`, we now may load a library multiple times into the address space; this happens when we have custom C++ extensions. Since the libraries are usually identical, this is usually benign, but it is technically undefined behavior and UBSAN hates it. I sprayed a few ways of getting things to "work" correctly: I preload libstdc++ (so that it is seen consistently over all library loads) and added turned off vptr checks entirely. Another possibility is we should have a mode where we use RTLD_GLOBAL to load _C, which would be acceptable in environments where you're sure C++ lines up correctly. There's a long comment in the test script going into more detail about this.
* Making some of our shared library dependencies load with `RTLD_LOCAL` breaks them. OpenMPI and MKL don't work; they play linker shenanigans to look up their symbols which doesn't work when loaded locally, and if we load a library with `RLTD_LOCAL` we aren't able to subsequently see it with `ctypes`. To solve this problem, we employ a clever device invented by apaszke: we create a dummy library `torch_global_deps` with dependencies on all of the libraries which need to be loaded globally, and then load that with `RTLD_GLOBAL`. As long as none of these libraries have C++ symbols, we can avoid confusion about C++ standard library.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Differential Revision: [D19262579](https://our.internmc.facebook.com/intern/diff/D19262579)

[ghstack-poisoned]
This should help us resolve a multitude of weird segfaults and crashes
when PyTorch is imported along with other packages. Those would often
happen because libtorch symbols were exposed globally and could be used
as a source of relocations in shared libraries loaded after libtorch.

Fixes #3059.

Some of the subtleties in preparing this patch:

* Getting ASAN to play ball was a pain in the ass. The basic problem is that when we load with `RTLD_LOCAL`, we now may load a library multiple times into the address space; this happens when we have custom C++ extensions. Since the libraries are usually identical, this is usually benign, but it is technically undefined behavior and UBSAN hates it. I sprayed a few ways of getting things to "work" correctly: I preload libstdc++ (so that it is seen consistently over all library loads) and added turned off vptr checks entirely. Another possibility is we should have a mode where we use RTLD_GLOBAL to load _C, which would be acceptable in environments where you're sure C++ lines up correctly. There's a long comment in the test script going into more detail about this.
* Making some of our shared library dependencies load with `RTLD_LOCAL` breaks them. OpenMPI and MKL don't work; they play linker shenanigans to look up their symbols which doesn't work when loaded locally, and if we load a library with `RLTD_LOCAL` we aren't able to subsequently see it with `ctypes`. To solve this problem, we employ a clever device invented by apaszke: we create a dummy library `torch_global_deps` with dependencies on all of the libraries which need to be loaded globally, and then load that with `RTLD_GLOBAL`. As long as none of these libraries have C++ symbols, we can avoid confusion about C++ standard library.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Differential Revision: [D19262579](https://our.internmc.facebook.com/intern/diff/D19262579)

[ghstack-poisoned]
ezyang added a commit that referenced this pull request Jan 7, 2020
Pull Request resolved: #31162


This should help us resolve a multitude of weird segfaults and crashes
when PyTorch is imported along with other packages. Those would often
happen because libtorch symbols were exposed globally and could be used
as a source of relocations in shared libraries loaded after libtorch.

Fixes #3059.

Some of the subtleties in preparing this patch:

* Getting ASAN to play ball was a pain in the ass. The basic problem is that when we load with `RTLD_LOCAL`, we now may load a library multiple times into the address space; this happens when we have custom C++ extensions. Since the libraries are usually identical, this is usually benign, but it is technically undefined behavior and UBSAN hates it. I sprayed a few ways of getting things to "work" correctly: I preload libstdc++ (so that it is seen consistently over all library loads) and added turned off vptr checks entirely. Another possibility is we should have a mode where we use RTLD_GLOBAL to load _C, which would be acceptable in environments where you're sure C++ lines up correctly. There's a long comment in the test script going into more detail about this.
* Making some of our shared library dependencies load with `RTLD_LOCAL` breaks them. OpenMPI and MKL don't work; they play linker shenanigans to look up their symbols which doesn't work when loaded locally, and if we load a library with `RLTD_LOCAL` we aren't able to subsequently see it with `ctypes`. To solve this problem, we employ a clever device invented by apaszke: we create a dummy library `torch_global_deps` with dependencies on all of the libraries which need to be loaded globally, and then load that with `RTLD_GLOBAL`. As long as none of these libraries have C++ symbols, we can avoid confusion about C++ standard library.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Differential Revision: [D19262579](https://our.internmc.facebook.com/intern/diff/D19262579/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D19262579/)!
ghstack-source-id: 96370605
torch/__init__.py Outdated Show resolved Hide resolved

# See Note [Global dependencies]
def _load_global_deps():
if platform.system() == 'Windows':
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about defining a global variable IS_WINDOWS since it's used multiple times in this file?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll do this in a follow up

This should help us resolve a multitude of weird segfaults and crashes
when PyTorch is imported along with other packages. Those would often
happen because libtorch symbols were exposed globally and could be used
as a source of relocations in shared libraries loaded after libtorch.

Fixes #3059.

Some of the subtleties in preparing this patch:

* Getting ASAN to play ball was a pain in the ass. The basic problem is that when we load with `RTLD_LOCAL`, we now may load a library multiple times into the address space; this happens when we have custom C++ extensions. Since the libraries are usually identical, this is usually benign, but it is technically undefined behavior and UBSAN hates it. I sprayed a few ways of getting things to "work" correctly: I preload libstdc++ (so that it is seen consistently over all library loads) and added turned off vptr checks entirely. Another possibility is we should have a mode where we use RTLD_GLOBAL to load _C, which would be acceptable in environments where you're sure C++ lines up correctly. There's a long comment in the test script going into more detail about this.
* Making some of our shared library dependencies load with `RTLD_LOCAL` breaks them. OpenMPI and MKL don't work; they play linker shenanigans to look up their symbols which doesn't work when loaded locally, and if we load a library with `RLTD_LOCAL` we aren't able to subsequently see it with `ctypes`. To solve this problem, we employ a clever device invented by apaszke: we create a dummy library `torch_global_deps` with dependencies on all of the libraries which need to be loaded globally, and then load that with `RTLD_GLOBAL`. As long as none of these libraries have C++ symbols, we can avoid confusion about C++ standard library.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Differential Revision: [D19262579](https://our.internmc.facebook.com/intern/diff/D19262579)

[ghstack-poisoned]
ezyang added a commit that referenced this pull request Jan 8, 2020
Pull Request resolved: #31162

This should help us resolve a multitude of weird segfaults and crashes
when PyTorch is imported along with other packages. Those would often
happen because libtorch symbols were exposed globally and could be used
as a source of relocations in shared libraries loaded after libtorch.

Fixes #3059.

Some of the subtleties in preparing this patch:

* Getting ASAN to play ball was a pain in the ass. The basic problem is that when we load with `RTLD_LOCAL`, we now may load a library multiple times into the address space; this happens when we have custom C++ extensions. Since the libraries are usually identical, this is usually benign, but it is technically undefined behavior and UBSAN hates it. I sprayed a few ways of getting things to "work" correctly: I preload libstdc++ (so that it is seen consistently over all library loads) and added turned off vptr checks entirely. Another possibility is we should have a mode where we use RTLD_GLOBAL to load _C, which would be acceptable in environments where you're sure C++ lines up correctly. There's a long comment in the test script going into more detail about this.
* Making some of our shared library dependencies load with `RTLD_LOCAL` breaks them. OpenMPI and MKL don't work; they play linker shenanigans to look up their symbols which doesn't work when loaded locally, and if we load a library with `RLTD_LOCAL` we aren't able to subsequently see it with `ctypes`. To solve this problem, we employ a clever device invented by apaszke: we create a dummy library `torch_global_deps` with dependencies on all of the libraries which need to be loaded globally, and then load that with `RTLD_GLOBAL`. As long as none of these libraries have C++ symbols, we can avoid confusion about C++ standard library.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Differential Revision: [D19262579](https://our.internmc.facebook.com/intern/diff/D19262579/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D19262579/)!
ghstack-source-id: 0f4abce4f757daadff64a567465c0cd9bdce83a3
This should help us resolve a multitude of weird segfaults and crashes
when PyTorch is imported along with other packages. Those would often
happen because libtorch symbols were exposed globally and could be used
as a source of relocations in shared libraries loaded after libtorch.

Fixes #3059.

Some of the subtleties in preparing this patch:

* Getting ASAN to play ball was a pain in the ass. The basic problem is that when we load with `RTLD_LOCAL`, we now may load a library multiple times into the address space; this happens when we have custom C++ extensions. Since the libraries are usually identical, this is usually benign, but it is technically undefined behavior and UBSAN hates it. I sprayed a few ways of getting things to "work" correctly: I preload libstdc++ (so that it is seen consistently over all library loads) and added turned off vptr checks entirely. Another possibility is we should have a mode where we use RTLD_GLOBAL to load _C, which would be acceptable in environments where you're sure C++ lines up correctly. There's a long comment in the test script going into more detail about this.
* Making some of our shared library dependencies load with `RTLD_LOCAL` breaks them. OpenMPI and MKL don't work; they play linker shenanigans to look up their symbols which doesn't work when loaded locally, and if we load a library with `RLTD_LOCAL` we aren't able to subsequently see it with `ctypes`. To solve this problem, we employ a clever device invented by apaszke: we create a dummy library `torch_global_deps` with dependencies on all of the libraries which need to be loaded globally, and then load that with `RTLD_GLOBAL`. As long as none of these libraries have C++ symbols, we can avoid confusion about C++ standard library.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Differential Revision: [D19262579](https://our.internmc.facebook.com/intern/diff/D19262579)

[ghstack-poisoned]
ezyang added a commit that referenced this pull request Jan 8, 2020
Pull Request resolved: #31162

This should help us resolve a multitude of weird segfaults and crashes
when PyTorch is imported along with other packages. Those would often
happen because libtorch symbols were exposed globally and could be used
as a source of relocations in shared libraries loaded after libtorch.

Fixes #3059.

Some of the subtleties in preparing this patch:

* Getting ASAN to play ball was a pain in the ass. The basic problem is that when we load with `RTLD_LOCAL`, we now may load a library multiple times into the address space; this happens when we have custom C++ extensions. Since the libraries are usually identical, this is usually benign, but it is technically undefined behavior and UBSAN hates it. I sprayed a few ways of getting things to "work" correctly: I preload libstdc++ (so that it is seen consistently over all library loads) and added turned off vptr checks entirely. Another possibility is we should have a mode where we use RTLD_GLOBAL to load _C, which would be acceptable in environments where you're sure C++ lines up correctly. There's a long comment in the test script going into more detail about this.
* Making some of our shared library dependencies load with `RTLD_LOCAL` breaks them. OpenMPI and MKL don't work; they play linker shenanigans to look up their symbols which doesn't work when loaded locally, and if we load a library with `RLTD_LOCAL` we aren't able to subsequently see it with `ctypes`. To solve this problem, we employ a clever device invented by apaszke: we create a dummy library `torch_global_deps` with dependencies on all of the libraries which need to be loaded globally, and then load that with `RTLD_GLOBAL`. As long as none of these libraries have C++ symbols, we can avoid confusion about C++ standard library.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Differential Revision: [D19262579](https://our.internmc.facebook.com/intern/diff/D19262579/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D19262579/)!
ghstack-source-id: c3991186af0f3ceca879e6d57b8c13431694b792
This should help us resolve a multitude of weird segfaults and crashes
when PyTorch is imported along with other packages. Those would often
happen because libtorch symbols were exposed globally and could be used
as a source of relocations in shared libraries loaded after libtorch.

Fixes #3059.

Some of the subtleties in preparing this patch:

* Getting ASAN to play ball was a pain in the ass. The basic problem is that when we load with `RTLD_LOCAL`, we now may load a library multiple times into the address space; this happens when we have custom C++ extensions. Since the libraries are usually identical, this is usually benign, but it is technically undefined behavior and UBSAN hates it. I sprayed a few ways of getting things to "work" correctly: I preload libstdc++ (so that it is seen consistently over all library loads) and added turned off vptr checks entirely. Another possibility is we should have a mode where we use RTLD_GLOBAL to load _C, which would be acceptable in environments where you're sure C++ lines up correctly. There's a long comment in the test script going into more detail about this.
* Making some of our shared library dependencies load with `RTLD_LOCAL` breaks them. OpenMPI and MKL don't work; they play linker shenanigans to look up their symbols which doesn't work when loaded locally, and if we load a library with `RLTD_LOCAL` we aren't able to subsequently see it with `ctypes`. To solve this problem, we employ a clever device invented by apaszke: we create a dummy library `torch_global_deps` with dependencies on all of the libraries which need to be loaded globally, and then load that with `RTLD_GLOBAL`. As long as none of these libraries have C++ symbols, we can avoid confusion about C++ standard library.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Differential Revision: [D19262579](https://our.internmc.facebook.com/intern/diff/D19262579)

[ghstack-poisoned]
ezyang added a commit that referenced this pull request Jan 8, 2020
Pull Request resolved: #31162

This should help us resolve a multitude of weird segfaults and crashes
when PyTorch is imported along with other packages. Those would often
happen because libtorch symbols were exposed globally and could be used
as a source of relocations in shared libraries loaded after libtorch.

Fixes #3059.

Some of the subtleties in preparing this patch:

* Getting ASAN to play ball was a pain in the ass. The basic problem is that when we load with `RTLD_LOCAL`, we now may load a library multiple times into the address space; this happens when we have custom C++ extensions. Since the libraries are usually identical, this is usually benign, but it is technically undefined behavior and UBSAN hates it. I sprayed a few ways of getting things to "work" correctly: I preload libstdc++ (so that it is seen consistently over all library loads) and added turned off vptr checks entirely. Another possibility is we should have a mode where we use RTLD_GLOBAL to load _C, which would be acceptable in environments where you're sure C++ lines up correctly. There's a long comment in the test script going into more detail about this.
* Making some of our shared library dependencies load with `RTLD_LOCAL` breaks them. OpenMPI and MKL don't work; they play linker shenanigans to look up their symbols which doesn't work when loaded locally, and if we load a library with `RLTD_LOCAL` we aren't able to subsequently see it with `ctypes`. To solve this problem, we employ a clever device invented by apaszke: we create a dummy library `torch_global_deps` with dependencies on all of the libraries which need to be loaded globally, and then load that with `RTLD_GLOBAL`. As long as none of these libraries have C++ symbols, we can avoid confusion about C++ standard library.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Differential Revision: [D19262579](https://our.internmc.facebook.com/intern/diff/D19262579/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D19262579/)!
ghstack-source-id: d112b7f68f612d8ada03ab99a7669333934987d1
@facebook-github-bot
Copy link
Contributor

@ezyang merged this pull request in ddff4ef.

@facebook-github-bot facebook-github-bot deleted the gh/ezyang/580/head branch January 13, 2020 15:39
facebook-github-bot pushed a commit that referenced this pull request Jan 22, 2020
Summary:
Fixes #31181 and #31162 (comment).
Pull Request resolved: #32215

Differential Revision: D19501869

Pulled By: ezyang

fbshipit-source-id: 363824e52d2592ad968ecf1df345aa4c0daff915
wuhuikx pushed a commit to wuhuikx/pytorch that referenced this pull request Jan 30, 2020
Summary:
Pull Request resolved: pytorch#31162

This should help us resolve a multitude of weird segfaults and crashes
when PyTorch is imported along with other packages. Those would often
happen because libtorch symbols were exposed globally and could be used
as a source of relocations in shared libraries loaded after libtorch.

Fixes pytorch#3059.

Some of the subtleties in preparing this patch:

* Getting ASAN to play ball was a pain in the ass. The basic problem is that when we load with `RTLD_LOCAL`, we now may load a library multiple times into the address space; this happens when we have custom C++ extensions. Since the libraries are usually identical, this is usually benign, but it is technically undefined behavior and UBSAN hates it. I sprayed a few ways of getting things to "work" correctly: I preload libstdc++ (so that it is seen consistently over all library loads) and added turned off vptr checks entirely. Another possibility is we should have a mode where we use RTLD_GLOBAL to load _C, which would be acceptable in environments where you're sure C++ lines up correctly. There's a long comment in the test script going into more detail about this.
* Making some of our shared library dependencies load with `RTLD_LOCAL` breaks them. OpenMPI and MKL don't work; they play linker shenanigans to look up their symbols which doesn't work when loaded locally, and if we load a library with `RLTD_LOCAL` we aren't able to subsequently see it with `ctypes`. To solve this problem, we employ a clever device invented by apaszke: we create a dummy library `torch_global_deps` with dependencies on all of the libraries which need to be loaded globally, and then load that with `RTLD_GLOBAL`. As long as none of these libraries have C++ symbols, we can avoid confusion about C++ standard library.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Differential Revision: D19262579

Test Plan: Imported from OSS

Pulled By: ezyang

fbshipit-source-id: 06a48a5d2c9036aacd535f7e8a4de0e8fe1639f2
wuhuikx pushed a commit to wuhuikx/pytorch that referenced this pull request Jan 30, 2020
Summary:
Fixes pytorch#31181 and pytorch#31162 (comment).
Pull Request resolved: pytorch#32215

Differential Revision: D19501869

Pulled By: ezyang

fbshipit-source-id: 363824e52d2592ad968ecf1df345aa4c0daff915
lly-zero-one pushed a commit to lly-zero-one/pytorch that referenced this pull request Feb 8, 2020
Summary:
This is another implementation of the maximum bailout depth.
The first version was implemented in https://github.com/pytorch/pytorch/pull/31521
This one has advantages that
* the bailout depth only exists in `CodeImpl` which seems to be an appropriate place to keep it in.
* threading many objects is reduced to threading through CodeImpl and getPlanFor
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32073

Differential Revision: D19443432

Pulled By: Krovatkin

fbshipit-source-id: 898384bb2308a1532a50a33d9e05cfca504711e6

use gtest asserts in ProcessGroupGlooTest instead of other checks (#32138)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32138

I personally prefer `throw std::runtime_error("BOOM")`, but we should
probably have asserts here now that it is gtest. Also ensures that the correct
exceptions are thrown by the `testSignal` tests.
ghstack-source-id: 96811000

Differential Revision: D19382905

fbshipit-source-id: 1b00dd70524d03c8bd6f48715baa5070a7985467

Don't dispatch to integral types in smooth_l1_kernel

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32333

Differential Revision: D19442787

Pulled By: ngimel

fbshipit-source-id: 9578483202614d7406eceb13cbf15b253c04f237

Added cummin

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32238

Differential Revision: D19416791

Pulled By: anjali411

fbshipit-source-id: 5aadc0a7a55af40d76f444ab7d7d47ec822f55a5

Use default scale/zero_point in fake_quantize module instead of None (#32318)

Summary:
Distributed data parallel can not broadcast None so when we prepare the model for QAT and trying to save the model it will error out.
fixes: https://github.com/pytorch/pytorch/issues/32082
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32318

Differential Revision: D19434801

Pulled By: jerryzh168

fbshipit-source-id: ee70abe4c3dcdd3506fb7dd0316aee2fb1705469

Delete unused bernoulli_Tensor from THTensorRandom.h

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32328

Test Plan: Imported from OSS

Differential Revision: D19448736

Pulled By: pbelevich

fbshipit-source-id: 92380ca1e0c0ac88d100e6fba8d216a46d0b181e

Add a new job to support custom build (#32323)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32323

Since we have released the custom build in 1.4.0, it's time to setup a CI for that. This PR adds a new iOS job to the iOS builds. To save time, It only runs the arm64 build.

- Don't break any iOS jobs
- Custom Build works.

Test Plan: Imported from OSS

Differential Revision: D19451342

Pulled By: xta0

fbshipit-source-id: 9de305c004fc795710ecf01d436ef4792c07760c

Add 64bit atomic fetch add (#32354)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32354

adding int_64 version of AtomicFetchAdd

Reviewed By: bwasti

Differential Revision: D19434349

fbshipit-source-id: b2358e8c5c6b7cd7e7b21de974b4ee1b5258fcf4

Fix ASAN / potential segfault in quantized Tensor memory allocations.

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29882

Differential Revision: D18522039

Pulled By: AshkanAliabadi

fbshipit-source-id: 1fdc68491aa2ac176633b9ecc3ee78c9175a97aa

C++ C2/Glow operator unittest

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32258

Test Plan:
```
 buck test glow/fb/test/numerics:fp16_op_test
```

Reviewed By: bddppq

Differential Revision: D19401786

fbshipit-source-id: 1382b5208be6172d3e6f768dedad7ebec31cffc9

fix unchecked cast alias analysis (#32309)

Summary:
Unchecked cast just refines the type of a value, the value stays the same, so the output should alias the input.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32309

Differential Revision: D19439037

Pulled By: eellison

fbshipit-source-id: fe6902d0d9a5a9ef5e9c13e1dbd056576d8c327e

exposing CPU/GPU Copy ops (#32248)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32248

expose CPU/GPU copy ops

Test Plan: buck test mode/dev-nosan caffe2/caffe2/python/operator_test:torch_integration_test

Reviewed By: houseroad

Differential Revision: D19405856

fbshipit-source-id: 1df4aa202e26647cb81e9fe7e4478e594a5f7f3e

Updating submodules

Summary:
GitHub commits:

https://github.com/facebook/fb303/commit/29aba0a28715b89ef60c338ffa1db574e60fdf35
https://github.com/facebook/fbthrift/commit/37a97eb4de2596310339fcc1520c7e5dada37ab5
https://github.com/facebook/fbzmq/commit/0efdd5729236427074842bb91c9b4687e6721a69
https://github.com/facebook/folly/commit/6d886fc7ebe4a7cb55c7733f5d0ec2d85e7062bb
https://github.com/facebook/proxygen/commit/2e5854752afb8068fc0fbc6b736790260167d56d
https://github.com/facebook/wangle/commit/931d1c643bf4fa57fcdb3ca695ae643b39066476
https://github.com/facebookincubator/fizz/commit/781986ef716d85c66584612d2d1e261772f85699
https://github.com/facebookincubator/katran/commit/2e6d2903d7cfec77b7d2f878f2add87e354352f1
https://github.com/facebookincubator/mvfst/commit/e04348ff63f56ff791336ecfd037193f1bd9f822
https://github.com/pytorch/fbgemm/commit/e8650fd5601e28783f64f5a38541e6d562125375

Test Plan: n/a

Reviewed By: yns88

fbshipit-source-id: abd7ee4aaec8401b2c885335940773a0655b4496

skip testExceptions in ProcessGroupGloo if built with TSAN (#32242)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32242

TSAN and fork don't play well together, so skip this test if we're
building under TSAN. It will still run in other modes.

Differential Revision: D19416113

fbshipit-source-id: 7e88d63a843356372160c2524c05e8fd1706553e

Renaming IValue List functions (#32093)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32093

toGenericListRef -> toListRef
isGenericList -> isList
toGenericList -> toList
toXListRef -> toXVector

Test Plan: Imported from OSS

Reviewed By: suo

Differential Revision: D19369767

Pulled By: zdevito

fbshipit-source-id: 4f0078f95b83e6586524c03f7bcf206722fdd9ae

Updating submodules

Summary:
GitHub commits:

https://github.com/facebookincubator/fizz/commit/54b290f00ff8a1e1bc12957f97d41b7f32b36268
https://github.com/facebookincubator/mvfst/commit/e8df50310d5d883660b409d2e484b6e05235ce3d
https://github.com/pytorch/fbgemm/commit/ef5c9efe120d1e8b5b263ebe37be8cb0c9583cc2

Test Plan: n/a

Reviewed By: yns88

fbshipit-source-id: 7b6dc88d40e8fd8c396d4d12846db43b0fb4258c

Fix typos, via a Levenshtein-type corrector (#31523)

Summary:
Should be non-semantic.

Uses https://en.wikipedia.org/wiki/Wikipedia:Lists_of_common_misspellings/For_machines to find likely typos, with https://github.com/bwignall/typochecker to help automate the checking.

Uses an updated version of the tool used in https://github.com/pytorch/pytorch/pull/30606 .
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31523

Differential Revision: D19216749

Pulled By: mrshenli

fbshipit-source-id: 7fd489cb9a77cd7e4950c1046f925d57524960ea

TensorIterator unrolling and vectorized load - step 0, 1 (#31974)

Summary:
This is step 0 and 1 for  https://github.com/pytorch/pytorch/issues/31975:

- Old code is moved to namespace `legacy`
- New `elementwise_kernel` and `launch_kernel` added to namespace `modern`, they only support 1d contiguous case for now
- In `gpu_kernel_impl`, dispatch to the new code if the problem is trivial 1d contiguous.

In terms of performance, this PR affect elementwise operators on contiguous tensors. The performance is improved slightly (up to 8%) for medium size tensors on Volta.

See https://github.com/zasdfgbnm/things/blob/master/2020Q1/disassembly-elementwise.ipynb

We can see that, previously, the add kernel compiles to
```
	//## File "/home/xgao/pytorch-master/aten/src/ATen/native/cuda/Loops.cuh", line 71
        /*0000*/                   IMAD.MOV.U32 R1, RZ, RZ, c[0x0][0x28] ;
        /*0010*/              @!PT SHFL.IDX PT, RZ, RZ, RZ, RZ ;
        /*0020*/                   S2R R0, SR_TID.X ;
	//## File "/home/xgao/pytorch-master/aten/src/ATen/native/cuda/Loops.cuh", line 73
        /*0030*/                   S2R R3, SR_CTAID.X ;
        /*0040*/                   IMAD R0, R3, 0x200, R0 ;
	//## File "/home/xgao/pytorch-master/aten/src/ATen/native/cuda/Loops.cuh", line 76
        /*0050*/                   ISETP.GE.AND P0, PT, R0, c[0x0][0x160], PT ;
        /*0060*/               P0 EXIT ;
	//## File "/home/xgao/pytorch-master/aten/src/ATen/native/cuda/Loops.cuh", line 110
        /*0070*/                   IMAD R3, R0.reuse, c[0x0][0x194], RZ ;
        /*0080*/                   IMAD R6, R0, c[0x0][0x198], RZ ;
        /*0090*/                   IADD3 R4, P0, R3.reuse, c[0x0][0x178], RZ ;
        /*00a0*/                   IADD3 R2, P1, R6.reuse, c[0x0][0x180], RZ ;
        /*00b0*/                   LEA.HI.X.SX32 R5, R3, c[0x0][0x17c], 0x1, P0 ;
        /*00c0*/                   LEA.HI.X.SX32 R3, R6, c[0x0][0x184], 0x1, P1 ;
        /*00d0*/                   LDG.E.SYS R5, [R4] ;
        /*00e0*/                   LDG.E.SYS R2, [R2] ;
	//## File "/home/xgao/pytorch-master/aten/src/ATen/native/cuda/Loops.cuh", line 77
        /*00f0*/                   IMAD R0, R0, c[0x0][0x190], RZ ;
        /*0100*/                   IADD3 R6, P0, R0, c[0x0][0x170], RZ ;
        /*0110*/                   LEA.HI.X.SX32 R7, R0, c[0x0][0x174], 0x1, P0 ;
	//## File "/home/xgao/pytorch-master/aten/src/ATen/native/cuda/Loops.cuh", line 110
        /*0120*/                   FFMA R9, R2, c[0x0][0x1a0], R5 ;
	//## File "/home/xgao/pytorch-master/aten/src/ATen/native/cuda/Loops.cuh", line 170
        /*0130*/                   STG.E.SYS [R6], R9 ;
	//## File "/home/xgao/pytorch-master/aten/src/ATen/native/cuda/Loops.cuh", line 81
        /*0140*/                   EXIT ;
.L_16826:
        /*0150*/                   BRA `(.L_16826);
        /*0160*/                   NOP;
        /*0170*/                   NOP;
.L_29063:
```
Now it compiles to
```
	//## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/Loops.cuh", line 210
        /*0000*/                   MOV R1, c[0x0][0x28] ;
        /*0010*/              @!PT SHFL.IDX PT, RZ, RZ, RZ, RZ ;
        /*0020*/                   S2R R6, SR_CTAID.X ;
	//## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/Loops.cuh", line 217
        /*0030*/                   MOV R7, 0x4 ;
	//## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/Loops.cuh", line 208
        /*0040*/                   S2R R3, SR_TID.X ;
	//## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/Loops.cuh", line 210
        /*0050*/                   LEA R6, R6, R3, 0x8 ;
	//## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/Loops.cuh", line 225
        /*0060*/                   IADD3 R2, R6.reuse, 0x40, RZ ;
	//## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/Loops.cuh", line 217
        /*0070*/                   IMAD.WIDE R4, R6.reuse, R7.reuse, c[0x0][0x190] ;
	//## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/Loops.cuh", line 225
        /*0080*/                   IADD3 R3, R6, 0x80, RZ ;
        /*0090*/                   ISETP.GE.AND P1, PT, R2, c[0x0][0x160], PT ;
        /*00a0*/                   ISETP.GE.AND P0, PT, R6.reuse, c[0x0][0x160], PT ;
        /*00b0*/                   ISETP.GE.AND P2, PT, R3, c[0x0][0x160], PT ;
	//## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/Loops.cuh", line 217
        /*00c0*/                   IMAD.WIDE R2, R6.reuse, R7, c[0x0][0x188] ;
	//## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/Loops.cuh", line 225
        /*00d0*/                   IADD3 R14, R6, 0xc0, RZ ;
        /*00e0*/                   ISETP.GE.AND P3, PT, R14, c[0x0][0x160], PT ;
	//## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/Loops.cuh", line 228
        /*00f0*/              @!P1 LDG.E.SYS R11, [R4+0x100] ;
        /*0100*/              @!P0 LDG.E.SYS R0, [R2] ;
        /*0110*/              @!P0 LDG.E.SYS R9, [R4] ;
        /*0120*/              @!P1 LDG.E.SYS R8, [R2+0x100] ;
        /*0130*/              @!P2 LDG.E.SYS R10, [R2+0x200] ;
        /*0140*/              @!P2 LDG.E.SYS R13, [R4+0x200] ;
        /*0150*/              @!P3 LDG.E.SYS R12, [R2+0x300] ;
        /*0160*/              @!P3 LDG.E.SYS R15, [R4+0x300] ;
	//## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/Loops.cuh", line 245
        /*0170*/                   IMAD.WIDE R6, R6, R7, c[0x0][0x180] ;
	//## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/Loops.cuh", line 191
        /*0180*/                   FFMA R9, R9, c[0x0][0x168], R0 ;
        /*0190*/                   FFMA R11, R11, c[0x0][0x168], R8 ;
        /*01a0*/                   FFMA R13, R13, c[0x0][0x168], R10 ;
	//## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/Loops.cuh", line 245
        /*01b0*/              @!P0 STG.E.SYS [R6], R9 ;
        /*01c0*/              @!P1 STG.E.SYS [R6+0x100], R11 ;
        /*01d0*/              @!P2 STG.E.SYS [R6+0x200], R13 ;
	//## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/Loops.cuh", line 191
        /*01e0*/                   FFMA R15, R15, c[0x0][0x168], R12 ;
	//## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/Loops.cuh", line 244
        /*01f0*/               P3 EXIT ;
	//## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/Loops.cuh", line 245
        /*0200*/                   STG.E.SYS [R6+0x300], R15 ;
	//## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/Loops.cuh", line 248
        /*0210*/                   EXIT ;
.L_727:
        /*0220*/                   BRA `(.L_727);
        /*0230*/                   NOP;
        /*0240*/                   NOP;
        /*0250*/                   NOP;
        /*0260*/                   NOP;
        /*0270*/                   NOP;
.L_32233:
```

The benchmark is for add kernel on Volta.

See https://github.com/zasdfgbnm/things/blob/master/2020Q1/benchmark-unroll.ipynb

For tensors of size from 2^20 to 2^30, previously we had
```
1.5.0a0+dedd16b
dedd16b4181cae81e37e978cd3bf24c1ba35ca05
33 µs ± 31.8 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
48.7 µs ± 75 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
78.9 µs ± 122 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
140 µs ± 51.8 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
261 µs ± 71.4 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
506 µs ± 159 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
993 µs ± 189 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
1.96 ms ± 139 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
3.9 ms ± 955 ns per loop (mean ± std. dev. of 7 runs, 100 loops each)
7.79 ms ± 187 ns per loop (mean ± std. dev. of 7 runs, 100 loops each)
```
Now we have
```
1.5.0a0+b1a239b
b1a239be8d529e89875fe47cd09964ef3a9516ac
30.4 µs ± 18 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
45.2 µs ± 46.5 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
75 µs ± 476 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
134 µs ± 192 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
253 µs ± 354 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
489 µs ± 138 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
961 µs ± 431 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
1.91 ms ± 578 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
3.8 ms ± 88.8 ns per loop (mean ± std. dev. of 7 runs, 100 loops each)
7.57 ms ± 763 ns per loop (mean ± std. dev. of 7 runs, 100 loops each)
```
It is slightly better.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31974

Differential Revision: D19450765

Pulled By: ngimel

fbshipit-source-id: 79601bfceb5da84ff87384ba8193793eb4095a2e

run code analysis against mobile interpreter (#32276)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32276

Include mobile interpreter in mobile code analysis pass, which has some
manually registered ops in temporary namespaces.

The mobile interpreter is still under development and these ops will be
removed in the future. This is a temporary step for internal build
experiment.

Test Plan: Imported from OSS

Differential Revision: D19426818

Pulled By: ljk53

fbshipit-source-id: 507453dc801e5f93208f1baea12400beccda9ca5

Specify requires_grad for Parameter replica so it's not always set to True by default (#32356)

Summary:
This is the proposed fix for issue https://github.com/pytorch/pytorch/issues/32018
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32356

Differential Revision: D19450648

Pulled By: mrshenli

fbshipit-source-id: c63eeb6e9f5a87ebe613dd7013907559f295a7ea

Fix cudnn channels_last descriptors problem (#31952)

Summary:
This is to append fixes to https://github.com/pytorch/pytorch/issues/31783 so we can pull the fixes in without breaking tests.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31952

Differential Revision: D19433839

Pulled By: ngimel

fbshipit-source-id: 5b3d2f0b2a86aacd1d100dd86996ee0d63e5ee92

Updating submodules

Summary:
GitHub commits:

https://github.com/facebook/fbthrift/commit/9b13f58aa1b1a5a65f21cf9a80f8552f5c07ff60
https://github.com/facebook/folly/commit/044b292accb454838008f0fe88eea0c78c9af27e
https://github.com/pytorch/fbgemm/commit/e1f67bbf3da31ca8fc5f4f506d4791cd8883b448

Test Plan: n/a

Reviewed By: yns88

fbshipit-source-id: 21df26f60f436eb8c1766f66afac4a0d93dd33d1

Back out "Calling JITed 8 Bit Fused SLS in FBGEMM from C2" (#32381)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32381
Original commit changeset: 0dfa936eb503

"Facebook"
Temporary remedy for SEV :
https://our.intern.facebook.com/intern/sevmanager/view/s/193726

Test Plan: Run CI tests

Reviewed By: jspark1105

Differential Revision: D19458382

fbshipit-source-id: 731790f96b341ade5e70ff13e4b0b5fafad0fea6

Remove stray `@script` (#32235)

Summary:
This should be covered under recursive script now
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32235

Pulled By: driazati

Differential Revision: D19414889

fbshipit-source-id: 85f8132401dbe44c9dbaef7c0350110f90eb9843

porting scatter_add to ATen (CPU) (#31662)

Summary:
Fixes [https://github.com/pytorch/pytorch/issues/24758](https://github.com/pytorch/pytorch/issues/24758).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31662

Differential Revision: D19440824

Pulled By: ngimel

fbshipit-source-id: b13443cfcc8bcb9ec21f1cddb5c6fbc0ef4bb0f2

Temporary workaround for BC test due to schema parser changes

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32324

Test Plan: Imported from OSS

Differential Revision: D19438085

Pulled By: jamesr66a

fbshipit-source-id: 3dd2586e73c890a7bdadd6cbb3df2c186f93199d

Remove __torch__ from custom class qualname

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32301

Test Plan: Imported from OSS

Differential Revision: D19431645

Pulled By: jamesr66a

fbshipit-source-id: 198522a1641cb9f90fa4c614da4ca4162fadf456

Fix returning instance of custom class from method

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32312

Test Plan: Imported from OSS

Differential Revision: D19433511

Pulled By: jamesr66a

fbshipit-source-id: f048d5f60eaba992ee42fea2d318a59b3a156578

Test passing custom class instance to bound method

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32320

Test Plan: Imported from OSS

Differential Revision: D19437335

Pulled By: jamesr66a

fbshipit-source-id: 8f5166dbe6fc5704b12b6224932460b12be0d39b

support torch script call over rpc (#32197)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32197

This is to reland https://github.com/pytorch/pytorch/pull/30063, the main change is to match a general exception and grep "pickle" error word in "test_script_functions_not_supported" unit test, as Python 3.5 and Python 3.6 throw different types of errors with different error message for the rpc call in the unit test.
[test all]This diff makes following changes:
1. Providing a new set of python rpc privated APIs, they can accept an annotated TorchScript call and this call can be serialized, deserialized and executed in C++ without GIL. These privated APIs will be binded to JIT in the future, and they are different from public APIs as future JIT binded private APIs will be able to accept qualified_name, not callables. These private APIs are subject to be deprecated once JIT supports torch script function to be a JIT type.

Also, these APIs require torch script function to be defined and annotated by users in python land, it can not be script class/module constructor or class/module methods.

2. This diff also allows public rpc APIs to accept an annotated TorchScript call and execute code path that above private APIs ran on. Therefore if users invoke an annotated TorchScript call over RPC, this call can be serialized, deserialized and executed in C++ without GIL as well.

3. The above private APIs call a newly defined C++ function to make rpc torch script call to be serialized, deserialized and executed in C++ land. This C++ function returns an ivalue::Future. so that in follow up diff this C++ function can be called when these privated APIs are binded to JIT.

4. script_call.cpp/.h and request_callback_impl.cpp files are refactored accordingly so that torch script call and builtin call can share same message type and codes.

5. refactored deserializeResponse() and added a new utility to deserizalize response to IValue

ghstack-source-id: 96879167
ghstack-source-id: 96879167

Test Plan: unit test

Differential Revision: D19402374

fbshipit-source-id: 04efcc7c167d08a6503f29efe55e76f2be4b2c5e

Updating submodules

Summary:
GitHub commits:

https://github.com/facebook/fb303/commit/ea6039a6c98f089b7d5b4455715effbf492deb80
https://github.com/facebook/fbthrift/commit/0d30b8e0fc3191b18d16e1ebb1d7db74dc39b082
https://github.com/facebook/fbzmq/commit/7acedd4723f1997d51638f583bee061abff3b58b
https://github.com/facebook/folly/commit/4db6e3b78569d72dd2c11a13ba508daa02c97fac
https://github.com/facebook/proxygen/commit/cd898afb5e249266789f76951ca1e8ded5a09d5f
https://github.com/facebook/wangle/commit/cf5dd1120450ffe81be83f51396231907cfec325
https://github.com/facebookincubator/fizz/commit/08bdcfd87ed0b382956c6c1ee3ba01e2b48dab1d
https://github.com/facebookincubator/katran/commit/fc84c09b8f104bb3b1497ff97132d39789b37ed1
https://github.com/facebookincubator/mvfst/commit/454d37976b88605aa3ff7cfc7f8f735d385e0bea
https://github.com/pytorch/fbgemm/commit/a22e6b8cb480dadfdada25188c50d65acd39f649

Test Plan: n/a

Reviewed By: yns88

fbshipit-source-id: b87550b26e69216be2a8e40870a6e7dab825261c

support empty batch in group normalization (#32401)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32401

https://github.com/pytorch/pytorch/issues/12013

Test Plan: buck test mode/dev-nosan //caffe2/test:nn -- 'test_GroupNorm_empty'

Differential Revision: D19463720

fbshipit-source-id: 8ae44590fc5eeb1adc69a2345d7cc2187d3307ac

Removed unused weight update in prepack. Moved zero point update to (#32254)

Summary:
qlinear/qconv to be consistent with data update.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32254

Differential Revision: D19422929

Pulled By: kimishpatel

fbshipit-source-id: 595a4f7d6fde4978c94f3e720ec8645f3f2bdb7a

Build: Respect USE_CUDNN=0, even if cudnn is found (#32404)

Summary:
Currently, setting `USE_CUDNN=0` has no effect and any cudnn library found on your system will be used anyway. This is especially problematic when your system has multiple CUDA versions installed, and you are building with a version that lacks a matching cudnn. CMake will find any other cudnn versions and you end up with both CUDA versions added to your compiler include paths.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32404

Differential Revision: D19499425

Pulled By: ezyang

fbshipit-source-id: a9b3f6f9dc22033481c3c1c5999b1a7ef98468cb

Make type of `Tensor.type()` more specific (#32353)

Summary:
Fixes the following issue:

```
$ cat test.py
import torch

t = torch.tensor(1.5)
t.type(torch.float32)[None]

$ mypy test.py
test.py:4: error: Invalid index type "None" for "Union[str, Tensor]"; expected type "Union[int, slice]"
Found 1 error in 1 file (checked 1 source file)
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32353

Differential Revision: D19499388

Pulled By: ezyang

fbshipit-source-id: 715111e934aea020b20f850d27e32c4f70b82572

.circleci: Only run macos libtorch on master (#32378)

Summary:
These jobs were taking forver to run so we decided it's only really
worth it to run it on master.

Signed-off-by: Eli Uriegas <eliuriegas@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32378

Differential Revision: D19499301

Pulled By: seemethere

fbshipit-source-id: 22cac5b5baee84e44607a16daeb77048cb0f5974

F.normalize uses clamp_min_ inplace (#32360)

Summary:
We don't care about autograd when `out!=None` anyways
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32360

Differential Revision: D19452402

Pulled By: colesbury

fbshipit-source-id: c54775289f8a700019ca61e951d59ff4894ac980

Synchronize with ShipIt.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

add an option to record time spent waiting for GIL (#30842)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30842

We'd like to profile the time spent on GIL acqusiition to debug
performance issues.

Test Plan: Unit tests pass.

Differential Revision: D18837590

fbshipit-source-id: 925968f71c5fb96b8cd93f1eab4647602d2617d1

Fix cusparse version check (#32405)

Summary:
The current version check doesn't use proper lexicographic comparison and so will break for future versions of cuSPARSE with `CUSPARSE_VER_MAJOR > 10` and `CUSPARSE_VER_MINOR < 2`. Also, my cusparse headers for CUDA 9 don't seem to include version macros at all, so added `if !defined` to be explicit about that.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32405

Differential Revision: D19499412

Pulled By: ezyang

fbshipit-source-id: 1593bf1e5a4aae8b75bb3b350d016cc6c3b9c009

Remove dead includes in caffe2/test

Reviewed By: ezyang

Differential Revision: D19273220

fbshipit-source-id: 3dfc3388914e60611c84472e3fc529f5b5e40534

Set rpath for JNI library on Mac (#32247)

Summary:
Without this, dlopen won't look in the proper directory for dependencies
(like libtorch and fbjni).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32247

Test Plan:
Build libpytorch_jni.dylib on Mac, replaced the one from the libtorch
nightly, and was able to run the Java demo.

Differential Revision: D19501498

Pulled By: dreiss

fbshipit-source-id: 13ffdff9622aa610f905d039f951ee9a3fdc6b23

Fix BC test after TorchBind cahnges (#32429)

Summary:
It was broken by https://github.com/pytorch/pytorch/issues/32320. Let's be on the safe side and just whitelist all testing ops
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32429

Differential Revision: D19501016

Pulled By: dzhulgakov

fbshipit-source-id: 9cc1d363edb4579905bee1976a2b57255ce41738

Redundant condition (#32396)

Summary:
Optimize expression: 'A || (!A && B)' <=> 'A || B'

A: relErr <= maxRelErr
!A : relErr > maxRelErr
B: absErr <= absErrForRelErrFailure
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32396

Differential Revision: D19499370

Pulled By: ezyang

fbshipit-source-id: c19bdcb2d4e7ff7806a8cd181c6e7e9e276b9979

Enhance NCCL watchdog to acitvely abort communicators for timed out ops. (#32338)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32338

Timed out ops could linger around if the user doesn't actually call
`wait()` on that OP. As result, to fix this I've introduced the following
functionality in this PR:

1. Keep track of all outstanding work in ProcessGroupNCCL.
2. Enhance NCCL watchdog to sweep through all outstanding work and perform the
following operations:
  i.   If the work has timed out, abort all communicators for that work and
       remove them from the cache.
  ii.  If the communicators for the work receive an error, abort the
       communicators and remove them from the cache.
  iii. If the work has completed (successfully/unsuccessfully), remove it from
       the list of outstanding work.
ghstack-source-id: 96895704

Test Plan: waitforbuildbot

Differential Revision: D19401625

fbshipit-source-id: 8f6f277ba2750a1e1aa03cdbc76e8c11862e7ce5

Revert "Temporary workaround for BC test due to schema parser changes" (#32441)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32441

This reverts commit ceffdbd2179e7dafdc6407909a00f4267db040de.

Test Plan: Imported from OSS

Reviewed By: houseroad

Differential Revision: D19500043

Pulled By: jamesr66a

fbshipit-source-id: 3bd22c55e4a81ff8b89d27f6e7438e3bdfc18606

Updating submodules

Summary:
GitHub commits:

https://github.com/facebook/fbthrift/commit/47e0b9b97e19c34dc15a6abf0e8ed93063870ce8
https://github.com/facebook/folly/commit/6d225aaf95b58baf2420efec7f4c570a2d426395
https://github.com/pytorch/fbgemm/commit/ab4da8f60a0194f04c55aa4c9b74c5c175bd1172

Test Plan: n/a

Reviewed By: zpao

fbshipit-source-id: 27bcdf08b6f5e47a5c948e094aca26bf67a6fb66

QNNPACK: Add support for dynamic quantization.

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31896

Test Plan: Added new tests to QNNPACK's test suite to cover the new use case.  All new tests are passing.

Reviewed By: supriyar

Differential Revision: D19443250

Pulled By: AshkanAliabadi

fbshipit-source-id: fa7b1cffed7266a3c198eb591d709f222141a152

Updating submodules

Summary:
GitHub commits:

https://github.com/facebook/fbthrift/commit/40b08129cfd2aed6dba56d10d8cea4ac0ef6932e
https://github.com/facebook/proxygen/commit/8cd8d286e68a06968b80dd5a6d8e150392b87aea
https://github.com/facebook/rocksdb/commit/d305f13e2124132863267eb49b2a08ede679d2c4
https://github.com/pytorch/fbgemm/commit/2957bd45f19d8fa2d185e26b7ada5a394c5ba5b4

Test Plan: n/a

Reviewed By: zpao

fbshipit-source-id: 3b76eb7c8b6b5cf617aca7bd143e1ee404c4f0ed

Adagrad optimizer - updated step function, added param_groups, state to optimizers

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29335

Differential Revision: D19449382

Pulled By: anjali411

fbshipit-source-id: ee238801ed9cdf15a80f2ce31cc4aab8ba582aea

Enhace DispatchStub to be thread safe from a TSAN point of view. (#32148)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32148

TSAN would complain about multiple threads reading and writing to the
`cpu_dispatch_ptr` without any sort of synchronization. Although, this is a
valid issue from a TSAN point of view, there wasn't a correctness issue since
both threads would compute the same value.

In order to fix this, I've used std::atomic for cpu_dispatch_ptr with relaxed
ordering guarantees.
ghstack-source-id: 96989435

Test Plan: Verify the TSAN tests pass.

Differential Revision: D19386082

fbshipit-source-id: 1ff0893e02529eddd06b2855d9565edf1bbf1196

Fix test_data_parallel name errors and add to run_test.py (#32428)

Summary:
While working on https://github.com/pytorch/pytorch/issues/31768 and trying to add tests for `DataParallel`, I discovered that:
- `test_data_parallel.py` can't be run through `run_test.py`
- running it with `pytest` fails with many name errors

`test_data_parallel.py` seems to have been split from `test_nn.py` in https://github.com/pytorch/pytorch/issues/28297 but not in a state where it can actually be run. Presumably `DataParallel` hasn't been tested by CI in the time since.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32428

Differential Revision: D19499345

Pulled By: ezyang

fbshipit-source-id: f9b748a99a5c85fc6675c22506cf10bbfd9c8a4d

Updating submodules

Summary:
GitHub commits:

https://github.com/facebook/fbthrift/commit/d45f7b4f0972951c2548e918c0bc167f397815b3
https://github.com/facebook/rocksdb/commit/e6e8b9e8718698b334d18fa8f5ab6db30b147c53
https://github.com/facebookincubator/katran/commit/da618022d26b0786d4a090f38006db9ae584f2cb
https://github.com/pytorch/fbgemm/commit/2df47f519a6c896b7c418a8a94aae9c07ba7285c

Test Plan: n/a

Reviewed By: zpao

fbshipit-source-id: c4af09e70a56d11e845150ba3d90a570a3758e51

Move log_normal to Aten(CPU) (#31854)

Summary:
Fix https://github.com/pytorch/pytorch/issues/24723.
Benchmark script :
```
import torch
import torch.nn as nn
import time

torch.manual_seed(0)

def _time():
    return time.time()

device = "cpu"

for n in [10, 100, 1000]:
    input = torch.randn(128, n, requires_grad=False, device=device)
    for i in range(1000):
        input.log_normal_()

for n in [1, 10, 100, 1000]:
    fwd_t = 0
    input = torch.randn(128, n, requires_grad=False, device=device)
    for i in range(10000):
        t1 = _time()
        input.log_normal_()
        t2 = _time()
        fwd_t = fwd_t + (t2 -t1)
    fwd_avg = fwd_t / 10000 * 1000
    print("input size(128, %d) forward time is %.4f (ms)." % (n, fwd_avg))
```
Test Device: skx-8180.
Before:
```
input size(128, 1) forward time is 0.0114 (ms).
input size(128, 10) forward time is 0.1021 (ms).
input size(128, 100) forward time is 1.0081 (ms).
input size(128, 1000) forward time is 10.1831 (ms).
```
After:
```
input size(128, 1) forward time is 0.0108 (ms).
input size(128, 10) forward time is 0.0969 (ms).
input size(128, 100) forward time is 0.9804 (ms).
input size(128, 1000) forward time is 9.6131 (ms).
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31854

Differential Revision: D19314586

Pulled By: pbelevich

fbshipit-source-id: 2ea1d9a2c505e36aca9e609b52ccb3e8caf2ba8f

Updating submodules

Summary:
GitHub commits:

https://github.com/facebook/proxygen/commit/d2ee8a1a3fc0bceee0dae34de37d1e23a8383977
https://github.com/pytorch/fbgemm/commit/a1543b168df44c4722fa545746aaaa7cf9660f6d

Test Plan: n/a

Reviewed By: zpao

fbshipit-source-id: a1394f1c4a48920d3ce1403c70351e2c56eaecf0

`insert_quant_dequant` pass support shared class types (#31408)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31408

We'll error out when a graph is quantized with different QSchemes.
This only occurs when we have two modules that have same types (e.g. two Conv2d modules initialized with
same arguments) and quantized with two configs that would produce different quantized graphs, for example
per tensor affine and per channel affine. This is a rare case, so it should be OK to skip for now.
Actual support will come later.

Test Plan:
test_jit.py, test_quantization.py

Imported from OSS

Differential Revision: D19162366

fbshipit-source-id: 798f06d0ddef0c8458237ce88b62159cc77eec8b

Remove the support of build options like NO_*, WITH_* (#32447)

Summary:
We will now use USE_*, BUILD_* consistently. The backward compatibility
for NO_* and WITH_* is hereby removed in this commit, as promised in the
comment (next release is beyond Feb 20):

    # Before we run the setup_helpers, let's look for NO_* and WITH_* variables and hotpatch environment with the USE_*
    # equivalent The use of NO_* and WITH_* is deprecated and will be removed in Feb 20, 2020.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32447

Differential Revision: D19515536

Pulled By: ezyang

fbshipit-source-id: 2f2c51e6d4674af690b190a1f0397b8f596b6a15

Implement backend fallback fallthrough (#32439)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32439

This adds c10::fallthrough_kernel which is a special boxed function which
can be used to implement fallthrough behavior at a dispatch key.  A fallthrough
kernel will redispatch to the next valid dispatch key.  It is implemented
in such a way that it costs no more to fallthrough than it does to go
straight to the actual implementation of the kernel.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Differential Revision: D19503886

Test Plan: Imported from OSS

Pulled By: ezyang

fbshipit-source-id: 6ee05bd815c4ef444e612d19f62312dbb76f2787

fix torch.eq() doc entry (#32399)

Summary:
fix `torch.eq()` entry example to match the current output (boolean, instead of uint8)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32399

Differential Revision: D19498104

Pulled By: ezyang

fbshipit-source-id: e7ec1263226766a5c549feed16d22f8f172aa1a3

Always return a new tensor from nn.functional.pad (#32350)

Summary:
Fixes https://github.com/pytorch/pytorch/issues/31734
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32350

Differential Revision: D19501845

Pulled By: ezyang

fbshipit-source-id: ea79496d23dc0016f3caa233c53d283b08f60371

Put sparse all reduce results to input tensors (#32226)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32226

right now if users call torch.dist.all_reduce() on dense tensors, outputs are put in input tensors. but if users call torch.dist.all_reduce() on sparse tensors, outputs are neither returned explicitly to users nor are put in input tensors.

To make torch.dist.all_reduce() API have same behavior on both dense tensors and sparse tensors, this diff is made to make torch.dist.all_reduce() on sparse tensors to put output in input tensors as well. This is acheived by simply calling input_sparse.copy_(output_sparse), see PR https://github.com/pytorch/pytorch/pull/9005 that implemented copy_ for sparse tensors.

close #31413
ghstack-source-id: 96984228

Test Plan: unit test

Differential Revision: D19192952

fbshipit-source-id: 2dd31dc057f20cc42b44b9e55df864afa2918c33

Fix dll load logic for Python 3.8 on Windows (#32215)

Summary:
Fixes https://github.com/pytorch/pytorch/issues/31181 and https://github.com/pytorch/pytorch/pull/31162#discussion_r362495611.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32215

Differential Revision: D19501869

Pulled By: ezyang

fbshipit-source-id: 363824e52d2592ad968ecf1df345aa4c0daff915

Migrate max and min (binary) from TH to ATen. (#30851)

Summary:
TH implementation will be removed after the unary max and min are
migrated.

Benchmark: (Debian 10, Release build, gcc 7.4, no turbo)

```python
import timeit
for device in ('cpu', 'cuda'):
    print(f'device: {device}')
    for op in ('max', 'min'):
        for dtype in ('torch.double', 'torch.float', 'torch.int16',
'torch.int32', 'torch.int64'):
            for n, t in [(10_000, 200000),
                        (100_000, 20000)]:
                print(f'torch.{op}(a, b), numel() == {n} for {t} times,
dtype={dtype}')
                print(timeit.timeit(f'torch.{op}(a)' +
(';torch.cuda.synchronize()' if device == 'cuda' else ''),
                                    setup=f'import torch; a =
torch.arange({n}, dtype={dtype}); b = torch.ones({n}, 0, dtype={dtype})
* ({n} / 2)', number=t))
    print()
```

Before:

```
device: cpu
torch.max(a, b), numel() == 10000 for 200000 times, dtype=torch.double
2.241763713000182
torch.max(a, b), numel() == 100000 for 20000 times, dtype=torch.double
1.7138833169992722
torch.max(a, b), numel() == 10000 for 200000 times, dtype=torch.float
2.2183356810000987
torch.max(a, b), numel() == 100000 for 20000 times, dtype=torch.float
1.7031846980007685
torch.max(a, b), numel() == 10000 for 200000 times, dtype=torch.int16
1.7704679510006827
torch.max(a, b), numel() == 100000 for 20000 times, dtype=torch.int16
1.289198366999699
torch.max(a, b), numel() == 10000 for 200000 times, dtype=torch.int32
1.7937613740014058
torch.max(a, b), numel() == 100000 for 20000 times, dtype=torch.int32
1.2930124340000475
torch.max(a, b), numel() == 10000 for 200000 times, dtype=torch.int64
1.8032857640009752
torch.max(a, b), numel() == 100000 for 20000 times, dtype=torch.int64
1.2908709189996443
torch.min(a, b), numel() == 10000 for 200000 times, dtype=torch.double
1.8829010000008566
torch.min(a, b), numel() == 100000 for 20000 times, dtype=torch.double
1.2994690759987861
torch.min(a, b), numel() == 10000 for 200000 times, dtype=torch.float
1.8037853410005482
torch.min(a, b), numel() == 100000 for 20000 times, dtype=torch.float
1.2929310759991495
torch.min(a, b), numel() == 10000 for 200000 times, dtype=torch.int16
1.8075240359994496
torch.min(a, b), numel() == 100000 for 20000 times, dtype=torch.int16
1.2932477679987642
torch.min(a, b), numel() == 10000 for 200000 times, dtype=torch.int32
1.7868400779989315
torch.min(a, b), numel() == 100000 for 20000 times, dtype=torch.int32
1.2885970789993735
torch.min(a, b), numel() == 10000 for 200000 times, dtype=torch.int64
1.8389664830010588
torch.min(a, b), numel() == 100000 for 20000 times, dtype=torch.int64
1.29402057399966

device: cuda
torch.max(a, b), numel() == 10000 for 200000 times, dtype=torch.double
4.787109836999662
torch.max(a, b), numel() == 100000 for 20000 times, dtype=torch.double
1.842438002999188
torch.max(a, b), numel() == 10000 for 200000 times, dtype=torch.float
3.429616614999759
torch.max(a, b), numel() == 100000 for 20000 times, dtype=torch.float
1.835390076999829
torch.max(a, b), numel() == 10000 for 200000 times, dtype=torch.int16
2.940423873000327
torch.max(a, b), numel() == 100000 for 20000 times, dtype=torch.int16
1.4108991760003846
torch.max(a, b), numel() == 10000 for 200000 times, dtype=torch.int32
2.9318018840003788
torch.max(a, b), numel() == 100000 for 20000 times, dtype=torch.int32
1.4168134739993548
torch.max(a, b), numel() == 10000 for 200000 times, dtype=torch.int64
2.9610764919998473
torch.max(a, b), numel() == 100000 for 20000 times, dtype=torch.int64
1.4189234130008117
torch.min(a, b), numel() == 10000 for 200000 times, dtype=torch.double
2.960172712999338
torch.min(a, b), numel() == 100000 for 20000 times, dtype=torch.double
1.4162539499993727
torch.min(a, b), numel() == 10000 for 200000 times, dtype=torch.float
2.8985912560001452
torch.min(a, b), numel() == 100000 for 20000 times, dtype=torch.float
1.4113489299998037
torch.min(a, b), numel() == 10000 for 200000 times, dtype=torch.int16
2.9160250799995993
torch.min(a, b), numel() == 100000 for 20000 times, dtype=torch.int16
1.4128787690005993
torch.min(a, b), numel() == 10000 for 200000 times, dtype=torch.int32
2.8806865219994506
torch.min(a, b), numel() == 100000 for 20000 times, dtype=torch.int32
1.4086357010000938
torch.min(a, b), numel() == 10000 for 200000 times, dtype=torch.int64
2.9362181240012433
torch.min(a, b), numel() == 100000 for 20000 times, dtype=torch.int64
1.4151225870009512

```

After:

```
device: cpu
torch.max(a, b), numel() == 10000 for 200000 times, dtype=torch.double
2.2685823729998447
torch.max(a, b), numel() == 100000 for 20000 times, dtype=torch.double
1.72004808300062
torch.max(a, b), numel() == 10000 for 200000 times, dtype=torch.float
2.212242640000113
torch.max(a, b), numel() == 100000 for 20000 times, dtype=torch.float
1.7089235590001408
torch.max(a, b), numel() == 10000 for 200000 times, dtype=torch.int16
1.7767087259999244
torch.max(a, b), numel() == 100000 for 20000 times, dtype=torch.int16
1.2916517639996528
torch.max(a, b), numel() == 10000 for 200000 times, dtype=torch.int32
1.8265984959998605
torch.max(a, b), numel() == 100000 for 20000 times, dtype=torch.int32
1.3002885240002797
torch.max(a, b), numel() == 10000 for 200000 times, dtype=torch.int64
1.8084679720004715
torch.max(a, b), numel() == 100000 for 20000 times, dtype=torch.int64
1.3012119999993956
torch.min(a, b), numel() == 10000 for 200000 times, dtype=torch.double
1.8800218449996464
torch.min(a, b), numel() == 100000 for 20000 times, dtype=torch.double
1.3060645710002063
torch.min(a, b), numel() == 10000 for 200000 times, dtype=torch.float
2.4905043950002437
torch.min(a, b), numel() == 100000 for 20000 times, dtype=torch.float
1.9126290209997023
torch.min(a, b), numel() == 10000 for 200000 times, dtype=torch.int16
1.7972335520007618
torch.min(a, b), numel() == 100000 for 20000 times, dtype=torch.int16
1.2918074379995232
torch.min(a, b), numel() == 10000 for 200000 times, dtype=torch.int32
1.8047651860006226
torch.min(a, b), numel() == 100000 for 20000 times, dtype=torch.int32
1.2992197730000044
torch.min(a, b), numel() == 10000 for 200000 times, dtype=torch.int64
1.8526509560006161
torch.min(a, b), numel() == 100000 for 20000 times, dtype=torch.int64
1.3030709570002728

device: cuda
torch.max(a, b), numel() == 10000 for 200000 times, dtype=torch.double
4.700986622000528
torch.max(a, b), numel() == 100000 for 20000 times, dtype=torch.double
1.8415469050005413
torch.max(a, b), numel() == 10000 for 200000 times, dtype=torch.float
3.3051693249999516
torch.max(a, b), numel() == 100000 for 20000 times, dtype=torch.float
1.8321999460004008
torch.max(a, b), numel() == 10000 for 200000 times, dtype=torch.int16
2.8086475109994353
torch.max(a, b), numel() == 100000 for 20000 times, dtype=torch.int16
1.405110773999695
torch.max(a, b), numel() == 10000 for 200000 times, dtype=torch.int32
2.913458047999484
torch.max(a, b), numel() == 100000 for 20000 times, dtype=torch.int32
1.4236377289998927
torch.max(a, b), numel() == 10000 for 200000 times, dtype=torch.int64
2.9386842409994642
torch.max(a, b), numel() == 100000 for 20000 times, dtype=torch.int64
1.4230227469997772
torch.min(a, b), numel() == 10000 for 200000 times, dtype=torch.double
3.0341797270002644
torch.min(a, b), numel() == 100000 for 20000 times, dtype=torch.double
1.4289592409995748
torch.min(a, b), numel() == 10000 for 200000 times, dtype=torch.float
3.6091147850002017
torch.min(a, b), numel() == 100000 for 20000 times, dtype=torch.float
2.036691903999781
torch.min(a, b), numel() == 10000 for 200000 times, dtype=torch.int16
2.8256167649997224
torch.min(a, b), numel() == 100000 for 20000 times, dtype=torch.int16
1.4078955400000268
torch.min(a, b), numel() == 10000 for 200000 times, dtype=torch.int32
2.8631781489993955
torch.min(a, b), numel() == 100000 for 20000 times, dtype=torch.int32
1.4210130069996012
torch.min(a, b), numel() == 10000 for 200000 times, dtype=torch.int64
3.0112479260005784
torch.min(a, b), numel() == 100000 for 20000 times, dtype=torch.int64
1.4297719679998409

```

Solve partly https://github.com/pytorch/pytorch/issues/24594 #24595

Close https://github.com/pytorch/pytorch/issues/25016

Continuing https://github.com/pytorch/pytorch/issues/27185
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30851

Differential Revision: D19515694

Pulled By: ezyang

fbshipit-source-id: 1764897f912d6ae24b0c361f19a1aacf96e0826e

add missing align_corners annotation (#32492)

Summary:
adds the missing annotation in grid_sample and affine_grid functional
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32492

Differential Revision: D19516550

Pulled By: ezyang

fbshipit-source-id: 064c8c99bf6eae6744237c0b151b3ce4c82ada96

Move some of the helper functions for public use (#32202)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32202

Move some helper functions in ModuleUseDeduper for public use

Test Plan:
.

Imported from OSS

Differential Revision: D19508034

fbshipit-source-id: 2e8e05eff6f3bbcfe6936598371e4afa72f9b11f

Fix comparisions for ConcreteModuleType (#32256)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32256

Previously two unrelated modules loaded from torch.jit.load
would compare equal because we only considered their data_ attributes which
are initialized blank in torch.jit.load. This changes ConcreteModuleType
to distinguish when the data_ attribute is blank vs when it is empty.

This replaces the poisoned logic.
ghstack-source-id: 96755797

Test Plan: oss

Differential Revision: D19423055

fbshipit-source-id: 79d6a50a3731c6eeb8466ba2a93702b49264bba0

Add str[] float[] constants resubmit

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31791

Test Plan: Imported from OSS

Reviewed By: driazati

Differential Revision: D19439513

Pulled By: eellison

fbshipit-source-id: a04c7401687b051f0d4fb4794963931ebe004194

improve mayContainAlias (#31839)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31839

There are a number of improvements that can be made to `mayContainAlias`, which I would like to do in follow ups. For now, this is an easy one.

Test Plan: Imported from OSS

Differential Revision: D19439516

Pulled By: eellison

fbshipit-source-id: 0042fb7eaae6cfb4916bf95dc38280517a4bd987

remove tuple logic in constant propagation (#31840)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31840

The next PR in this stack makes tuples insertable as constants, so we can remove special handling of tuples in constant propagation.

Test Plan: Imported from OSS

Differential Revision: D19439515

Pulled By: eellison

fbshipit-source-id: c58f153157f1d4eee4c1242decc4f36e41c1aa05

implement tuple constants (#31841)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31841

Add Tuple Constants to JIT. The constraint here is that all elements of a tuple must themself be insertable as a a constant. Previously tuples were special cased in constant propagation, but now that there are more passes that are inserted constants, such as freezing, we should just have tuples be representable as constants.

Test Plan: Imported from OSS

Differential Revision: D19439514

Pulled By: eellison

fbshipit-source-id: 3810ba08ee349fa5598f4b53ea64525996637b1a

Adding QConfigTypePtrMap (#32203)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32203

The type is needed for allowing multiple qconfig configurations for shared
ClassType, see next PR for more details

Test Plan:
.

Imported from OSS

Differential Revision: D19508027

fbshipit-source-id: a3df29dab3038bfa88c55dda98a3e8a78e99e5a1

Remove mis-exposed abort API on ProcessGroup

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32292

Test Plan: Imported from OSS

Differential Revision: D19430252

Pulled By: mrshenli

fbshipit-source-id: 4ec594e1be54afe774bdcecc0f1c9bda2edf5e0d

Corrected logical boolean expression (#32249)

Summary:
Changed bitwise & to logical && in the boolean expression.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32249

Differential Revision: D19501586

Pulled By: eellison

fbshipit-source-id: afe374cfc9661182703cc82810d9cb735fbb8180

[caffe2] remove unnecessary np.set_printoptions and fix test errors (#32475)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32475

As title

Test Plan: CI

Reviewed By: houseroad

Differential Revision: D19508778

fbshipit-source-id: fd9ad63607535980505d155f3e3c3b7c6b95daf7

Updating submodules

Summary:
GitHub commits:

https://github.com/facebook/fbthrift/commit/87b81e7cb2e17d6cb2289d678decd9311136ab28
https://github.com/facebook/folly/commit/3a9a0976f2537ed66a465bf30ec2038a7a92d636
https://github.com/facebook/litho/commit/9294f3b2faeded509b6fb0c2780b4bf4d4e6d763
https://github.com/facebook/proxygen/commit/c8addc5ad4ebf73a2dbb8a00e0d9e68dfdf12cd7
https://github.com/facebookincubator/profilo/commit/9a9f1a849a33248fa4d7f06a100cfa73257de233
https://github.com/pytorch/fbgemm/commit/27cb280170fbf530033c4d0123e063e2f8bb50f3

Test Plan: n/a

Reviewed By: zpao

fbshipit-source-id: 73beec64bf9c17fa6c42dd09ea85350e8c9c66ea

[jit] Enable IValue to hold a PyObject (#32491)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32491

This PR enables IValue to be able to hold a pure PyObject by adding a
new enum tag, a new jit_type to denote PyObject existance in IValue and
the JIT type system. We don't and not plan to expose this to user.

This is the basic piece that enable ivalue to be adopted broader like
making RRef always hold IValue, it might also simplify some compiler
logic
ghstack-source-id: 97039980

Test Plan: Imported from OSS

Differential Revision: D19502234

fbshipit-source-id: 90be001706d707d376cfbea25980fd82980df84a

Fix race condition for to() backward that spans devices (#31930)

Summary:
While putting finishing touches on the gradient scaling PR (https://github.com/pytorch/pytorch/pull/26512), I discovered my multi-GPU test (which uses `to()` to transfer tensors between devices) was intermittently failing with bad numerics.  I knew it was going to be [a weird case from the start](https://www.imdb.com/title/tt8946378/quotes/qt4868203) and spent a week descending into madness.  It turns out, for backward ops that create gradients on a different device from the device on whose stream the op is executed, the streaming backward synchronizations in [input_buffer.cpp](https://github.com/pytorch/pytorch/blob/master/torch/csrc/autograd/input_buffer.cpp#L46-L83) do not properly tell later ops to wait on the population/creation of those gradients.  For example, a cross-device `to()` backward (CopyBackward Node) enqueues a cudaMemcpyAsync on the current stream of the source (incoming gradient's) device, then [syncs getCurrentCUDAStream on the destination device with the cudaMemcpyAsync](https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/cuda/Copy.cu#L76).  However, `input_buffer.cpp` in such cases ([case (3)](https://github.com/pytorch/pytorch/blob/master/torch/csrc/autograd/input_buffer.cpp#L77-L81)) was not properly telling `opt_consumer_stream` to wait on the current stream of the destination device (`var`'s device).

Circumstances needed to repro in current master (see [my test](https://github.com/pytorch/pytorch/compare/master...mcarilli:backward_to_race_fix#diff-e68a7bc6ba14f212e5e7eb3727394b40R1901)):
- 2 devices, with non-default streams used for forward-pass ops on both devices (which is the default behavior in test_cuda.py)
- A `to()` that transfers a tensor requiring grad from one device to another
- A backward pass that routes back through to()'s backward (aka CopyBackward).

Under these circumstances, backward ops following CopyBackward on CopyBackward's destination device (aka the original forward-pass source device) race with the device-to-device transfer, and execute using partially-transferred data.

The present PR fixes the race condition and ensures that later ops wait on the CopyBackward transfer.  This PR should also make streaming backward safe for other backward ops that span devices, as long as they play nice and populate any new gradients they create using the "current stream" of the device(s) on which they create those gradients.

There are a couple minor issues where I'm not sure of the best approach:
- Should we guard onto the var's device for the entire body of InputBuffer::add?
- I'm fairly sure we need to `recordStream` on `var` if the consumer stream is different from the stream on which (we expect) `var` was created, but calling `c10::cuda::CUDACachingAllocator::recordStream` in input_buffer.cpp might break CPU-only builds.  I couldn't find a different API call to record streams that seemed CPU-build-agnostic.  Could I wrap the call with a macro?

Thanks to mruberry for helpful suggestions and also the organization/naming of the stream pool and streaming backward code that allowed me to (just barely) wrap my head around the issue.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31930

Differential Revision: D19517617

Pulled By: mruberry

fbshipit-source-id: 183d5460aefa5d27366b465b0473b80ec80fa044

[Rowwise Pruning][c2 op] Add Quantile Op (#32448)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32448

Using binary search to compute the value for the given quantile among the input tensors.

Test Plan: Newly added unittests;

Reviewed By: jspark1105

Differential Revision: D19487604

fbshipit-source-id: 0dc6627b78d1310ac35b3f1d53b89cc89a697ece

[caffe2] use 2-stage EmbeddingSpMDM interface (#32271)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32271

Use the 2-stage EmbeddingSpMDM interface in D19425982 to reduce the overhead of code cache lookup and lock contention.
Fix an issue in sparse_lengths_sum_benchmarks generating empty indices when average length is small like 1.

Test Plan: CI

Reviewed By: dskhudia

Differential Revision: D19425987

fbshipit-source-id: d5c5f0d46e0072403901809c31d516fa0f4b9b31

Move pytorch distributed tests to separate folder for contbuild. (#30445)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30445

Create distributed and rpc directories under caffe/test for better management
of unit tests.

Differential Revision: D18702786

fbshipit-source-id: e9daeed0cfb846ef68806f6decfcb57c0e0e3606

[gloo] Skip registry warning (#31126)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31126

Gloo device creator registry is throwing warning that confuses users - https://fb.workplace.com/groups/1405155842844877/permalink/3217491788277931/
Create C10_DEFINE_SHARED_REGISTRY_WITHOUT_WARNING API to skip such warning

Test Plan:
{F224342749}

Tested both `C10_DEFINE_SHARED_REGISTRY` and `C10_DEFINE_SHARED_REGISTRY_WITHOUT_WARNING`.
Make sure nothing breaks

Reviewed By: d4l3k

Differential Revision: D18904783

fbshipit-source-id: 0e0065d530956249a18325d4ed3cb58dec255d4c

Raise error for code that risk deadlock (#32295)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32295

Fix for https://github.com/pytorch/pytorch/issues/32045

Calling into the engine with the GIL can deadlock because:
- worker thread initialization acquires the GIL
- Any Node / hook can be a python function that will acquire the GIL

The choice was made here to raise an error as one of the advantage of using cpp extensions with python is to be able to release the GIL. So we prefer to educate users to do it rather than doing it under the hook.

Test Plan: Imported from OSS

Differential Revision: D19430979

Pulled By: albanD

fbshipit-source-id: e43f57631885f12e573da0fc569c03a943cec519

[PyTorch BC] Clean up the whitelist for PyTorch Op BC check (#32523)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32523

remove stale items

Test Plan: cont build

Reviewed By: hl475

Differential Revision: D19526918

fbshipit-source-id: ee7392ae84e5ddf88284020775119e59c9b6533e

[quant][graphmode] Default to non-inplace in graph mode quantization API (#32204)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32204

att

Test Plan:
.

Imported from OSS

Differential Revision: D19508030

fbshipit-source-id: 94814c3c126a196f3938f944abfa5ae2a24d8dde

Fix nll_loss to support empty tensors on GPU (#31491)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31491

Fixes #31472

Test Plan: Imported from OSS

Differential Revision: D19537231

Pulled By: pbelevich

fbshipit-source-id: 20a43251a0f68a7a3557dd8234daee2d4814e5dd

Add unit test on export_opnames with interface. (#31531)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31531

As suggested by suo , add unit test on torch.jit.export_opnames with interface. A submodule is annotated as interface and assigned to an instance, and then re-assigned to another instance. Make sure the operator names are also updated.

Test Plan: Imported from OSS

Differential Revision: D19539129

Pulled By: iseeyuan

fbshipit-source-id: 71a76ae7790cdd577618ca278afdb132727f08dc

Support 3D attention mask in MultiheadAttention. (#31996)

Summary:
Support a 3D attention mask for MultiheadAttention. If `attn_mask` has the batch dimension, it will not be unsqueezed. Fix https://github.com/pytorch/pytorch/issues/30678
Relevant issues/pr:
https://github.com/pytorch/pytorch/pull/25359
https://github.com/pytorch/pytorch/issues/29520
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31996

Differential Revision: D19332816

Pulled By: zhangguanheng66

fbshipit-source-id: 3448af4b219607af60e02655affe59997ad212d9

[JIT] throw if no self arg on ignored methods (#32503)

Summary:
There was a user who did this and it would seg fault.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32503

Differential Revision: D19538481

Pulled By: eellison

fbshipit-source-id: dc3752028b9eff6ac88c025e8a2b5f8fd44ce32f

[quant][graphmode] Support quantizing shared ClassType with different qconfigs (#32205)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32205

to be filled

Test Plan:
python test_jit.py

Imported from OSS

Differential Revision: D19508031

fbshipit-source-id: cbf03d34e52eae62595c34fde6ec645cb6744ad9

no more build_pytorch_libs.sh/.bat (#32319)

Summary:
https://github.com/pytorch/pytorch/issues/12918
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32319

Differential Revision: D19544272

Pulled By: soumith

fbshipit-source-id: dd32fa61efa78af908f21c7e54cb6484bf895e54

Only run test_conv_large and test_conv_transposed_large_cuda on 32GB device (#32473)

Summary:
For some reason, these two tests start to fail on 16GB Volta on Linux...

Also fixes https://github.com/pytorch/pytorch/issues/31650
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32473

Differential Revision: D19538314

Pulled By: ngimel

fbshipit-source-id: 266195f19d8cf76b035795e0e318c152ae72adc2

[JIT] Passing custom class as arg (#32260)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32260

This makes it so you can actually pass the custom class as an arg to ScriptFunctions

Test Plan: Imported from OSS

Differential Revision: D19424252

Pulled By: jamesr66a

fbshipit-source-id: c3530186619655781dedbea03c2ad321aaff1cb8

[JIT] Test __getstate__ and __setstate__ for custom bound C++ classes

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32470

Test Plan: Imported from OSS

Differential Revision: D19508250

Pulled By: jamesr66a

fbshipit-source-id: 481299fb3c18fa874c2a1d2993984bb6b3193bac

[JIT] Fix custom class method binding for const methods

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32471

Test Plan: Imported from OSS

Differential Revision: D19508249

Pulled By: jamesr66a

fbshipit-source-id: 3a0bce6845072bb03567049a73b9982b54d8daf9

[JIT] Support returning tuple from custom bound C++ method

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32477

Test Plan: Imported from OSS

Differential Revision: D19509927

Pulled By: jamesr66a

fbshipit-source-id: 7d407150402cc19344c3ec3b4a27b3d7c464e8ac

[JIT] Add torch.classes.load_library

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32508

Test Plan: Imported from OSS

Differential Revision: D19525175

Pulled By: jamesr66a

fbshipit-source-id: b9f07113f551bdfb56d49d24d12989be2b8fc7e4

Revert "Remove __torch__ from custom class qualname" (#32514)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32514

This reverts commit c7fdf5b251c6fecd5d78b4f33d30bd77ca3f841c.

Test Plan: Imported from OSS

Differential Revision: D19525532

Pulled By: jamesr66a

fbshipit-source-id: 126f4e87250a2ac739bd7aa161a0f7b39f143d38

[quant] Re-enable test_nested that has different qconfig for shared ClassType (#32206)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32206

att

Test Plan:
python test/test_quantization.py

Imported from OSS

Differential Revision: D19508028

fbshipit-source-id: 5de3c2ef17de146feca03d7135a7e04f393de398

porting gather to ATen using TensorIterator with multithreading support. (#32425)

Summary:
Fixes [https://github.com/pytorch/pytorch/issues/24702](https://github.com/pytorch/pytorch/issues/24702).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32425

Differential Revision: D19538265

Pulled By: ngimel

fbshipit-source-id: 78821a16b6948916e956a04f984e0956f86cf582

[JIT] Remove capsule type handling of node hashing (#32540)

Summary:
Capsule Type doesn't appear in the IR, it is purely used in the runtime. So we should not have to handle it node hashing... Let's see if this breaks anything.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32540

Differential Revision: D19541357

Pulled By: eellison

fbshipit-source-id: 905ed9f89cf6d03b45ddb4fde02adfa149b477f8

Updating submodules

Summary:
GitHub commits:

https://github.com/facebook/fbthrift/commit/08e28edc08dea3b96bc5eab84c10efecee580133
https://github.com/facebook/folly/commit/6884ecfc6724b30f3f54899889f309f81650e125
https://github.com/facebook/mcrouter/commit/685144514fc59139189b75f7a1c3387a992670e2
https://github.com/pytorch/fbgemm/commit/ed665880aa9b017b04af40193a22bcc933ddabad

Test Plan: n/a

Reviewed By: zpao

fbshipit-source-id: 7b19dca06ad7e8751de21efc48f5eada37b446fb

[rpc] Remove template on RRef and add Type to RRef creation (#30630)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30630

This remove template and all the specializations it have in rpc, we
universally use IValue as the inner value since we support making python
object to be hold inside IValue.

This will also ensure that we have the correct type information when
creating the RRef, we use the return type from the schema when creating
userRRef and OwnerRRef, it will enable IValue to always have the correct
type if the IValue is the RRef object (next PR)

Test Plan: Imported from OSS

Differential Revision: D19502235

fbshipit-source-id: 0d5decae8a9767e0893f3b8b6456b231653be3c5

[pytorch][embeddingbag] Parallelize the EmbeddingBag operator (#…
here = os.path.abspath(__file__)
lib_path = os.path.join(os.path.dirname(here), 'lib', lib_name)

ctypes.CDLL(lib_path, mode=ctypes.RTLD_GLOBAL)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been going through this out of curiosity and it got me wondering if this doesn't lead to an eventual dlclose? Don't we have to stash this library handle somewhere?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -54,3 +54,4 @@ def get_source_lines_and_file(obj):

TEST_MASTER_ADDR = '127.0.0.1'
TEST_MASTER_PORT = 29500
USE_RTLD_GLOBAL_WITH_LIBTORCH = False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we have this constant if it's always false? Is this so that you can patch it in fbcode?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, fbcode shenanigans

ttumiel pushed a commit to ttumiel/pytorch that referenced this pull request Mar 4, 2020
Summary:
Fixes pytorch#31181 and pytorch#31162 (comment).
Pull Request resolved: pytorch#32215

Differential Revision: D19501869

Pulled By: ezyang

fbshipit-source-id: 363824e52d2592ad968ecf1df345aa4c0daff915
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants