Skip to content

Conversation

@jaliyae
Copy link
Owner

@jaliyae jaliyae commented Nov 12, 2018

Sampler reset method accepts a new size. This will be useful when the dataloader or dataset want to reset the sampler to use with a different epoch or chunk of the dataset.

struct TestIndexSampler : public samplers::Sampler<TestIndex> {
explicit TestIndexSampler(size_t size) : size_(size) {}
void reset() override {}
void reset(optional<size_t> new_size) override {}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we do this as with a default value? That way we don't need to change existing code.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't need to change existing code.
I mean we don't need to pass nullopt for code that doesn't need a size change.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is bit more complicated with templates. Having defaults at the Sampler does not mean the derived classes will have them when someone use it with the derived class. I did change the three samplers to take nullopt as default so that we don't have to change the tests.

jaliyae pushed a commit that referenced this pull request Nov 21, 2018
pytorch#14040)

Summary:
…2164)"

This reverts commit 4b7c615.
Pull Request resolved: pytorch#14040

Differential Revision: D13089531

Pulled By: yinghai

fbshipit-source-id: 2114b36111dab6f179c02921bbc9bd382ef461bf
gujinghui and others added 24 commits November 21, 2018 15:44
… reduce dim. (pytorch#12971)

Summary:
Add "axis" and "axis_w" arguments in FC to support customized axix to reduce dim.
Pull Request resolved: pytorch#12971

Reviewed By: bddppq

Differential Revision: D12850675

Pulled By: yinghai

fbshipit-source-id: f1cde163201bd7add53b8475329db1f038a73019
Summary:
Pull Request resolved: pytorch#14210

We left `SparseLengthsWeightedSum` as benchmark is not testing it due to fp16 filler issue. It was flushed out by unit tests. Hence we add the support here.

Reviewed By: bddppq

Differential Revision: D13132320

fbshipit-source-id: b21c30c185c9e1fbf3980641bc3cdc39e85af2e1
Summary:
This PR adds `aten::neq` for list inequality comparisons and converts
`nll_loss` to weak script
Pull Request resolved: pytorch#14129

Differential Revision: D13123894

Pulled By: driazati

fbshipit-source-id: 8c1edf7c163217ec00eb653f95d196db3998613f
Summary:
This somehow is not cleaned up after the C++ migration. Unused and can be removed.
Pull Request resolved: pytorch#14208

Differential Revision: D13132492

Pulled By: teng-li

fbshipit-source-id: 0f05b6368174664ebb2560c037347c8eb45f7c38
Summary: Pull Request resolved: pytorch#14206

Reviewed By: dzhulgakov

Differential Revision: D13131318

fbshipit-source-id: 559b55b8d98cdf6b7d1d3e31237c5473edc5e462
Summary:
First draft of an alias analysis pass. It's a big PR unfortunately; a rough table of contents/suggested order of review:
1. `AliasAnalysis` pass, which traverses the graph and builds an `AliasDb`. The basic strategy is to assign alias information to every value of mutable type (list/tuple/tensor), and use the alias annotations of each node's schema to assign alias info to the outputs based on the alias info the inputs. Nodes that aren't explicitly schematized have hand-written analysis rules.

2. Integration of aliasing information into `moveBefore/AfterTopologicallyValid()`. Basically, we pass in an alias DB when we ask for moveBefore/After. Similar to how we can boil down dependency analysis to "what nodes use this node", we can boil down mutability analysis to "what nodes write to an alias set input/output'd by this node".

3. Integration of alias analysis to optimization passes that need it. Right now, it is `GraphFuser`, `CreateAutodiffSubgraphs`, constant prop, and CSE. Not sure if any others need it.

- Testing; still figuring out the best way to do this.
- Eventually we want to integrate the alias db into the graph, but we shouldn't do that until we can guarantee that the information can stay up to date with mutations.
- Do the same thing `python_printer` did for operators and force people to register alias analyzers if they can't schematize their op.
Pull Request resolved: pytorch#14018

Differential Revision: D13144906

Pulled By: suo

fbshipit-source-id: 1bc964f9121a504c237cef6dfeea6b233694de6a
…torch#14271)

Summary:
This covers the very edgy case when we run the same NCCL process group with multiple GPU combinations instead of the last GPU combination. We always keep track of what GPUs have been used previously in the NCCL process group and barrier() itself will synchronize on each GPU's NCCL stream.

Test covered as well. Tested on 8-GPU machine
Pull Request resolved: pytorch#14271

Differential Revision: D13164993

Pulled By: teng-li

fbshipit-source-id: 81e04352740ea50b5e943369e74cfcba40bb61c1
Reviewed By: yns88

fbshipit-source-id: 366c29d09bec53459e2a4890c7fe8d10f45ff5c3
Reviewed By: yns88

fbshipit-source-id: ee60b4dddf688608ef80043b1dc336d120a045d0
Summary:
Cuda headers include cuda version in form of major.minor. But when we do find_package(cuda). CUDA_VERSION variable includes patch number as well which fails following condition.

`
if(NOT ${cuda_version_from_header} STREQUAL ${CUDA_VERSION})
`

**For example:**
I have cuda 10.0 installed. My nvcc output looks like this
`Cuda compilation tools, release 10.0, **V10.0.130**
`

If I compile my application with caffe2. It gives me following error:

```
CMake Error at /usr/share/cmake/Caffe2/public/cuda.cmake:59 (message):
  FindCUDA says CUDA version is (usually determined by nvcc), but the CUDA
  headers say the version is 10.0.  This often occurs when you set both
  CUDA_HOME and CUDA_NVCC_EXECUTABLE to non-standard locations, without also
  setting PATH to point to the correct nvcc.  Perhaps, try re-running this
  command again with PATH=/usr/local/cuda/bin:$PATH.  See above log messages
  for more diagnostics, and see
  pytorch#8092 for more details.
```

**In this case, it got failed because**
cuda_version_from_header = 10.0
CUDA_VERSION = 10.0.130 (Came from NVCC)

`if(NOT ${cuda_version_from_header} STREQUAL ${CUDA_VERSION})
`

**Fix:**
We should compare header version with **major.minor format** which is given by CUDA_VERSION_STRING
Pull Request resolved: pytorch#14302

Differential Revision: D13166485

Pulled By: soumith

fbshipit-source-id: 1b74e756a76c4cc5aa09978f5850f763ed5469b6
Summary:
Pull Request resolved: pytorch#14301

This diff removes quantization utility functions copied to fbgemm

Reviewed By: Maratyszcza

Differential Revision: D13159299

fbshipit-source-id: a7f3cd2af0aa241a8578d532a70a157da70d9289
Reviewed By: yns88

fbshipit-source-id: 20976d595e68a08d746d8806fd0205d810656366
Summary: Pull Request resolved: pytorch#14309

Reviewed By: soumith

Differential Revision: D13166626

Pulled By: JoelMarcey

fbshipit-source-id: 4f11228d8b5da85cec222bf11282722a7319581b
Summary: Pull Request resolved: pytorch#13691

Reviewed By: ezyang

Differential Revision: D12937090

fbshipit-source-id: fe9d21d5f7ea4e78e7e38ac60db13814a9971ed9
Summary:
Pull Request resolved: pytorch#13692

This now lives in c10/util, not ATen/core anymore.

Reviewed By: ezyang

Differential Revision: D12937091

fbshipit-source-id: ea2d420a15e7941a38d0b4c75e20ca18437c73f8
Summary:
Pull Request resolved: pytorch#14021

I'm planning to move at::Scalar to c10, and there's a at::toString(Scalar) defined.
Unfortunately, we call it by specifying at::toString() instead of relying on ADL.
This diff changes that to prepare the actual move.

Reviewed By: ezyang

Differential Revision: D13015239

fbshipit-source-id: f2a09f43a96bc5ef20ec2c4c88f7790fd5a04870
…h#14232)

Summary:
1. Support `Optional[BroadcastingList1[int]]` like type annotation to accept a int or a list[int]
2. Convert gumbel_softmax, lp pooling weak functions and modules
Pull Request resolved: pytorch#14232

Differential Revision: D13164506

Pulled By: wanchaol

fbshipit-source-id: 6c2a2b9a0613bfe907dbb5934122656ce2b05700
)

Summary:
Pull Request resolved: pytorch#14214

This is to pick up the residual task of T36325466 to make sure that input/output binding of c2 Onnxifi op is positional.

Reviewed By: dzhulgakov

Differential Revision: D13134470

fbshipit-source-id: d1b916dade65c79133b86507cd54ea5166fa6810
Summary:
Pull Request resolved: pytorch#13168

We now have a "using namespace c10" in the at and caffe2 namespaces, we don't need the individual ones anymore

Reviewed By: ezyang

Differential Revision: D11669870

fbshipit-source-id: fc2bb1008e533906914188da4b6eb30e7db6acc1
Reviewed By: yns88

fbshipit-source-id: e92b0c24a56b588dcf30542692cb4bdc2d474825
…heckpointed dropout (pytorch#14253)

Summary:
This issue was noticed, and fix proposed, by raulpuric.

Checkpointing is implemented by rerunning a forward-pass segment for each checkpointed segment during backward.  This can result in the RNG state advancing more than it would without checkpointing, which can cause checkpoints that include dropout invocations to lose end-to-end bitwise accuracy as compared to non-checkpointed passes.

The present PR contains optional logic to juggle the RNG states such that checkpointed passes containing dropout achieve bitwise accuracy with non-checkpointed equivalents.**  The user requests this behavior by supplying `preserve_rng_state=True` to `torch.utils.checkpoint` or `torch.utils.checkpoint_sequential`.

Currently, `preserve_rng_state=True` may incur a moderate performance hit because restoring MTGP states can be expensive.  However, restoring Philox states is dirt cheap, so syed-ahmed's [RNG refactor](pytorch#13070 (comment)), once merged, will make this option more or less free.

I'm a little wary of the [def checkpoint(function, *args, preserve_rng_state=False):](https://github.com/pytorch/pytorch/pull/14253/files#diff-58da227fc9b1d56752b7dfad90428fe0R75) argument-passing method (specifically, putting a kwarg after a variable argument list).  Python 3 seems happy with it.
Edit:  It appears Python 2.7 is NOT happy with a [kwarg after *args](https://travis-ci.org/pytorch/pytorch/builds/457706518?utm_source=github_status&utm_medium=notification).  `preserve_rng_state` also needs to be communicated in a way that doesn't break any existing usage.  I'm open to suggestions (a global flag perhaps)?

**Batchnorm may still be an issue, but that's a battle for another day.
Pull Request resolved: pytorch#14253

Differential Revision: D13166665

Pulled By: soumith

fbshipit-source-id: 240cddab57ceaccba038b0276151342344eeecd7
…14171)

Summary:
Currently, the `pin_memory_batch` function in the dataloader will return a batch comprised of any unrecognized type without pinning the data, because it doesn't know how.

This behavior was preventing us from overlapping data prefetching in Mask-RCNN, whose custom `collate_fn` returns a custom batch type.

The present PR adds the ability for the user to pass a `pin_fn` alongside any custom `collate_fn` to handle such custom types.
Pull Request resolved: pytorch#14171

Differential Revision: D13166669

Pulled By: soumith

fbshipit-source-id: ca965f9841d4a259b3ca4413c8bd0d8743d433ab
Summary:
1. Fix execution failure when some of the paths are not defined
2. Users can now optionally override install dir by setting `CMAKE_INSTALL_PREFIX`
Pull Request resolved: pytorch#14218

Differential Revision: D13180350

Pulled By: soumith

fbshipit-source-id: 8c9680d1285dbf08b49380af1ebfa43ede99babc
Summary:
This PR is deceptively large because of an indenting change. The actual change is small; I will highlight it inline
Pull Request resolved: pytorch#14325

Differential Revision: D13183296

Pulled By: suo

fbshipit-source-id: fcbf6d5317954694ec83e6b8cc1c989f2d8ac298
zrphercule and others added 24 commits November 27, 2018 13:51
Summary:
In pytorch#14239 we fixed ONNX_ATEN.
In order to make sure its correctness in the future, we should add related test case.
We use torch.fmod() to test ONNX_ATEN.
Pull Request resolved: pytorch#14259

Differential Revision: D13204610

Pulled By: zrphercule

fbshipit-source-id: e4660c346e5edd201f1458b7d74d7dfac49b94c7
Summary:
Pull Request resolved: pytorch#14279

att

Reviewed By: smessmer

Differential Revision: D13144472

fbshipit-source-id: af4d920a3148c648d1a428a5bcd56da19ea8c38c
… environment (pytorch#14416)

Summary:
`call "C:\\Program Files (x86)\\Microsoft Visual Studio\\2017\\Community\\VC\\Auxiliary\\Build\\vcvarsall.bat" x64` seems to change the working dir to `C:\Users\Administrator\source`, and we need to cd back to the PyTorch directory before running `git submodule update --init --recursive`
Pull Request resolved: pytorch#14416

Differential Revision: D13222269

Pulled By: yf225

fbshipit-source-id: a0eb3311fb11713b1bb8f52cd13e2c21d5ca9c7b
…hen fp16 is used for training

Summary: Pull Request resolved: pytorch#13768

Reviewed By: xianjiec

Differential Revision: D12996103

fbshipit-source-id: 5ca4cda4210f68ece2b5d6eced8cf52ee91fb36f
Summary:
This speeds-up "advanced" indexing (indexing a tensor by a tensor)
on CPU and GPU. There's still a bunch of work to do, including
speeding up indexing by a byte (boolean) mask and speeding up the derivative
calculation for advanced indexing.

Here's some speed comparisons to indexing on master using a little [benchmark script](https://gist.github.com/colesbury/c369db72aad594e5e032c8fda557d909) with 16 OpenMP threads and on a P100. The test cases are listed as (input shape -> output shape).

| Test case             | CPU (old vs. new)   | CUDA (old vs. new)     |
|-----------------------|---------------------|------------------------|
| 1024x1024 -> 512x1024 | 225 us vs. **57 us**  | 297 us vs. **47 us** |
| 1024x1024 -> 1024x512 | 208 us vs. **153 us** | 335 us vs. **54 us** |
| 50x50 -> 20000x50     | 617 us vs. **77 us**  | 239 us vs. **54 us** |
| 50x50 -> 50x20000     | 575 us vs. **236 us** | 262 us vs. **58 us** |
| 2x5x10 -> 10          | 65 us  vs. **18 us**  | 612 us vs. **93 us** |

See pytorch#11647
Pull Request resolved: pytorch#13420

Reviewed By: soumith

Differential Revision: D13088936

Pulled By: colesbury

fbshipit-source-id: 0a5c2ee9aa54e15f96d06692d1694c3b24b924e2
…pytorch#14430)

Summary:
pytorch#14431 tracks supporting this with CI
Pull Request resolved: pytorch#14430

Differential Revision: D13224079

Pulled By: anderspapitto

fbshipit-source-id: 47d7900d25910ed61585b93f9003acd1b2630a9f
Summary:
I'd like to NOT HIPify files that are not in a cuda/
directory, so hand-HIPify AccumulateType.h

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: pytorch#14412

Differential Revision: D13221801

Pulled By: ezyang

fbshipit-source-id: d1927cfc956e50a6a5e67168ac0e1ce56ecd1e0b
Summary:
Stacked on pytorch#14176, review only the last commit.
* Print parameters to methods as self.weight rather than as extra inputs.
* Print entire set of methods out as a single string
* Update test code to test the module-at-a-time export/import
Pull Request resolved: pytorch#14378

Differential Revision: D13198463

Pulled By: zdevito

fbshipit-source-id: 3fab02e8239cfd6f40d6ab6399047bd02cf0a8c8
Summary: Pull Request resolved: pytorch#14427

Differential Revision: D13222381

Pulled By: wanchaol

fbshipit-source-id: d90d210a810e95bf0eb404f9c1c304f4e6a3f61e
…ytorch#14130)

Summary:
When using `setuptools` to build a Python extension, setuptools will automatically add an ABI suffix like `cpython-37m-x86_64-linux-gnu` to the shared library name when using Python 3. This is required for extensions meant to be imported as Python modules. When we use setuptools to build shared libraries not meant as Python modules, for example libraries that define and register TorchScript custom ops, having your library called `my_ops.cpython-37m-x86_64-linux-gnu.so` is a bit annoying compared to just `my_ops.so`, especially since you have to reference the library name when loading it with `torch.ops.load_library` in Python.

This PR fixes this by adding a `with_options` class method to the `torch.utils.cpp_extension.BuildExtension` which allows configuring the `BuildExtension`. In this case, the first option we add is `no_python_abi_suffix`, which we then use in `get_ext_filename` (override from `setuptools.build_ext`) to throw away the ABI suffix.

I've added a test `setup.py` in a `no_python_abi_suffix_test` folder.

Fixes pytorch#14188

t-vi fmassa soumith
Pull Request resolved: pytorch#14130

Differential Revision: D13216575

Pulled By: goldsborough

fbshipit-source-id: 67dc345c1278a1a4ee4ca907d848bc1fb4956cfa
Summary:
Port AffineGrid to C++, because script does not support compiling Function classes.
Pull Request resolved: pytorch#14392

Differential Revision: D13219698

Pulled By: eellison

fbshipit-source-id: 3ddad8a84c72010b5a6c6f7f9712be614202faa6
Summary: Pull Request resolved: pytorch#14440

Differential Revision: D13226354

Pulled By: zdevito

fbshipit-source-id: e4ed023eece8b5b670a4a27d24a8688907b36b90
Summary:
This PR allows to overload functions based on the value of a parameter (so long as it is a constant). See max_pool1d for an example usage.

This is the first step in enabling the use of max_pool functions for the standard library that can return `Tensor` or `Tuple[Tensor, Tensor]` based on the `return_indices` flag. This will give the JIT identical results to the Python versions of the functions.

Fixes pytorch#14081
Pull Request resolved: pytorch#14425

Differential Revision: D13222104

Pulled By: driazati

fbshipit-source-id: 8cb676b8b13ebcec3262234698edf4a7d7dcbbe1
Summary: Pull Request resolved: pytorch#14420

Differential Revision: D13220726

Pulled By: driazati

fbshipit-source-id: 6c08a0050075beafcc8ba413c9603b273870c70c
Summary: Pull Request resolved: pytorch#13874

Differential Revision: D13223669

Pulled By: nairbv

fbshipit-source-id: 1678d52529c326fa4a0614d0994b1820ad12bc04
Summary: Pull Request resolved: pytorch#13915

Differential Revision: D13222110

Pulled By: nairbv

fbshipit-source-id: fcff1ad058fbf792d0fdf4aa75d77f22e3b7483b
Summary:
This PR adds weak modules for all activation modules and uses `test_nn` module tests to test weak modules that have been annotated with `weak_module` and therefore are in `torch._jit_internal._weak_types`

Also depends on pytorch#14379
Pull Request resolved: pytorch#14238

Differential Revision: D13192230

Pulled By: driazati

fbshipit-source-id: 36488960b6c91448b38c0fa65422539a93af8c5e
…h#14452)

Summary:
Fixed: pytorch#14445

Also bumped up timeout to 30 seconds, since on 8-GPU machines, DDP test will take more than 15 seconds sometimes.

Tested on 8 GPU machines:
```
tengli@learnfair062:~/pytorch/test$ python test_c10d.py --verbose
test_dist_broadcast_coalesced_gloo (__main__.DistributedDataParallelTest) ... ok
test_dist_broadcast_coalesced_nccl (__main__.DistributedDataParallelTest) ... skipped 'Test skipped due to known issues'
test_fp16 (__main__.DistributedDataParallelTest) ... ok
test_gloo_backend (__main__.DistributedDataParallelTest) ... ok
test_nccl_backend (__main__.DistributedDataParallelTest) ... ok
test_queue_reduction (__main__.DistributedDataParallelTest) ... ok
test_sync_params_no_buffers (__main__.DistributedDataParallelTest) ... ok
test_sync_params_with_buffers (__main__.DistributedDataParallelTest) ... ok
test_sync_reduction (__main__.DistributedDataParallelTest) ... ok
test_set_get (__main__.FileStoreTest) ... ok
test_set_get (__main__.PrefixFileStoreTest) ... ok
test_set_get (__main__.PrefixTCPStoreTest) ... ok
test_allgather_basics (__main__.ProcessGroupGlooTest) ... ok
test_allgather_checks (__main__.ProcessGroupGlooTest) ... ok
test_allreduce_basics (__main__.ProcessGroupGlooTest) ... ok
test_allreduce_basics_cuda (__main__.ProcessGroupGlooTest) ... ok
test_allreduce_checks (__main__.ProcessGroupGlooTest) ... ok
test_allreduce_stress (__main__.ProcessGroupGlooTest) ... ok
test_allreduce_stress_cuda (__main__.ProcessGroupGlooTest) ... ok
test_broadcast_basics (__main__.ProcessGroupGlooTest) ... ok
test_broadcast_basics_cuda (__main__.ProcessGroupGlooTest) ... ok
test_broadcast_checks (__main__.ProcessGroupGlooTest) ... ok
test_broadcast_stress (__main__.ProcessGroupGlooTest) ... ok
test_broadcast_stress_cuda (__main__.ProcessGroupGlooTest) ... ok
test_gather_basics (__main__.ProcessGroupGlooTest) ... ok
test_gather_checks (__main__.ProcessGroupGlooTest) ... ok
test_reduce_basics (__main__.ProcessGroupGlooTest) ... ok
test_reduce_checks (__main__.ProcessGroupGlooTest) ... ok
test_scatter_basics (__main__.ProcessGroupGlooTest) ... ok
test_scatter_checks (__main__.ProcessGroupGlooTest) ... ok
test_send_recv_all_to_all (__main__.ProcessGroupGlooTest) ... ok
test_timeout_kwarg (__main__.ProcessGroupGlooTest) ... ok
test_allgather_ops (__main__.ProcessGroupNCCLTest) ... ok
test_allreduce_ops (__main__.ProcessGroupNCCLTest) ... ok
test_barrier (__main__.ProcessGroupNCCLTest) ... ok
test_broadcast_ops (__main__.ProcessGroupNCCLTest) ... ok
test_reduce_ops (__main__.ProcessGroupNCCLTest) ... ok
test_common_errors (__main__.RendezvousEnvTest) ... ok
test_nominal (__main__.RendezvousEnvTest) ... ok
test_common_errors (__main__.RendezvousFileTest) ... ok
test_nominal (__main__.RendezvousFileTest) ... ok
test_common_errors (__main__.RendezvousTCPTest) ... ok
test_nominal (__main__.RendezvousTCPTest) ... ok
test_unknown_handler (__main__.RendezvousTest) ... ok
test_address_already_in_use (__main__.TCPStoreTest) ... ok
test_set_get (__main__.TCPStoreTest) ... ok

----------------------------------------------------------------------
Ran 46 tests in 162.980s

OK (skipped=1)
```
Pull Request resolved: pytorch#14452

Differential Revision: D13230652

Pulled By: teng-li

fbshipit-source-id: 88580fe55b3a4fbc7a499ca3b591958f11623bf8
Differential Revision:
D13192230

Original commit changeset: 36488960b6c9

fbshipit-source-id: 63b68bd909b9ef0548f52c986c84f549aecb8909
Summary:
The doc covers pretty much all we have had on distributed for PT1 stable release, tracked in pytorch#14080

Tested by previewing the sphinx generated webpages. All look good.
Pull Request resolved: pytorch#14444

Differential Revision: D13227675

Pulled By: teng-li

fbshipit-source-id: 752f00df096af38dd36e4a337ea2120ffea79f86
@jaliyae jaliyae closed this Nov 29, 2018
@jaliyae jaliyae deleted the jaliyae/samplers branch November 29, 2018 17:41
jaliyae pushed a commit that referenced this pull request Feb 22, 2019
Summary:
Currently there is a mismatch in naming between Python BatchNorm `running_var` and C++ BatchNorm `running_variance`, which causes JIT model parameters loading to fail (pytorch/vision#728 (comment)):
```
terminate called after throwing an instance of 'c10::Error'
  what():  No such serialized tensor 'running_variance' (read at /home/shahriar/Build/pytorch/torch/csrc/api/src/serialize/input-archive.cpp:27)
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 0x85 (0x7f2d92d32f95 in /usr/local/lib/libc10.so)
frame #1: torch::serialize::InputArchive::read(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, at::Tensor&, bool) + 0xdeb (0x7f2d938551ab in /usr/local/lib/libtorch.so.1)
frame #2: torch::nn::Module::load(torch::serialize::InputArchive&) + 0x98 (0x7f2d9381cd08 in /usr/local/lib/libtorch.so.1)
frame #3: torch::nn::Module::load(torch::serialize::InputArchive&) + 0xf9 (0x7f2d9381cd69 in /usr/local/lib/libtorch.so.1)
frame #4: torch::nn::Module::load(torch::serialize::InputArchive&) + 0xf9 (0x7f2d9381cd69 in /usr/local/lib/libtorch.so.1)
frame #5: torch::nn::operator>>(torch::serialize::InputArchive&, std::shared_ptr<torch::nn::Module> const&) + 0x32 (0x7f2d9381c7b2 in /usr/local/lib/libtorch.so.1)
frame pytorch#6: <unknown function> + 0x2b16c (0x5645f4d1916c in /home/shahriar/Projects/CXX/build-TorchVisionTest-Desktop_Qt_5_12_1_GCC_64bit-Debug/TorchVisionTest)
frame pytorch#7: <unknown function> + 0x27a3c (0x5645f4d15a3c in /home/shahriar/Projects/CXX/build-TorchVisionTest-Desktop_Qt_5_12_1_GCC_64bit-Debug/TorchVisionTest)
frame pytorch#8: <unknown function> + 0x2165c (0x5645f4d0f65c in /home/shahriar/Projects/CXX/build-TorchVisionTest-Desktop_Qt_5_12_1_GCC_64bit-Debug/TorchVisionTest)
frame pytorch#9: <unknown function> + 0x1540b (0x5645f4d0340b in /home/shahriar/Projects/CXX/build-TorchVisionTest-Desktop_Qt_5_12_1_GCC_64bit-Debug/TorchVisionTest)
frame pytorch#10: __libc_start_main + 0xf3 (0x7f2d051dd223 in /usr/lib/libc.so.6)
frame pytorch#11: <unknown function> + 0x1381e (0x5645f4d0181e in /home/shahriar/Projects/CXX/build-TorchVisionTest-Desktop_Qt_5_12_1_GCC_64bit-Debug/TorchVisionTest)
```
Renaming C++ BatchNorm `running_variance` to `running_var` should fix this problem.

This is a BC-breaking change, but it should be easy for end user to rename `running_variance` to `running_var` in their call sites.
Pull Request resolved: pytorch#17371

Reviewed By: goldsborough

Differential Revision: D14172775

Pulled By: yf225

fbshipit-source-id: b9d3729ec79272a8084269756f28a8f7c4dd16b6
jaliyae pushed a commit that referenced this pull request Apr 22, 2019
Summary:
Tracing models which attempts to return this in-place value doesn't turn out well.

I haven't run any tests to confirm the results to be honest, but regardless of the outcome, the operation happens in-place, so it should work as before.

Sample output from traced model attempting to set `max_norm` on `Embedding`:
```
a leaf Variable that requires grad has been used in an in-place operation. (check_inplace at /pytorch/torch/csrc/autograd/VariableTypeUtils.h:49)
frame #0: std::function<std::string ()>::operator()() const + 0x11 (0x7f0ecc5cc021 in /usr/local/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #1: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x2a (0x7f0ecc5cb8ea in /usr/local/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #2: <unknown function> + 0x38ab2f (0x7f0ecb55ab2f in /usr/local/lib/python3.7/site-packages/torch/lib/libtorch.so.1)
frame #3: torch::autograd::VariableType::embedding_renorm_(at::Tensor&, at::Tensor const&, double, double) const + 0x76 (0x7f0ecb5b5966 in /usr/local/lib/python3.7/site-packages/torch/lib/libtorch.so.1)
frame #4: <unknown function> + 0x56c958 (0x7f0ecb73c958 in /usr/local/lib/python3.7/site-packages/torch/lib/libtorch.so.1)
frame #5: <unknown function> + 0x672286 (0x7f0ecb842286 in /usr/local/lib/python3.7/site-packages/torch/lib/libtorch.so.1)
frame pytorch#6: torch::jit::InterpreterState::run(std::vector<c10::IValue, std::allocator<c10::IValue> >&) + 0x22 (0x7f0ecb83d842 in /usr/local/lib/python3.7/site-packages/torch/lib/libtorch.so.1)
frame pytorch#7: <unknown function> + 0x65c6ac (0x7f0ecb82c6ac in /usr/local/lib/python3.7/site-packages/torch/lib/libtorch.so.1)
frame pytorch#8: <unknown function> + 0x3c8ab4 (0x7f0f06bc0ab4 in /usr/local/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame pytorch#9: <unknown function> + 0x3ad2c3 (0x7f0f06ba52c3 in /usr/local/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame pytorch#10: <unknown function> + 0x11663e (0x7f0f0690e63e in /usr/local/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
<omitting python frames>
frame pytorch#39: python_call + 0x11 (0x5563c3c521c1 in uwsgi)
frame pytorch#40: uwsgi_request_wsgi + 0x100 (0x5563c3c54410 in uwsgi)
frame pytorch#41: wsgi_req_recv + 0xac (0x5563c3becabc in uwsgi)
frame pytorch#42: simple_loop_run + 0xc4 (0x5563c3c35be4 in uwsgi)
frame pytorch#43: simple_loop + 0x10 (0x5563c3c35a00 in uwsgi)
frame pytorch#44: uwsgi_ignition + 0x241 (0x5563c3c3a3a1 in uwsgi)
frame pytorch#45: uwsgi_worker_run + 0x275 (0x5563c3c3ec35 in uwsgi)
frame pytorch#46: <unknown function> + 0x8f22c (0x5563c3c3f22c in uwsgi)
frame pytorch#47: <unknown function> + 0x3c13e (0x5563c3bec13e in uwsgi)
frame pytorch#48: __libc_start_main + 0xf1 (0x7f0f138922e1 in /lib/x86_64-linux-gnu/libc.so.6)
frame pytorch#49: _start + 0x2a (0x5563c3bec16a in uwsgi)
:
operation failed in interpreter:
op_version_set = 0
def forward(self,
    input_1: Tensor) -> Tensor:
  _0 = torch.norm(self.item_embedding.weight, 2, 1, True)
  _1 = torch.div(self.item_embedding.weight, _0)
  m_weight = torch.t(_1)
  input_2 = torch.contiguous(input_1)
  weight_1 = torch.embedding_renorm_(self.item_embedding.weight, input_2, 1., 2.)
             ~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
  x = torch.embedding(weight_1, input_2, -1, False, False)
  input_3 = torch.div(x, torch.norm(x, 2, 2, True))
  max_batch_size = ops.prim.NumToTensor(torch.size(input_3, 0))
  hx = torch.zeros([2, int(max_batch_size), 70], dtype=6, layout=0, device=torch.device("cpu"))
  _2 = [self.lstm_layer.weight_ih_l0, self.lstm_layer.weight_hh_l0, self.lstm_layer.weight_ih_l1, self.lstm_layer.weight_hh_l1]
  input_4, _3, _4 = torch.lstm(input_3, [hx, hx], _2, False, 2, 0.10000000000000001, False, False, True)
  input = torch.matmul(input_4, torch.t(self.rnn2item.weight))
  tastevec = torch.div(input, torch.norm(input, 2, 2, True))
  outputs = torch.matmul(tastevec, m_weight)
```
Pull Request resolved: pytorch#18684

Differential Revision: D14782041

Pulled By: ezyang

fbshipit-source-id: 7b2fc19b7d5b6600263644498bb728319a19f39d
jaliyae pushed a commit that referenced this pull request Jun 19, 2019
Summary:
We have encountered `std::bad_cast` error when running PyTorch binary built with cxx11 abi on CentOS7, stack trace:
```
#0  0x00007fec10160207 in raise () from /lib64/libc.so.6
#1  0x00007fec101618f8 in abort () from /lib64/libc.so.6
#2  0x00007fec015767d5 in __gnu_cxx::__verbose_terminate_handler() () from /lib64/libstdc++.so.6
#3  0x00007fec01574746 in ?? () from /lib64/libstdc++.so.6
#4  0x00007fec01574773 in std::terminate() () from /lib64/libstdc++.so.6
#5  0x00007fec01574993 in __cxa_throw () from /lib64/libstdc++.so.6
pytorch#6  0x00007fec015c94d2 in std::__throw_bad_cast() () from /lib64/libstdc++.so.6
pytorch#7  0x00007feb2ab3c2d7 in std::__cxx11::numpunct<char> const& std::use_facet<std::__cxx11::numpunct<char> >(std::locale const&) ()
   from /root/.local/lib/python2.7/site-packages/torch/lib/libcaffe2.so
pytorch#8  0x00007feb28643d62 in torch::jit::script::strtod_c(char const*, char**) () from /root/.local/lib/python2.7/site-packages/torch/lib/libcaffe2.so
```

We are suspecting this line will get compiled to gcc abi dependent symbol:
```
char decimal_point = std::use_facet<std::numpunct<char>>(std::locale()).decimal_point();
```
Pull Request resolved: pytorch#21293

Differential Revision: D15609910

Pulled By: bddppq

fbshipit-source-id: e247059729863868e4b36d6fec4fcbc36fbc4bb1
jaliyae pushed a commit that referenced this pull request Jun 19, 2019
Summary:
Turing GPUs (compute capability 7.5) require CUDA10 to work properly.
We've seen some issues for these GPUs using PyTorch binaries with CUDA9 or older:
[Discussion Board #1](https://discuss.pytorch.org/t/cudnn-status-execution-failed-error/38575)
[Discussion Board #2](https://discuss.pytorch.org/t/cublas-runtime-error-on-gpu-running-but-works-on-cpu/46545/6)

Tested on using CUDA9 with an RTX 2080Ti.
Pull Request resolved: pytorch#21468

Differential Revision: D15696170

Pulled By: ezyang

fbshipit-source-id: ed43f4e4948d3f97ec8e7d7952110cbbfeafef2a
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.