Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ONNX] Add binary_cross_entropy_with_logits op to ONNX opset version 12 #49675

Merged
merged 232 commits into from
Jan 20, 2021

Commits on Dec 17, 2020

  1. Configuration menu
    Copy the full SHA
    170908b View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    203e181 View commit details
    Browse the repository at this point in the history

Commits on Dec 22, 2020

  1. Configuration menu
    Copy the full SHA
    29dd23f View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    f7c63eb View commit details
    Browse the repository at this point in the history

Commits on Dec 23, 2020

  1. [te] Fix bugs with shift operators (pytorch#49396)

    Summary:
    Pull Request resolved: pytorch#49396
    
    Pull Request resolved: pytorch#49271
    
    Two things:
    
    1. These throw exceptions in their constructor, which causes a segfault (*), so
       move the exceptions to ::make.
    2. They technically support FP types but the rules are complicated so let's not
       bother.
    
    (*) The reason for the segfault: all Exprs including these inherit from
    KernelScopedObject, whose constructor adds the object to a list for destruction
    at the end of the containing KernelArena's lifetime.  But if the derived-class
    constructor throws, the object is deleted even though it's still in the
    KernelArena's list.  So when the KernelArena is itself deleted, it double-frees
    the pointer and dies.  I've also fixed And, Or, and Xor in this diff.
    ghstack-source-id: 118594998
    
    Test Plan: `buck test //caffe2/test:jit`
    
    Reviewed By: bwasti
    
    Differential Revision: D25512052
    
    fbshipit-source-id: 42670b3be0cc1600dc5cda6811f7f270a2c88bba
    bertmaher authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    086fcf6 View commit details
    Browse the repository at this point in the history
  2. [static runtime] refine fusion group (pytorch#49340)

    Summary:
    Pull Request resolved: pytorch#49340
    
    This refines the fusion group to include on certain types of operations.  We cannot safely handle "canRunNatively" types and the memonger pass causes regressions on some internal models, so it was disabled (to be revisited with proper memory optimization once Tensor pools are implemented)
    
    Test Plan:
    ```
    buck test mode/no-gpu caffe2/test:static_runtime
    buck test //caffe2/benchmarks/static_runtime:static_runtime_cpptest
    ```
    
    Reviewed By: ZolotukhinM
    
    Differential Revision: D25520105
    
    fbshipit-source-id: add61d103e4f8b4615f5402e760893ef759a60a9
    bwasti authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    43aa3be View commit details
    Browse the repository at this point in the history
  3. [JIT] Support multiple outputs in subgraph matcher. (pytorch#48992)

    Summary: Pull Request resolved: pytorch#48992
    
    Differential Revision: D25388100
    
    Test Plan: Imported from OSS
    
    Reviewed By: heitorschueroff
    
    Pulled By: ZolotukhinM
    
    fbshipit-source-id: d95713af2220cf4f99ac92f59f8e5b902f2f3822
    Mikhail Zolotukhin authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    837ac43 View commit details
    Browse the repository at this point in the history
  4. [numpy] torch.{all/any} : output dtype is always bool (pytorch#47878)

    Summary:
    BC-breaking note:
    
    This PR changes the behavior of the any and all functions to always return a bool tensor. Previously these functions were only defined on bool and uint8 tensors, and when called on uint8 tensors they would also return a uint8 tensor. (When called on a bool tensor they would return a bool tensor.)
    
    PR summary:
    
    pytorch#44790 (comment)
    
    Fixes 2 and 3
    
    Also Fixes pytorch#48352
    
    Changes
    * Output dtype is always `bool` (consistent with numpy) **BC Breaking (Previously used to match the input dtype**)
    * Uses vectorized version for all dtypes on CPU
    * Enables test for complex
    * Update doc for `torch.all` and `torch.any`
    
    TODO
    * [x] Update docs
    * [x] Benchmark
    * [x] Raise issue on XLA
    
    Pull Request resolved: pytorch#47878
    
    Reviewed By: H-Huang
    
    Differential Revision: D25421263
    
    Pulled By: mruberry
    
    fbshipit-source-id: c6c681ef94004d2bcc787be61a72aa059b333e69
    kshitij12345 authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    4bdc202 View commit details
    Browse the repository at this point in the history
  5. Replace THError() check in THCTensorMathReduce.cu with C10_CUDA_KERNE…

    …L_LAUNCH_CHECK() (pytorch#49424)
    
    Summary:
    Pull Request resolved: pytorch#49424
    
    As per conversation in this [comment](https://www.internalfb.com/intern/diff/D25541113 (https://github.com/pytorch/pytorch/commit/e2510a0b60232aba5160ceb18b6ece8c59a9b79d)/?dest_fbid=393026838623691&transaction_id=3818008671564312) on D25541113 (pytorch@e2510a0), although THError does more than just log any errors associated cuda kernel launches, we're going to go ahead and replace it with C10_CUDA_KERNEL_LAUNCH_CHECK, so as to be consistent throughout the code base.
    Standardization FTW.
    
    This commit is purposefully sent in as a single file change so it can be easily reverted if it introduces a regression.
    
    Test Plan:
    Checked that the code still builds with
    ```
    buck build //caffe2/aten:ATen-cu
    ```
    Also ran basic aten tests
    ```
    buck test //caffe2/aten:atest
    ```
    
    Reviewed By: r-barnes
    
    Differential Revision: D25567863
    
    fbshipit-source-id: 1093bfe2b6ca6b9a3bfb79dcdc5d713f6025eb77
    Amogh Akshintala authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    6428593 View commit details
    Browse the repository at this point in the history
  6. Fix include files for out-of-tree compilation (pytorch#48827)

    Summary:
    Signed-off-by: caozhong <zhong.z.cao@intel.com>
    
    Pull Request resolved: pytorch#48827
    
    Reviewed By: agolynski
    
    Differential Revision: D25375988
    
    Pulled By: ailzhang
    
    fbshipit-source-id: a8d5ab4572d991d6d96dfe758011517651ff0a6b
    CaoZhongZ authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    3e6bdd1 View commit details
    Browse the repository at this point in the history
  7. Add flag torch_jit_disable_warning_prints to allow disabling all warn…

    …ings.warn (pytorch#49313)
    
    Summary:
    Adding a flag torch_jit_disable_warning_prints to optimize interpreter performance by suppressing (potentially large amount) of warnings.warn.
    
    This is to work around TorchScript's warning behavior mismatch with Python. Python by default triggers a warning once per location but TorchScript doesn't support it. This causes same warning to trigger and print once per inference run, hurting performance.
    
    Pull Request resolved: pytorch#49313
    
    Reviewed By: SplitInfinity
    
    Differential Revision: D25534274
    
    Pulled By: gmagogsfm
    
    fbshipit-source-id: eaeb57a335c3e6c7eb259671645db05d781e80a2
    gmagogsfm authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    9058e5f View commit details
    Browse the repository at this point in the history
  8. [DPER] Introduce barrier operation to force synchronization of thread…

    …s in async execution (pytorch#49322)
    
    Summary:
    Pull Request resolved: pytorch#49322
    
    In some cases async execution might loose dependencies (Alias like ops) or produce suboptimal scheduling when there is an option which parts to schedule first. Example of the later behavior can happen in ModelParallel training where copy can get lower priority compared to the rest of the execution on the given GPU, which will caused other GPUs to starve.
    
    This operator allows to address these issues by introducing extra explicit dependencies between ops.
    
    Test Plan:
    Unit-test/
    E2E testing in the future diffs.
    
    Reviewed By: xianjiec
    
    Differential Revision: D24933471
    
    fbshipit-source-id: 1668994c7856d73926cde022378a99e1e8db3567
    kennyhorror authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    f360b23 View commit details
    Browse the repository at this point in the history
  9. [FX] Rename Node._uses and refactor Node.all_input_nodes (pytorch#49415)

    Summary: Pull Request resolved: pytorch#49415
    
    Test Plan: Imported from OSS
    
    Reviewed By: zdevito
    
    Differential Revision: D25565341
    
    Pulled By: jamesr66a
    
    fbshipit-source-id: 2290ab62572632788809ba16319578bf0c0260ee
    James Reed authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    4c667a1 View commit details
    Browse the repository at this point in the history
  10. [PyTorch] Use plain old function pointer for RecordFunctionCallback (…

    …reapply) (pytorch#49408)
    
    Summary:
    Pull Request resolved: pytorch#49408
    
    Nearly every non-test callsite doesn't need to capture any variables anyway, and this saves 48 bytes per callback.
    ghstack-source-id: 118665808
    
    Test Plan:
    Wait for GitHub CI since we had C++14-specific issues with
    this one in previous PR pytorch#48629
    
    Reviewed By: malfet
    
    Differential Revision: D25563207
    
    fbshipit-source-id: 6a2831205917d465f8248ca37429ba2428d5626d
    swolchok authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    1c9a0bf View commit details
    Browse the repository at this point in the history
  11. [CMake] Use libtorch_cuda list defined in bzl file (pytorch#49429)

    Summary:
    Since NCCL is an optional CUDA dependency, remove nccl.cpp from the core filelist
    
    Pull Request resolved: pytorch#49429
    
    Reviewed By: nikithamalgifb
    
    Differential Revision: D25569883
    
    Pulled By: malfet
    
    fbshipit-source-id: 61371a4c6b0438e4e0a7f094975b9a9f9ffa4032
    malfet authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    4558c13 View commit details
    Browse the repository at this point in the history
  12. update breathe (pytorch#49407)

    Summary:
    Fixes pytorch#47462, but not completely.
    
    Update breathe to the latest version to get fixes for the "Unable to resolve..." issues. There are still some build errors, but much fewer than before.
    
    Pull Request resolved: pytorch#49407
    
    Reviewed By: izdeby
    
    Differential Revision: D25562163
    
    Pulled By: glaringlee
    
    fbshipit-source-id: 91bfd9e9ac70723816309f489022d72853f5fdc5
    mattip authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    6275612 View commit details
    Browse the repository at this point in the history
  13. [StaticRuntime] Permute_out (pytorch#49447)

    Summary:
    Pull Request resolved: pytorch#49447
    
    Adding an out variant for `permute`. It's better than fixing the copy inside contiguous because 1) we can leverage the c2 math library, 2) contiguous creates a tensor inside the function which isn't managed by the MemoryPlanner in StaticRuntime
    
    Test Plan:
    Benchmark:
    ```
    After:
    I1214 12:35:32.218775 991920 PyTorchPredictorBenchLib.cpp:209] PyTorch run finished. Milliseconds per iter: 0.0902339. Iters per second: 11082.3
    
    Before:
    I1214 12:35:43.368770 992620 PyTorchPredictorBenchLib.cpp:209] PyTorch run finished. Milliseconds per iter: 0.0961521. Iters per second: 10400.2
    ```
    
    Reviewed By: yinghai
    
    Differential Revision: D25541666
    
    fbshipit-source-id: 013ed0d4080cd01de4d3e1b031ab51e5032e6651
    Hao Lu authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    7439f10 View commit details
    Browse the repository at this point in the history
  14. fix optimizer.pyi typo 'statue'->'state' (pytorch#49388)

    Summary: Pull Request resolved: pytorch#49388
    
    Test Plan: Imported from OSS
    
    Reviewed By: zou3519
    
    Differential Revision: D25553672
    
    Pulled By: glaringlee
    
    fbshipit-source-id: e9f2233bd678a90768844af2d8d5e2994d59e304
    lixinyu authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    edea937 View commit details
    Browse the repository at this point in the history
  15. [StaticRuntime] Fusion pass for ClipRanges/GatherRanges/LengthsToOffs…

    …ets (pytorch#49113)
    
    Summary: Pull Request resolved: pytorch#49113
    
    Reviewed By: ajyu
    
    Differential Revision: D25388512
    
    fbshipit-source-id: 3daa5b9387a3a10b6c220688df06540c4d844aea
    Hao Lu authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    1f1c0f5 View commit details
    Browse the repository at this point in the history
  16. quantized tensor: add preliminary support for advanced indexing, try 2 (

    pytorch#49346)
    
    Summary:
    Pull Request resolved: pytorch#49346
    
    This is less ambitious redo of
    pytorch#49129.
    
    We make the
    
    ```
    xq_slice = xq[:, [0], :, :]
    ```
    
    indexing syntax work if `xq` is a quantized Tensor.  For now, we are
    making the code not crash, with an in efficient `dq -> index -> q`
    implementation.  A future PR can optimize performance by removing
    the unnecessary memory copies (which will require some non-trivial
    changes to TensorIterator).
    
    Test Plan:
    ```
    python test/test_quantization.py TestQuantizedOps.test_advanced_indexing
    ```
    
    Imported from OSS
    
    Reviewed By: jerryzh168
    
    Differential Revision: D25539365
    
    fbshipit-source-id: 98485875aaaf5743e1a940e170258057691be4fa
    vkuzo authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    b1547e4 View commit details
    Browse the repository at this point in the history
  17. Unescape string in RPC error message (pytorch#49373)

    Summary:
    Pull Request resolved: pytorch#49373
    
    Unescaping the string in RPC error message to provide better error msg
    
    Test Plan: CI
    
    Reviewed By: xush6528
    
    Differential Revision: D25511730
    
    fbshipit-source-id: 054f46d5ffbcb1350012362a023fafb1fe57fca1
    rohan-varma authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    28a5455 View commit details
    Browse the repository at this point in the history
  18. [StaticRuntime][ATen] Add out variant for narrow_copy (pytorch#49449)

    Summary:
    Pull Request resolved: pytorch#49449
    
    Similar to permute_out, add the out variant of `aten::narrow` (slice in c2) which does an actual copy. `aten::narrow` creates a view, however, an copy is incurred when we call `input.contiguous` in the ops that follow `aten::narrow`, in `concat_add_mul_replacenan_clip`, `casted_batch_one_hot_lengths`, and `batch_box_cox`.
    
    {F351263599}
    
    Test Plan:
    Unit test:
    
    ```
    buck test //caffe2/aten:native_test
    ```
    Benchmark with the adindexer model:
    ```
    bs = 1 is neutral
    
    Before:
    I1214 21:32:51.919239 3285258 PyTorchPredictorBenchLib.cpp:209] PyTorch run finished. Milliseconds per iter: 0.0886948. Iters per second: 11274.6
    After:
    I1214 21:32:52.492352 3285277 PyTorchPredictorBenchLib.cpp:209] PyTorch run finished. Milliseconds per iter: 0.0888019. Iters per second: 11261
    
    bs = 20 shows more gains probably because the tensors are bigger and therefore the cost of copying is higher
    
    Before:
    I1214 21:20:19.702445 3227229 PyTorchPredictorBenchLib.cpp:209] PyTorch run finished. Milliseconds per iter: 0.527563. Iters per second: 1895.51
    After:
    I1214 21:20:20.370173 3227307 PyTorchPredictorBenchLib.cpp:209] PyTorch run finished. Milliseconds per iter: 0.508734. Iters per second: 1965.67
    ```
    
    Reviewed By: bwasti
    
    Differential Revision: D25554109
    
    fbshipit-source-id: 6bae62e6ce3456ff71559b635cc012fdcd1fdd0e
    Hao Lu authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    bc97e02 View commit details
    Browse the repository at this point in the history
  19. Revert D25554109: [StaticRuntime][ATen] Add out variant for narrow_copy

    Test Plan: revert-hammer
    
    Differential Revision:
    D25554109 (pytorch@ed04b71)
    
    Original commit changeset: 6bae62e6ce34
    
    fbshipit-source-id: bfa038e150166d0116bcae8f7a6415d98d4146de
    Hao Lu authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    1479e05 View commit details
    Browse the repository at this point in the history
  20. Making ops c10 full: optional out arguments (pytorch#49083)

    Summary:
    Pull Request resolved: pytorch#49083
    
    We have some (but very few) ops that take optional out arguments `Tensor(a!)? out`.
    This PR makes them non-optional mandatory arguments and enables c10-fullness for them.
    There is only a very small number of ops affected by this.
    
    Putting this up for discussion.
    
    Alternatives considered:
    If we keep them optional, we run into lots of issues in the dispatcher. We have to decide what the dispatcher calling convention for this argument type should be.
    1) If we keep passing them in as `Tensor&` arguments and return them as `tuple<Tensor&, Tensor&, Tensor&>`, so basically same as currently, then the schema inference check will say "Your kernel function got inferred to have a `Tensor` argument but your native_functions.yaml declaration says `Tensor?`. This is a mismatch, you made an error". We could potentially disable that check, but that would open the door for real mistakes to not be reported anymore in the future. This sounds bad.
    2) If we change them to a type that schema inference could differentiate from `Tensor`, say we pass them in as `const optional<Tensor>&` and return them as `tuple<const optional<Tensor>&, const optional<Tensor>&, const optional<Tensor>&>`, then our boxing logic fails because it can't recognize those as out overloads anymore and shortcut the return value as it is doing right now. We might be able to rewrite the boxing logic, but that could be difficult and could easily develop into a rabbit hole of having to clean up `Tensor&` references throughout the system where we use them.
    
    Furthermore, having optional out arguments in C++ doesn't really make sense. the C++ API puts them to the front of the argument list, so you can't omit them anyways when calling an op.
    You would be able to omit them when calling from Python with out kwargs, but not sure if we want that discrepancy between the c++ and python API.
    ghstack-source-id: 118660075
    
    Test Plan: waitforsandcastle
    
    Reviewed By: ezyang
    
    Differential Revision: D25422197
    
    fbshipit-source-id: 3cb25c5a3d93f9eb960d70ca014bae485be9f058
    smessmer authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    00e3716 View commit details
    Browse the repository at this point in the history
  21. Making ops c10-full: optional lists (pytorch#49088)

    Summary:
    Pull Request resolved: pytorch#49088
    
    We had special case logic to support `int[]?` and `double[]?` but nothing for `DimnameList[]?`.
    This PR generalizes the logic to support optional lists so it should now work with all types.
    It also enables c10-fullness for ops that were blocked by this.
    
    Note that using these arguments in a signature was always and still is expensive because the whole list needs to be copied.
    We should probably consider alternatives in the future like for example using `torch::List` instead of `ArrayRef`, that could work without copying the list.
    ghstack-source-id: 118660071
    
    Test Plan: waitforsandcastle
    
    Reviewed By: ezyang
    
    Differential Revision: D25423901
    
    fbshipit-source-id: dec58dc29f3bb4cbd89e2b95c42da204a9da2e0a
    smessmer authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    bdfa87e View commit details
    Browse the repository at this point in the history
  22. [PyTorch] Avoid move-constructing a List in listConstruct (pytorch#49355

    )
    
    Summary:
    Pull Request resolved: pytorch#49355
    
    List's move ctor is a little bit more expensive than you might expect, but we can easily avoid it.
    ghstack-source-id: 118624596
    
    Test Plan: Roughly 1% improvement on internal benchmark.
    
    Reviewed By: hlu1
    
    Differential Revision: D25542190
    
    fbshipit-source-id: 08532642c7d1f1604e16c8ebefd1ed3e56f7c919
    swolchok authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    076d62f View commit details
    Browse the repository at this point in the history
  23. Enhanced generators with grad-mode decorators (pytorch#49017)

    Summary:
    This PR addresses the feature request outlined in pytorch#48713 for two-way communication with enhanced generators from [pep-342](https://www.python.org/dev/peps/pep-0342/).
    
    Briefly, the logic of the patch resembles `yield from` [pep-380](https://www.python.org/dev/peps/pep-0380/), which cannot be used, since the generator **must be interacted with from within the grad-mode context**, while yields from the decorator **must take place outside of the context**. Hence any interaction with the wrapped generator, be it via [.send](https://docs.python.org/3/reference/expressions.html?highlight=throw#generator.send), [.throw](https://docs.python.org/3/reference/expressions.html?highlight=throw#generator.throw), and even [.close](https://docs.python.org/3/reference/expressions.html?highlight=throw#generator.close) must be wrapped by a `with` clause. The patch is compatible with `for i in gen: pass` and `next(gen)` use cases and allows two-way communication with the generator via `.send <-> yield` points.
    
    ### Logic
    At lines [L37-L38](https://github.com/ivannz/pytorch/blob/2d40296c0c6617b3980c86762be466c995aa7f8e/torch/autograd/grad_mode.py#L37-L38) we (the decorator) **start the wrapped generator** (coroutine) by issuing `None` into it (equivalently, we can use `next(get)` here). Then we **dispatch responses of the generator** to our ultimate caller and **relay the latter's requests** into the generator in the loop on lines [L39-L52](https://github.com/ivannz/pytorch/blob/2d40296c0c6617b3980c86762be466c995aa7f8e/torch/autograd/grad_mode.py#L39-L52).
    
    We yield the most recent response on [L40-L41](https://github.com/ivannz/pytorch/blob/2d40296c0c6617b3980c86762be466c995aa7f8e/torch/autograd/grad_mode.py#L40-L41), at which point we become **paused**, waiting for the next ultimate caller's interaction with us. If the caller **sends us a request**, then we become unpaused and move to [L51-L52](https://github.com/ivannz/pytorch/blob/2d40296c0c6617b3980c86762be466c995aa7f8e/torch/autograd/grad_mode.py#L51-L52) and **forward it into the generator**, at which point we pause, waiting for its response. The response might be a value, an exception or a `StopIteration`. In the case of an exception from the generator, we let it **bubble up** from the immediately surrounding [except clause](https://docs.python.org/3/reference/compound_stmts.html#the-try-statement)  to the ultimate caller through the [outer try-except](https://github.com/ivannz/pytorch/blob/2dc287bba87fa6f05c49446c0239ffdcdb1e896e/torch/autograd/grad_mode.py#L36-L54). In the case of a `StopIteration`, we **take it's payload and propagate it** to the caller via [return](https://github.com/ivannz/pytorch/blob/2d40296c0c6617b3980c86762be466c995aa7f8e/torch/autograd/grad_mode.py#L54). In the case of a value, the flow and the loop continues.
    
    The caller **throwing an exception at us** is handled much like a proper request, except for the exception playing the role of the request. In this case we **forward it into the generator** on lines [L47-L49](https://github.com/ivannz/pytorch/blob/2d40296c0c6617b3980c86762be466c995aa7f8e/torch/autograd/grad_mode.py#L47-L49) and await its response. We explicitly **advance** the traceback one frame up, in order to indicate the **source of the exception within the generator**.
    
    Finally the `GeneratorExit` is handled on lines [L42-L45](https://github.com/ivannz/pytorch/blob/2d40296c0c6617b3980c86762be466c995aa7f8e/torch/autograd/grad_mode.py#L42-L45) and closes the generator.
    
    Updates: clarified exception propagation
    
    Pull Request resolved: pytorch#49017
    
    Reviewed By: izdeby
    
    Differential Revision: D25567796
    
    Pulled By: albanD
    
    fbshipit-source-id: 801577cccfcb2b5e13a08e77faf407881343b7b0
    ivannz authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    197266d View commit details
    Browse the repository at this point in the history
  24. webdataset prototype - ListDirFilesIterableDataset (pytorch#48944)

    Summary:
    Pull Request resolved: pytorch#48944
    
    This is a stack PR for webdataset prototype. I am trying to make each stack a separate dataset.
    To make the implementation simple, each dataset will only support the basic functionality.
    
    - [x] ListDirFilesDataset
    - [x] LoadFilesFromDiskIterableDataset
    - [x] ReadFilesFromTarIterableDataset
    - [x] ReadFilesFromZipIterableDataset
    - [x] RoutedDecoderIterableDataset
    
    Test Plan: Imported from OSS
    
    Reviewed By: izdeby
    
    Differential Revision: D25541277
    
    Pulled By: glaringlee
    
    fbshipit-source-id: 9e738f6973493f6be1d5cc1feb7a91513fa5807c
    lixinyu authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    5165093 View commit details
    Browse the repository at this point in the history
  25. webdataset prototype - LoadFilesFromDiskIterableDataset (pytorch#48955)

    Summary: Pull Request resolved: pytorch#48955
    
    Test Plan: Imported from OSS
    
    Reviewed By: izdeby
    
    Differential Revision: D25541393
    
    Pulled By: glaringlee
    
    fbshipit-source-id: dea6ad64a7ba40abe45612d99f078b14d1da8bbf
    lixinyu authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    9745362 View commit details
    Browse the repository at this point in the history
  26. CUDA BFloat embedding (pytorch#44848)

    Summary: Pull Request resolved: pytorch#44848
    
    Reviewed By: izdeby
    
    Differential Revision: D25574204
    
    Pulled By: ngimel
    
    fbshipit-source-id: b35f7253a6ad2b83f7b6b06862a5ab77295373e0
    zasdfgbnm authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    bf3d1b4 View commit details
    Browse the repository at this point in the history
  27. Instantiate PackedConvWeight to avoid linking error (pytorch#49442)

    Summary:
    Pull Request resolved: pytorch#49442
    
    When moving Aten/native to app level, symbols from native/quantized may sit in a target away from some of its call sites. As a result, there are linking errors of missing symbols of instantiations of PackedConvWeight::prepack. The solution is to instantiate PackedConvWeight in the same compilation unit. It's similar to D24941989 (pytorch@fe6bb2d).
    ghstack-source-id: 118676374
    
    Test Plan: CI
    
    Reviewed By: dhruvbird
    
    Differential Revision: D25576703
    
    fbshipit-source-id: d6e3d11d51d8172ab8487ce44ec8c042889f0f11
    iseeyuan authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    a213e48 View commit details
    Browse the repository at this point in the history
  28. .circleci: downgrade conda-package-handling to 1.6.0 (pytorch#49434)

    Summary:
    Pull Request resolved: pytorch#49434
    
    There was a bug that was introduced in conda-package-handling >= 1.6.1 that makes archives
    above a certain size fail out when attempting to extract
    see: conda/conda-package-handling#71
    
    coincides with pytorch/builder#611
    
    Signed-off-by: Eli Uriegas <eliuriegas@fb.com>
    
    Test Plan: Imported from OSS
    
    Reviewed By: xuzhao9, janeyx99, samestep
    
    Differential Revision: D25573390
    
    Pulled By: seemethere
    
    fbshipit-source-id: 82173804f1b30da6e4b401c4949e2ee52065e149
    seemethere authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    d73c1f4 View commit details
    Browse the repository at this point in the history
  29. [Docs] Updating init_process_group docs to indicate correct rank range (

    pytorch#49131)
    
    Summary:
    Pull Request resolved: pytorch#49131
    
    Users frequently assume the correct range of ranks is 1 ...
    `world_size`. This PR udpates the docs to indicate that the correct rank range
    users should specify is 0 ... `world_size` - 1.
    
    Test Plan: Rendering and Building Docs
    
    Reviewed By: mrshenli
    
    Differential Revision: D25410532
    
    fbshipit-source-id: fe0f17a4369b533dc98543204a38b8558e68497a
    osalpekar authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    98c4a4d View commit details
    Browse the repository at this point in the history
  30. [c10d Store] Store Python Docs Fixes (pytorch#49130)

    Summary:
    Pull Request resolved: pytorch#49130
    
    The Python Store API docs had some typos, where boolean value were
    lower case, which is incorrect Python syntax. This diff fixes those typos.
    
    Test Plan: Built and Rendered Docs
    
    Reviewed By: mrshenli
    
    Differential Revision: D25411492
    
    fbshipit-source-id: fdbf1e6b8f81e9589e638286946cad68eb7c9252
    osalpekar authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    e9c93eb View commit details
    Browse the repository at this point in the history
  31. Add sinc operator (pytorch#48740)

    Summary:
    Implements the sinc operator.
    See https://numpy.org/doc/stable/reference/generated/numpy.sinc.html
    
    ![image](https://user-images.githubusercontent.com/13428986/101653855-cdffa080-3a0d-11eb-8426-ecc81c152ebd.png)
    
    Pull Request resolved: pytorch#48740
    
    Reviewed By: izdeby
    
    Differential Revision: D25564477
    
    Pulled By: soulitzer
    
    fbshipit-source-id: 13f36a2b84dadfb4fd1442a2a40a3a3246cbaecb
    soulitzer authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    6cb4910 View commit details
    Browse the repository at this point in the history
  32. Revert "Revert D24923679: Fixed einsum compatibility/performance issu…

    …es (pytorch#46398)" (pytorch#49189)
    
    Summary:
    Pull Request resolved: pytorch#49189
    
    This reverts commit d307601 and fixes the bug with diagonals and ellipsis combined.
    
    Test Plan: Imported from OSS
    
    Reviewed By: glaringlee
    
    Differential Revision: D25540722
    
    Pulled By: heitorschueroff
    
    fbshipit-source-id: 86d0c9a7dcfda600b546457dad102af2ff33e353
    heitorschueroff authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    7b4218c View commit details
    Browse the repository at this point in the history
  33. [caffe2][autograd] Avoid extensive -Wunused-variable warnings on _any…

    …_requires_grad (pytorch#49167)
    
    Summary:
    Pull Request resolved: pytorch#49167
    
    Building with clang and a fair warning level can result in hundreds of lines of compiler output of the form:
    ```
    caffe2\gen_aten_libtorch\autograd\generated\VariableType_1.cpp(2279,8): warning: unused variable '_any_requires_grad' [-Wunused-variable]
       auto _any_requires_grad = compute_requires_grad( self );
            ^
    caffe2\gen_aten_libtorch\autograd\generated\VariableType_1.cpp(2461,8): warning: unused variable '_any_requires_grad' [-Wunused-variable]
       auto _any_requires_grad = compute_requires_grad( grad_output, self );
            ^
    caffe2\gen_aten_libtorch\autograd\generated\VariableType_1.cpp(2677,8): warning: unused variable '_any_requires_grad' [-Wunused-variable]
       auto _any_requires_grad = compute_requires_grad( self );
            ^
    ...
    ```
    This happens when requires_derivative == False. Let's mark `_any_requires_grad` as potentially unused. If this were C++17 we would use `[[maybe_unused]]` but to retain compatibility with C++11 we just mark it with `(void)`.
    
    Test Plan: CI + locally built
    
    Reviewed By: ezyang
    
    Differential Revision: D25421548
    
    fbshipit-source-id: c56279a184b1c616e8717a19ee8fad60f36f37d1
    jdonald authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    6a56da9 View commit details
    Browse the repository at this point in the history
  34. Revert D25421263: [pytorch][PR] [numpy] torch.{all/any} : output dtyp…

    …e is always bool
    
    Test Plan: revert-hammer
    
    Differential Revision:
    D25421263 (pytorch@c508e5b)
    
    Original commit changeset: c6c681ef9400
    
    fbshipit-source-id: 4c0c9acf42b06a3ed0af8f757ea4512ca35b6c59
    ngimel authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    5125131 View commit details
    Browse the repository at this point in the history
  35. Reland "Add test for empty tensors for batch matmuls" (pytorch#48797)

    Summary:
    This reverts commit c7746ad.
    
    Fixes #{issue number}
    
    Pull Request resolved: pytorch#48797
    
    Reviewed By: mruberry
    
    Differential Revision: D25575264
    
    Pulled By: ngimel
    
    fbshipit-source-id: c7f3b384db833d727bb5bd8a51f1493a13016d09
    zasdfgbnm authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    c7ce84b View commit details
    Browse the repository at this point in the history
  36. Adding support for CuDNN-based LSTM with projections (pytorch#47725)

    Summary:
    Fixes pytorch#46213
    
    I didn't yet update the documentation, will add those change soon. A few other things that I didn't do, but want to clarify if I maybe should.
    
    1. I didn't expose projections in c++ API: torch/csrc/api/src/nn/modules/rnn.cpp. Let me know if this is desirable and I will add those changes.
    2. I didn't expose projections in "lstm_cell" function and "_thnn_differentiable_lstm_cell_backward" functions from aten/src/ATen/native/RNN.cpp. As far as I understand, they are not needed for nn.LSTM CPU execution. For lstm_cell, projections don't bring any real benefit, since if cell is used separately, it can be easily added in Python. For "_thnn_differentiable_lstm_cell_backward", I'm actually not sure where exactly that function is used, so I also disabled projections there for now. Please let me know if I should change that.
    3. I added check that projections are not supported for quantized LSTMs to quantized_lstm_<data/input> functions. But I didn't add any checks to LSTMCell code. It seems that since I disabled projections in "lstm_cell" function, they should also not be available for quantized models through any other API than quantized_lstm_<data/input>. Please let me know if I'm not correct and I will add checks to other places.
    4. Projections are not supported for CuDNN versions < 7.1.2. Should I add the check for CuDNN version and disable projections in that case? If so, what will be the best way to do that?
    5. Currently I added projection weight as the last weight, so the layout is "w_ih, w_hh, b_ih, b_hh, w_hr". This breaks the assumption that biases come after weights and thus I had to add additional if-s in various places. Alternative way would be to have "w_ih, w_hh, w_hr, b_ih, b_hh" layout, in which case the assumption will be true. But in that case I will need to split the loop in get_parameters function from aten/src/ATen/native/cudnn/RNN.cpp. And in some cases, I will still need to add an "undefined" tensor in the 3rd position, because we get all 5 weights from CuDNN most of the time. So I'm not sure which way is better. Let me know if you think I should change to the weights-then-biases layout.
    
    Pull Request resolved: pytorch#47725
    
    Reviewed By: zou3519
    
    Differential Revision: D25449794
    
    Pulled By: ngimel
    
    fbshipit-source-id: fe6ce59e481d1f5fd861a8ff7fa13d1affcedb0c
    Igor Gitman authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    1352101 View commit details
    Browse the repository at this point in the history
  37. Move inplace_is_vmap_compatible to BatchedTensorImpl.h (pytorch#49118)

    Summary:
    Pull Request resolved: pytorch#49118
    
    I need this in the next stack up. It seems useful to have as a helper
    function.
    
    Test Plan: - run tests
    
    Reviewed By: izdeby
    
    Differential Revision: D25563546
    
    Pulled By: zou3519
    
    fbshipit-source-id: a4031fdc4b2373cc230ba3c66738d91dcade96e2
    zou3519 authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    0991d63 View commit details
    Browse the repository at this point in the history
  38. Update accumulate_grad to support vmap (pytorch#49119)

    Summary:
    Pull Request resolved: pytorch#49119
    
    I don't know how the accumulate_grad code gets hit via calling
    autograd.grad, so I went through all places in accumulate_grad
    that are definitely impossible to vmap through and changed them.
    
    To support this:
    - I added vmap support for Tensor::strides(). It returns the strides
    that correspond to the public dimensions of the tensor (not the ones
    being vmapped over).
    - Changed an instance of empty_strided to new_empty_strided.
    - Replaced an in-place operation in accumulate_grad.h
    
    Test Plan:
    - added a test for calling strides() inside of vmap
    - added tests that exercise all of the accumulate_grad code path.
    NB: I don't know why these tests exercise the code paths, but I've
    verified that they do via gdb.
    
    Suggestions for some saner test cases are very welcome.
    
    Reviewed By: izdeby
    
    Differential Revision: D25563543
    
    Pulled By: zou3519
    
    fbshipit-source-id: 05ac6c549ebd447416e6a07c263a16c90b2ef510
    zou3519 authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    b2acf95 View commit details
    Browse the repository at this point in the history
  39. Update TensorPipe submodule (pytorch#49467)

    Summary:
    Pull Request resolved: pytorch#49467
    
    Credit to beauby for the Bazel fixes.
    
    Test Plan: Export and run on CI
    
    Reviewed By: beauby
    
    Differential Revision: D25588027
    
    fbshipit-source-id: efe1c543eb7438ca05254de67cf8b5cee625119a
    lw authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    da5c385 View commit details
    Browse the repository at this point in the history
  40. Add docs/README.md to make existing doc build info more discoverable (p…

    …ytorch#49286)
    
    Summary:
    Closes pytorchgh-42003
    
    Pull Request resolved: pytorch#49286
    
    Reviewed By: glaringlee
    
    Differential Revision: D25535250
    
    Pulled By: ezyang
    
    fbshipit-source-id: a7790bfe4528fa6a31698126cc687793fdf7ac3f
    rgommers authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    94344a2 View commit details
    Browse the repository at this point in the history
  41. Updated derivative rules for complex svd and pinverse (pytorch#47761)

    Summary:
    Updated `svd_backward` to work correctly for complex-valued inputs.
    Updated `common_methods_invocations.py` to take dtype, device arguments for input construction.
    Removed `test_pinverse` from `test_autograd.py`, it is replaced by entries to `common_methods_invocations.py`.
    Added `svd` and `pinverse` to list of complex tests.
    
    References for complex-valued SVD differentiation:
    
    - https://giggleliu.github.io/2019/04/02/einsumbp.html
    - https://arxiv.org/abs/1909.02659
    
    The derived rules assume gauge invariance of loss functions, so the result would not be correct for loss functions that are not gauge invariant.
    https://re-ra.xyz/Gauge-Problem-in-Automatic-Differentiation/
    
    The same rule is implemented in Tensorflow and [BackwardsLinalg.jl](https://github.com/GiggleLiu/BackwardsLinalg.jl).
    
    Ref. pytorch#33152
    
    Pull Request resolved: pytorch#47761
    
    Reviewed By: izdeby
    
    Differential Revision: D25574962
    
    Pulled By: mruberry
    
    fbshipit-source-id: 832b61303e883ad3a451b84850ccf0f36763a6f6
    IvanYashchuk authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    6315a7e View commit details
    Browse the repository at this point in the history
  42. [quant][docs] Add fx graph mode quantization to quantization docs (py…

    …torch#49211)
    
    Summary: Pull Request resolved: pytorch#49211
    
    Test Plan: Imported from OSS
    
    Reviewed By: raghuramank100
    
    Differential Revision: D25507480
    
    fbshipit-source-id: 9e9e4b5fef979f5621c1bbd1b49e9cc6830da617
    jerryzh168 authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    bbaa6bb View commit details
    Browse the repository at this point in the history
  43. stft: Change require_complex warning to an error (pytorch#49022)

    Summary: Pull Request resolved: pytorch#49022
    
    Test Plan: Imported from OSS
    
    Reviewed By: ngimel
    
    Differential Revision: D25569586
    
    Pulled By: mruberry
    
    fbshipit-source-id: 09608088f540c2c3fc70465f6a23f2aec5f24f85
    peterbell10 authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    0d82603 View commit details
    Browse the repository at this point in the history
  44. Revert D25564477: [pytorch][PR] Add sinc operator

    Test Plan: revert-hammer
    
    Differential Revision:
    D25564477 (pytorch@bbc7143)
    
    Original commit changeset: 13f36a2b84da
    
    fbshipit-source-id: 58cbe8109efaf499dd017531878b9fbbb27976bc
    soulitzer authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    0176da6 View commit details
    Browse the repository at this point in the history
  45. Making ops c10-full: Storage arguments (pytorch#49146)

    Summary:
    Pull Request resolved: pytorch#49146
    
    Add support for Storage arguments to IValue and the JIT typing system, and make ops that were blocked on that c10-full.
    ghstack-source-id: 118710665
    
    (Note: this ignores all push blocking failures!)
    
    Test Plan: waitforsandcastle
    
    Reviewed By: ezyang
    
    Differential Revision: D25456799
    
    fbshipit-source-id: da14f125af352de5fcf05a83a69ad5a69d5a3b45
    smessmer authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    8dcd580 View commit details
    Browse the repository at this point in the history
  46. Allow zero annealing epochs (pytorch#47579)

    Summary:
    Fixes pytorch#47578.
    
    Pull Request resolved: pytorch#47579
    
    Reviewed By: H-Huang
    
    Differential Revision: D25429403
    
    Pulled By: vincentqb
    
    fbshipit-source-id: c42fbcd71b46e07c672a1e9661468848ac16de38
    Daniil-Osokin authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    6f50a18 View commit details
    Browse the repository at this point in the history
  47. Revert D25507480: [quant][docs] Add fx graph mode quantization to qua…

    …ntization docs
    
    Test Plan: revert-hammer
    
    Differential Revision:
    D25507480 (pytorch@7729581)
    
    Original commit changeset: 9e9e4b5fef97
    
    fbshipit-source-id: fdb08d824209b97defaba2e207d1a914575a6ae7
    Mike Ruberry authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    3bbc766 View commit details
    Browse the repository at this point in the history
  48. Fix link in distributed contributing doc and add link (pytorch#49141)

    Summary:
    One of the links for ramp up tasks wasn't showing any results and the other was only RPC results. Instead of this, I just changed it to one link that has `pt_distributed_rampup` which seems reasonable as the developer will be able to see both RPC and distributed tasks.
    
    Also added test command for DDP tests.
    
    Pull Request resolved: pytorch#49141
    
    Reviewed By: ezyang
    
    Differential Revision: D25597560
    
    Pulled By: rohan-varma
    
    fbshipit-source-id: 85d7d2964a19ea69fe149c017cf88dff835b164a
    rohan-varma authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    e7b6a29 View commit details
    Browse the repository at this point in the history
  49. Add note to torch docs for sinh/cosh (pytorch#49413)

    Summary:
    Address pytorch#48641
    
    Documents the behavior of sinh and cosh in the edge cases
    ```
    >>> b = torch.full((15,), 89, dtype=torch.float32)
    >>> torch.sinh(b)
    tensor([2.2448e+38, 2.2448e+38, 2.2448e+38, 2.2448e+38, 2.2448e+38, 2.2448e+38,
            2.2448e+38, 2.2448e+38, 2.2448e+38, 2.2448e+38, 2.2448e+38, 2.2448e+38,
            2.2448e+38, 2.2448e+38, 2.2448e+38])
    >>> b = torch.full((16,), 89, dtype=torch.float32)
    >>> torch.sinh(b)
    tensor([inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf])
    >>> b = torch.full((17,), 89, dtype=torch.float32)
    >>> torch.sinh(b)
    tensor([       inf,        inf,        inf,        inf,        inf,        inf,
                   inf,        inf,        inf,        inf,        inf,        inf,
                   inf,        inf,        inf,        inf, 2.2448e+38])
    >>> b = torch.full((32,), 89, dtype=torch.float32)[::2]
    >>> torch.sinh(b)
    tensor([2.2448e+38, 2.2448e+38, 2.2448e+38, 2.2448e+38, 2.2448e+38, 2.2448e+38,
            2.2448e+38, 2.2448e+38, 2.2448e+38, 2.2448e+38, 2.2448e+38, 2.2448e+38,
            2.2448e+38, 2.2448e+38, 2.2448e+38, 2.2448e+38])
    ```
    
    See https://sleef.org/purec.xhtml
    
    Pull Request resolved: pytorch#49413
    
    Reviewed By: ezyang
    
    Differential Revision: D25587932
    
    Pulled By: soulitzer
    
    fbshipit-source-id: 6db75c45786f4b95f82459d0ce5efa37ec0774f0
    soulitzer authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    470a9cf View commit details
    Browse the repository at this point in the history
  50. Refine ConvParams::use_nnpack() (pytorch#49464)

    Summary:
    NNPACK convolution algorithms can only be used for kernels up to 16x16
    
    Fixes pytorch#49462
    
    Pull Request resolved: pytorch#49464
    
    Reviewed By: xuzhao9
    
    Differential Revision: D25587879
    
    Pulled By: malfet
    
    fbshipit-source-id: 658197f23c08cab97f0849213ecee3f91f96c932
    malfet authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    ce124c2 View commit details
    Browse the repository at this point in the history
  51. T66557700 Support default argument values of a method (pytorch#48863)

    Summary:
    Pull Request resolved: pytorch#48863
    
    Support default arguments when invoking a module via PyTorch Lite (`mobile::Module`).
    
    Test Plan:
    buck test mode/dbg //caffe2/test/cpp/jit:jit -- LiteInterpreterTest.MethodInvocation
    
    buck test mode/dbg caffe2/test:mobile -- test_method_calls_with_optional_arg
    
    Reviewed By: raziel, iseeyuan
    
    Differential Revision: D25152559
    
    fbshipit-source-id: bbf52f1fbdbfbc6f8fa8b65ab524b1cd4648f9c0
    frankseide authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    0998854 View commit details
    Browse the repository at this point in the history
  52. [PyTorch] Merge CoinflipTLS into RecordFunctionTLS (pytorch#49359)

    Summary:
    Pull Request resolved: pytorch#49359
    
    This should be both slightly more efficient (1 less TLS guard
    check in at::shouldRunRecordFunction) and definitely more correct
    (CoinflipTLS is now saved whenever RecordFunctionTLS is saved), fixing
    a bad merge that left RecordFunctionTLS::tries_left dead.
    ghstack-source-id: 118624402
    
    Test Plan: Review, CI
    
    Reviewed By: hlu1
    
    Differential Revision: D25542799
    
    fbshipit-source-id: 310f9fd157101f659cea13c331b2a0ee6db2db88
    swolchok authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    c971a62 View commit details
    Browse the repository at this point in the history
  53. [PyTorch] Avoid extra Tensor refcounting in _cat_out_cpu (pytorch#49364)

    Summary:
    Pull Request resolved: pytorch#49364
    
    We had a local `Tensor` when we only needed a `const Tensor&`.
    ghstack-source-id: 118624595
    
    Test Plan: Internal benchmark.
    
    Reviewed By: hlu1
    
    Differential Revision: D25544731
    
    fbshipit-source-id: 7b9656d0371ab65a6313cb0ad4aa1df707884c1c
    swolchok authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    4df68b3 View commit details
    Browse the repository at this point in the history
  54. [PyTorch] Use .sizes() instead of .size() in _cat_out_cpu (pytorch#49368

    )
    
    Summary:
    Pull Request resolved: pytorch#49368
    
    The former is faster because it doesn't allow negative indexing (which we don't use).
    ghstack-source-id: 118624598
    
    Test Plan: internal benchmark
    
    Reviewed By: hlu1
    
    Differential Revision: D25545777
    
    fbshipit-source-id: b2714fac95c801fd735fac25b238b4a79b012993
    swolchok authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    bff610b View commit details
    Browse the repository at this point in the history
  55. [PyTorch] Use .sizes() isntead of .size() in cat_serial_kernel_impl (p…

    …ytorch#49371)
    
    Summary:
    Pull Request resolved: pytorch#49371
    
    As with previous diff, .sizes() is strictly more efficient.
    ghstack-source-id: 118627223
    
    Test Plan: internal benchmark
    
    Differential Revision: D25546409
    
    fbshipit-source-id: 196034716b6e11efda1ec8cb1e0fce7732d73eb4
    swolchok authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    51e4cc9 View commit details
    Browse the repository at this point in the history
  56. [PyTorch] Make tls_local_dispatch_key_set inlineable (reapply) (pytor…

    …ch#49412)
    
    Summary:
    Pull Request resolved: pytorch#49412
    
    FLAGS_disable_variable_dispatch had to go, but it looks like the only user was some benchmarks anyway.
    ghstack-source-id: 118669590
    
    Test Plan:
    Small (order of 0.1% improvement) on Internal benchmarks. Wait for
    GitHub CI since this was reverted before due to CI break
    
    Reviewed By: ezyang
    
    Differential Revision: D25547962
    
    fbshipit-source-id: 58424b1da230fdc5d27349af762126a5512fce43
    swolchok authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    e70d3f0 View commit details
    Browse the repository at this point in the history
  57. BFloat16: add explicit dtype support for to_mkldnn and to_dense (pyto…

    …rch#48881)
    
    Summary: Pull Request resolved: pytorch#48881
    
    Test Plan: Imported from OSS
    
    Reviewed By: ngimel
    
    Differential Revision: D25537190
    
    Pulled By: VitalyFedyunin
    
    fbshipit-source-id: a61a433c638e2e95576f88f081b64ff171b2316e
    XiaobingSuper authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    fb4da16 View commit details
    Browse the repository at this point in the history
  58. Introduce tools.codegen.api.translate (pytorch#49122)

    Summary:
    Pull Request resolved: pytorch#49122
    
    cpparguments_exprs has induced a lot of head scratching in many recent PRs for how to structure the code in a good way.  This PR eliminates the old algorithm for an entirely new algorithm inspired by logic programming.  The net result is shorter, cleaner and should be more robust to future changes.
    
    This PR is a bit of a whopper.  Here is the order to review it.
    
    - tools/codegen/api/types.py
      - Deleted CppArgument, CppArgumentPackIface (and subclasses), CppExpr, DispatcherExpr, DispatcherArgument, NativeExpr, NativeArgument, MetaArgument. All things previously called XArgument are now Binding. All things previously called XExpr are now Expr. I deleted the `__str__` implementation on Binding and fixed all call sites not to use it. On Binding, I renamed `str_no_default` and `str_default` to `defn` and `decl` for better symmetry with the corresponding signature concepts, although I'm open to naming them back to their original versions.
      - Obviously, things are less type safe without the class distinctions. So I introduce a new ADT called CType. CType represents the *semantic C++ type* of a binding: it is both the C++ type (e.g., `const Tensor&`) as well as the argument name that specifies what the  binding denotes (e.g., `other`). Every binding now records its CType. The key observation here is that you don't actually care if a given expression is from the cpp or dispatcher or native API; what you care is having enough information to know what the expression means, so you can use it appropriately. CType has this information. For the most part, ArgNames are just the string names of the arguments as you see them in JIT schema, but there is one case (`possibly_redundant_memory_format`) where we encode a little extra information. Unlike the plain strings we previously used to represent C++ types, CType have a little bit of structure around optional and references, because the translation code needs to work around these concepts.
      - I took the opportunity to kill all of the private fields like `_arguments` and `_returns_type` (since the argument types don't make sense anymore). Everything is computed for you on the fly. If this is a perf problem in codegen we can start using `cached_property` decorator.
      - All of the heavy lifting in CppSignature.argument_packs has been moved to the cpp module. We'll head over there next. Similarly, all of the exprs methods are now calling translate, the new functionality which we haven't gotten to yet
    - tools/codegen/api/cpp.py
       - We refactor all of the type computation functions to return CType instead of str. Because CTypes need to know the denotation, there is a new `binds: ArgName` argument to most functions that provides the denotation, so we can slot it in. (An alternative would have been to construct CTypes without denotations and then fill them in post-facto, but I didn't do it this way. One downside is there are some places where I need a CType without denotation, so I fill these in with `__placeholder__` whenever this happens).
      - `argument` and `arguments` are now extremely simple. There is no more Pack business, just produce one or more Bindings. The one thing of note is that when both a `memory_format` and `options` are in scope, we label the memory format as `possibly_redundant_memory_format`. This will be used in translation
    - tools/codegen/api/dispatcher.py and tools/codegen/api/native.py - same deal as cpp.py. One thing is that `cpparguments_exprs` is deleted; that is in the translator
    - tools/codegen/api/translate.py - the translator! It uses a very simple backwards deduction engine to work out how to fill in the arguments of functions. There are comments in the file that explain how it works.
    - Everything else: just some small call site tweaks for places when I changed API.
    
    Signed-off-by: Edward Z. Yang <ezyang@fb.com>
    
    Test Plan: Imported from OSS
    
    Reviewed By: ljk53
    
    Differential Revision: D25455887
    
    Pulled By: ezyang
    
    fbshipit-source-id: 90dc58d420d4cc49281aa8647987c69f3ed42fa6
    ezyang authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    15bc45f View commit details
    Browse the repository at this point in the history
  59. Revert D25569586: stft: Change require_complex warning to an error

    Test Plan: revert-hammer
    
    Differential Revision:
    D25569586 (pytorch@5874925)
    
    Original commit changeset: 09608088f540
    
    fbshipit-source-id: 6a5953b327a4a2465b046e29bb007a0c5f4cf14a
    Mike Ruberry authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    c482c5d View commit details
    Browse the repository at this point in the history
  60. [NNC] Dont inline outputs buffers on cpu (pytorch#49488)

    Summary:
    In pytorch#48967 we enabled output buffer inlining,  which results in duplicate computation if one output depends on another. This was done to fix correctness for CUDA, but is not needed for correctness for CPU and results in  perf slowdown.
    
    The output buffer inlining solution for CUDA is intended to be an interim solution because it does not work with reductions.
    
    Pull Request resolved: pytorch#49488
    
    Reviewed By: ezyang
    
    Differential Revision: D25596071
    
    Pulled By: eellison
    
    fbshipit-source-id: bc3d987645da5ce3c603b4abac3586b169656cfd
    Elias Ellison authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    fb0a942 View commit details
    Browse the repository at this point in the history
  61. Prevent accidentally writing old style ops (pytorch#49510)

    Summary:
    Pull Request resolved: pytorch#49510
    
    Adding old style operators with out arguments will break XLA. This prevents that. See for background: https://fb.workplace.com/groups/pytorch.dev/permalink/809934446251704/
    
    This is a temporary change that will prevent this breakage for the next couple of days until the problem is resolved for good.
    It will be deleted in pytorch#49164 then.
    ghstack-source-id: 118756437
    
    (Note: this ignores all push blocking failures!)
    
    Test Plan: waitforsandcastle
    
    Reviewed By: bhosmer
    
    Differential Revision: D25599112
    
    fbshipit-source-id: 6b0ca4da4b55da8aab9d1b332cd9f68e7602301e
    smessmer authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    c694e7d View commit details
    Browse the repository at this point in the history
  62. .circleci: Only downgrade if we have conda (pytorch#49519)

    Summary:
    Signed-off-by: Eli Uriegas <eliuriegas@fb.com>
    
    Fixes #{issue number}
    
    Pull Request resolved: pytorch#49519
    
    Reviewed By: robieta
    
    Differential Revision: D25603779
    
    Pulled By: seemethere
    
    fbshipit-source-id: ca8d811925762a5a413ca906d94c974a4ac5b132
    seemethere authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    b39b6cb View commit details
    Browse the repository at this point in the history
  63. Fix bad error message when int overflow (pytorch#48250)

    Summary:
    Fixes pytorch#48114
    
    Before:
    ```
    >>> torch.empty(2 * 10 ** 20)
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    TypeError: empty(): argument 'size' must be tuple of ints, but found element of type int at pos 1
    ```
    
    After fix:
    ```
    >>> torch.empty(2 * 10 ** 20)
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    RuntimeError: Overflow when unpacking long
    ```
    
    Unclear whether we need a separate test for this case, I can add one if it's necessary...
    
    Pull Request resolved: pytorch#48250
    
    Reviewed By: linbinyu
    
    Differential Revision: D25105217
    
    Pulled By: ezyang
    
    fbshipit-source-id: a5aa7c0266945c8125210a2fd34ce4b6ba940c92
    Kiyosora authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    3be7381 View commit details
    Browse the repository at this point in the history
  64. Relax the atol/rtol of layernorm math kernel test. (pytorch#49507)

    Summary: Pull Request resolved: pytorch#49507
    
    Test Plan: Imported from OSS
    
    Reviewed By: mruberry
    
    Differential Revision: D25598424
    
    Pulled By: ailzhang
    
    fbshipit-source-id: b3f43e84f177cf7c14831b0b83a399b155c813c4
    Ailing Zhang authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    12c9616 View commit details
    Browse the repository at this point in the history
  65. Fix CUDA extension ninja build (pytorch#49344)

    Summary:
    I am submitting this PR on behalf of Janne Hellsten(nurpax) from NVIDIA, for the convenience of CLA. Thanks Janne a lot for the contribution!
    
    Currently, the ninja build decides whether to rebuild a .cu file or not pretty randomly. And there are actually two issues:
    
    First, the arch list in the building command is ordered randomly. When the order changes, it will unconditionally rebuild regardless of the timestamp.
    
    Second, the header files are not included in the dependency list, so if the header file changes, it is possible that ninja will not rebuild.
    
    This PR fixes both issues. The fix for the second issue requires nvcc >= 10.2. nvcc < 10.2 can still build CUDA extension as it used to be, but it will be unable to see the changes in header files.
    
    Pull Request resolved: pytorch#49344
    
    Reviewed By: glaringlee
    
    Differential Revision: D25540157
    
    Pulled By: ezyang
    
    fbshipit-source-id: 197541690d7f25e3ac5ebe3188beb1f131a4c51f
    zasdfgbnm authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    2aa0817 View commit details
    Browse the repository at this point in the history
  66. [extensions] fix is_ninja_available during cuda extension building (p…

    …ytorch#49443)
    
    Summary:
    tldr: current version of `is_ninja_available` of `torch/utils/cpp_extension.py` fails to run in the recent incarnations of pip w/ new build isolation feature which is now a default. This PR fixes this problem.
    
    The full story follows:
    
    --------------------------
    
    Currently trying to build https://github.com/facebookresearch/fairscale/ which builds cuda extensions fails with the recent pip versions. The build is failing to perform `is_ninja_available`, which runs a simple subprocess to run `ninja --version` but does it with some /dev/null stream override which seems to break with the new pip versions. Currently I have `pip==20.3.3`. The recent pip performs build isolation which first fetches all dependencies to somewhere under /tmp/pip-install-xyz and then builds the package.
    
    If I build:
    
    ```
    pip install fairscale --no-build-isolation
    ```
    everything works.
    
    When building normally (i.e. without `--no-build-isolation`), the failure is a long long trace,
    <details>
    <summary>Full log</summary>
    <pre>
    pip install fairscale
    Collecting fairscale
      Downloading fairscale-0.1.1.tar.gz (83 kB)
         |████████████████████████████████| 83 kB 562 kB/s
      Installing build dependencies ... done
      Getting requirements to build wheel ... error
      ERROR: Command errored out with exit status 1:
       command: /home/stas/anaconda3/envs/main-38/bin/python /home/stas/anaconda3/envs/main-38/lib/python3.8/site-packages/pip/_vendor/pep517/_in_process.py get_requires_for_build_wheel /tmp/tmpjvw00c7v
           cwd: /tmp/pip-install-1wq9f8fp/fairscale_347f218384a64f24b8d5ce846641213e
      Complete output (55 lines):
      running egg_info
      writing fairscale.egg-info/PKG-INFO
      writing dependency_links to fairscale.egg-info/dependency_links.txt
      writing requirements to fairscale.egg-info/requires.txt
      writing top-level names to fairscale.egg-info/top_level.txt
      Traceback (most recent call last):
        File "/home/stas/anaconda3/envs/main-38/bin/ninja", line 5, in <module>
          from ninja import ninja
      ModuleNotFoundError: No module named 'ninja'
      Traceback (most recent call last):
        File "/home/stas/anaconda3/envs/main-38/lib/python3.8/site-packages/pip/_vendor/pep517/_in_process.py", line 280, in <module>
          main()
        File "/home/stas/anaconda3/envs/main-38/lib/python3.8/site-packages/pip/_vendor/pep517/_in_process.py", line 263, in main
          json_out['return_val'] = hook(**hook_input['kwargs'])
        File "/home/stas/anaconda3/envs/main-38/lib/python3.8/site-packages/pip/_vendor/pep517/_in_process.py", line 114, in get_requires_for_build_wheel
          return hook(config_settings)
        File "/tmp/pip-build-env-a5x2icen/overlay/lib/python3.8/site-packages/setuptools/build_meta.py", line 149, in get_requires_for_build_wheel
          return self._get_build_requires(
        File "/tmp/pip-build-env-a5x2icen/overlay/lib/python3.8/site-packages/setuptools/build_meta.py", line 130, in _get_build_requires
          self.run_setup()
        File "/tmp/pip-build-env-a5x2icen/overlay/lib/python3.8/site-packages/setuptools/build_meta.py", line 145, in run_setup
          exec(compile(code, __file__, 'exec'), locals())
        File "setup.py", line 56, in <module>
          setuptools.setup(
        File "/tmp/pip-build-env-a5x2icen/overlay/lib/python3.8/site-packages/setuptools/__init__.py", line 153, in setup
          return distutils.core.setup(**attrs)
        File "/home/stas/anaconda3/envs/main-38/lib/python3.8/distutils/core.py", line 148, in setup
          dist.run_commands()
        File "/home/stas/anaconda3/envs/main-38/lib/python3.8/distutils/dist.py", line 966, in run_commands
          self.run_command(cmd)
        File "/home/stas/anaconda3/envs/main-38/lib/python3.8/distutils/dist.py", line 985, in run_command
          cmd_obj.run()
        File "/tmp/pip-build-env-a5x2icen/overlay/lib/python3.8/site-packages/setuptools/command/egg_info.py", line 298, in run
          self.find_sources()
        File "/tmp/pip-build-env-a5x2icen/overlay/lib/python3.8/site-packages/setuptools/command/egg_info.py", line 305, in find_sources
          mm.run()
        File "/tmp/pip-build-env-a5x2icen/overlay/lib/python3.8/site-packages/setuptools/command/egg_info.py", line 536, in run
          self.add_defaults()
        File "/tmp/pip-build-env-a5x2icen/overlay/lib/python3.8/site-packages/setuptools/command/egg_info.py", line 572, in add_defaults
          sdist.add_defaults(self)
        File "/home/stas/anaconda3/envs/main-38/lib/python3.8/distutils/command/sdist.py", line 228, in add_defaults
          self._add_defaults_ext()
        File "/home/stas/anaconda3/envs/main-38/lib/python3.8/distutils/command/sdist.py", line 311, in _add_defaults_ext
          build_ext = self.get_finalized_command('build_ext')
        File "/home/stas/anaconda3/envs/main-38/lib/python3.8/distutils/cmd.py", line 298, in get_finalized_command
          cmd_obj = self.distribution.get_command_obj(command, create)
        File "/home/stas/anaconda3/envs/main-38/lib/python3.8/distutils/dist.py", line 858, in get_command_obj
          cmd_obj = self.command_obj[command] = klass(self)
        File "/tmp/pip-build-env-a5x2icen/overlay/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 351, in __init__
          if not is_ninja_available():
        File "/tmp/pip-build-env-a5x2icen/overlay/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1310, in is_ninja_available
          subprocess.check_call('ninja --version'.split(), stdout=devnull)
        File "/home/stas/anaconda3/envs/main-38/lib/python3.8/subprocess.py", line 364, in check_call
          raise CalledProcessError(retcode, cmd)
      subprocess.CalledProcessError: Command '['ninja', '--version']' returned non-zero exit status 1.
      ----------------------------------------
    ERROR: Command errored out with exit status 1: /home/stas/anaconda3/envs/main-38/bin/python /home/stas/anaconda3/envs/main-38/lib/python3.8/site-packages/pip/_vendor/pep517/_in_process.py get_requires_for_build_wheel /tmp/tmpjvw00c7v Check the logs for full command output.
    </pre>
    
    </details>
    
    and the middle of it is what we want:
    
    ```
        File "/tmp/pip-build-env-a5x2icen/overlay/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 351, in __init__
          if not is_ninja_available():
        File "/tmp/pip-build-env-a5x2icen/overlay/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1310, in is_ninja_available
          subprocess.check_call('ninja --version'.split(), stdout=devnull)
        File "/home/stas/anaconda3/envs/main-38/lib/python3.8/subprocess.py", line 364, in check_call
          raise CalledProcessError(retcode, cmd)
      subprocess.CalledProcessError: Command '['ninja', '--version']' returned non-zero exit status 1.
    ```
    
    For some reason pytorch fails to run this simple code:
    
    ```
    # torch/utils/cpp_extension.py
    def is_ninja_available():
        r'''
        Returns ``True`` if the `ninja <https://ninja-build.org/>`_ build system is
        available on the system, ``False`` otherwise.
        '''
        with open(os.devnull, 'wb') as devnull:
            try:
                subprocess.check_call('ninja --version'.split(), stdout=devnull)
            except OSError:
                return False
            else:
                return True
    ```
    
    I suspect that pip does something to `os.devnull` and that's why it fails.
    
    This PR proposes a simpler code which doesn't rely on anything but `subprocess.check_output`:
    
    ```
    def is_ninja_available():
        r'''
        Returns ``True`` if the `ninja <https://ninja-build.org/>`_ build system is
        available on the system, ``False`` otherwise.
        '''
        try:
            subprocess.check_output('ninja --version'.split())
        except Exception:
            return False
        else:
            return True
    ```
    
    which doesn't use `os.devnull` and performs the same function. There could be a whole bunch of different exceptions there I think, so I went for the generic one - we don't care why it failed, since this function's only purpose is to suggest whether ninja can be used or not.
    
    Let's check
    
    ```
    python -c "import torch.utils.cpp_extension; print(torch.utils.cpp_extension.is_ninja_available())"
    True
    ```
    
    Look ma - no std noise to take care of. (i.e. no need for /dev/null).
    
    I was editing the  installed environment-wide `cpp_extension.py` file directly, so didn't need to tweak `PYTHONPATH` - I made sure to replace `'ninja --version'.` with something that should fail and I did get `False` for the above command line.
    
    I next did a somewhat elaborate cheat to re-package an already existing binary wheel with this corrected version of `cpp_extension.py`, rather than building from source:
    ```
    mkdir /tmp/pytorch-local-channel
    cd /tmp/pytorch-local-channel
    
    # get the latest nightly wheel
    wget https://download.pytorch.org/whl/nightly/cu110/torch-1.8.0.dev20201215%2Bcu110-cp38-cp38-linux_x86_64.whl
    
    # unpack it
    unzip torch-1.8.0.dev20201215+cu110-cp38-cp38-linux_x86_64.whl
    
    # edit torch/utils/cpp_extension.py to fix the python code with the new version as in this PR
    emacs torch/utils/cpp_extension.py &
    
    # pack the files back
    zip -r torch-1.8.0.dev20201215+cu110-cp38-cp38-linux_x86_64.whl caffe2 torch torch-1.8.0.dev20201215+cu110.dist-info
    ```
    
    Now I tell pip to use my local channel, plus `--pre` for it to pick up the pre-release as an acceptable wheel
    ```
    # install using this local channel
    git clone https://github.com/facebookresearch/fairscale/
    cd fairscale
    pip install -v --disable-pip-version-check -e . -f file:///tmp/pytorch-local-channel --pre
    ```
    and voila all works.
    
    ```
    [...]
    Successfully installed fairscale
    ```
    
    I noticed a whole bunch of ninja not found errors in the log, which I think is the same problem with other parts of the build system packages which also use this old check copied all over various projects and build tools, and which the recent pip breaks.
    
    ```
        writing manifest file '/tmp/pip-modern-metadata-_nsdesbq/fairscale.egg-info/SOURCES.txt'
        Traceback (most recent call last):
          File "/home/stas/anaconda3/envs/main-38/bin/ninja", line 5, in <module>
            from ninja import ninja
        ModuleNotFoundError: No module named 'ninja'
        [...]
        /tmp/pip-build-env-fqflyevr/overlay/lib/python3.8/site-packages/torch/utils/cpp_extension.py:364: UserWarning: Attempted to use ninja as the BuildExtension backend but we could not find ninja.. Falling back to using the slow distutils backend.
          warnings.warn(msg.format('we could not find ninja.'))
    ```
    
    but these don't prevent from the build completing and installing.
    
    I suppose these need to be identified and reported to various other projects, but that's another story.
    
    The new pip does something to `os.devnull` I think which breaks any code relying on it - I haven't tried to figure out what happens to that stream object, but this PR which removes its usage solves the problem.
    
    Also do notice that:
    
    ```
    git clone https://github.com/facebookresearch/fairscale/
    cd fairscale
    python setup.py bdist_wheel
    pip install dist/fairscale-0.1.1-cp38-cp38-linux_x86_64.whl
    ```
    works too. So it is really a pip issue.
    
    Apologies if the notes are too many, I tried to give the complete picture and probably other projects will need those details as well.
    
    Thank you for reading.
    
    Pull Request resolved: pytorch#49443
    
    Reviewed By: mruberry
    
    Differential Revision: D25592109
    
    Pulled By: ezyang
    
    fbshipit-source-id: bfce4420c28b614ead48e9686f4153c6e0fbe8b7
    stas00 authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    dc052aa View commit details
    Browse the repository at this point in the history
  67. [NNC] Add Support For is_nan (pytorch#48973)

    Summary: Pull Request resolved: pytorch#48973
    
    Test Plan: Imported from OSS
    
    Reviewed By: bertmaher
    
    Differential Revision: D25413166
    
    Pulled By: eellison
    
    fbshipit-source-id: 0c79258345df18c60a862373fa16931228fb92ef
    Elias Ellison authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    6362b78 View commit details
    Browse the repository at this point in the history
  68. [NNC] add support for masked_fill (pytorch#48974)

    Summary: Pull Request resolved: pytorch#48974
    
    Test Plan: Imported from OSS
    
    Reviewed By: bertmaher
    
    Differential Revision: D25413165
    
    Pulled By: eellison
    
    fbshipit-source-id: 8cece1dc3692389be90c0d77bd71b103254d5ad3
    Elias Ellison authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    a0d6342 View commit details
    Browse the repository at this point in the history
  69. Add fusion support of aten::to (pytorch#48976)

    Summary: Pull Request resolved: pytorch#48976
    
    Test Plan: Imported from OSS
    
    Reviewed By: ZolotukhinM
    
    Differential Revision: D25413164
    
    Pulled By: eellison
    
    fbshipit-source-id: 0c31787e8b5e1368b0cba6e23660799b652389cd
    Elias Ellison authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    08fd21f View commit details
    Browse the repository at this point in the history
  70. eager quant: remove fake_quant after add/mul nodes during QAT (pytorc…

    …h#49213)
    
    Summary:
    Pull Request resolved: pytorch#49213
    
    Changes behavior of Eager mode quantization to remove observation after add_scalar/mul_scalar.
    This is not used, and it removes one difference between Eager and FX modes.
    
    Test Plan:
    ```
    python test/test_quantization.py TestQuantizeFxOps.test_quantized_add_qat
    python test/test_quantization.py TestQuantizeFxOps.test_quantized_mul_qat
    python test/test_quantization.py TestQuantizationAwareTraining.test_add_scalar_uses_input_qparams
    python test/test_quantization.py TestQuantizationAwareTraining.test_mul_scalar_uses_input_qparams
    ```
    
    Imported from OSS
    
    Reviewed By: jerryzh168
    
    Differential Revision: D25486276
    
    fbshipit-source-id: 34a5d6ce0d08739319ec0f8b197cfc1309d71040
    vkuzo authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    5ac65cb View commit details
    Browse the repository at this point in the history
  71. fx quant: move {input|output}_quantized_idxs cfg from convert to prep…

    …are (pytorch#49238)
    
    Summary:
    Pull Request resolved: pytorch#49238
    
    Moves the `input_quantized_idxs` and `output_quantized_idxs` options
    from the convert config to the prepare config.  This is done because
    these operations are related to placing observers, which is numerics
    changing during QAT.
    
    The next PR will adjust the behavior of `input_quantized_idxs` in
    prepare in QAT to prevent placing a fake_quant at the input if the
    input is marked quantized.  Placing a fake_quant there can lead to
    numerical inaccuracies during calibration, as it would start with
    scale=1 and zp=0, which may be different from the quantization
    parameters of the incoming quantized input.
    
    Test Plan:
    ```
    python test/test_quantization.py TestQuantizeFx
    ```
    
    Imported from OSS
    
    Reviewed By: jerryzh168
    
    Differential Revision: D25498762
    
    fbshipit-source-id: 17ace8f803542155652b310e5539e1882ebaadc6
    vkuzo authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    6c5a43d View commit details
    Browse the repository at this point in the history
  72. fx quant: do not insert observers at quantized inputs (pytorch#49239)

    Summary:
    Pull Request resolved: pytorch#49239
    
    Context: the existing implementation of `quantized_input_idxs` is convert-only.
    Therefore, observers are inserted between the input and the first
    quantized node.  This is a problem during QAT, because the initial
    input is a fake_quant, and it starts with scale=1 and zp=0.  This does
    not match the quantization parameters of the graph input, which can
    lead to incorrect numerics.
    
    Fix: do not insert observer for a quantized input.
    
    Test Plan:
    ```
    python test/test_quantization.py TestQuantizeFx
    ```
    
    Imported from OSS
    
    Reviewed By: jerryzh168
    
    Differential Revision: D25499486
    
    fbshipit-source-id: 303b49cc9d95a9fd06fef3b0859c08be34e19d8a
    vkuzo authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    f7a7355 View commit details
    Browse the repository at this point in the history
  73. fx quant: fix fq when input is quantized and node does not need fq (p…

    …ytorch#49382)
    
    Summary:
    Pull Request resolved: pytorch#49382
    
    Fixes an edge case.  If the input to the graph is quantized and the
    first node does not need activation observation, makes sure that
    the observer is not inserted.
    
    Test Plan:
    ```
    python test/test_quantization.py TestQuantizeFxOps.test_int8_input_no_unnecessary_fq
    ```
    
    Imported from OSS
    
    Reviewed By: jerryzh168
    
    Differential Revision: D25551041
    
    fbshipit-source-id: a6cba235c63ca7f6856e4128af7c1dc7fa0085ea
    vkuzo authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    f604f1b View commit details
    Browse the repository at this point in the history
  74. fx quant: make sure observer is inserted before a quantized output (p…

    …ytorch#49420)
    
    Summary:
    Pull Request resolved: pytorch#49420
    
    Before: if an output was marked as quantized, it could actually not
    be quantized, if the previous node was not quantized.
    
    After: if an output was marked as quantized, it will be quantized
    regardless of the quantization status of the previous node.
    
    Test Plan:
    ```
    python test/test_quantization.py TestQuantizeFxOps.test_quant_output_always_observed
    ```
    
    Imported from OSS
    
    Reviewed By: jerryzh168
    
    Differential Revision: D25566834
    
    fbshipit-source-id: 84755a1605fd3847edd03a7887ab9f635498c05c
    vkuzo authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    b7a36d0 View commit details
    Browse the repository at this point in the history
  75. add files to SLOW_TESTS for target determinator (pytorch#49500)

    Summary:
    - test_torch was split into 6 in pytorch#47356.
    - also test_linalg has 10 slowtest marking.
    
    Pull Request resolved: pytorch#49500
    
    Reviewed By: ezyang, malfet
    
    Differential Revision: D25598085
    
    Pulled By: walterddr
    
    fbshipit-source-id: 74b0b433897721db86c00e236d1dd925d7a6d3d0
    Rong Rong (AI Infra) authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    1aa640b View commit details
    Browse the repository at this point in the history
  76. [reland] Support torch.distributed.irecv(src=None, ...) (pytorch#49383)

    Summary:
    Pull Request resolved: pytorch#49383
    
    Reland of pytorch#47137
    ghstack-source-id: 118735407
    
    Test Plan: waitforbuildbot
    
    Reviewed By: osalpekar
    
    Differential Revision: D25551910
    
    fbshipit-source-id: 2e1f2f77e7c69204056dfe6ed178e8ad7650ab32
    pritamdamania authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    5aed6b3 View commit details
    Browse the repository at this point in the history
  77. Set caffe2::pthreadpool() size in ParallelOpenMP (pytorch#45566)

    Summary:
    Addresses pytorch#45418.
    
    This is probably not the best solution, but it's a rebase of the solution we're considering until pytorch#45418 is solved. If you can outline a better one I'm willing to implement it (:
    
    Pull Request resolved: pytorch#45566
    
    Reviewed By: ezyang
    
    Differential Revision: D24621568
    
    Pulled By: glaringlee
    
    fbshipit-source-id: 89dad5c61d8b5c26984d401551a1fe29df1ead04
    dbalchev authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    46971a5 View commit details
    Browse the repository at this point in the history
  78. Add torch._foreach_zero_ API (pytorch#47286)

    Summary:
    **In this PR**
    - add `_foreach_zero_` API
    - Update all optimizers under /_multi_tensor/ to use `_foreach_zero_` in `zero_grad` method
    
    Performance improvement
    ----------------- OP:  zero_  -----------------
    for-loop: 630.36 us
    foreach: 90.84 us
    
    script
    
    ```
    import torch
    import torch.optim as optim
    import torch.nn as nn
    import torchvision
    import torch.utils.benchmark as benchmark_utils
    
    inputs = [torch.rand(3, 200, 200, device="cuda") for _ in range(100)]
    
    def main():
        for op in [
                "zero_"
            ]:
            print("\n\n----------------- OP: ", op, " -----------------")
            stmt = "[torch.{op}(t) for t in inputs]"
            timer = benchmark_utils.Timer(
                stmt=stmt.format(op = op),
                globals=globals(),
                label="str(optimizer)",
            )
            print(f"autorange:\n{timer.blocked_autorange()}\n\n")
    
            stmt = "torch._foreach_{op}(inputs)"
            timer_mta = benchmark_utils.Timer(
                stmt=stmt.format(op = op),
                globals=globals(),
                label="str(optimizer_mta)",
            )
            print(f"autorange:\n{timer_mta.blocked_autorange()}\n\n")
    
    if __name__ == "__main__":
        main()
    
    ```
    **TODO**
    - Refactor zero_grad once foreach APIs are stable.
    
    **Tested** via unit tests
    
    Pull Request resolved: pytorch#47286
    
    Reviewed By: ngimel
    
    Differential Revision: D24706240
    
    Pulled By: izdeby
    
    fbshipit-source-id: aac69d6d134d65126ae8e5916f3627b73d8a94bf
    Iurii Zdebskyi authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    6a59ef2 View commit details
    Browse the repository at this point in the history
  79. Bring back math_silu_backward which works for all backends. (pytorch#…

    …49439)
    
    Summary: Pull Request resolved: pytorch#49439
    
    Test Plan: Imported from OSS
    
    Reviewed By: nikithamalgifb, ngimel
    
    Differential Revision: D25594129
    
    Pulled By: ailzhang
    
    fbshipit-source-id: 627bbea9ba478ee3a8edcc6695abab6431900192
    Ailing Zhang authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    3b1186d View commit details
    Browse the repository at this point in the history
  80. [quant][be] Add typing for quantization_mappings.py (pytorch#49179)

    Summary: Pull Request resolved: pytorch#49179
    
    Test Plan: Imported from OSS
    
    Reviewed By: vkuzo, wat3rBro
    
    Differential Revision: D25470520
    
    fbshipit-source-id: 16e35fec9a5f3339860bd2305ae8ffdd8e2dfaf7
    jerryzh168 authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    99ba415 View commit details
    Browse the repository at this point in the history
  81. Add BFloat16 support for isinf and isfinite (pytorch#49356)

    Summary:
    Also fix some tests.
    
    Pull Request resolved: pytorch#49356
    
    Reviewed By: mruberry
    
    Differential Revision: D25604364
    
    Pulled By: ngimel
    
    fbshipit-source-id: 9efdd83aaa96cacc66e9689db9f9d8c24175a693
    zasdfgbnm authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    5494a81 View commit details
    Browse the repository at this point in the history
  82. Change aten::native_layer_norm signature to match torch.layer_norm de…

    …finition (pytorch#48971)
    
    Summary:
    This PR is to change the `aten::native_layer_norm` and `aten::native_layer_norm_backward` signature to match `torch.layer_norm` definition. The current definition doesn't provide enough information to the PyTorch JIT to fuse layer_norm during training.
    
    `native_layer_norm(X, gamma, beta, M, N, eps)` =>
    `native_layer_norm(input, normalized_shape, weight, bias, eps)`
    
    `native_layer_norm_backward(dY, X, mean, rstd, gamma, M, N, grad_input_mask)` =>
    `native_layer_norm_backward(dY, input, normalized_shape, mean, rstd, weight, bias, grad_input_mask)`
    
    Pull Request resolved: pytorch#48971
    
    Reviewed By: izdeby
    
    Differential Revision: D25574070
    
    Pulled By: ngimel
    
    fbshipit-source-id: 23e2804295a95bda3f1ca6b41a1e4c5a3d4d31b4
    rdspring1 authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    276e68e View commit details
    Browse the repository at this point in the history
  83. Adding fix for invalid annotation types for dictionary (pytorch#49425)

    Summary:
    Fixes pytorch#49362
    
    **Summary:**
    This PR fixes the issue where invalid annotation types are used for a dictionary.
    Unsupported assertion message is generated for all invalid annotations
    
    **Test Case**:
    python test/test_jit.py TestJit.test_dict_invalid_annotations
    
    Pull Request resolved: pytorch#49425
    
    Reviewed By: navahgar
    
    Differential Revision: D25601578
    
    Pulled By: nikithamalgifb
    
    fbshipit-source-id: 91633e3d0891bdcb5402f044a74d02fe352ecd6f
    nikithamalgifb authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    54636e1 View commit details
    Browse the repository at this point in the history
  84. [pt] fuse ClipRangesGatherSigridHash (pytorch#49181)

    Summary:
    Pull Request resolved: pytorch#49181
    
    Fuse ClipRangesGatherSigridHash
    
    Test Plan:
    ```
    MKL_NUM_THREADS=1 OMP_NUM_THREADS=1 numactl -m 0 -C 3 ./buck-out/opt/gen/caffe2/caffe2/fb/predictor/ptvsc2_predictor_bench --scripted_model=/data/users/ansha/tmp/adindexer/merge/traced_merge_dper_fixes.pt --pt_inputs=/data/users/ansha/tmp/adindexer/merge/container_precomputation_bs1.pt --iters=30000 --warmup_iters=10000  --num_threads=1 --pred_net=/data/users/ansha/tmp/adindexer/precomputation_merge_net.pb --c2_inputs=/data/users/ansha/tmp/adindexer/merge/c2_inputs_precomputation_bs1.pb --c2_sigrid_transforms_opt=1 --c2_use_memonger=1 --c2_weights=/data/users/ansha/tmp/adindexer/merge/c2_weights_precomputation.pb --pt_enable_static_runtime --pt_cleanup_activations=true --pt_enable_out_variant=true --do_profile --compare_results
    ```
    
    Verify op fused:
    Node #3: 0.00104917 ms/iter, %173 : Tensor, %174 : Tensor = fb::clip_ranges_gather_sigrid_hash_offsets(%75, %76, %39, %40, %41, %38, %26)
    
    Before: 0.0919786
    After: 0.0911792
    
    Reviewed By: hlu1
    
    Differential Revision: D25468225
    
    fbshipit-source-id: 36bd91c140eaa57cb42cdaad46d878b94f162a9d
    ajyu authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    c18bc82 View commit details
    Browse the repository at this point in the history
  85. Revert D25574962: [pytorch][PR] Updated derivative rules for complex …

    …svd and pinverse
    
    Test Plan: revert-hammer
    
    Differential Revision:
    D25574962 (pytorch@9955355)
    
    Original commit changeset: 832b61303e88
    
    fbshipit-source-id: d73f77f3e51b0f535dad6d21c5bebf8d41a6bfbd
    Mike Ruberry authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    2e3adbd View commit details
    Browse the repository at this point in the history
  86. Remove set_quantizer_ from native_functions.yaml (pytorch#49463)

    Summary:
    Pull Request resolved: pytorch#49463
    
    set_quantizer_ takes a ConstQuantizerPtr argument, which is neither supported by JIT nor by c10.
    Also, it doesn't get dispatched (CPU and CUDA have the same implementation) and it is excluded from python bindings generation.
    So there is no real reason why this needs to be in native_functions.yaml
    
    Removing it unblocks the migration to c10-fullness since this is an op that would have been hard to migrate. See https://fb.quip.com/QRtJAin66lPN
    ghstack-source-id: 118710663
    
    Test Plan: waitforsandcastle
    
    Reviewed By: ezyang
    
    Differential Revision: D25587763
    
    fbshipit-source-id: 8fab921f4c256c128d48d82dac731f04ec9bad92
    smessmer authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    0a2ba5d View commit details
    Browse the repository at this point in the history
  87. [C2] Revive unsafe CoalesceOp (pytorch#49402)

    Summary:
    Pull Request resolved: pytorch#49402
    
    In cases of NCCLAllReduce operations there could be non-trivial overhead for
    launching cooperative kernels (especially in case of async execution of
    different parts of the model). This diff is reviving this operator to make it
    possible to fuse multiple operations into a single kernel.
    
    Test Plan:
    Unit-test.
    Used in a later diff.
    
    Reviewed By: xianjiec
    
    Differential Revision: D25531206
    
    fbshipit-source-id: 64b1c161233a726f9e2868f1059316e42a8ea1fc
    kennyhorror authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    87a4bc5 View commit details
    Browse the repository at this point in the history
  88. [AutoAccept][Codemod][FBSourceClangFormatLinter] Daily `arc lint --ta…

    …ke CLANGFORMAT`
    
    Reviewed By: zertosh
    
    Differential Revision: D25609974
    
    fbshipit-source-id: 4db8f8100336a2f0f2af8bc7b960d3711a5d1d7d
    generatedunixname89002005325676 authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    9df6183 View commit details
    Browse the repository at this point in the history
  89. PyLong_{As/From}{Long/UnsignedLong} lint checks (pytorch#49280)

    Summary:
    Fixes pytorch#45581
    
    Pull Request resolved: pytorch#49280
    
    Reviewed By: mruberry
    
    Differential Revision: D25592330
    
    Pulled By: ezyang
    
    fbshipit-source-id: 5c16d6aed88ad1feaa7f129b4cd44c0561be2de2
    peterjc123 authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    728a912 View commit details
    Browse the repository at this point in the history
  90. [reland][quant][docs] Add fx graph mode quantization to quantization …

    …docs (pytorch#49211) (pytorch#49515)
    
    Summary: Pull Request resolved: pytorch#49515
    
    Test Plan:
    Imported from OSS
    
    Imported from OSS
    
    Reviewed By: vkuzo
    
    Differential Revision: D25601061
    
    fbshipit-source-id: 74e917d57895e9b4131a01fdcea8df3e94322bec
    jerryzh168 authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    b8c8d33 View commit details
    Browse the repository at this point in the history
  91. Refactor RPC matchBuiltInOp to get rid of exception swallowing (pytor…

    …ch#49009)
    
    Summary:
    Pull Request resolved: pytorch#49009
    
    As per the title, we should generally not have exception swalling and
    this commit makes it so that if there is a true error in JIT operator
    resolution, it is propagated back to the RPC callee and we don't silently
    swallow any other exceptions that may happen. Swallowing the exceptions
    previously resulted in hard to debug issues such as unexpected ops showing up
    in profiler, and flaky tests which were fixed by
    pytorch#41287
    
    Added a unittest that validates the error that comes from `jit/pybind_utils.h`.
    ghstack-source-id: 118794661
    
    Test Plan: CI
    
    Reviewed By: mrshenli
    
    Differential Revision: D25392905
    
    fbshipit-source-id: 6f93251635740bcf902824548b2bc6f9249be5f0
    rohan-varma authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    2853fa3 View commit details
    Browse the repository at this point in the history
  92. Revert D25105217: [pytorch][PR] Fix bad error message when int overflow

    Test Plan: revert-hammer
    
    Differential Revision:
    D25105217 (pytorch@c675727)
    
    Original commit changeset: a5aa7c026694
    
    fbshipit-source-id: ddb4c93f9317e1747def8842a8072c84776cd487
    ezyang authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    0567619 View commit details
    Browse the repository at this point in the history
  93. Set is_non_overlapping_and_dense_ flag in OpaqueTensorImpl constructor (

    pytorch#49470)
    
    Summary:
    Pull Request resolved: pytorch#49470
    
    pytorch#48625 changes the default contiguous settings for `TensorImpl` causing the Vulkan backend to crash. Therefore, add argument that can set `is_non_overlapping_and_dense_` back to false for `OpaqueTensorImpl` constructor.
    
    Test Plan: Imported from OSS
    
    Reviewed By: AshkanAliabadi
    
    Differential Revision: D25592826
    
    Pulled By: SS-JIA
    
    fbshipit-source-id: e5d9de9a733875cb00c0546a3bc3271e5c6e23a3
    SS-JIA authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    83f6ad5 View commit details
    Browse the repository at this point in the history
  94. Test distributed collectives profiling with Gloo on GPU (pytorch#49072)

    Summary:
    Pull Request resolved: pytorch#49072
    
    As per the title, we should enable these tests for Gloo when run on GPU and the profiler is enabled with `use_cuda=True`. Enabling ProcessGroupNCCL profiling test to work with `use_cuda=True` is being tracked in pytorch#48987.
    ghstack-source-id: 118789003
    
    Test Plan: CI
    
    Reviewed By: mrshenli
    
    Differential Revision: D25388986
    
    fbshipit-source-id: 664d922ac2e10c77299daebdc6d3c92bb70eb56e
    rohan-varma authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    0deecfc View commit details
    Browse the repository at this point in the history
  95. Revert D25152559: T66557700 Support default argument values of a method

    Test Plan: revert-hammer
    
    Differential Revision:
    D25152559 (pytorch@6bde0ca)
    
    Original commit changeset: bbf52f1fbdbf
    
    fbshipit-source-id: 592fdb3078b1ac86cd394adc6c1bfd6b10d829e1
    iseeyuan authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    8d6bce8 View commit details
    Browse the repository at this point in the history
  96. [te] Add fast log approximation based on sleef

    Summary:
    This is a fast log implementations
    
    benchmark:
    ```
    buck run mode/opt //caffe2/benchmarks/cpp/tensorexpr:tensorexpr_bench -c 'fbcode.caffe2_gpu_type=none'
    ```
    
    Test Plan: buck test mode/no-gpu //caffe2/test/cpp/tensorexpr:tensorexpr -- *.fastLogFloat
    
    Reviewed By: bertmaher
    
    Differential Revision: D25445815
    
    fbshipit-source-id: 20696eacd12a55e797f606f4a6dbbd94c9652888
    bwasti authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    e8b6219 View commit details
    Browse the repository at this point in the history
  97. [quant][eagermode][fix] Fix quantization for DeQuantStub (pytorch#49428)

    Summary:
    Pull Request resolved: pytorch#49428
    
    Previously dequantstub will be swapped with nn.quantized.DeQuantize regardless of qconfig
    reason is we skipped attaching qconfig for DeQuantStub to avoid adding fake quantize module to it
    but the correct fix is to skip it in insert observers, this PR fixes the issue.
    
    Test Plan: Imported from OSS
    
    Reviewed By: vkuzo
    
    Differential Revision: D25569991
    
    fbshipit-source-id: d44a08c6e64c7a49509687dc389b57de1cbb878c
    jerryzh168 authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    14bb5d0 View commit details
    Browse the repository at this point in the history
  98. .github: Add action workflow to update S3 HTMLS (pytorch#49509)

    Summary:
    Successful run: https://github.com/pytorch/pytorch/runs/1572315901
    
    Signed-off-by: Eli Uriegas <eliuriegas@fb.com>
    
    Pull Request resolved: pytorch#49509
    
    Reviewed By: walterddr
    
    Differential Revision: D25619133
    
    Pulled By: seemethere
    
    fbshipit-source-id: 092ab12535f3bf4fc85bbfc690d3f5b10a5f8791
    seemethere authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    cfd0951 View commit details
    Browse the repository at this point in the history
  99. [FileStore] Implemented numKeys and Added Tests (pytorch#49556)

    Summary:
    Pull Request resolved: pytorch#49556
    
    Implemented the missing Store functionality (specifically numKeys) in the FileStore.
    
    Test Plan: Added both C++ and Python tests to verify functionality.
    
    Reviewed By: jiayisuse
    
    Differential Revision: D25619001
    
    fbshipit-source-id: 9146d0da9e0903622be3035880f619bbb2cc3891
    osalpekar authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    1c90741 View commit details
    Browse the repository at this point in the history
  100. [FileStore] Updating Docs to Reflect FileStore changes (pytorch#49557)

    Summary:
    Pull Request resolved: pytorch#49557
    
    Updating the PyTorch docs to reflect that FileStore now supported the
    num_keys API. Also included a note to describe the behavior of the API.
    
    Test Plan: build and rendered docs.
    
    Reviewed By: jiayisuse
    
    Differential Revision: D25619000
    
    fbshipit-source-id: 6c660d7ceb32d1d61024df8394aff3fcd0b752c1
    osalpekar authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    9611cf3 View commit details
    Browse the repository at this point in the history
  101. Revert D25445815: [te] Add fast log approximation based on sleef

    Test Plan: revert-hammer
    
    Differential Revision:
    D25445815 (pytorch@1329066)
    
    Original commit changeset: 20696eacd12a
    
    fbshipit-source-id: 38830a6abd16260d60e5dd9a5594e65736a9c782
    ezyang authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    ddddf93 View commit details
    Browse the repository at this point in the history
  102. Add dict comprehension (pytorch#47774)

    Summary: Pull Request resolved: pytorch#47774
    
    Test Plan: Imported from OSS
    
    Reviewed By: pbelevich
    
    Differential Revision: D25615464
    
    Pulled By: ansley
    
    fbshipit-source-id: 10bba6f70e812fa580cbbbf097e93de7142484cc
    Ansley Ussery authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    0e10eb7 View commit details
    Browse the repository at this point in the history
  103. Revert D25547962: [PyTorch] Make tls_local_dispatch_key_set inlineabl…

    …e (reapply)
    
    Test Plan: revert-hammer
    
    Differential Revision:
    D25547962 (pytorch@6f928a4)
    
    Original commit changeset: 58424b1da230
    
    fbshipit-source-id: 10ff9f45f6587f67e1c88886f977930b4f7e396a
    Mike Ruberry authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    4e1b7d2 View commit details
    Browse the repository at this point in the history
  104. Revert D25546409: [PyTorch] Use .sizes() isntead of .size() in cat_se…

    …rial_kernel_impl
    
    Test Plan: revert-hammer
    
    Differential Revision:
    D25546409 (pytorch@953f992)
    
    Original commit changeset: 196034716b6e
    
    fbshipit-source-id: 0e80f06a98c2842d2f11db7057ffcdcaea85f3bf
    Mike Ruberry authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    7c49006 View commit details
    Browse the repository at this point in the history
  105. Revert D25545777: [PyTorch] Use .sizes() instead of .size() in _cat_o…

    …ut_cpu
    
    Test Plan: revert-hammer
    
    Differential Revision:
    D25545777 (pytorch@c1879b5)
    
    Original commit changeset: b2714fac95c8
    
    fbshipit-source-id: f534f8fc312943f1e6ba3d4029d6cf69b006aca8
    Mike Ruberry authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    917cdeb View commit details
    Browse the repository at this point in the history
  106. Revert D25544731: [PyTorch] Avoid extra Tensor refcounting in _cat_ou…

    …t_cpu
    
    Test Plan: revert-hammer
    
    Differential Revision:
    D25544731 (pytorch@1a05104)
    
    Original commit changeset: 7b9656d0371a
    
    fbshipit-source-id: 0f7ea74eca282cadf269bbd284d59650a431ed65
    Mike Ruberry authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    c04718b View commit details
    Browse the repository at this point in the history
  107. Revert D25542799: [PyTorch] Merge CoinflipTLS into RecordFunctionTLS

    Test Plan: revert-hammer
    
    Differential Revision:
    D25542799 (pytorch@9ce1df0)
    
    Original commit changeset: 310f9fd15710
    
    fbshipit-source-id: 51777914422a560e94430a786c86f5de4007a00b
    Mike Ruberry authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    0ec5fb3 View commit details
    Browse the repository at this point in the history
  108. [te][reapply] Add fast log approximation based on sleef (pytorch#49575)

    Summary:
    Pull Request resolved: pytorch#49575
    
    This is a fast log implementations
    
    benchmark:
    
    ```
    buck run mode/opt //caffe2/benchmarks/cpp/tensorexpr:tensorexpr_bench -c 'fbcode.caffe2_gpu_type=none'
    ```
    
    Test Plan: buck test mode/no-gpu //caffe2/test/cpp/tensorexpr:tensorexpr -- *.fastLogFloat
    
    Reviewed By: bertmaher
    
    Differential Revision: D25627157
    
    fbshipit-source-id: a4920f4f4005ce617d372b375e790ca966275cd9
    bwasti authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    309d517 View commit details
    Browse the repository at this point in the history
  109. [ddp launch] solve zombie problem (pytorch#49305)

    Summary:
    I was exhausted with needing to hunt down zombies when working with ddp launcher, so this PR solves the various zombie issues.
    
    This PR addresses 2 distinct zombie scenarios caused by ddp launch.py:
    
    1. When the main process is killed, the child processes aren't killed and continue running
    2. When any of the children processes dies (e.g. OOM), the rest of the children and the parent remain running, but really are stuck
    
    To solve these problems this PR switches from `wait` to `poll` and uses signal handlers.
    
    The main problem with `wait()` was that it's not async, and I was having a 2nd process OOM, and the code was stuck waiting for the first process to finish which will not happen since the first process is blocking now waiting for the 2nd process - a sort of deadlock. My 2nd card is smaller than the first one, so it occasionally OOMs.
    
    Using `asyncio` would probably be the cleanest solution, but as it's relatively new in python, perhaps polling is good enough.
    
    I wrote this little script to reproduce 2 problematic scenarios and a normal running setup, it does 3 different things according to the `--mode` arg
    
    - `oom` - causes the 2nd process to exit prematurely emulating OOM
    - `clean-finish` - just exit normally in both processes
    - `False` (lack of arg) just keep on running - emulating multiple normally running processes
    
    ```
    # oom.py
    import argparse
    from time import sleep
    import sys
    
    def main():
        parser = argparse.ArgumentParser()
        parser.add_argument("--local_rank", default=False, type=int)
        parser.add_argument("--mode", default=False, type=str)
        args, _ = parser.parse_known_args()
    
        print(f"{args.local_rank} is starting")
        sleep(3)
    
        if args.mode == "oom":
            # emulate OOM in 2nd card
            if args.local_rank == 1:
                raise RuntimeError("OOM")
    
        if args.mode == "clean-finish":
            sleep(1)
            print(f"{args.local_rank} is cleanly finishing")
            sys.exit(0)
    
        while (True):
            # emulate long running process
            print(f"{args.local_rank} is running")
            sleep(1)
    
    if __name__ == "__main__":
        main()
    ```
    
    Let's begin:
    
    ###  1. Normal execution
    
    ```
    python -m torch.distributed.launch --nproc_per_node=2 ./oom.py --mode=clean-finish
    ```
    
    All the processes exit upon completion - I won't bother pasting the log here - just testing that my code didn't break the normal running
    
    ### 2. OOM
    
    ```
    python -m torch.distributed.launch --nproc_per_node=2 ./oom.py --mode=oom
    ```
    
    ```
    POLLING FOR 17547
    POLLING FOR 17548
    0
    0 is starting
    1
    1 is starting
    POLLING FOR 17547
    POLLING FOR 17548
    POLLING FOR 17548
    POLLING FOR 17547
    POLLING FOR 17547
    POLLING FOR 17548
    0 is running
    Traceback (most recent call last):
      File "./oom.py", line 33, in <module>
        main()
      File "./oom.py", line 20, in main
        raise RuntimeError("OOM")
    RuntimeError: OOM
    POLLING FOR 17548
    process 17548 is no more
    Killing subprocess 17547
    Killing subprocess 17548
    Traceback (most recent call last):
      File "/home/stas/anaconda3/envs/main-38/lib/python3.8/runpy.py", line 194, in _run_module_as_main
        return _run_code(code, main_globals, None,
      File "/home/stas/anaconda3/envs/main-38/lib/python3.8/runpy.py", line 87, in _run_code
        exec(code, run_globals)
      File "/home/stas/anaconda3/envs/main-38/lib/python3.8/site-packages/torch/distributed/launch.py", line 341, in <module>
        main()
      File "/home/stas/anaconda3/envs/main-38/lib/python3.8/site-packages/torch/distributed/launch.py", line 327, in main
        sigkill_handler(signal.SIGTERM, None) # not coming back
      File "/home/stas/anaconda3/envs/main-38/lib/python3.8/site-packages/torch/distributed/launch.py", line 301, in sigkill_handler
        raise subprocess.CalledProcessError(returncode=last_return_code, cmd=cmd)
    subprocess.CalledProcessError: Command '['/home/stas/anaconda3/envs/main-38/bin/python', '-u', './oom.py', '--local_rank=1', '--mode=oom']' returned non-zero exit status 1.
    ```
    
    All processes exited and the trace was printed
    
    ### 3. Exit on SIGINT/SIGTERM
    
    If I started a process and then realized I made a mistake I want to be able to kill it cleanly and if any sub-processes have already been spawned I want them to be killed too. Here the sighandler takes care of trapping the SIGTERM/SIGINT.
    
    ```
    python -m torch.distributed.launch --nproc_per_node=2 ./oom.py
    ```
    
    Here the processes emulate a long normal run.
    
    So let's Ctrl-C the process as soon as it started and see:
    
    ```
    POLLING FOR 18749
    POLLING FOR 18750
    0
    0 is starting
    1
    1 is starting
    POLLING FOR 18749
    POLLING FOR 18750
    POLLING FOR 18750
    POLLING FOR 18749
    POLLING FOR 18749
    POLLING FOR 18750
    0 is running
    1 is running
    POLLING FOR 18750
    POLLING FOR 18749
    0 is running
    1 is running
    ^CTraceback (most recent call last):
    Killing subprocess 18749
    Traceback (most recent call last):
      File "./oom.py", line 33, in <module>
      File "./oom.py", line 33, in <module>
    Killing subprocess 18750
    Parent got kill signal=SIGINT, exiting
    ```
    
    all processes got killed
    
    --------------------------------
    
    So this covered the 2 problematic cases and 1 normal case
    
    Notes:
    - we could probably switch to `sleep(3)` - `1` is probably too fast
    - all the debug prints will be removed once you are happy - I left them so that it's easier for you to test that my PR does the right thing.
    
    Thank you!
    
    Pull Request resolved: pytorch#49305
    
    Reviewed By: izdeby
    
    Differential Revision: D25565617
    
    Pulled By: rohan-varma
    
    fbshipit-source-id: 1ea864113f283d4daac5eef1131c8d745aae4c99
    stas00 authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    723010e View commit details
    Browse the repository at this point in the history
  110. Add more list peephole idioms (pytorch#48268)

    Summary: Pull Request resolved: pytorch#48268
    
    Test Plan: Imported from OSS
    
    Reviewed By: jamesr66a
    
    Differential Revision: D25104617
    
    Pulled By: eellison
    
    fbshipit-source-id: b41c03d5da6e9b88acf21a859f61c5c70608c150
    Elias Ellison authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    4c9c61e View commit details
    Browse the repository at this point in the history
  111. disable concat nested namespace check (pytorch#49571)

    Summary:
    Pull Request resolved: pytorch#49571
    
    Disable nested namespace check since OSS standard is
    ```
    set(CMAKE_CXX_STANDARD 14)
    ```
    and its currently causing confusion on clang-tidy internally such as D25214452
    
    Test Plan: clang-tidy
    
    Reviewed By: xuzhao9
    
    Differential Revision: D25626392
    
    fbshipit-source-id: 1fb472c89ebe9b83718ae27f2c1d77b8b2412b5e
    Rong Rong (AI Infra) authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    29e296d View commit details
    Browse the repository at this point in the history
  112. Add type inference for dequantization.tensors (pytorch#49517)

    Summary:
    Pull Request resolved: pytorch#49517
    
    We should add concrete type info for Tensor List case as well.
    
    Test Plan: ci
    
    Reviewed By: qizzzh
    
    Differential Revision: D25599223
    
    fbshipit-source-id: 3614e9ec25fc963a8d6a0bd641735fcca6c87032
    houseroad authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    f7ed11e View commit details
    Browse the repository at this point in the history
  113. FLOPS Roofline Analysis Feature for PyTorch Profiler. (pytorch#46506)

    Summary:
    FLOPs Roofline Analysis Feature for PyTorch Profiler.
    
    Currently, PyTorch Profiler lacks the ability to measure the FLOPs of operators, such as mm and conv.
    FLOPs are helpful to estimate the computation complexity of the operators.
    For now, we use input shapes to estimate the number of floating pointer operations.
    In the future, we may compute this information by tracking hardware counters.
    
    Pull Request resolved: pytorch#46506
    
    Test Plan:
    Run `python test/test_profiler_flops.py -k test_flops`. The test will print a profiler table with "FLOPS" column, like the following:
    ----------------------------  ------------  ------------  ------------  ------------  ------------  ------------  ---------------------------------------------  ------------
                            Name    Self CPU %      Self CPU   CPU total %     CPU total  CPU time avg    # of Calls                                   Input Shapes        MFLOPS
    ----------------------------  ------------  ------------  ------------  ------------  ------------  ------------  ---------------------------------------------  ------------
                    aten::matmul         0.06%      57.653us        82.97%      79.310ms      79.310ms             1                 [[40, 33, 1, 243], [243, 243]]            --
                        aten::mm        82.84%      79.186ms        82.86%      79.204ms      79.204ms             1                      [[1320, 243], [243, 243]]       984.323
                    aten::conv2d         0.04%      36.345us        16.06%      15.347ms      15.347ms             1  [[40, 16, 18, 260], [33, 16, 18, 18], [33], [  44065010.318
               aten::convolution         0.02%      16.016us        16.02%      15.310ms      15.310ms             1  [[40, 16, 18, 260], [33, 16, 18, 18], [33], [            --
              aten::_convolution         0.07%      63.855us        16.00%      15.294ms      15.294ms             1  [[40, 16, 18, 260], [33, 16, 18, 18], [33], [            --
        aten::mkldnn_convolution        15.89%      15.188ms        15.93%      15.225ms      15.225ms             1  [[40, 16, 18, 260], [33, 16, 18, 18], [33], [            --
                      aten::relu         0.10%      98.223us         0.64%     612.157us     306.079us             2                             [[40, 33, 1, 243]]            --
                 aten::threshold         0.49%     465.416us         0.54%     513.934us     256.967us             2                     [[40, 33, 1, 243], [], []]            --
                      aten::add_         0.29%     279.301us         0.29%     279.301us     279.301us             1                  [[40, 33, 1, 243], [243], []]            --
                     aten::empty         0.10%      99.113us         0.10%      99.113us      24.778us             4                       [[], [], [], [], [], []]            --
    ----------------------------  ------------  ------------  ------------  ------------  ------------  ------------  ---------------------------------------------  ------------
    Self CPU time total: 95.584ms
    
    .
    ----------------------------------------------------------------------
    Ran 1 test in 0.176s
    
    For now, we only provide FLOPs calculation for aten::conv2d and aten::mm operators.
    
    Reviewed By: ezyang
    
    Differential Revision: D25214452
    
    Pulled By: xuzhao9
    
    fbshipit-source-id: 0ae841bd8dbdeb032346dc3d9d38e19875aa1da3
    xuzhao9 authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    1f570a0 View commit details
    Browse the repository at this point in the history
  114. Disables method variant grad and grad grad checks (pytorch#49576)

    Summary:
    These are redundant with the functional variant checks and can be very costly, as some grad and gradgrad testing takes minutes to run per variant. Maybe in the future we'll add them back for operations with divergent method implementations.
    
    Pull Request resolved: pytorch#49576
    
    Reviewed By: albanD, ngimel
    
    Differential Revision: D25631691
    
    Pulled By: mruberry
    
    fbshipit-source-id: 247f750979d9dafab2454cdbfa992a2aa6da724a
    Mike Ruberry authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    8c28731 View commit details
    Browse the repository at this point in the history
  115. Use store based barrier in init_process_group. (pytorch#49419)

    Summary:
    Pull Request resolved: pytorch#49419
    
    As described in pytorch#48110, the
    newly introduced `barrier()` in `init_process_group` messes up NCCL
    communicator state since it uses a bunch of default devices to perform an
    allreduce which simulates a barrier(). As a ressult, subsequent NCCL operations
    might not behave as expected.
    ghstack-source-id: 118861776
    
    Test Plan:
    1) unit test added.
    2) waitforbuildbot
    
    Reviewed By: mrshenli
    
    Differential Revision: D25566550
    
    fbshipit-source-id: ab083b67b634d7c515f4945deb228f959b27c936
    pritamdamania authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    14c3255 View commit details
    Browse the repository at this point in the history
  116. Fix CustomAutogradTest.ReentrantPriority rerun failures (pytorch#49581)

    Summary:
    Clear static variable at the end of the test to ensure test passes after re-runs
    
    Pull Request resolved: pytorch#49581
    
    Test Plan:
    `./bin/test_api "--gtest_filter=CustomAutogradTest.ReentrantPriority" --gtest_repeat=50`
    Before the change all subsequent runs of the test failed with
    ```
    ../test/cpp/api/autograd.cpp:681: Failure
    Expected equality of these values:
      order.size()
        Which is: 310
      10
    ```
    
    Reviewed By: mrshenli
    
    Differential Revision: D25632374
    
    Pulled By: malfet
    
    fbshipit-source-id: 4814d22b5dff15e1b38a0187e51070771fd58370
    malfet authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    ce7608a View commit details
    Browse the repository at this point in the history
  117. Set USE_KINETO=1 (pytorch#49201)

    Summary:
    Pull Request resolved: pytorch#49201
    
    This unblocks kineto profiler for 1.8 release.
    This PR supercedes pytorch#48391
    Note: this will somewhat increase the size of linux server binaries, bc
    we add libkineto.a and libcupti_static.a:
    -rw-r--r-- 1 jenkins jenkins 1107502 Dec 10 21:16 build/lib/libkineto.a
    -rw-r--r-- 1 root root 13699658 Nov 13  2019 /usr/local/cuda/lib64/libcupti_static.a
    
    Test Plan:
    CI
    pytorch#48391
    
    Imported from OSS
    
    Reviewed By: ngimel
    
    Differential Revision: D25480770
    
    fbshipit-source-id: 037cd774f5547d9918d6055ef5cc952a54e48e4c
    Ilia Cherniavskii authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    a90a450 View commit details
    Browse the repository at this point in the history
  118. Revert D25480770: Set USE_KINETO=1

    Test Plan: revert-hammer
    
    Differential Revision:
    D25480770 (pytorch@1a92802)
    
    Original commit changeset: 037cd774f554
    
    fbshipit-source-id: 6a6062195033ca91fcc0cfa1e890e47efc774ac1
    Ilia Cherniavskii authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    0f3059d View commit details
    Browse the repository at this point in the history
  119. Support integral types for kAbs in SimpleIREvaluator (pytorch#49357)

    Summary:
    Pull Request resolved: pytorch#49357
    
    This is a follow-up fix for PR pytorch#48679, where the previous PR
    adds support for integer inputs to aten::abs by promoting integers to
    float and then demote the result back to integers. This PR supports
    integer inputs to aten::abs more efficiently in the SimpleIREvaluator
    by allowing implementing integer inputs for kAbs (renamed from kFabs).
    - Rename kFabs to kAbs
    - Add support for integer input to kAbs in SimpleIREvalator (note that:
    llvm_codegen and cuda_codegen already supports integer inputs to kAbs)
    
    Test Plan:
    - `PYTORCH_TENSOREXPR_DONT_USE_LLVM=1 python test/test_jit_fuser_te.py
    TestTEFuser.test_unary_ops`
    - `python test/test_jit_fuser_te.py TestTEFuser.test_unary_ops`
    
    Imported from OSS
    
    Reviewed By: eellison
    
    Differential Revision: D25545791
    
    fbshipit-source-id: e52f51a352d149f66ce8341fb3beb479be08a230
    Peng Wu authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    4fc5d14 View commit details
    Browse the repository at this point in the history
  120. Add op bench for caffe2 quantile op (pytorch#49598)

    Summary:
    Pull Request resolved: pytorch#49598
    
    Add op bench for caffe2 quantile op
    
    Test Plan: `buck run mode/opt caffe2/benchmarks/operator_benchmark/c2:quantile_op_test -- --wramup_iterations=10000  --iterations=10000`
    
    Reviewed By: radkris-git
    
    Differential Revision: D25590085
    
    fbshipit-source-id: 0db58ac87c595b2bf2958f6299a1bf2ccea019db
    ShijunK authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    f6d0b3c View commit details
    Browse the repository at this point in the history
  121. add checkout PR tip step for quick checks (pytorch#49590)

    Summary: Pull Request resolved: pytorch#49590
    
    Reviewed By: samestep
    
    Differential Revision: D25633341
    
    Pulled By: walterddr
    
    fbshipit-source-id: 6e8db1f628f562d7632390bdb7788437cb1bf63d
    Rong Rong (AI Infra) authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    25d77c8 View commit details
    Browse the repository at this point in the history
  122. Refactor VmapPhysicalView::newLogicalToPhysical (pytorch#49482)

    Summary:
    Pull Request resolved: pytorch#49482
    
    Motivation
    ==========
    Batching rules always invoke newLogicalToPhysical at the very end to turn
    a physical tensor into a logical BatchedTensor (an example is below):
    ```
    Tensor select_backward_batching_rule(const Tensor& grad, IntArrayRef input_sizes, int64_t dim, int64_t index) {
      auto grad_physical = MultiBatchVmapTransform::logicalToPhysical(grad);
      auto grad_input = at::zeros(grad_physical.getPhysicalShape(input_sizes), grad.options());
      auto physical_dim = getGradInputPhysicalDim(dim, input_sizes, grad_physical.numBatchDims());
      grad_input.select(physical_dim, index).copy_(grad_physical.tensor());
      return grad_physical.newLogicalFromPhysical(grad_input);
    }
    ```
    However, albanD noted that this function is confusing and ambiguous
    because it's unclear which physical tensor is being turned into the logical
    (in this case, grad_physical is a VmapPhysicalView, but we're really transforming
    grad_input and returning it).
    pytorch#44505 (comment)
    
    I didn't want to make too many changes to the batching rule API because
    I think we'll change it even more in the future, but this PR attempts to
    remove the ambiguity by applying one of the suggestions in
    pytorch#44505 (comment)
    
    This PR
    =======
    
    The diagnosis of the problem is that we were conflating
    "VmapPhysicalView", which maps logical attributes on a Tensor (like
    dimension and shape) to physical attributes, with the reverse
    physical-to-logical map. This PR creates a new VmapPhysicalToLogicalMap
    object that handles the latter.
    
    Instead of calling `grad_physical.newLogicalFromPhysical(grad_input)`,
    an author of batching rules should now retrieve the VmapPhysicalToLogicalMap
    object and apply it to their physical input. So the above code becomes:
    ```
    grad_physical.getPhysicalToLogicalMap().apply(grad_input)
    ```
    
    I've also moved VmapPhysicalView::makeLogicalFromPhysicalListInplace
    to VmapPhysicalToLogicalMap::applyInplace.
    
    Test Plan
    =========
    wait for tests
    
    Test Plan: Imported from OSS
    
    Reviewed By: mrshenli
    
    Differential Revision: D25592645
    
    Pulled By: zou3519
    
    fbshipit-source-id: 9c6ede9901ec6b70e5763193064658a8f91e6d48
    zou3519 authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    41dbb0e View commit details
    Browse the repository at this point in the history
  123. fixed the first line of torch.rst to match the __init__.py file's fir…

    …st line (pytorch#49584)
    
    Summary:
    Changed the first line of the torch.rst file to match that of the __init__.py file
    
    Fixes pytorch#49228
    
    Pull Request resolved: pytorch#49584
    
    Reviewed By: VitalyFedyunin
    
    Differential Revision: D25639260
    
    Pulled By: mrshenli
    
    fbshipit-source-id: a0bafd945ff92115eed932662feedc46d29dfaab
    jonykarki authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    2aa7bd0 View commit details
    Browse the repository at this point in the history
  124. Fix Module backward hooks for all Tensor inputs/outputs (pytorch#46163)

    Summary:
    Fixes pytorch#598
    
    This is BC-breaking as we now explicitly don't call the hook when there are not Tensors at the top level of the output.
    This feature was not working anyways as the returned grad_input/grad_output were wrong (not respecting the output structure and wrong inputs for multi-Node Module).
    
    This is also BC-breaking as we now report the correct gradients for `nn.Module`s that contain multiple autograd `Node`s while we use to return bad results before.
    
    Pull Request resolved: pytorch#46163
    
    Reviewed By: ailzhang, mruberry
    
    Differential Revision: D24894180
    
    Pulled By: albanD
    
    fbshipit-source-id: e1b5d193d2818eb2f51e2a2722c7405c8bd13c2b
    albanD authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    e2bc618 View commit details
    Browse the repository at this point in the history
  125. Remove deadlines for Caffe2 hypothesis_test when running on GPU. (pyt…

    …orch#49591)
    
    Summary:
    Pull Request resolved: pytorch#49591
    
    A bunch of these tests are marked flaky, and have been since time immemorial. (Read: as far back as Buck will build.) However closer inspection reveals that they fail if and only if run on a GPU worker. What seems to be going on is that there are more jobs than GPUs, so the contention causes waits which registers as timeouts on the test.
    
    This diff is kind of hacky, but it basically just drops deadlines if a GPU is present. Because Caffe2 is going away I'm not too terribly concerned about a beautiful solution, but we may as well keep some test coverage if it's easy.
    
    CC Sebastian, Ilia, Min, and Hongzheng who also have tasks for what seems to be the same flakiness.
    
    Test Plan: Turn the tests back on and see if they fall over. (The failure repros reliably on an OnDemand GPU and is fixed by this change, so it's not really just a hail Mary.)
    
    Reviewed By: ngimel
    
    Differential Revision: D25632981
    
    fbshipit-source-id: 43dcce416fea916ba91f891e9e5b59b2c11cca1a
    Taylor Robie authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    e253a31 View commit details
    Browse the repository at this point in the history
  126. [FX] Enforce args is tuple and kwargs is dict (pytorch#49526)

    Summary: Pull Request resolved: pytorch#49526
    
    Test Plan: Imported from OSS
    
    Reviewed By: Chillee
    
    Differential Revision: D25606115
    
    Pulled By: jamesr66a
    
    fbshipit-source-id: f2a21d02a2cf8c08cbd618efc5a6a28d34806851
    James Reed authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    a7d4333 View commit details
    Browse the repository at this point in the history
  127. Renaming CAFFE2_API to TORCH_API (pytorch#49496)

    Summary:
    Since caffe2 and torch have been consolidated, CAFFE2_API should be merged with TORCH_API. Addresses a TODO.
    
    Manually edited some references of the removed `CAFFE2_API`:
    * `CONTRIBUTING.md`
    * `caffe2/proto/CMakeLists.txt`
    * `cmake/ProtoBuf.cmake`
    * `c10/macros/Export.h`
    * `torch/csrc/WindowsTorchApiMacro.h`
    
    Pull Request resolved: pytorch#49496
    
    Reviewed By: malfet, samestep
    
    Differential Revision: D25600726
    
    Pulled By: janeyx99
    
    fbshipit-source-id: 7e068d959e397ac183c097d7e9a9afeca5ddd782
    janeyx99 authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    efb851b View commit details
    Browse the repository at this point in the history
  128. [PyTorch Mobile] Export Operator List from Mobile CompilationUnit ins…

    …tead of from TorchScript Model (pytorch#49385)
    
    Summary:
    Pull Request resolved: pytorch#49385
    
    Currently, the API to export operator lists accepts a `torch::jit::Module` object, and spits out an operator list. The operator list is practically used only for mobile. This is not ideal because the set of root operators may change by the time the model is subsequently optmized and exported for mobile.
    
    What we need to to instead is glean the list of operators from the mobile model itself (`bytecode.pkl` specifically), and expose that instead.
    
    Also updated the logic in `converter`.
    
    ### Before this change:
    1. Get operator List from Torch Script Model
    2. Convert to bytecode mobile model
    
    ### After this change:
    1. Convert to bytecode mobile model
    2. Use this converted mobile model to get the list of operators for each method on the model
    
    ghstack-source-id: 118796752
    
    Test Plan:
    Added a unit test in `test_lite_interpreter.cpp` to ensure that all model referenced operators show up in the exported operator list. Also make `test_lite_interpreter.cpp` runnable from `xplat/caffe2/BUCK` since this is where the production code will be built from.
    
    Verified that the list of operators produced before and after this change for an example model (segmentation) are the same.
    
    {P147863234}
    
    Also verified that the operator lists for BI-Xray model is different (we have been having problems with missing operators for this one): {P154903132}
    
    Reviewed By: iseeyuan
    
    Differential Revision: D24690094
    
    fbshipit-source-id: 0426a6ef90456a811010cfe337c415882ae2deff
    dhruvbird authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    2e88a18 View commit details
    Browse the repository at this point in the history
  129. New profiler API (pytorch#48280)

    Summary:
    Pull Request resolved: pytorch#48280
    
    Adding new API for the kineto profiler that supports enable predicate
    function
    
    Test Plan: unit test
    
    Reviewed By: ngimel
    
    Differential Revision: D25142220
    
    Pulled By: ilia-cher
    
    fbshipit-source-id: c57fa42855895075328733d7379eaf3dc1743d14
    Ilia Cherniavskii authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    422e2d1 View commit details
    Browse the repository at this point in the history
  130. Adding support for bitwise augassignment operators (pytorch#44621)

    Summary:
    ========
    Fixes #{42915}
    
    This commit adds support for Bitwise Shorthands in TorchScript, i.e : |=,&=,^=,<<=,>>=,**=
    
    Testing:
    ======
    This commit also adds test for the above fix in test_jit.py
    The test can be invoked by
    pytest -k augassign test/test_jit.py
    
    Here is a snapshot of the testing:
    <img width="1238" alt="image" src="https://user-images.githubusercontent.com/70345919/93105141-8f9f5300-f663-11ea-836b-3b52da6d2be5.png">
    
    Pull Request resolved: pytorch#44621
    
    Reviewed By: mrshenli
    
    Differential Revision: D23906344
    
    Pulled By: nikithamalgifb
    
    fbshipit-source-id: 4c93a7430a625f698b163609ccec15e51417d564
    nikithamalgifb authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    d770127 View commit details
    Browse the repository at this point in the history
  131. Test pipeline parallelism works with DDP. (pytorch#48470)

    Summary:
    Pull Request resolved: pytorch#48470
    
    Adding a unit test to test this works as expected. Although, this
    doesn't work with other checkpointing modes of the pipe and checkpoint=never
    needs to be set for this to work.
    ghstack-source-id: 118820806
    
    Test Plan: waitforbuildbot
    
    Reviewed By: mrshenli
    
    Differential Revision: D25182668
    
    fbshipit-source-id: 85e69e338bf388c132a303ad93e29ec2cc4a0ed8
    pritamdamania authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    3f2a6c5 View commit details
    Browse the repository at this point in the history
  132. [FX] Emit named tuple construction node when NamedTuple appears as an…

    … arg (pytorch#49553)
    
    Summary: Pull Request resolved: pytorch#49553
    
    Test Plan: Imported from OSS
    
    Reviewed By: zdevito
    
    Differential Revision: D25618577
    
    Pulled By: jamesr66a
    
    fbshipit-source-id: 042f742f9ca02e59bbceda97bfcf47f9bac07873
    James Reed authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    7e483a7 View commit details
    Browse the repository at this point in the history
  133. [package] implicitly extern stdlib before mocking (pytorch#49306)

    Summary:
    Pull Request resolved: pytorch#49306
    
    This allows you to mock out everything except for specific patterns while
    still correctly externing the python standard library. This makes it less
    likely that you will need to override require_module.
    
    Test Plan: Imported from OSS
    
    Reviewed By: suo
    
    Differential Revision: D25526212
    
    Pulled By: zdevito
    
    fbshipit-source-id: 7339f4c7f12af883496f79de95e57d452bb32dc2
    zdevito authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    e530504 View commit details
    Browse the repository at this point in the history
  134. Upload test times to S3 (pytorch#49190)

    Summary:
    This PR currently just modifies the `test/print_test_stats.py` script (run in the `pytorch_linux_test` job) so that now it uploads test times to the new `ossci-metrics` S3 bucket (rather than just to Scribe) if passed the `--upload-to-s3` parameter.
    
    The next step is to add an additional step to that `pytorch_linux_test` job which checks if it's being run on a PR, and if so, finds the `master` commit to compare against (similar to what's done in the now-unused `.jenkins/pytorch/short-perf-test-{c,g}pu.sh` scripts) and adds test time info to the Dr CI comment if the PR is significantly different from the base revision.
    
    Pull Request resolved: pytorch#49190
    
    Test Plan:
    An "integration test" would be to just look in [the `ossci-metrics` S3 bucket](https://s3.console.aws.amazon.com/s3/buckets/ossci-metrics) to confirm that the CI run(s) for this PR did indeed upload their test time data successfully.
    
    To test this locally, first make sure you have all the packages you need, such as these:
    ```
    $ conda install -c anaconda boto3
    $ conda install -c conda-forge unittest-xml-reporting
    ```
    Then run whatever tests you want; these are the ones I used for my local smoke test, for no particular reason:
    ```
    $ python test/test_spectral_ops.py --save-xml=/tmp/reports/spectral_ops
    ```
    Once the tests finish, run the script to upload their times to S3:
    ```
    $ CIRCLE_SHA1="$(git rev-parse HEAD)" CIRCLE_JOB=foo test/print_test_stats.py --upload-to-s3 /tmp/reports/spectral_ops
    ```
    Now check that they uploaded successfully:
    ```
    $ aws s3 cp "s3://ossci-metrics/test_time/$(git rev-parse HEAD)/foo/" /tmp/reports --recursive
    ```
    And that it's a valid `*.json.bz2` file:
    ```
    $ bzip2 -kdc /tmp/reports/*Z.json.bz2 | jq . | head -n21
    {
      "build_pr": null,
      "build_tag": null,
      "build_sha1": "e46f43621b910bc2f18dd33c08f5af18a542d5ed",
      "build_branch": null,
      "build_job": "foo",
      "build_workflow_id": null,
      "total_seconds": 0.9640000000000003,
      "suites": {
        "TestFFTCPU": {
          "total_seconds": 0.9640000000000003,
          "cases": [
            {
              "name": "test_fft_invalid_dtypes_cpu",
              "seconds": 0.022,
              "errored": false,
              "failed": false,
              "skipped": false
            },
            {
              "name": "test_istft_throws_cpu",
    ```
    
    Reviewed By: walterddr, malfet
    
    Differential Revision: D25618035
    
    Pulled By: samestep
    
    fbshipit-source-id: 4d8013859a38a49e5bba700c5134951ca1a9d8b7
    samestep authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    113ca4d View commit details
    Browse the repository at this point in the history
  135. Cleanup APIs for pipeline parallelism. (pytorch#48630)

    Summary:
    Pull Request resolved: pytorch#48630
    
    1) Make torch.distributed.pipeline package public.
    2) Make several helper methods private.
    ghstack-source-id: 118820803
    
    Test Plan: waitforbuildbot
    
    Reviewed By: rohan-varma
    
    Differential Revision: D25235688
    
    fbshipit-source-id: c32833ebf090ddbd4eaf06fcb5e3f9d421623a60
    pritamdamania authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    c4d42b4 View commit details
    Browse the repository at this point in the history
  136. [torchscript] Fix constant propagation schemas (pytorch#49605)

    Summary: Pull Request resolved: pytorch#49605
    
    Test Plan: Imported from OSS
    
    Reviewed By: eellison
    
    Differential Revision: D25643157
    
    Pulled By: IvanKobzarev
    
    fbshipit-source-id: c5440622f6cf559afadca853e1eb7a9fbb8edf7f
    IvanKobzarev authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    8bef7b7 View commit details
    Browse the repository at this point in the history
  137. Add sinc operator (pytorch#48740)

    Summary:
    Implements the sinc operator.
    See https://numpy.org/doc/stable/reference/generated/numpy.sinc.html
    
    ![image](https://user-images.githubusercontent.com/13428986/101653855-cdffa080-3a0d-11eb-8426-ecc81c152ebd.png)
    
    Pull Request resolved: pytorch#48740
    
    Reviewed By: ezyang
    
    Differential Revision: D25597565
    
    Pulled By: soulitzer
    
    fbshipit-source-id: 6dbcf282ee4eba34930bc9e5c85c0c5e79cf0322
    soulitzer authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    0839efa View commit details
    Browse the repository at this point in the history
  138. Output stacks (support for SVG visualization) (pytorch#48438)

    Summary:
    Pull Request resolved: pytorch#48438
    
    Outputting stacks in a format suitable for SVG vizualization
    (e.g. with https://github.com/brendangregg/FlameGraph tool)
    
    Test Plan:
    python test/test_profiler.py -k test_export_stacks
    
    e.g. resnet18 (note: actual SVG is interactive):
    
    <img width="1193" alt="Screen Shot 2020-11-24 at 7 06 27 PM" src="https://user-images.githubusercontent.com/30845429/100178160-397f3500-2e88-11eb-81c4-34b19c5fcb87.png">
    
    Reviewed By: dzhulgakov
    
    Differential Revision: D25174270
    
    Pulled By: ilia-cher
    
    fbshipit-source-id: 6b60084071b209441805c468f5ff777318e42d1a
    Ilia Cherniavskii authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    3086f7f View commit details
    Browse the repository at this point in the history
  139. torch.reciprocal: promote integer inputs to float (pytorch#49102)

    Summary:
    Fixes pytorch#49091
    
    Pull Request resolved: pytorch#49102
    
    Reviewed By: VitalyFedyunin
    
    Differential Revision: D25639541
    
    Pulled By: soulitzer
    
    fbshipit-source-id: 1dd360bd7b77f106d606143d8d3961610bac8cb7
    soulitzer authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    c5e477a View commit details
    Browse the repository at this point in the history
  140. [NNC] Disable masked fill (pytorch#49622)

    Summary:
    There's a bug internally, disable as quick fix before investigation
    
    Pull Request resolved: pytorch#49622
    
    Test Plan:
    Imported from GitHub, without a `Test Plan:` line.
    build
    
    Reviewed By: zheng-xq, PursueHappinessDirectly
    
    Differential Revision: D25651897
    
    Pulled By: eellison
    
    fbshipit-source-id: dd1454f2ef7506d7844016128aa6320d7e69aa6e
    Elias Ellison authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    55296c4 View commit details
    Browse the repository at this point in the history
  141. [Issue pytorch#46210] added torch.fx.len() to provide support for len…

    …(); added a test case for torch.fx.len() (pytorch#49532)
    
    Summary: Pull Request resolved: pytorch#49532
    
    Test Plan: Imported from OSS
    
    Reviewed By: jamesr66a
    
    Differential Revision: D25608804
    
    Pulled By: huiguoo
    
    fbshipit-source-id: 93ac02ab57db5d200d92443062286c34782ec0ef
    huiguoo authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    65b8aa3 View commit details
    Browse the repository at this point in the history
  142. Inline coverage report combining/reporting (pytorch#49615)

    Summary:
    Instead of calling coverage frontend import coverage module and call combine() and html_report()
    
    Fixes pytorch#49596 by not using a strict mode when combining those reports
    
    Pull Request resolved: pytorch#49615
    
    Reviewed By: seemethere
    
    Differential Revision: D25645196
    
    Pulled By: malfet
    
    fbshipit-source-id: be55b5c23a3569a331cbdf3f86d8c89bc27d5fe1
    malfet authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    c8968bf View commit details
    Browse the repository at this point in the history
  143. [Gradient Compression] Implement the original layerwise PowerSGD (pyt…

    …orch#49417)
    
    Summary:
    Pull Request resolved: pytorch#49417
    
    The existing implementation applies PowerSGD to a batch of flattened tensors, which is a coarse-grained compression. This hook now is renamed as "batched_powerSGD_hook".
    
    Now implement the original implementation in the paper, which applies PowerSGD to each per-parameter tensor. This is a layerwise fine-grained compression. Although this original implementation is slower, it is expected to achieve a higher accuracy, especially when the shapes of per-param tensors cannot be aligned.
    
    Also add a test in distributed_test.py.
    
    Original PR issue: Investigate Applying PowerSGD to Communication Hook for Gradient Compression pytorch#47202
    ghstack-source-id: 118921275
    
    Test Plan:
    buck test mode/dev-nosan caffe2/test/distributed:c10d -- test_powerSGD_ddp_comm_hook_nccl
    
    buck test mode/dev-nosan caffe2/test/distributed:distributed_nccl_fork -- test_DistributedDataParallel_powerSGD_ddp_comm_hook
    
    Reviewed By: rohan-varma
    
    Differential Revision: D25511543
    
    fbshipit-source-id: 19ef188bc2d4c7406443c8fa233c1f2c2f27d93c
    Yi Wang authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    a17f9e6 View commit details
    Browse the repository at this point in the history
  144. Improve documentation for pipeline parallelism. (pytorch#48638)

    Summary:
    Pull Request resolved: pytorch#48638
    
    Polishing up some of the docs for the main `Pipe` class and its
    `forward` method.
    ghstack-source-id: 118820804
    
    Test Plan: waitforbuildbot
    
    Reviewed By: rohan-varma
    
    Differential Revision: D25237705
    
    fbshipit-source-id: ba3d8737b90a80024c827c0887fc56f14bf678b7
    pritamdamania authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    b375c45 View commit details
    Browse the repository at this point in the history
  145. Add benchmark for torch.distributed.pipeline.sync.Pipe (pytorch#49577)

    Summary:
    Pull Request resolved: pytorch#49577
    
    Repurposing the benchmarking from
    https://github.com/facebookresearch/fairscale/blob/master/benchmarks/pipe.py
    and pulling in a stripped down version of the benchmark into PyTorch.
    
    Sample output:
    ```
    Running benchmark with args: Namespace(batch_size=8, checkpoint='never', chunks=4, host='localhost', max_batch=10, num_decoder_layers=10, num_devices=4)
    Number of parameters for model: 292833040
    | batch     1 | wps 3593.07 | loss 25.98 | ppl 192556591553.37
    | batch     2 | wps 4405.16 | loss 19.36 | ppl 256201548.33
    | batch     3 | wps 4404.98 | loss 23.56 | ppl 17111244076.37
    | batch     4 | wps 4413.25 | loss 27.11 | ppl 594561327825.83
    | batch     5 | wps 4408.53 | loss 25.92 | ppl 181277705101.33
    | batch     6 | wps 4385.64 | loss 24.92 | ppl 66592883598.50
    | batch     7 | wps 4434.11 | loss 24.75 | ppl 56113635884.68
    | batch     8 | wps 4441.25 | loss 24.88 | ppl 63666024212.82
    | batch     9 | wps 4425.49 | loss 25.35 | ppl 101959669008.98
    | batch    10 | wps 4421.05 | loss 25.34 | ppl 101597621863.94
    Peak memory usage for GPUs: cuda:0: 2.38GiB, cuda:1: 3.04GiB, cuda:2: 3.04GiB, cuda:3: 3.67GiB,
    ```
    ghstack-source-id: 118939686
    
    Test Plan: sentinel
    
    Reviewed By: rohan-varma
    
    Differential Revision: D25628721
    
    fbshipit-source-id: 41c788eed4f852aef019aec18a84cb25ad254f3a
    pritamdamania authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    8c75384 View commit details
    Browse the repository at this point in the history
  146. Bump tensorpipe version (pytorch#49599)

    Summary: Pull Request resolved: pytorch#49599
    
    Reviewed By: lw
    
    Differential Revision: D25639036
    
    Pulled By: mrshenli
    
    fbshipit-source-id: 595b396a01d7fa9049d88447ab9079e286637afe
    Lucas Hosseini authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    4dd4d0c View commit details
    Browse the repository at this point in the history
  147. Fix lint (pytorch#49629)

    Summary:
    Fix lint on master
    
    Pull Request resolved: pytorch#49629
    
    Reviewed By: rohan-varma
    
    Differential Revision: D25654199
    
    Pulled By: mrshenli
    
    fbshipit-source-id: 2ab5669ad47996c0ca0f9b6611855767d5af0506
    mrshenli authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    32073ec View commit details
    Browse the repository at this point in the history
  148. [quant][graphmode][fx] Allow user to specify qconfig for call_method (p…

    …ytorch#49621)
    
    Summary:
    Pull Request resolved: pytorch#49621
    
    This adds support to configure qconfig for a call_method, e.g. x.chunk, this will help workaround
    a problem in our internal model.
    
    TODO: since call_method is also a string and we flatten the qconfig, might need to resolve namespace conflict between
    call_method and module_name
    TODO: Add scope support to set the qconfig for call_method correctly with original qconfig
    
    Test Plan: Imported from OSS
    
    Reviewed By: vkuzo
    
    Differential Revision: D25651828
    
    fbshipit-source-id: 82d66b121d37c8274fd481b6a2e9f9b54c5ca73d
    jerryzh168 authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    cd8ef1a View commit details
    Browse the repository at this point in the history
  149. Revert D25511543: [Gradient Compression] Implement the original layer…

    …wise PowerSGD
    
    Test Plan: revert-hammer
    
    Differential Revision:
    D25511543 (pytorch@71f3399)
    
    Original commit changeset: 19ef188bc2d4
    
    fbshipit-source-id: a363641a059aeacc57684884998cf8fb7363d748
    mrshenli authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    9b0a4c6 View commit details
    Browse the repository at this point in the history
  150. [PyTorch Mobile] Preserve bundled input related methods when calling …

    …optimize_for_mobile (pytorch#49170)
    
    Summary:
    Pull Request resolved: pytorch#49170
    
    Added an extra step to **always** preserve the bundled inputs methods if they are present in the input module.
    
    Also added a check to see if all the methods in the `preseved_methods` exist. If not, we will now throw an exception. This can hopefully stop hard-to-debug inputs from getting into downstream functions.
    
    ~~Add an optional argument `preserve_bundled_inputs_methods=False` to the `optimize_for_mobile` function. If set to be True, the function will now add three additional functions related with bundled inputs to be preserved: `get_all_bundled_inputs`, `get_num_bundled_inputs` and `run_on_bundled_input`.~~
    
    Test Plan:
    `buck test mode/dev //caffe2/test:mobile -- 'test_preserve_bundled_inputs_methods \(test_mobile_optimizer\.TestOptimizer\)'`
    
    or
    
    `buck test caffe2/test:mobile` to run some other related tests as well.
    
    Reviewed By: dhruvbird
    
    Differential Revision: D25463719
    
    fbshipit-source-id: 6670dfd59bcaf54b56019c1a43db04b288481b6a
    bearzx authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    8d2580f View commit details
    Browse the repository at this point in the history
  151. Disable test on windows (pytorch#49636)

    Summary:
    Pull Request resolved: pytorch#49636
    
    test_export_stacks fails with permission errors
    
    Test Plan:
    CI
    
    Imported from OSS
    
    Reviewed By: robieta
    
    Differential Revision: D25654680
    
    fbshipit-source-id: 5689289e06eebc0686030f90ed56483a072b6850
    Ilia Cherniavskii authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    c6210c3 View commit details
    Browse the repository at this point in the history
  152. Remove DataPtr extractor from CUDAFuture (pytorch#48840)

    Summary:
    Pull Request resolved: pytorch#48840
    
    The CUDAFuture class needs to inspect the values it contains in order to extract its tensors (in fact, the DataPtrs backing those). These are needed first to determine what CUDA devices back those tensors, so that an event for each such device can be recorded; and later to record these DataPtrs with the CUDA caching allocator if they are used in other streams.
    
    This became complicated when Python was added to the mix, because to inspect a Python object we need to acquire the GIL, but we couldn't do so from code that was supposed to also work in C++-only mode. The solution was for users to provide a custom way to extract DataPtrs, so that the PythonFutureWrapper could install such a custom Python-aware one. This was the DataPtr extractor.
    
    In pytorch#48502 a different suggestion was proposed. At its root, it consists in adding support for IValues of type PyObject to the visit() and getSubValues() methods. In order to deal with the GIL, we do this through a virtual method: PyObjectHolder, which is the base class, is available also in C++-only mode, and thus defines this method but leaves it unimplemented; ConcretePyObjectHolder, which is the subclass, is only included in Python mode, and thus it can implement that method, acquire the GIL, and do what it's supposed to.
    
    In my opinion, this approach is just brilliant! Thank wanchaol for proposing it! It hides the complexity of dealing with Python inside getSubValues(), where it can be done properly, thus simplifying enormously the CUDAFuture and the PythonFutureWrapper classes.
    ghstack-source-id: 118704935
    
    Test Plan: Unit tests
    
    Reviewed By: wanchaol
    
    Differential Revision: D25334355
    
    fbshipit-source-id: 3f1d3bf6e6e8505a114c877fb9a6fcc3f68d91d3
    lw authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    20c0038 View commit details
    Browse the repository at this point in the history
  153. disable kthvalue overlap (pytorch#48254)

    Summary:
    Fixes pytorch#47934
    
    Pull Request resolved: pytorch#48254
    
    Reviewed By: bdhirsh
    
    Differential Revision: D25276689
    
    Pulled By: VitalyFedyunin
    
    fbshipit-source-id: a70774e31c269b41786170e99ec1ede42596ba7b
    guol-fnst authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    0eecd3d View commit details
    Browse the repository at this point in the history
  154. Resubmit: [Gradient Compression] Implement the original layerwise Pow…

    …erSGD (pytorch#49639)
    
    Summary:
    Pull Request resolved: pytorch#49639
    
    Resubmit pytorch#49417 with a fix for distributed_test.
    
    The previous submission broke a multi-gpu test that runs on 4 GPUs. Since this test only runs on master, couldn't detect it before the submission.
    
    The real diff is:
    pytorch@4ca1014
    
    This time I have verified that the previous failed test `pytorch_linux_xenial_cuda10_2_cudnn7_py3_multigpu_test` could pass after creating a PR (pytorch#49651) from a separate branch:
    https://app.circleci.com/pipelines/github/pytorch/pytorch/253644/workflows/c1c02b70-0877-40e6-8b4c-61f60f6b70ed/jobs/9768079
    
    ghstack-source-id: 118969912
    
    Test Plan: buck test mode/dev-nosan caffe2/test/distributed:distributed_nccl_fork -- test_DistributedDataParallel_powerSGD_ddp_comm_hook、
    
    Reviewed By: mrshenli
    
    Differential Revision: D25654961
    
    fbshipit-source-id: 2a45c8ceb9bdb54ff7309a8b66ec87e913e0150e
    Yi Wang authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    5ea5c01 View commit details
    Browse the repository at this point in the history
  155. Updated derivative rules for complex svd and pinverse (pytorch#47761)

    Summary:
    Updated `svd_backward` to work correctly for complex-valued inputs.
    Updated `common_methods_invocations.py` to take dtype, device arguments for input construction.
    Removed `test_pinverse` from `test_autograd.py`, it is replaced by entries to `common_methods_invocations.py`.
    Added `svd` and `pinverse` to list of complex tests.
    
    References for complex-valued SVD differentiation:
    
    - https://giggleliu.github.io/2019/04/02/einsumbp.html
    - https://arxiv.org/abs/1909.02659
    
    The derived rules assume gauge invariance of loss functions, so the result would not be correct for loss functions that are not gauge invariant.
    https://re-ra.xyz/Gauge-Problem-in-Automatic-Differentiation/
    
    The same rule is implemented in Tensorflow and [BackwardsLinalg.jl](https://github.com/GiggleLiu/BackwardsLinalg.jl).
    
    Ref. pytorch#33152
    
    Pull Request resolved: pytorch#47761
    
    Reviewed By: ngimel
    
    Differential Revision: D25658897
    
    Pulled By: mruberry
    
    fbshipit-source-id: ba33ecbbea3f592238c01e62c7f193daf22a9d01
    IvanYashchuk authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    8e25d99 View commit details
    Browse the repository at this point in the history
  156. [Gradient Compression] Add error feedback to layerwise PowerSGD (pyto…

    …rch#49418)
    
    Summary:
    Pull Request resolved: pytorch#49418
    
    Add error feedback to the original implementation of PowerSGD.
    
    Original PR issue: Investigate Applying PowerSGD to Communication Hook for Gradient Compression pytorch#47202
    ghstack-source-id: 118670930
    
    Test Plan:
    buck test mode/dev-nosan caffe2/test/distributed:c10d -- test_powerSGD_ddp_comm_hook_nccl
    
    buck test mode/dev-nosan caffe2/test/distributed:distributed_nccl_fork -- test_DistributedDataParallel_powerSGD_ddp_comm_hook
    
    Reviewed By: rohan-varma
    
    Differential Revision: D25555538
    
    fbshipit-source-id: c01145cc9acf574a4c6aa337dbbba0ba7d9350b2
    Yi Wang authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    50e3afc View commit details
    Browse the repository at this point in the history
  157. [Gradient Compression] Replace the assertions in PowerSGD comm hook b…

    …y stream syncrhonization (pytorch#49435)
    
    Summary:
    Pull Request resolved: pytorch#49435
    
    Previously the assertion that prevents illegal memory access is because of the torch.any that returns a boolean value, which initiates a data transfer from the device to the host and forces a synchronization.
    
    An explicit synchronization is more to the point.
    
    Original PR issue: Investigate Applying PowerSGD to Communication Hook for Gradient Compression pytorch#47202
    ghstack-source-id: 118664204
    
    Test Plan:
    buck test mode/dev-nosan caffe2/test/distributed:c10d -- test_powerSGD_ddp_comm_hook_nccl
    
    buck test mode/dev-nosan caffe2/test/distributed:distributed_nccl_fork -- test_DistributedDataParallel_powerSGD_ddp_comm_hook
    
    Reviewed By: rohan-varma
    
    Differential Revision: D25573484
    
    fbshipit-source-id: 516d0d502da2863b516c15332702335ee662f072
    Yi Wang authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    53750d2 View commit details
    Browse the repository at this point in the history
  158. Add support for torch.tensor_split to accept a tensor for indices a…

    …rgument (pytorch#49169)
    
    Summary:
    Pull Request resolved: pytorch#49169
    
    Trying to solve PR request pytorch#47479.
    This diff tries to overload method `torch.tensor_split` to also accept a tensor for argument `split_size_or_sections` which currently accepts a python list or int. The motivation is to avoid converting a tensor to a list so that when tracing a model/module the tensor operations can be recorded.
    
    Implementation is following the diff that originally added the `tensor_split` method D24166164 (pytorch@ef4817f).
    
    Test Plan:
    ```
    buck test caffe2/test:torch -- tensor_split
    ```
    https://www.internalfb.com/intern/testinfra/testconsole/testrun/5910974550563805/
    
    ```
    buck test caffe2/test:others -- tensor_split
    ```
    https://www.internalfb.com/intern/testinfra/testconsole/testrun/1688849905082678/
    
    Reviewed By: mruberry
    
    Differential Revision: D25440885
    
    fbshipit-source-id: 6705dc551279e3a5eb1e5ec1ede2728eab85ffb1
    Edson Romero authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    9098cc7 View commit details
    Browse the repository at this point in the history
  159. [AutoAccept][Codemod][FBSourceClangFormatLinter] Daily `arc lint --ta…

    …ke CLANGFORMAT`
    
    Reviewed By: zertosh
    
    Differential Revision: D25662961
    
    fbshipit-source-id: f5811a5797fd6dc8733fdf86f35c93d12a08d53a
    generatedunixname89002005325676 authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    c7f5af6 View commit details
    Browse the repository at this point in the history
  160. [WIP][DataLoader] CollateIterableDataset prototype (pytorch#48933)

    Summary:
    Pull Request resolved: pytorch#48933
    
    Prototype for CollateIterableDataset.
    Move `collate_batch_fn` to BatchIterableDataset
    
    - CollateIterableDataset
      - [x] Prototype
      - [x] Tests
    - BatchIterableDataset
      - [x] Prototype
      - [x] Tests
    - SamplerIterableDataset
      - [x] Prototype
      - [x] Tests
    
    Test Plan: Imported from OSS
    
    Reviewed By: mrshenli
    
    Differential Revision: D25623635
    
    Pulled By: ejguan
    
    fbshipit-source-id: 99ba077619f672551ac15367baaba985db35a9c2
    ejguan authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    b410315 View commit details
    Browse the repository at this point in the history
  161. [WIP][DataLoader] Prototype of BatchIterableDataset (pytorch#49186)

    Summary: Pull Request resolved: pytorch#49186
    
    Test Plan: Imported from OSS
    
    Reviewed By: mrshenli
    
    Differential Revision: D25623636
    
    Pulled By: ejguan
    
    fbshipit-source-id: 01a08cccb69301481c55b46358203354b9b4f5fa
    ejguan authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    cf9ad1f View commit details
    Browse the repository at this point in the history
  162. [WIP][DataLoader] Prototype of SamplerIterableDataset (pytorch#49363)

    Summary: Pull Request resolved: pytorch#49363
    
    Test Plan: Imported from OSS
    
    Reviewed By: mrshenli
    
    Differential Revision: D25623637
    
    Pulled By: ejguan
    
    fbshipit-source-id: 9155d27d1fc91996b74110795cc73f1da0eedd44
    ejguan authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    dc3bbaa View commit details
    Browse the repository at this point in the history
  163. [Mask R-CNN]Add Int8 AABB Generate proposals Op (pytorch#49574)

    Summary:
    Pull Request resolved: pytorch#49574
    
    Adds support for additional Eigen Utils for custom type defs.
    
    Reviewed By: linbinyu
    
    Differential Revision: D25624556
    
    fbshipit-source-id: 0ffa90aaf8cbf1d08825e95156fb40d966ca7042
    anshuljain1 authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    b3355fd View commit details
    Browse the repository at this point in the history
  164. Fix sinc docs typo (pytorch#49667)

    Summary:
    Fix small typo in sinc docs
    
    Pull Request resolved: pytorch#49667
    
    Reviewed By: ngimel
    
    Differential Revision: D25665721
    
    Pulled By: soulitzer
    
    fbshipit-source-id: 5f78b9e34bb0084e51ae79d1afc450bcb0ae3d75
    soulitzer authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    6da4c09 View commit details
    Browse the repository at this point in the history
  165. Added linalg.solve (pytorch#48456)

    Summary:
    This PR adds `torch.linalg.solve`.
    
    `linalg_solve_out` uses in-place operations on the provided result tensor.
    
    I modified `apply_solve` to accept tensor of Int instead of std::vector, that way we can write a function similar to `linalg_solve_out` but removing the error checks and device memory synchronization.
    
    In comparison to `torch.solve` this routine accepts 1-dimensional tensors and batches of 1-dim tensors for the right-hand-side term. `torch.solve` requires it to be at least 2-dimensional.
    
    Ref. pytorch#42666
    
    Pull Request resolved: pytorch#48456
    
    Reviewed By: izdeby
    
    Differential Revision: D25562222
    
    Pulled By: mruberry
    
    fbshipit-source-id: a9355c029e2442c2e448b6309511919631f9e43b
    IvanYashchuk authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    0edf70b View commit details
    Browse the repository at this point in the history
  166. Fix return type Any for Ternary ops (pytorch#49165)

    Summary: Pull Request resolved: pytorch#49165
    
    Test Plan: Imported from OSS
    
    Reviewed By: eellison
    
    Differential Revision: D25463694
    
    Pulled By: ejguan
    
    fbshipit-source-id: 5cf907e8de6eeb0171d61175a60fac9812b76c6c
    ejguan authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    0c19f79 View commit details
    Browse the repository at this point in the history
  167. Fix typo in add_pr_curve docstrings. (pytorch#49648)

    Summary:
    Very small PR to fix a typo.
    
    ### Description
    Fixed 1 typo in the documentation of `torch/utils/tensorboard/writer.py` (replaced "_should in_" by "_should be in_")
    
    Pull Request resolved: pytorch#49648
    
    Reviewed By: ngimel
    
    Differential Revision: D25665831
    
    Pulled By: mrshenli
    
    fbshipit-source-id: a4e733515603bb9313c1267fdf2cfcc2bc2773c6
    theodumont authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    ffc1c0c View commit details
    Browse the repository at this point in the history
  168. Fixed a typo in dataloader.py. (pytorch#49437)

    Summary:
    This small PR fixes a one character typo in the docstring for `DataLoader`.
    
    Pull Request resolved: pytorch#49437
    
    Reviewed By: ngimel
    
    Differential Revision: D25665971
    
    Pulled By: mrshenli
    
    fbshipit-source-id: b60f975f1e3bf0bb8f88e39f490f716c602f087e
    tmcclintock authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    0b652f9 View commit details
    Browse the repository at this point in the history
  169. [NNC] Intermediate allocs flattened and dependency support (pytorch#4…

    …9554)
    
    Summary:
    Makes two changes in NNC for intermediate buffer allocations:
    1. Flattens dimensions of buffers allocated in LoopNest::prepareForCodegen() to match their flattened usages.
    2. Adds support for tracking memory dependencies of Alloc/Free to the MemDependencyChecker, which will allow us to check safety of accesses to intermediate buffers (coming in a future diff).
    
    I didn't add any new tests as the mem dependency checker tests already cover it pretty well, particularly the GEMM test.
    
    Pull Request resolved: pytorch#49554
    
    Reviewed By: VitalyFedyunin
    
    Differential Revision: D25643133
    
    Pulled By: nickgg
    
    fbshipit-source-id: 66be3054eb36f0a4279d0c36562e63aa2dae371c
    nickgg authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    0a6a102 View commit details
    Browse the repository at this point in the history
  170. Implementing NumPy-like function torch.broadcast_to (pytorch#48997)

    Summary:
    Related pytorch#38349
    
    Implement NumPy-like function `torch.broadcast_to` to broadcast the input tensor to a new shape.
    
    Pull Request resolved: pytorch#48997
    
    Reviewed By: anjali411, ngimel
    
    Differential Revision: D25663937
    
    Pulled By: mruberry
    
    fbshipit-source-id: 0415c03f92f02684983f412666d0a44515b99373
    RockingJavaBean authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    11d2494 View commit details
    Browse the repository at this point in the history
  171. Sparse-sparse matrix multiplication (CPU/CUDA) (pytorch#39526)

    Summary:
    This PR implements matrix multiplication support for 2-d sparse tensors using the COO sparse format.
    
    The current implementation of `torch.sparse.mm` support this configuration,
    `torch.sparse.mm(sparse_matrix1, sparse_matrix2.to_dense())`, but this could spend a lot of memory when sparse_matrix2's shape is large.
    
    This implementation extends `torch.sparse.mm` function to support  `torch.sparse.mm(sparse_matrix1, sparse_matrix2)`
    
    Resolves  #[20988](pytorch#20988) for CPU/CUDA.
    
    - [x] sparse matmul
      - [x] CPU/CUDA C++ implementation
      - [x] unittests
      - [x] update torch.sparse.mm documentation
      - [x] autograd support
    
    The CPU sparse-sparse matmul was implemented taking as a reference this work "Sparse Matrix Multiplication Package (SMMP)". The GPU sparse-sparse matmul is based on cuSparse, there is specific code for CUSPARSE when CUSPARSE_VERSION >= 11 and old version of CUSPARSE. Both CPU/CUDA  rely on the sparse-sparse matmul algorithm using the CSR indices format as it is one of the fastest algorithm.
    
    Here it is the latest benchmark (script is here) results for torch.sparse.mm (CUDA) and torch.sparse.mm (CPU) and scipy, values are float32 scalars:
    
    size | density | sparse.mm(CUDA) | sparse.mm(CPU) | scipy_coo_matmul
    -- | -- | -- | -- | --
    (32, 10000) | 0.01 | 822.7 | 79.4 | 704.1
    (32, 10000) | 0.05 | 1741.1 | 402.6 | 1155.3
    (32, 10000) | 0.1 | 2956.8 | 840.8 | 1885.4
    (32, 10000) | 0.25 | 6417.7 | 2832.3 | 4665.2
    (512, 10000) | 0.01 | 1010.2 | 3941.3 | 26937.7
    (512, 10000) | 0.05 | 2216.2 | 26903.8 | 57343.7
    (512, 10000) | 0.1 | 4868.4 | 87773.7 | 117477.0
    (512, 10000) | 0.25 | 16639.3 | 608105.0 | 624290.4
    (1024, 10000) | 0.01 | 1224.8 | 13088.1 | 110379.2
    (1024, 10000) | 0.05 | 3897.5 | 94783.9 | 236541.8
    (1024, 10000) | 0.1 | 10559.1 | 405312.5 | 525483.4
    (1024, 10000) | 0.25 | 57456.3 | 2424337.5 | 2729318.7
    
    A new backward algorithm was implemented using only `sparse @ sparse` and `sparse_mask` operations. Here is some benchmarking:
    
    ```
    [------------------------- sparse.mm-backward -------------------------]
                                |   sparse.backward   |  dense.backward
     -----------------------------------------------------------------------
          (32, 10000) | 0.01    |            13.5          |         2.4
          (32, 10000) | 0.05    |            52.3          |         2.4
          (512, 10000) | 0.01   |          1016.8          |       491.5
          (512, 10000) | 0.05   |          1604.3          |       492.3
          (1024, 10000) | 0.01  |          2384.1          |      1963.7
          (1024, 10000) | 0.05  |          3965.8          |      1951.9
    ```
    
    I added new benchmark tests. Now I am using a real dataset used in recent studies [1, 2] with different sparsity levels.
    
    ```
    [---------------------------------- matmul ---------------------------------]
                            |   0.5   |  0.7   |  0.8   |  0.9   |  0.95  |  0.98
    1 threads: ------------------------------------------------------------------
      (cpu)   torch         |    5.4  |   5.4  |   5.2  |   5.3  |   5.3  |   5.4
              torch.sparse  |  122.2  |  51.9  |  27.5  |  11.4  |   4.9  |   1.8
              scipy         |  150.1  |  87.4  |  69.2  |  56.8  |  38.4  |  17.1
      (cuda)  torch         |    1.3  |   1.1  |   1.1  |   1.1  |   1.1  |   1.1
              torch.sparse  |   20.0  |   8.4  |   5.1  |   2.5  |   1.5  |   1.1
    
    [----------------------------------- backward -----------------------------------]
                            |   0.5   |   0.7   |   0.8   |   0.9   |   0.95  |   0.98
    1 threads: -----------------------------------------------------------------------
      (cpu)   torch         |   17.7  |   17.9  |   17.7  |   17.7  |   17.6  |   17.9
              torch.sparse  |  672.9  |  432.6  |  327.5  |  230.8  |  176.7  |  116.7
      (cuda)  torch         |    3.8  |    3.6  |    3.5  |    3.5  |    3.6  |    3.5
              torch.sparse  |   68.8  |   46.2  |   35.6  |   24.2  |   17.8  |   11.9
    
    Times are in milliseconds (ms).
    ```
    
    In summary, I can say that the new `sparse @ sparse` backward algorithm is better as it is more about saving space than performance. Moreover, it is better than other options tested before.
    
    ## **References**
    
    1. Trevor Gale, Matei Zaharia, Cliff Young, Erich Elsen. **Sparse GPU Kernels for Deep Learning.**  Proceedings of the International Conference for High Performance Computing, 2020. [https://github.com/google-research/google-research/tree/master/sgk](https://github.com/google-research/google-research/tree/master/sgk)
    2. Trevor Gale, Erich Elsen, Sara Hooker. **The State of Sparsity in Deep Neural Networks.** [https://github.com/google-research/google-research/tree/master/state_of_sparsity](https://github.com/google-research/google-research/tree/master/state_of_sparsity)
    
    Pull Request resolved: pytorch#39526
    
    Reviewed By: mruberry
    
    Differential Revision: D25661239
    
    Pulled By: ngimel
    
    fbshipit-source-id: b515ecd66d25f347d637e159d51aa45fb43b6938
    aocsa authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    38ff78f View commit details
    Browse the repository at this point in the history
  172. [BE] Introduce set_cwd context manager (pytorch#49657)

    Summary:
    Used to temporarily change working directory, but restore it even if exception is raised
    Use it in test_type_hints and during code coverage collection
    
    Pull Request resolved: pytorch#49657
    
    Reviewed By: walterddr
    
    Differential Revision: D25660543
    
    Pulled By: malfet
    
    fbshipit-source-id: 77f08d57e4b60b95daa4068d0dacf7c25f978526
    malfet authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    209bddb View commit details
    Browse the repository at this point in the history
  173. add close() method to tqdm mock (pytorch#46040)

    Summary:
    In `torchvision` we use [`torch.hub.tqdm`](https://github.com/pytorch/vision/blob/2cc20d7485458a6368e8995e3f79799589b632bd/torchvision/datasets/utils.py#L11) to display the dataset download. One of our methods uses [`tqdm().close()`](https://github.com/pytorch/vision/blob/2cc20d7485458a6368e8995e3f79799589b632bd/torchvision/datasets/utils.py#L188), which is [not included in the mock](https://github.com/pmeier/pytorch/blob/283ae1998cd6920b588907adfb88909afb522ae2/torch/hub.py#L22-L49). This PR adds a `close()` method to the mock.
    
    Cc fmassa
    
    Pull Request resolved: pytorch#46040
    
    Reviewed By: mrshenli
    
    Differential Revision: D25619429
    
    Pulled By: fmassa
    
    fbshipit-source-id: a137f2417d8a47923ccb1ec6b7d5298c1545245c
    pmeier authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    2e52d1d View commit details
    Browse the repository at this point in the history
  174. Dynamic GRU quantization support (pytorch#49448)

    Summary:
    Pull Request resolved: pytorch#49448
    
    ghstack-source-id: 118982171
    
    Test Plan:
    buck test caffe2/test:quantization --  'test_qlstmGRU \(quantization\.test_quantized_op\.TestDynamicQuantizedRNNOp\)' --print-passing-details
    buck test caffe2/test:quantization --  'test_quantized_rnn \(quantization\.test_quantize\.TestPostTrainingDynamic\)' --print-passing-details
    buck test caffe2/test:quantization --  'test_qrnncell \(quantization\.test_quantized_op\.TestDynamicQuantizedRNNOp\)' --run-disabled --print-passing-details
    
    Reviewed By: vkuzo
    
    Differential Revision: D25579815
    
    fbshipit-source-id: 413cc8888eb8058230b94c9576d2fa54b0ed1416
    raghuramank100 authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    3dafed5 View commit details
    Browse the repository at this point in the history
  175. converted current debugging statements in LLVM codegen to jit-logging…

    … statements pytorch#48771 (pytorch#49040)
    
    Summary: Pull Request resolved: pytorch#49040
    
    Test Plan: Imported from OSS
    
    Reviewed By: ZolotukhinM
    
    Differential Revision: D25407356
    
    Pulled By: huiguoo
    
    fbshipit-source-id: 1c1f893ed8d0877bee27e9a673a5dce2203c2bad
    huiguoo authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    6f66ee4 View commit details
    Browse the repository at this point in the history
  176. added macros in jit logging to check whether loggings are enabled; re…

    …placed similar checks in LLVM codegen with such macros (pytorch#49121)
    
    Summary: Pull Request resolved: pytorch#49121
    
    Test Plan: Imported from OSS
    
    Reviewed By: ZolotukhinM
    
    Differential Revision: D25445971
    
    Pulled By: huiguoo
    
    fbshipit-source-id: 980775a94159aa0b3b66fae938962761b38703d5
    huiguoo authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    a20a1f9 View commit details
    Browse the repository at this point in the history
  177. change block codegen to handle new inlining in NNC (pytorch#47687)

    Summary:
    minor changes to block codegen to handle new inlining in NNC.
    For Block code generation we need to delay inlining before collecting dimension data about the tensors.
    We need to collect the dimension of the tensor before they were flattened. We don't have this information after the inlining pass, so for Block we run inling after we have collected this data using `CreateBufferMap`  analysis.
    
    Pull Request resolved: pytorch#47687
    
    Reviewed By: ZolotukhinM
    
    Differential Revision: D24864869
    
    Pulled By: protonu
    
    fbshipit-source-id: 9574c0599f7d959a1cf0eb49d4e3e541cbe9b1d3
    Protonu Basu authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    8cb4a36 View commit details
    Browse the repository at this point in the history
  178. Clean up backward compatibility skip list (pytorch#49691)

    Summary:
    Pull Request resolved: pytorch#49691
    
    Quite a few stale items, let's make the list short.
    
    Test Plan: oss ci
    
    Reviewed By: hl475
    
    Differential Revision: D25667464
    
    fbshipit-source-id: cff1be8b5e0068470b3f621acf6bf4fbd414233e
    houseroad authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    2af5914 View commit details
    Browse the repository at this point in the history
  179. Enable product for bool tensor (pytorch#48637)

    Summary:
    Fixes pytorch#48351
    
    Pull Request resolved: pytorch#48637
    
    Reviewed By: mrshenli
    
    Differential Revision: D25658596
    
    Pulled By: mruberry
    
    fbshipit-source-id: ff3ada74b6d281c8e4753ed38339a1c036f722ee
    Kiyosora authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    83c91f9 View commit details
    Browse the repository at this point in the history
  180. Fix test_cuda_init_race skip rules (pytorch#49693)

    Summary:
    Fixes pytorch#49432
    
    Pull Request resolved: pytorch#49693
    
    Reviewed By: walterddr, janeyx99
    
    Differential Revision: D25668027
    
    Pulled By: malfet
    
    fbshipit-source-id: 802cbd39e4ebe585709179f332b680f5f7978814
    malfet authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    56115b7 View commit details
    Browse the repository at this point in the history
  181. Add base forward grad logic (pytorch#49097)

    Summary:
    Pull Request resolved: pytorch#49097
    
    RFC: pytorch/rfcs#11
    
    This PR add the basic logic to handle forward grad as dual Tensors.
    It contains the following:
    - Mechanism to save dual state on a Tensor and clear it up when the dual level ends
    - C++ and python user facing API
    - Updated view system that is able to track both forward and backward views
    
    The current PR has the following limitations:
    - Extensive tests are in the next PR in the stack as formulas are needed to write full tests.
    - Only the manual formulas have been audited and no other formula is actually implemented here (they are in the next PR in the stack)
    - Only level 0 is allowed for now. This was discussed and agreed that it is not needed for the first version of this PR.
    - We can save one ViewInfo creation when both the forward and backward views have the same base. This can be done by adding a boolean flag to the DifferentiableViewMeta and extra logic in the `as_view` method. This is left out to keep this PR concise.
    - We can skip tracking forward views if the base has a forward grad. This can be done by adding extra logic in the `as_view` method. This is left out to keep this PR concise.
    
    Reading guide:
    - Updated view handling in [gen_variable_type.py](https://github.com/pytorch/pytorch/pull/49097/files#diff-f6553cec68caeaea36f6c8b14ff76a6d39dfd774e0ea9ef2f76e8d81fd9af5df), [VariableTypeUtils.h](https://github.com/pytorch/pytorch/pull/49097/files#diff-ec71cfa45954dece1236c661d170e6341879c5be637f4abf52e826d61b40695a), [variable.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-60e3bfe444e89efc7149f25b38e472710525984789934ab83f1bd5671b8ff285) (skip code below "[Forward Grad View]" for now), [variable.h](https://github.com/pytorch/pytorch/pull/49097/files#diff-1604bcd0e4350ed99ec45e437cee7ac9ebe337392c9ea16a236247aeeb35b02bR266-R542) and [custom_function.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-dd85f452082b5bb6612bbc12adb496f8827defa228509f7b493de1d517522d5d). This introduces the new ViewInfo to hold view informations shared for forward and backward. It also updates the differentiable view meta to use this. And it updates the as_view function to handle both forward and backward view.
    - New forward grad class that handle storing gradients and tracking at each level [forward_grad.h](https://github.com/pytorch/pytorch/pull/49097/files#diff-c6c5b9ab2d7e5dde4102495faa1b6bbbfc23aa3e47deb7359c0bfe1eb004c0cb), [forward_grad.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-de2ab54ade7312701850d71a119a4f4ee4b9fc5a9c42a467cdd4e73c033531dd) and [build_variables.bzl](https://github.com/pytorch/pytorch/pull/49097/files#diff-dfdfa2efb17beddfd9094524f95351fd197db6c8857e96b436fb599870359325). EDIT: These files also contain the new flag to globally disable forward AD that allows us to reduce performance issues while this is in development.
    - Lowest level API and binding between Tensor and AutogradMeta in [TensorBody.h](https://github.com/pytorch/pytorch/pull/49097/files#diff-7554853205392fa743357bf845ecc350a974ec049383248c12daaf2f4de04911), [TensorImpl.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-052bd9150ef8e09289ddf644b5a6830ede49207201cd41728f6d7cc6d9cead94), [TensorImpl.h](https://github.com/pytorch/pytorch/pull/49097/files#diff-a15aae4cf23da44970db7cece62ff981265575c798c62f7b52d87c8809dfe2e1) and the rest of [variable.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-60e3bfe444e89efc7149f25b38e472710525984789934ab83f1bd5671b8ff285R557-R677)
    - API to access the forward primal that needs to be a differentiable function (and so in native_functions.yaml) [native_functions.yaml](https://github.com/pytorch/pytorch/pull/49097/files#diff-2f3dbd85efb9b5172f2264eedd3be47dd765e6ab7cc8bf3ade5e62c28ae35991) [NamedRegistrations.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-69bd3bea510c9b64e1633fa18c3ea63d4b8348dbad3a78ad9de844ab3e43dc1d), [VariableMethodsStub.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-23f5fcb737a2b289811fe0f4b65aef775e7c824b2e629ecd343df51405cd434f), [derivatives.yaml](https://github.com/pytorch/pytorch/pull/49097/files#diff-e4c2f99a2404e98c3586e07425da73008f36b1bada790648a7297af141d37f8c), [gen_python_functions.py](https://github.com/pytorch/pytorch/pull/49097/files#diff-e4c2f99a2404e98c3586e07425da73008f36b1bada790648a7297af141d37f8c), [gen_trace_type.py](https://github.com/pytorch/pytorch/pull/49097/files#diff-54e0b976027bf8debefb959ff360b89ae93466970c843365b1b3a03806d868ce), [TraceTypeManual.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-f34636741ad4a23d018e0c289bc750c3bad887b45660e1d6eaf440d234a78fbf) and [part of VariableTypeManual.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-6e19a1bce8cbdba8714b6e2c794a76bc0864b64a49cfa757cb0b5afdc937d1a4R198-R243)
    - c++ API [autograd.h](https://github.com/pytorch/pytorch/pull/49097/files#diff-349028fbe8291a965a7a263c323b208fe071c35c66179ee997ef84fa81aa4b1e), [autograd.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-a3fe908d67dfec16a1fcde300de68b0701bf68b88db7451f29f2bee255cf30c9)
    - python binding [init.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-c58a67c85191c22c9b3bb439117d8053edfd9dea839fa010cf967d404c3c630d)
    - python API [forward_ad.py](https://github.com/pytorch/pytorch/pull/49097/files#diff-a4efad4ba18fffdfb264c21e5475997a24a743089a899f8ec1a5ff962c6738d9), [autograd/__init__.py](https://github.com/pytorch/pytorch/pull/49097/files#diff-743abcafd32ad0e69f39ac5a91df4197b7e1921c135cacee7ef6dc829a8a7af8)
    - c++ and python printing [Formatting.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-881dba501e71662e2e4818b4b016f739b344c8aed2f5edc6b871eda47a2aced0), [_tensor_str.py](https://github.com/pytorch/pytorch/pull/49097/files#diff-a7911f8d5e73adbff914d99fd7818ace2a7030b6a3748abe06ec6fc6e3df9cc3)
    - Utility for formulas and updated manual functions to respect new view system as well as forward grad [FunctionsManual.h](https://github.com/pytorch/pytorch/pull/49097/files#diff-6378bb6dc81a64dab676d61731341fa5d1088418f32a1473a33a0ccfc2357dc1), [FunctionsManual.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-4adbd88239afcd60e8198aab65d4f5e43b62314e34b80551e997a1ea503adea5) [rest of VariableTypeManual.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-6e19a1bce8cbdba8714b6e2c794a76bc0864b64a49cfa757cb0b5afdc937d1a4R264-R433)
    - Ensure SavedVariable save forward grad properly [saved_variable.h](https://github.com/pytorch/pytorch/pull/49097/files#diff-c1b8039d776241abe177d5aa99b79dd9489a9b3e529da8ab24c2e386c1238ae2), [saved_variable.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-cc9fba479b5beae06b2eea2e390d17796e0341c5b037a20b5bcaccbb0c341030)
    
    Test Plan: Imported from OSS
    
    Reviewed By: mrshenli
    
    Differential Revision: D25607503
    
    Pulled By: albanD
    
    fbshipit-source-id: f1396290de1d75760f3d380c43cdd56e86fa6099
    albanD authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    97d64bc View commit details
    Browse the repository at this point in the history
  182. Do not use negative values in GCD computation. (pytorch#49379)

    Summary:
    GCD should always return positive integers. When negative values are used, we hit a corner case that results in an infinite recursion during simplification.
    
    Pull Request resolved: pytorch#49379
    
    Reviewed By: ezyang
    
    Differential Revision: D25597115
    
    Pulled By: navahgar
    
    fbshipit-source-id: b0e8ac07ee50a5eb775c032628d4840df7424927
    navahgar authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    b77390b View commit details
    Browse the repository at this point in the history
  183. [jit][tracer] allow traced modules to return dicts with tuple values …

    …when strict=False (pytorch#49568)
    
    Summary:
    Pull Request resolved: pytorch#49568
    
    We have some inference use cases where the expected output of a module is of the form `{"key": (t1, t1)}` and are currently jit tracing the modules until we can reach jit script compatibility.
    
    Test Plan: buck test mode/dev caffe2/test:jit -- 'test_trace_returning_complex_dict'
    
    Reviewed By: houseroad
    
    Differential Revision: D25624152
    
    fbshipit-source-id: 5adef0e3c9d54cd31ad5fece4ac6530d541fd673
    bradleyhd authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    4ab6172 View commit details
    Browse the repository at this point in the history
  184. Move device guard from MultiTensorApply.cuh (pytorch#46664)

    Summary: Pull Request resolved: pytorch#46664
    
    Test Plan: Imported from OSS
    
    Reviewed By: anjali411
    
    Differential Revision: D24453343
    
    Pulled By: izdeby
    
    fbshipit-source-id: b82a658af50ededc985195ed02dbf60e792c7a13
    Iurii Zdebskyi authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    eb6a2ab View commit details
    Browse the repository at this point in the history
  185. Use store based barrier only for certain store types. (pytorch#49694)

    Summary:
    Pull Request resolved: pytorch#49694
    
    The store based barrier introduced in
    pytorch#49419 broke for certain store types.
    This is a quick fix to resolve the issues for other store types.
    ghstack-source-id: 119006874
    
    Test Plan: 1) waitforbuildbot
    
    Reviewed By: ppwwyyxx, rohan-varma
    
    Differential Revision: D25668404
    
    fbshipit-source-id: 751fb8b229ad6f50ee9c50f63a70de5a91c9eda5
    pritamdamania authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    4164cb2 View commit details
    Browse the repository at this point in the history
  186. Fix TCPStore type coercion (pytorch#49685)

    Summary:
    Fixes pytorch#49052
    
    The TCPStore example with 4 arguments was working because the datetime value was being implicitly converted to a bool. Modified the pybind definition and updated documentation.
    
    Pull Request resolved: pytorch#49685
    
    Test Plan:
    ```
    import torch.distributed as dist
    from datetime import timedelta
    
    dist.TCPStore("127.0.0.1", 0, True, timedelta(seconds=30))
    ```
    
    Now fails with
    ```
    TypeError: __init__(): incompatible constructor arguments. The following argument types are supported:
        1. torch._C._distributed_c10d.TCPStore(host_name: str, port: int, world_size: int, is_master: bool, timeout: datetime.timedelta = datetime.timedelta(seconds=300))
    
    Invoked with: '127.0.0.1', 0, True, datetime.timedelta(seconds=30)
    ```
    
    Reviewed By: mrshenli, ngimel
    
    Differential Revision: D25668021
    
    Pulled By: H-Huang
    
    fbshipit-source-id: ce40b8648d0a414f0255666fbc680f1a66fae090
    H-Huang authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    ccde23b View commit details
    Browse the repository at this point in the history
  187. replacing THC_CLASS and THC_API with TORCH_CUDA_API (pytorch#49690)

    Summary:
    THC_API and THC_CLASS were leftover macros from before the consolidation of caffe2, aten, and torch. Now that they're combined, these are misleading and should just be TORCH_CUDA_API. The only file I manually edited was `THCGeneral.h.in`.
    
    Pull Request resolved: pytorch#49690
    
    Reviewed By: malfet
    
    Differential Revision: D25667982
    
    Pulled By: janeyx99
    
    fbshipit-source-id: 2fdf7912b2a0537b7c25e1fed21cc301fa59d57f
    janeyx99 authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    1e9a97f View commit details
    Browse the repository at this point in the history
  188. Revert D25607503: Add base forward grad logic

    Test Plan: revert-hammer
    
    Differential Revision:
    D25607503 (pytorch@fdf02ef)
    
    Original commit changeset: f1396290de1d
    
    fbshipit-source-id: 057206e28ff48ee288856adfe3ca577d4880789f
    Walter Shen authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    220afd2 View commit details
    Browse the repository at this point in the history
  189. [TensorExpr] Change LoopNest::vectorize to accept For* instead of…

    … `Stmt*`. (pytorch#49696)
    
    Summary:
    Pull Request resolved: pytorch#49696
    
    And make it static.
    
    Test Plan: Imported from OSS
    
    Reviewed By: navahgar, nickgg
    
    Differential Revision: D25668695
    
    Pulled By: ZolotukhinM
    
    fbshipit-source-id: 8d7fb507d6f3beca70e868d9e0f4c46247311a99
    Mikhail Zolotukhin authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    1b63e24 View commit details
    Browse the repository at this point in the history
  190. [TensorExpr] Move SimpleIREval implementation from .h to .cpp. (pyt…

    …orch#49697)
    
    Summary:
    Pull Request resolved: pytorch#49697
    
    Mostly mechanical move. This refactoring helps to hide unnecessary
    details from the SimpleIREval interface and make it more similar to a
    pure 'codegen'.
    
    Test Plan: Imported from OSS
    
    Reviewed By: nickgg
    
    Differential Revision: D25668696
    
    Pulled By: ZolotukhinM
    
    fbshipit-source-id: 423247bfcdfa88403e8ec92152f00110bb9da19c
    Mikhail Zolotukhin authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    d1fac89 View commit details
    Browse the repository at this point in the history
  191. unbreak mypy torch/quantization (pytorch#49549)

    Summary:
    Pull Request resolved: pytorch#49549
    
    Somehow `mypy torch/quantization` got broken in the past couple of days:
    https://gist.github.com/vkuzo/07af454246f0a68e6fa8929beeec7e0d
    .  I didn't see any relevant PRs other than
    pytorch#47725, which doesn't seem
    related. The error doesn't seem real, as the arguments to
    `_cudnn_rnn_flatten_weight` seem correct. For now,
    ignoring the failure so we have a clean `mypy` run on
    `torch/quantization`.
    
    Test Plan:
    ```
    mypy torch/quantization
    ```
    
    Imported from OSS
    
    Reviewed By: jerryzh168
    
    Differential Revision: D25616972
    
    fbshipit-source-id: 46c207fe1565ec949c0b1f57d6cd0c93f627e6bd
    vkuzo authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    ca537cd View commit details
    Browse the repository at this point in the history
  192. fx quant: types for fusion_patterns.py (pytorch#49606)

    Summary:
    Pull Request resolved: pytorch#49606
    
    Adds more types, for readability.
    
    Test Plan:
    ```
    mypy torch/quantization
    ```
    
    Imported from OSS
    
    Reviewed By: jerryzh168
    
    Differential Revision: D25643894
    
    fbshipit-source-id: 4aad52fe4e59ad74b6e0e3acd0f98fba91561a29
    vkuzo authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    6d8e9d3 View commit details
    Browse the repository at this point in the history
  193. fx quant: add types to observed_module.py (pytorch#49607)

    Summary:
    Pull Request resolved: pytorch#49607
    
    Readability
    
    Test Plan:
    ```
    mypy torch/quantization
    ```
    
    Imported from OSS
    
    Reviewed By: jerryzh168
    
    Differential Revision: D25643895
    
    fbshipit-source-id: b4b8741b07ac4827c3bacd2084df81fbfdd0c2d5
    vkuzo authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    1de10d5 View commit details
    Browse the repository at this point in the history
  194. fx quant: fix types on _find_quants (pytorch#49616)

    Summary:
    Pull Request resolved: pytorch#49616
    
    Add types to `_find_quants` I/O and fix resulting errors,
    needed for an upcoming bug fix.
    
    Test Plan:
    ```
    mypy torch/quantization
    python test/test_quantization.py TestQuantizeFx
    ```
    
    Imported from OSS
    
    Reviewed By: jerryzh168
    
    Differential Revision: D25645719
    
    fbshipit-source-id: 4bf788b55fd4fd086c83a4438b9c2df22b9cff49
    vkuzo authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    1869bd7 View commit details
    Browse the repository at this point in the history
  195. [FX] Fix python code having spurious newlines from placeholders (pyto…

    …rch#49720)
    
    Summary: Pull Request resolved: pytorch#49720
    
    Test Plan: Imported from OSS
    
    Reviewed By: zdevito
    
    Differential Revision: D25675825
    
    Pulled By: jamesr66a
    
    fbshipit-source-id: a9028acad9c8feb877fff5cd09aedabed52a3f4b
    James Reed authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    c8aefec View commit details
    Browse the repository at this point in the history
  196. [pt][ATen] Optimize bmm (pytorch#49506)

    Summary:
    Pull Request resolved: pytorch#49506
    
    - Get rid of expensive stuff like `TensorArg`, `checkBackend`, `checkSize`, and `TensorAccessor`.
    - Add `checkDim` that does not require creating a `TensorArg` which incurs a refcount bump
    - Avoid unnecessary calls to `torch.select`, which goes through the dispatcher in the cases we care about, with mat1 and mat2 not permuted or permuted with dims = [0, 2, 1]. The pt version of bmm supports crazy cases like when the inputs are permuted with dims = [1, 2, 0], which is uncommon in SparseNNs.
    
    Test Plan:
    Unit test:
    ```
    buck test //caffe2/test:linalg
    ```
    
    Benchmark with the adindexer model:
    ```
    Before:
    I1216 14:02:24.155516 2595800 PyTorchPredictorBenchLib.cpp:209] PyTorch run finished. Milliseconds per iter: 0.0847197. Iters per second: 11803.6
    After:
    I1216 14:02:26.583878 2595939 PyTorchPredictorBenchLib.cpp:209] PyTorch run finished. Milliseconds per iter: 0.082051. Iters per second: 12187.5
    ```
    
    Reviewed By: bwasti
    
    Differential Revision: D25577574
    
    fbshipit-source-id: 8aba69b950e7b4d9d1b14ba837931695a908c068
    Hao Lu authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    f41aa50 View commit details
    Browse the repository at this point in the history
  197. [PyTorch] Remove direct reference to native symbols in sparse related…

    … non-native codes (pytorch#49721)
    
    Summary:
    Pull Request resolved: pytorch#49721
    
    As a refactor effort of per-app selective build, we are decoupling ATen/native from the rest of aten (D25413998).
    All symbols of ATen/native could only be referenced through dispatcher (pytorch#48684).
    
    This diff is to decouple the native reference recently introduced for sparse tensors.
    ghstack-source-id: 119028080
    
    Test Plan: CI
    
    Reviewed By: dhruvbird, ngimel
    
    Differential Revision: D25675711
    
    fbshipit-source-id: 381cbb3b361ee41b002055399d4996a9ca21377c
    iseeyuan authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    db3f718 View commit details
    Browse the repository at this point in the history
  198. [Gradient Compression] Warm-start of PowerSGD (pytorch#49451)

    Summary:
    Pull Request resolved: pytorch#49451
    
    Reuse the low-rank tensors P(s) and Q(s) from the previous iteration if possible.
    
    This can give a better compression performance in terms of both accuracy and speed.
    
    Also add a unit test for batched PowerSGD to test_c10d.py.
    
    Original PR issue: Investigate Applying PowerSGD to Communication Hook for Gradient Compression pytorch#47202
    ghstack-source-id: 119014132
    
    Test Plan:
    buck test mode/dev-nosan caffe2/test/distributed:c10d -- test_powerSGD_ddp_comm_hook_nccl
    buck test mode/dev-nosan caffe2/test/distributed:distributed_nccl_fork -- test_DistributedDataParallel_powerSGD_ddp_comm_hook
    
    Reviewed By: rohan-varma
    
    Differential Revision: D25583086
    
    fbshipit-source-id: a757df3c4cfcc0ead4647f7de2f43198f1e063ee
    Yi Wang authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    4cebcbd View commit details
    Browse the repository at this point in the history
  199. NewModuleTest: Don't call both check_jacobian and gradcheck (pytorch#…

    …49566)
    
    Summary:
    Pull Request resolved: pytorch#49566
    
    Fixes pytorch#49422.
    
    check_jacobian and gradcheck do roughly the same thing: they both
    compute an analytic jacobian and a numeric jacobian and check that
    they are equivalent. Furthermore, NewModuleTest will (by default) call
    both check_jacobian and gradcheck, leading to some redundant checks that
    waste CI resources.
    
    However, there is one subtle difference: `check_jacobian` can handle the
    special case where a Module takes in dense inputs and dense parameters
    but returns sparse gradients, but that is not something gradcheck can
    handle. This is only used in the tests for nn.Embedding and
    nn.EmbeddingBag.
    
    This PR does the following:
    - have NewModuleTest call gradcheck instead of check_jacobian by default
    - add a new "has_sparse_gradients" flag to NewModuleTest. These are True
    for the nn.Embedding and nn.EmbeddingBag sparse gradient tests. If
    `has_sparse_gradients` is True, then we call check_jacobian, otherwise,
    we call gradcheck.
    - Kills the "jacobian_input" flag. This flag was used to tell
    NewModuleTest to not attempt to compute the jacobian for the inputs to
    the module. This is only desireable if the input to the module isn't
    differentiable and was only set in the case of nn.Embedding /
    nn.EmbeddingBag that take a LongTensor input. `gradcheck` handles these
    automatically by not checking gradients for non-differentiable inputs.
    
    Test Plan:
    - Code reading
    - run test_nn.py
    
    Reviewed By: albanD
    
    Differential Revision: D25622929
    
    Pulled By: zou3519
    
    fbshipit-source-id: 8d831ada98b6a95d63f087ea9bce1b574c996a22
    zou3519 authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    10b5558 View commit details
    Browse the repository at this point in the history
  200. [fix] inplace remainder/% (pytorch#49390)

    Summary:
    Fixes pytorch#49214
    
    **BC-Breaking**
    Before this PR, `%=` didn't actually do the operation inplace and returned a new tensor.
    After this PR, `%=` operation is actually inplace and the modified input tensor is returned.
    
    Before PR,
    ```python
    >>> import torch
    >>> a = torch.tensor([11,12,13])
    >>> id(a)
    139627966219328
    >>> a %= 10
    >>> id(a)
    139627966219264
    ```
    
    After PR,
    ```python
    >>> import torch
    >>> a = torch.tensor([11,12,13])
    >>> id(a)
    139804702425280
    >>> a %= 10
    >>> id(a)
    139804702425280
    ```
    
    Pull Request resolved: pytorch#49390
    
    Reviewed By: izdeby
    
    Differential Revision: D25560423
    
    Pulled By: zou3519
    
    fbshipit-source-id: 2b92bfda260582aa4ac22c4025376295e51f854e
    kshitij12345 authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    25c852b View commit details
    Browse the repository at this point in the history
  201. Complex backward for torch.sqrt (pytorch#49461)

    Summary:
    Pull Request resolved: pytorch#49461
    
    resolves pytorch#48398
    
    Test Plan: Imported from OSS
    
    Reviewed By: navahgar
    
    Differential Revision: D25589454
    
    Pulled By: anjali411
    
    fbshipit-source-id: 46e9f913c8ab3e18c98d6f623b2394044b6fe079
    anjali411 authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    bc28081 View commit details
    Browse the repository at this point in the history
  202. [ROCm] add 4.0 to nightly builds (pytorch#49632)

    Summary:
    Depends on pytorch/builder#614.
    
    Pull Request resolved: pytorch#49632
    
    Reviewed By: ngimel
    
    Differential Revision: D25665880
    
    Pulled By: walterddr
    
    fbshipit-source-id: b37a55b7e3028648453b422683fa4a72e0ee04a4
    jeffdaily authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    03214d5 View commit details
    Browse the repository at this point in the history
  203. Make PyTorch partially cross-compilable for Apple M1 (pytorch#49701)

    Summary:
    Update CPUINFO to include pytorch/cpuinfo#51
    Update sleef to include shibatch/sleef#376
    Modify aten/src/ATen/native/quantized/cpu/qnnpack/CMakeLists.txt to recognize CMAKE_OSX_ARCHITECTURES
    
    Pull Request resolved: pytorch#49701
    
    Test Plan: `cmake -DCMAKE_OSX_ARCHITECTURES=x86_64 -DPYTHON_EXECUTABLE=/usr/bin/python3  -DUSE_XNNPACK=NO -DBUILD_TEST=YES .. -G Ninja; ninja basic` finishes successfully on Apple M1
    
    Reviewed By: janeyx99
    
    Differential Revision: D25669219
    
    Pulled By: malfet
    
    fbshipit-source-id: 5ee36b64e3a7ac76448f2a300ac4993375a26de5
    malfet authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    a813673 View commit details
    Browse the repository at this point in the history
  204. [onnxifi] Get rid of class member (pytorch#49380)

    Summary:
    Pull Request resolved: pytorch#49380
    
    Couldn't resist removing a class member that is only used in one function.
    
    Reviewed By: yinghai
    
    Differential Revision: D25547366
    
    fbshipit-source-id: 74e61c6a0068566fb7956380862999163e7e94bf
    khabinov authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    2d2a1f6 View commit details
    Browse the repository at this point in the history
  205. Reland: Add base forward grad logic (pytorch#49734)

    Summary:
    Pull Request resolved: pytorch#49734
    
    RFC: pytorch/rfcs#11
    
    This PR add the basic logic to handle forward grad as dual Tensors.
    It contains the following:
    - Mechanism to save dual state on a Tensor and clear it up when the dual level ends
    - C++ and python user facing API
    - Updated view system that is able to track both forward and backward views
    
    The current PR has the following limitations:
    - Extensive tests are in the next PR in the stack as formulas are needed to write full tests.
    - Only the manual formulas have been audited and no other formula is actually implemented here (they are in the next PR in the stack)
    - Only level 0 is allowed for now. This was discussed and agreed that it is not needed for the first version of this PR.
    - We can save one ViewInfo creation when both the forward and backward views have the same base. This can be done by adding a boolean flag to the DifferentiableViewMeta and extra logic in the `as_view` method. This is left out to keep this PR concise.
    - We can skip tracking forward views if the base has a forward grad. This can be done by adding extra logic in the `as_view` method. This is left out to keep this PR concise.
    
    Reading guide:
    - Updated view handling in [gen_variable_type.py](https://github.com/pytorch/pytorch/pull/49097/files#diff-f6553cec68caeaea36f6c8b14ff76a6d39dfd774e0ea9ef2f76e8d81fd9af5df), [VariableTypeUtils.h](https://github.com/pytorch/pytorch/pull/49097/files#diff-ec71cfa45954dece1236c661d170e6341879c5be637f4abf52e826d61b40695a), [variable.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-60e3bfe444e89efc7149f25b38e472710525984789934ab83f1bd5671b8ff285) (skip code below "[Forward Grad View]" for now), [variable.h](https://github.com/pytorch/pytorch/pull/49097/files#diff-1604bcd0e4350ed99ec45e437cee7ac9ebe337392c9ea16a236247aeeb35b02bR266-R542) and [custom_function.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-dd85f452082b5bb6612bbc12adb496f8827defa228509f7b493de1d517522d5d). This introduces the new ViewInfo to hold view informations shared for forward and backward. It also updates the differentiable view meta to use this. And it updates the as_view function to handle both forward and backward view.
    - New forward grad class that handle storing gradients and tracking at each level [forward_grad.h](https://github.com/pytorch/pytorch/pull/49097/files#diff-c6c5b9ab2d7e5dde4102495faa1b6bbbfc23aa3e47deb7359c0bfe1eb004c0cb), [forward_grad.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-de2ab54ade7312701850d71a119a4f4ee4b9fc5a9c42a467cdd4e73c033531dd) and [build_variables.bzl](https://github.com/pytorch/pytorch/pull/49097/files#diff-dfdfa2efb17beddfd9094524f95351fd197db6c8857e96b436fb599870359325). EDIT: These files also contain the new flag to globally disable forward AD that allows us to reduce performance issues while this is in development.
    - Lowest level API and binding between Tensor and AutogradMeta in [TensorBody.h](https://github.com/pytorch/pytorch/pull/49097/files#diff-7554853205392fa743357bf845ecc350a974ec049383248c12daaf2f4de04911), [TensorImpl.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-052bd9150ef8e09289ddf644b5a6830ede49207201cd41728f6d7cc6d9cead94), [TensorImpl.h](https://github.com/pytorch/pytorch/pull/49097/files#diff-a15aae4cf23da44970db7cece62ff981265575c798c62f7b52d87c8809dfe2e1) and the rest of [variable.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-60e3bfe444e89efc7149f25b38e472710525984789934ab83f1bd5671b8ff285R557-R677)
    - API to access the forward primal that needs to be a differentiable function (and so in native_functions.yaml) [native_functions.yaml](https://github.com/pytorch/pytorch/pull/49097/files#diff-2f3dbd85efb9b5172f2264eedd3be47dd765e6ab7cc8bf3ade5e62c28ae35991) [NamedRegistrations.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-69bd3bea510c9b64e1633fa18c3ea63d4b8348dbad3a78ad9de844ab3e43dc1d), [VariableMethodsStub.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-23f5fcb737a2b289811fe0f4b65aef775e7c824b2e629ecd343df51405cd434f), [derivatives.yaml](https://github.com/pytorch/pytorch/pull/49097/files#diff-e4c2f99a2404e98c3586e07425da73008f36b1bada790648a7297af141d37f8c), [gen_python_functions.py](https://github.com/pytorch/pytorch/pull/49097/files#diff-e4c2f99a2404e98c3586e07425da73008f36b1bada790648a7297af141d37f8c), [gen_trace_type.py](https://github.com/pytorch/pytorch/pull/49097/files#diff-54e0b976027bf8debefb959ff360b89ae93466970c843365b1b3a03806d868ce), [TraceTypeManual.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-f34636741ad4a23d018e0c289bc750c3bad887b45660e1d6eaf440d234a78fbf) and [part of VariableTypeManual.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-6e19a1bce8cbdba8714b6e2c794a76bc0864b64a49cfa757cb0b5afdc937d1a4R198-R243)
    - c++ API [autograd.h](https://github.com/pytorch/pytorch/pull/49097/files#diff-349028fbe8291a965a7a263c323b208fe071c35c66179ee997ef84fa81aa4b1e), [autograd.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-a3fe908d67dfec16a1fcde300de68b0701bf68b88db7451f29f2bee255cf30c9)
    - python binding [init.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-c58a67c85191c22c9b3bb439117d8053edfd9dea839fa010cf967d404c3c630d)
    - python API [forward_ad.py](https://github.com/pytorch/pytorch/pull/49097/files#diff-a4efad4ba18fffdfb264c21e5475997a24a743089a899f8ec1a5ff962c6738d9), [autograd/__init__.py](https://github.com/pytorch/pytorch/pull/49097/files#diff-743abcafd32ad0e69f39ac5a91df4197b7e1921c135cacee7ef6dc829a8a7af8)
    - c++ and python printing [Formatting.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-881dba501e71662e2e4818b4b016f739b344c8aed2f5edc6b871eda47a2aced0), [_tensor_str.py](https://github.com/pytorch/pytorch/pull/49097/files#diff-a7911f8d5e73adbff914d99fd7818ace2a7030b6a3748abe06ec6fc6e3df9cc3)
    - Utility for formulas and updated manual functions to respect new view system as well as forward grad [FunctionsManual.h](https://github.com/pytorch/pytorch/pull/49097/files#diff-6378bb6dc81a64dab676d61731341fa5d1088418f32a1473a33a0ccfc2357dc1), [FunctionsManual.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-4adbd88239afcd60e8198aab65d4f5e43b62314e34b80551e997a1ea503adea5) [rest of VariableTypeManual.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-6e19a1bce8cbdba8714b6e2c794a76bc0864b64a49cfa757cb0b5afdc937d1a4R264-R433)
    - Ensure SavedVariable save forward grad properly [saved_variable.h](https://github.com/pytorch/pytorch/pull/49097/files#diff-c1b8039d776241abe177d5aa99b79dd9489a9b3e529da8ab24c2e386c1238ae2), [saved_variable.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-cc9fba479b5beae06b2eea2e390d17796e0341c5b037a20b5bcaccbb0c341030)
    
    Test Plan: Imported from OSS
    
    Reviewed By: gchanan
    
    Differential Revision: D25678797
    
    Pulled By: albanD
    
    fbshipit-source-id: 3d58550c11b5f58b9b73fd30596d042b857fb9dd
    albanD authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    e241e1a View commit details
    Browse the repository at this point in the history
  206. Fix get_overlap_status for tensors without storage (pytorch#49638)

    Summary: Pull Request resolved: pytorch#49638
    
    Reviewed By: ngimel
    
    Differential Revision: D25681908
    
    Pulled By: asuhan
    
    fbshipit-source-id: 2ea8623614f2f0027f6437cf2819ba1657464f54
    asuhan authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    1c39e42 View commit details
    Browse the repository at this point in the history
  207. Minor doc fix: change truncating to rounding in TF32 docs (pytorch#49625

    )
    
    Summary:
    Minor doc fix in clarifying that the input data is rounded not truncated.
    
    CC zasdfgbnm ngimel
    
    Pull Request resolved: pytorch#49625
    
    Reviewed By: mruberry
    
    Differential Revision: D25668244
    
    Pulled By: ngimel
    
    fbshipit-source-id: ac97e41e0ca296276544f9e9f85b2cf1790d9985
    pbialecki authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    4406379 View commit details
    Browse the repository at this point in the history
  208. remove unused THCBlas (pytorch#49725)

    Summary:
    removes unused THCBlas, call `at::cuda::blas::gemm` directly where needed.
    
    Pull Request resolved: pytorch#49725
    
    Reviewed By: mruberry
    
    Differential Revision: D25680831
    
    Pulled By: ngimel
    
    fbshipit-source-id: d826f3f558b156f45f2a4864daf3f6d086bda78c
    ngimel authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    40e15e5 View commit details
    Browse the repository at this point in the history
  209. only upload s3 stats on master, nightly, and release branch (pytorch#…

    …49645)
    
    Summary: Pull Request resolved: pytorch#49645
    
    Reviewed By: malfet
    
    Differential Revision: D25665851
    
    Pulled By: walterddr
    
    fbshipit-source-id: 1cf50f6e3657f70776aaf3c5d3823c8a586bf22d
    Rong Rong (AI Infra) authored and hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    5e176cb View commit details
    Browse the repository at this point in the history
  210. Merge pull request #1 from pytorch/onnx_ms_1

    merge code
    hwangdeyu committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    73985d9 View commit details
    Browse the repository at this point in the history
  211. Configuration menu
    Copy the full SHA
    2bfe745 View commit details
    Browse the repository at this point in the history

Commits on Jan 4, 2021

  1. Configuration menu
    Copy the full SHA
    525ac26 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    9259b03 View commit details
    Browse the repository at this point in the history
  3. [ONNX] Add checks in ONNXSetDynamicInputShape (pytorch#49783)

    * [ONNX] Add checks in ONNXSetDynamicInputShape
    
    * [ONNX] Add checks in ONNXSetDynamicInputShape
    jiafatom authored and BowenBao committed Jan 4, 2021
    Configuration menu
    Copy the full SHA
    1baebbb View commit details
    Browse the repository at this point in the history

Commits on Jan 5, 2021

  1. [ONNX] Enable export af aten::__derive_index (pytorch#49514)

    * Add derive_index
    
    * Add derive_index test
    
    * Adding more tests
    
    * Update symbolic_opset9.py
    neginraoof committed Jan 5, 2021
    Configuration menu
    Copy the full SHA
    4898616 View commit details
    Browse the repository at this point in the history
  2. [ONNX] Update symbolic for unfold (pytorch#49378)

    * update symbolic for unfold
    
    * update symbolic_opse12 file
    
    * update symbolic_opse12 file
    
    * [ONNX] Support onnx if/loop sequence output in opset 13 - (pytorch#49270)
    
    * Symbolic function for torch.square (pytorch#49446)
    
    * instead of a pass use a helper function
    
    * update ort version
    
    * Revert "instead of a pass use a helper function"
    
    This reverts commit 723b446.
    
    * update symbolics
    
    * update symbolic
    
    * update symbolics
    
    * [ONNX] Support onnx if/loop sequence output in opset 13 - (pytorch#49270)
    
    * Symbolic function for torch.square (pytorch#49446)
    
    * empty commit
    
    * fix clang-tidy
    
    * fix clang-tidy
    
    Co-authored-by: Bowen Bao <bowbao@microsoft.com>
    Co-authored-by: David Fan <30608893+jiafatom@users.noreply.github.com>
    3 people committed Jan 5, 2021
    Configuration menu
    Copy the full SHA
    eef5191 View commit details
    Browse the repository at this point in the history
  3. [ONNX] Update the sequence of initializers in exported graph so that …

    …it is as same as inputs. (pytorch#49798)
    
    * [ONNX] Support onnx if/loop sequence output in opset 13 - (pytorch#49270)
    
    * Symbolic function for torch.square (pytorch#49446)
    
    * [ONNX] Support onnx if/loop sequence output in opset 13 - (pytorch#49270)
    
    * Symbolic function for torch.square (pytorch#49446)
    
    * Update code so that initializers' sequence is as same as inputs.
    
    * Correct the format according to flake8.
    
    * Correct the format by clang-format.
    
    * Add a new test for script model.
    
    * Update expect files for Test_Operators tests.
    
    Co-authored-by: Bowen Bao <bowbao@microsoft.com>
    Co-authored-by: David Fan <30608893+jiafatom@users.noreply.github.com>
    3 people committed Jan 5, 2021
    Configuration menu
    Copy the full SHA
    97a8af1 View commit details
    Browse the repository at this point in the history

Commits on Jan 6, 2021

  1. [ONNX] Enable opset 13 ops (pytorch#49612)

    * Enable opset 13 ORT tests
    
    * Update test.sh
    
    * Set environ var
    
    * Update test.sh
    
    * Enabling more ops for opset 13
    
    * change master to main
    
    * Update symbolic_opset13.py
    
    * Flake 8 fix
    
    * [ONNX] Support onnx if/loop sequence output in opset 13 - (pytorch#49270)
    
    * Symbolic function for torch.square (pytorch#49446)
    
    * Clean up tests
    
    * Exclude more tests
    
    * Trigge build
    
    * [ONNX] Support onnx if/loop sequence output in opset 13 - (pytorch#49270)
    
    * Symbolic function for torch.square (pytorch#49446)
    
    * update ORT version
    
    * disable more tests
    
    * clean up
    
    * flake8
    
    * Disable TV tests
    
    * Update test_pytorch_onnx_onnxruntime.py
    
    Co-authored-by: Bowen Bao <bowbao@microsoft.com>
    Co-authored-by: David Fan <30608893+jiafatom@users.noreply.github.com>
    3 people committed Jan 6, 2021
    Configuration menu
    Copy the full SHA
    616da7c View commit details
    Browse the repository at this point in the history
  2. Merge branch 'onnx_ms_1' of https://github.com/pytorch/pytorch into p…

    …ytorch-onnx_ms_1
    hwangdeyu committed Jan 6, 2021
    Configuration menu
    Copy the full SHA
    b3ae16c View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    2c69cf3 View commit details
    Browse the repository at this point in the history
  4. Reland: Add base forward grad logic (pytorch#49734)

    Summary:
    Pull Request resolved: pytorch#49734
    
    RFC: pytorch/rfcs#11
    
    This PR add the basic logic to handle forward grad as dual Tensors.
    It contains the following:
    - Mechanism to save dual state on a Tensor and clear it up when the dual level ends
    - C++ and python user facing API
    - Updated view system that is able to track both forward and backward views
    
    The current PR has the following limitations:
    - Extensive tests are in the next PR in the stack as formulas are needed to write full tests.
    - Only the manual formulas have been audited and no other formula is actually implemented here (they are in the next PR in the stack)
    - Only level 0 is allowed for now. This was discussed and agreed that it is not needed for the first version of this PR.
    - We can save one ViewInfo creation when both the forward and backward views have the same base. This can be done by adding a boolean flag to the DifferentiableViewMeta and extra logic in the `as_view` method. This is left out to keep this PR concise.
    - We can skip tracking forward views if the base has a forward grad. This can be done by adding extra logic in the `as_view` method. This is left out to keep this PR concise.
    
    Reading guide:
    - Updated view handling in [gen_variable_type.py](https://github.com/pytorch/pytorch/pull/49097/files#diff-f6553cec68caeaea36f6c8b14ff76a6d39dfd774e0ea9ef2f76e8d81fd9af5df), [VariableTypeUtils.h](https://github.com/pytorch/pytorch/pull/49097/files#diff-ec71cfa45954dece1236c661d170e6341879c5be637f4abf52e826d61b40695a), [variable.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-60e3bfe444e89efc7149f25b38e472710525984789934ab83f1bd5671b8ff285) (skip code below "[Forward Grad View]" for now), [variable.h](https://github.com/pytorch/pytorch/pull/49097/files#diff-1604bcd0e4350ed99ec45e437cee7ac9ebe337392c9ea16a236247aeeb35b02bR266-R542) and [custom_function.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-dd85f452082b5bb6612bbc12adb496f8827defa228509f7b493de1d517522d5d). This introduces the new ViewInfo to hold view informations shared for forward and backward. It also updates the differentiable view meta to use this. And it updates the as_view function to handle both forward and backward view.
    - New forward grad class that handle storing gradients and tracking at each level [forward_grad.h](https://github.com/pytorch/pytorch/pull/49097/files#diff-c6c5b9ab2d7e5dde4102495faa1b6bbbfc23aa3e47deb7359c0bfe1eb004c0cb), [forward_grad.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-de2ab54ade7312701850d71a119a4f4ee4b9fc5a9c42a467cdd4e73c033531dd) and [build_variables.bzl](https://github.com/pytorch/pytorch/pull/49097/files#diff-dfdfa2efb17beddfd9094524f95351fd197db6c8857e96b436fb599870359325). EDIT: These files also contain the new flag to globally disable forward AD that allows us to reduce performance issues while this is in development.
    - Lowest level API and binding between Tensor and AutogradMeta in [TensorBody.h](https://github.com/pytorch/pytorch/pull/49097/files#diff-7554853205392fa743357bf845ecc350a974ec049383248c12daaf2f4de04911), [TensorImpl.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-052bd9150ef8e09289ddf644b5a6830ede49207201cd41728f6d7cc6d9cead94), [TensorImpl.h](https://github.com/pytorch/pytorch/pull/49097/files#diff-a15aae4cf23da44970db7cece62ff981265575c798c62f7b52d87c8809dfe2e1) and the rest of [variable.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-60e3bfe444e89efc7149f25b38e472710525984789934ab83f1bd5671b8ff285R557-R677)
    - API to access the forward primal that needs to be a differentiable function (and so in native_functions.yaml) [native_functions.yaml](https://github.com/pytorch/pytorch/pull/49097/files#diff-2f3dbd85efb9b5172f2264eedd3be47dd765e6ab7cc8bf3ade5e62c28ae35991) [NamedRegistrations.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-69bd3bea510c9b64e1633fa18c3ea63d4b8348dbad3a78ad9de844ab3e43dc1d), [VariableMethodsStub.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-23f5fcb737a2b289811fe0f4b65aef775e7c824b2e629ecd343df51405cd434f), [derivatives.yaml](https://github.com/pytorch/pytorch/pull/49097/files#diff-e4c2f99a2404e98c3586e07425da73008f36b1bada790648a7297af141d37f8c), [gen_python_functions.py](https://github.com/pytorch/pytorch/pull/49097/files#diff-e4c2f99a2404e98c3586e07425da73008f36b1bada790648a7297af141d37f8c), [gen_trace_type.py](https://github.com/pytorch/pytorch/pull/49097/files#diff-54e0b976027bf8debefb959ff360b89ae93466970c843365b1b3a03806d868ce), [TraceTypeManual.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-f34636741ad4a23d018e0c289bc750c3bad887b45660e1d6eaf440d234a78fbf) and [part of VariableTypeManual.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-6e19a1bce8cbdba8714b6e2c794a76bc0864b64a49cfa757cb0b5afdc937d1a4R198-R243)
    - c++ API [autograd.h](https://github.com/pytorch/pytorch/pull/49097/files#diff-349028fbe8291a965a7a263c323b208fe071c35c66179ee997ef84fa81aa4b1e), [autograd.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-a3fe908d67dfec16a1fcde300de68b0701bf68b88db7451f29f2bee255cf30c9)
    - python binding [init.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-c58a67c85191c22c9b3bb439117d8053edfd9dea839fa010cf967d404c3c630d)
    - python API [forward_ad.py](https://github.com/pytorch/pytorch/pull/49097/files#diff-a4efad4ba18fffdfb264c21e5475997a24a743089a899f8ec1a5ff962c6738d9), [autograd/__init__.py](https://github.com/pytorch/pytorch/pull/49097/files#diff-743abcafd32ad0e69f39ac5a91df4197b7e1921c135cacee7ef6dc829a8a7af8)
    - c++ and python printing [Formatting.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-881dba501e71662e2e4818b4b016f739b344c8aed2f5edc6b871eda47a2aced0), [_tensor_str.py](https://github.com/pytorch/pytorch/pull/49097/files#diff-a7911f8d5e73adbff914d99fd7818ace2a7030b6a3748abe06ec6fc6e3df9cc3)
    - Utility for formulas and updated manual functions to respect new view system as well as forward grad [FunctionsManual.h](https://github.com/pytorch/pytorch/pull/49097/files#diff-6378bb6dc81a64dab676d61731341fa5d1088418f32a1473a33a0ccfc2357dc1), [FunctionsManual.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-4adbd88239afcd60e8198aab65d4f5e43b62314e34b80551e997a1ea503adea5) [rest of VariableTypeManual.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-6e19a1bce8cbdba8714b6e2c794a76bc0864b64a49cfa757cb0b5afdc937d1a4R264-R433)
    - Ensure SavedVariable save forward grad properly [saved_variable.h](https://github.com/pytorch/pytorch/pull/49097/files#diff-c1b8039d776241abe177d5aa99b79dd9489a9b3e529da8ab24c2e386c1238ae2), [saved_variable.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-cc9fba479b5beae06b2eea2e390d17796e0341c5b037a20b5bcaccbb0c341030)
    
    Test Plan: Imported from OSS
    
    Reviewed By: gchanan
    
    Differential Revision: D25678797
    
    Pulled By: albanD
    
    fbshipit-source-id: 3d58550c11b5f58b9b73fd30596d042b857fb9dd
    albanD authored and hwangdeyu committed Jan 6, 2021
    Configuration menu
    Copy the full SHA
    c92808f View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    d144220 View commit details
    Browse the repository at this point in the history
  6. fix format

    hwangdeyu committed Jan 6, 2021
    Configuration menu
    Copy the full SHA
    e6dd64a View commit details
    Browse the repository at this point in the history
  7. add comprehensive tests

    hwangdeyu committed Jan 6, 2021
    Configuration menu
    Copy the full SHA
    0992510 View commit details
    Browse the repository at this point in the history
  8. Configuration menu
    Copy the full SHA
    cdc08ce View commit details
    Browse the repository at this point in the history

Commits on Jan 14, 2021

  1. Merge remote-tracking branch 'origin1/onnx_ms_1' into deyu/bce_with_l…

    …ogits_sy12
    hwangdeyu committed Jan 14, 2021
    Configuration menu
    Copy the full SHA
    0e09ee9 View commit details
    Browse the repository at this point in the history

Commits on Jan 15, 2021

  1. Merge remote-tracking branch 'origin1/onnx_ms_1' into deyu/bce_with_l…

    …ogits_sy12
    hwangdeyu committed Jan 15, 2021
    Configuration menu
    Copy the full SHA
    d2ebe7e View commit details
    Browse the repository at this point in the history

Commits on Jan 18, 2021

  1. Configuration menu
    Copy the full SHA
    5275cc5 View commit details
    Browse the repository at this point in the history