-
Notifications
You must be signed in to change notification settings - Fork 21.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ONNX] Add binary_cross_entropy_with_logits op to ONNX opset version 12 #49675
[ONNX] Add binary_cross_entropy_with_logits op to ONNX opset version 12 #49675
Commits on Dec 17, 2020
-
Configuration menu - View commit details
-
Copy full SHA for 170908b - Browse repository at this point
Copy the full SHA 170908bView commit details -
Configuration menu - View commit details
-
Copy full SHA for 203e181 - Browse repository at this point
Copy the full SHA 203e181View commit details
Commits on Dec 22, 2020
-
Configuration menu - View commit details
-
Copy full SHA for 29dd23f - Browse repository at this point
Copy the full SHA 29dd23fView commit details -
Configuration menu - View commit details
-
Copy full SHA for f7c63eb - Browse repository at this point
Copy the full SHA f7c63ebView commit details
Commits on Dec 23, 2020
-
[te] Fix bugs with shift operators (pytorch#49396)
Summary: Pull Request resolved: pytorch#49396 Pull Request resolved: pytorch#49271 Two things: 1. These throw exceptions in their constructor, which causes a segfault (*), so move the exceptions to ::make. 2. They technically support FP types but the rules are complicated so let's not bother. (*) The reason for the segfault: all Exprs including these inherit from KernelScopedObject, whose constructor adds the object to a list for destruction at the end of the containing KernelArena's lifetime. But if the derived-class constructor throws, the object is deleted even though it's still in the KernelArena's list. So when the KernelArena is itself deleted, it double-frees the pointer and dies. I've also fixed And, Or, and Xor in this diff. ghstack-source-id: 118594998 Test Plan: `buck test //caffe2/test:jit` Reviewed By: bwasti Differential Revision: D25512052 fbshipit-source-id: 42670b3be0cc1600dc5cda6811f7f270a2c88bba
Configuration menu - View commit details
-
Copy full SHA for 086fcf6 - Browse repository at this point
Copy the full SHA 086fcf6View commit details -
[static runtime] refine fusion group (pytorch#49340)
Summary: Pull Request resolved: pytorch#49340 This refines the fusion group to include on certain types of operations. We cannot safely handle "canRunNatively" types and the memonger pass causes regressions on some internal models, so it was disabled (to be revisited with proper memory optimization once Tensor pools are implemented) Test Plan: ``` buck test mode/no-gpu caffe2/test:static_runtime buck test //caffe2/benchmarks/static_runtime:static_runtime_cpptest ``` Reviewed By: ZolotukhinM Differential Revision: D25520105 fbshipit-source-id: add61d103e4f8b4615f5402e760893ef759a60a9
Configuration menu - View commit details
-
Copy full SHA for 43aa3be - Browse repository at this point
Copy the full SHA 43aa3beView commit details -
[JIT] Support multiple outputs in subgraph matcher. (pytorch#48992)
Summary: Pull Request resolved: pytorch#48992 Differential Revision: D25388100 Test Plan: Imported from OSS Reviewed By: heitorschueroff Pulled By: ZolotukhinM fbshipit-source-id: d95713af2220cf4f99ac92f59f8e5b902f2f3822
Mikhail Zolotukhin authored and hwangdeyu committedDec 23, 2020 Configuration menu - View commit details
-
Copy full SHA for 837ac43 - Browse repository at this point
Copy the full SHA 837ac43View commit details -
[numpy] torch.{all/any} : output dtype is always bool (pytorch#47878)
Summary: BC-breaking note: This PR changes the behavior of the any and all functions to always return a bool tensor. Previously these functions were only defined on bool and uint8 tensors, and when called on uint8 tensors they would also return a uint8 tensor. (When called on a bool tensor they would return a bool tensor.) PR summary: pytorch#44790 (comment) Fixes 2 and 3 Also Fixes pytorch#48352 Changes * Output dtype is always `bool` (consistent with numpy) **BC Breaking (Previously used to match the input dtype**) * Uses vectorized version for all dtypes on CPU * Enables test for complex * Update doc for `torch.all` and `torch.any` TODO * [x] Update docs * [x] Benchmark * [x] Raise issue on XLA Pull Request resolved: pytorch#47878 Reviewed By: H-Huang Differential Revision: D25421263 Pulled By: mruberry fbshipit-source-id: c6c681ef94004d2bcc787be61a72aa059b333e69
Configuration menu - View commit details
-
Copy full SHA for 4bdc202 - Browse repository at this point
Copy the full SHA 4bdc202View commit details -
Replace THError() check in THCTensorMathReduce.cu with C10_CUDA_KERNE…
…L_LAUNCH_CHECK() (pytorch#49424) Summary: Pull Request resolved: pytorch#49424 As per conversation in this [comment](https://www.internalfb.com/intern/diff/D25541113 (https://github.com/pytorch/pytorch/commit/e2510a0b60232aba5160ceb18b6ece8c59a9b79d)/?dest_fbid=393026838623691&transaction_id=3818008671564312) on D25541113 (pytorch@e2510a0), although THError does more than just log any errors associated cuda kernel launches, we're going to go ahead and replace it with C10_CUDA_KERNEL_LAUNCH_CHECK, so as to be consistent throughout the code base. Standardization FTW. This commit is purposefully sent in as a single file change so it can be easily reverted if it introduces a regression. Test Plan: Checked that the code still builds with ``` buck build //caffe2/aten:ATen-cu ``` Also ran basic aten tests ``` buck test //caffe2/aten:atest ``` Reviewed By: r-barnes Differential Revision: D25567863 fbshipit-source-id: 1093bfe2b6ca6b9a3bfb79dcdc5d713f6025eb77
Amogh Akshintala authored and hwangdeyu committedDec 23, 2020 Configuration menu - View commit details
-
Copy full SHA for 6428593 - Browse repository at this point
Copy the full SHA 6428593View commit details -
Fix include files for out-of-tree compilation (pytorch#48827)
Summary: Signed-off-by: caozhong <zhong.z.cao@intel.com> Pull Request resolved: pytorch#48827 Reviewed By: agolynski Differential Revision: D25375988 Pulled By: ailzhang fbshipit-source-id: a8d5ab4572d991d6d96dfe758011517651ff0a6b
Configuration menu - View commit details
-
Copy full SHA for 3e6bdd1 - Browse repository at this point
Copy the full SHA 3e6bdd1View commit details -
Add flag torch_jit_disable_warning_prints to allow disabling all warn…
…ings.warn (pytorch#49313) Summary: Adding a flag torch_jit_disable_warning_prints to optimize interpreter performance by suppressing (potentially large amount) of warnings.warn. This is to work around TorchScript's warning behavior mismatch with Python. Python by default triggers a warning once per location but TorchScript doesn't support it. This causes same warning to trigger and print once per inference run, hurting performance. Pull Request resolved: pytorch#49313 Reviewed By: SplitInfinity Differential Revision: D25534274 Pulled By: gmagogsfm fbshipit-source-id: eaeb57a335c3e6c7eb259671645db05d781e80a2
Configuration menu - View commit details
-
Copy full SHA for 9058e5f - Browse repository at this point
Copy the full SHA 9058e5fView commit details -
[DPER] Introduce barrier operation to force synchronization of thread…
…s in async execution (pytorch#49322) Summary: Pull Request resolved: pytorch#49322 In some cases async execution might loose dependencies (Alias like ops) or produce suboptimal scheduling when there is an option which parts to schedule first. Example of the later behavior can happen in ModelParallel training where copy can get lower priority compared to the rest of the execution on the given GPU, which will caused other GPUs to starve. This operator allows to address these issues by introducing extra explicit dependencies between ops. Test Plan: Unit-test/ E2E testing in the future diffs. Reviewed By: xianjiec Differential Revision: D24933471 fbshipit-source-id: 1668994c7856d73926cde022378a99e1e8db3567
Configuration menu - View commit details
-
Copy full SHA for f360b23 - Browse repository at this point
Copy the full SHA f360b23View commit details -
[FX] Rename Node._uses and refactor Node.all_input_nodes (pytorch#49415)
Summary: Pull Request resolved: pytorch#49415 Test Plan: Imported from OSS Reviewed By: zdevito Differential Revision: D25565341 Pulled By: jamesr66a fbshipit-source-id: 2290ab62572632788809ba16319578bf0c0260ee
James Reed authored and hwangdeyu committedDec 23, 2020 Configuration menu - View commit details
-
Copy full SHA for 4c667a1 - Browse repository at this point
Copy the full SHA 4c667a1View commit details -
[PyTorch] Use plain old function pointer for RecordFunctionCallback (…
…reapply) (pytorch#49408) Summary: Pull Request resolved: pytorch#49408 Nearly every non-test callsite doesn't need to capture any variables anyway, and this saves 48 bytes per callback. ghstack-source-id: 118665808 Test Plan: Wait for GitHub CI since we had C++14-specific issues with this one in previous PR pytorch#48629 Reviewed By: malfet Differential Revision: D25563207 fbshipit-source-id: 6a2831205917d465f8248ca37429ba2428d5626d
Configuration menu - View commit details
-
Copy full SHA for 1c9a0bf - Browse repository at this point
Copy the full SHA 1c9a0bfView commit details -
[CMake] Use libtorch_cuda list defined in bzl file (pytorch#49429)
Summary: Since NCCL is an optional CUDA dependency, remove nccl.cpp from the core filelist Pull Request resolved: pytorch#49429 Reviewed By: nikithamalgifb Differential Revision: D25569883 Pulled By: malfet fbshipit-source-id: 61371a4c6b0438e4e0a7f094975b9a9f9ffa4032
Configuration menu - View commit details
-
Copy full SHA for 4558c13 - Browse repository at this point
Copy the full SHA 4558c13View commit details -
update breathe (pytorch#49407)
Summary: Fixes pytorch#47462, but not completely. Update breathe to the latest version to get fixes for the "Unable to resolve..." issues. There are still some build errors, but much fewer than before. Pull Request resolved: pytorch#49407 Reviewed By: izdeby Differential Revision: D25562163 Pulled By: glaringlee fbshipit-source-id: 91bfd9e9ac70723816309f489022d72853f5fdc5
Configuration menu - View commit details
-
Copy full SHA for 6275612 - Browse repository at this point
Copy the full SHA 6275612View commit details -
[StaticRuntime] Permute_out (pytorch#49447)
Summary: Pull Request resolved: pytorch#49447 Adding an out variant for `permute`. It's better than fixing the copy inside contiguous because 1) we can leverage the c2 math library, 2) contiguous creates a tensor inside the function which isn't managed by the MemoryPlanner in StaticRuntime Test Plan: Benchmark: ``` After: I1214 12:35:32.218775 991920 PyTorchPredictorBenchLib.cpp:209] PyTorch run finished. Milliseconds per iter: 0.0902339. Iters per second: 11082.3 Before: I1214 12:35:43.368770 992620 PyTorchPredictorBenchLib.cpp:209] PyTorch run finished. Milliseconds per iter: 0.0961521. Iters per second: 10400.2 ``` Reviewed By: yinghai Differential Revision: D25541666 fbshipit-source-id: 013ed0d4080cd01de4d3e1b031ab51e5032e6651
Hao Lu authored and hwangdeyu committedDec 23, 2020 Configuration menu - View commit details
-
Copy full SHA for 7439f10 - Browse repository at this point
Copy the full SHA 7439f10View commit details -
fix optimizer.pyi typo 'statue'->'state' (pytorch#49388)
Summary: Pull Request resolved: pytorch#49388 Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D25553672 Pulled By: glaringlee fbshipit-source-id: e9f2233bd678a90768844af2d8d5e2994d59e304
lixinyu authored and hwangdeyu committedDec 23, 2020 Configuration menu - View commit details
-
Copy full SHA for edea937 - Browse repository at this point
Copy the full SHA edea937View commit details -
[StaticRuntime] Fusion pass for ClipRanges/GatherRanges/LengthsToOffs…
…ets (pytorch#49113) Summary: Pull Request resolved: pytorch#49113 Reviewed By: ajyu Differential Revision: D25388512 fbshipit-source-id: 3daa5b9387a3a10b6c220688df06540c4d844aea
Hao Lu authored and hwangdeyu committedDec 23, 2020 Configuration menu - View commit details
-
Copy full SHA for 1f1c0f5 - Browse repository at this point
Copy the full SHA 1f1c0f5View commit details -
quantized tensor: add preliminary support for advanced indexing, try 2 (
pytorch#49346) Summary: Pull Request resolved: pytorch#49346 This is less ambitious redo of pytorch#49129. We make the ``` xq_slice = xq[:, [0], :, :] ``` indexing syntax work if `xq` is a quantized Tensor. For now, we are making the code not crash, with an in efficient `dq -> index -> q` implementation. A future PR can optimize performance by removing the unnecessary memory copies (which will require some non-trivial changes to TensorIterator). Test Plan: ``` python test/test_quantization.py TestQuantizedOps.test_advanced_indexing ``` Imported from OSS Reviewed By: jerryzh168 Differential Revision: D25539365 fbshipit-source-id: 98485875aaaf5743e1a940e170258057691be4fa
Configuration menu - View commit details
-
Copy full SHA for b1547e4 - Browse repository at this point
Copy the full SHA b1547e4View commit details -
Unescape string in RPC error message (pytorch#49373)
Summary: Pull Request resolved: pytorch#49373 Unescaping the string in RPC error message to provide better error msg Test Plan: CI Reviewed By: xush6528 Differential Revision: D25511730 fbshipit-source-id: 054f46d5ffbcb1350012362a023fafb1fe57fca1
Configuration menu - View commit details
-
Copy full SHA for 28a5455 - Browse repository at this point
Copy the full SHA 28a5455View commit details -
[StaticRuntime][ATen] Add out variant for narrow_copy (pytorch#49449)
Summary: Pull Request resolved: pytorch#49449 Similar to permute_out, add the out variant of `aten::narrow` (slice in c2) which does an actual copy. `aten::narrow` creates a view, however, an copy is incurred when we call `input.contiguous` in the ops that follow `aten::narrow`, in `concat_add_mul_replacenan_clip`, `casted_batch_one_hot_lengths`, and `batch_box_cox`. {F351263599} Test Plan: Unit test: ``` buck test //caffe2/aten:native_test ``` Benchmark with the adindexer model: ``` bs = 1 is neutral Before: I1214 21:32:51.919239 3285258 PyTorchPredictorBenchLib.cpp:209] PyTorch run finished. Milliseconds per iter: 0.0886948. Iters per second: 11274.6 After: I1214 21:32:52.492352 3285277 PyTorchPredictorBenchLib.cpp:209] PyTorch run finished. Milliseconds per iter: 0.0888019. Iters per second: 11261 bs = 20 shows more gains probably because the tensors are bigger and therefore the cost of copying is higher Before: I1214 21:20:19.702445 3227229 PyTorchPredictorBenchLib.cpp:209] PyTorch run finished. Milliseconds per iter: 0.527563. Iters per second: 1895.51 After: I1214 21:20:20.370173 3227307 PyTorchPredictorBenchLib.cpp:209] PyTorch run finished. Milliseconds per iter: 0.508734. Iters per second: 1965.67 ``` Reviewed By: bwasti Differential Revision: D25554109 fbshipit-source-id: 6bae62e6ce3456ff71559b635cc012fdcd1fdd0e
Hao Lu authored and hwangdeyu committedDec 23, 2020 Configuration menu - View commit details
-
Copy full SHA for bc97e02 - Browse repository at this point
Copy the full SHA bc97e02View commit details -
Revert D25554109: [StaticRuntime][ATen] Add out variant for narrow_copy
Test Plan: revert-hammer Differential Revision: D25554109 (pytorch@ed04b71) Original commit changeset: 6bae62e6ce34 fbshipit-source-id: bfa038e150166d0116bcae8f7a6415d98d4146de
Hao Lu authored and hwangdeyu committedDec 23, 2020 Configuration menu - View commit details
-
Copy full SHA for 1479e05 - Browse repository at this point
Copy the full SHA 1479e05View commit details -
Making ops c10 full: optional out arguments (pytorch#49083)
Summary: Pull Request resolved: pytorch#49083 We have some (but very few) ops that take optional out arguments `Tensor(a!)? out`. This PR makes them non-optional mandatory arguments and enables c10-fullness for them. There is only a very small number of ops affected by this. Putting this up for discussion. Alternatives considered: If we keep them optional, we run into lots of issues in the dispatcher. We have to decide what the dispatcher calling convention for this argument type should be. 1) If we keep passing them in as `Tensor&` arguments and return them as `tuple<Tensor&, Tensor&, Tensor&>`, so basically same as currently, then the schema inference check will say "Your kernel function got inferred to have a `Tensor` argument but your native_functions.yaml declaration says `Tensor?`. This is a mismatch, you made an error". We could potentially disable that check, but that would open the door for real mistakes to not be reported anymore in the future. This sounds bad. 2) If we change them to a type that schema inference could differentiate from `Tensor`, say we pass them in as `const optional<Tensor>&` and return them as `tuple<const optional<Tensor>&, const optional<Tensor>&, const optional<Tensor>&>`, then our boxing logic fails because it can't recognize those as out overloads anymore and shortcut the return value as it is doing right now. We might be able to rewrite the boxing logic, but that could be difficult and could easily develop into a rabbit hole of having to clean up `Tensor&` references throughout the system where we use them. Furthermore, having optional out arguments in C++ doesn't really make sense. the C++ API puts them to the front of the argument list, so you can't omit them anyways when calling an op. You would be able to omit them when calling from Python with out kwargs, but not sure if we want that discrepancy between the c++ and python API. ghstack-source-id: 118660075 Test Plan: waitforsandcastle Reviewed By: ezyang Differential Revision: D25422197 fbshipit-source-id: 3cb25c5a3d93f9eb960d70ca014bae485be9f058
Configuration menu - View commit details
-
Copy full SHA for 00e3716 - Browse repository at this point
Copy the full SHA 00e3716View commit details -
Making ops c10-full: optional lists (pytorch#49088)
Summary: Pull Request resolved: pytorch#49088 We had special case logic to support `int[]?` and `double[]?` but nothing for `DimnameList[]?`. This PR generalizes the logic to support optional lists so it should now work with all types. It also enables c10-fullness for ops that were blocked by this. Note that using these arguments in a signature was always and still is expensive because the whole list needs to be copied. We should probably consider alternatives in the future like for example using `torch::List` instead of `ArrayRef`, that could work without copying the list. ghstack-source-id: 118660071 Test Plan: waitforsandcastle Reviewed By: ezyang Differential Revision: D25423901 fbshipit-source-id: dec58dc29f3bb4cbd89e2b95c42da204a9da2e0a
Configuration menu - View commit details
-
Copy full SHA for bdfa87e - Browse repository at this point
Copy the full SHA bdfa87eView commit details -
[PyTorch] Avoid move-constructing a List in listConstruct (pytorch#49355
) Summary: Pull Request resolved: pytorch#49355 List's move ctor is a little bit more expensive than you might expect, but we can easily avoid it. ghstack-source-id: 118624596 Test Plan: Roughly 1% improvement on internal benchmark. Reviewed By: hlu1 Differential Revision: D25542190 fbshipit-source-id: 08532642c7d1f1604e16c8ebefd1ed3e56f7c919
Configuration menu - View commit details
-
Copy full SHA for 076d62f - Browse repository at this point
Copy the full SHA 076d62fView commit details -
Enhanced generators with grad-mode decorators (pytorch#49017)
Summary: This PR addresses the feature request outlined in pytorch#48713 for two-way communication with enhanced generators from [pep-342](https://www.python.org/dev/peps/pep-0342/). Briefly, the logic of the patch resembles `yield from` [pep-380](https://www.python.org/dev/peps/pep-0380/), which cannot be used, since the generator **must be interacted with from within the grad-mode context**, while yields from the decorator **must take place outside of the context**. Hence any interaction with the wrapped generator, be it via [.send](https://docs.python.org/3/reference/expressions.html?highlight=throw#generator.send), [.throw](https://docs.python.org/3/reference/expressions.html?highlight=throw#generator.throw), and even [.close](https://docs.python.org/3/reference/expressions.html?highlight=throw#generator.close) must be wrapped by a `with` clause. The patch is compatible with `for i in gen: pass` and `next(gen)` use cases and allows two-way communication with the generator via `.send <-> yield` points. ### Logic At lines [L37-L38](https://github.com/ivannz/pytorch/blob/2d40296c0c6617b3980c86762be466c995aa7f8e/torch/autograd/grad_mode.py#L37-L38) we (the decorator) **start the wrapped generator** (coroutine) by issuing `None` into it (equivalently, we can use `next(get)` here). Then we **dispatch responses of the generator** to our ultimate caller and **relay the latter's requests** into the generator in the loop on lines [L39-L52](https://github.com/ivannz/pytorch/blob/2d40296c0c6617b3980c86762be466c995aa7f8e/torch/autograd/grad_mode.py#L39-L52). We yield the most recent response on [L40-L41](https://github.com/ivannz/pytorch/blob/2d40296c0c6617b3980c86762be466c995aa7f8e/torch/autograd/grad_mode.py#L40-L41), at which point we become **paused**, waiting for the next ultimate caller's interaction with us. If the caller **sends us a request**, then we become unpaused and move to [L51-L52](https://github.com/ivannz/pytorch/blob/2d40296c0c6617b3980c86762be466c995aa7f8e/torch/autograd/grad_mode.py#L51-L52) and **forward it into the generator**, at which point we pause, waiting for its response. The response might be a value, an exception or a `StopIteration`. In the case of an exception from the generator, we let it **bubble up** from the immediately surrounding [except clause](https://docs.python.org/3/reference/compound_stmts.html#the-try-statement) to the ultimate caller through the [outer try-except](https://github.com/ivannz/pytorch/blob/2dc287bba87fa6f05c49446c0239ffdcdb1e896e/torch/autograd/grad_mode.py#L36-L54). In the case of a `StopIteration`, we **take it's payload and propagate it** to the caller via [return](https://github.com/ivannz/pytorch/blob/2d40296c0c6617b3980c86762be466c995aa7f8e/torch/autograd/grad_mode.py#L54). In the case of a value, the flow and the loop continues. The caller **throwing an exception at us** is handled much like a proper request, except for the exception playing the role of the request. In this case we **forward it into the generator** on lines [L47-L49](https://github.com/ivannz/pytorch/blob/2d40296c0c6617b3980c86762be466c995aa7f8e/torch/autograd/grad_mode.py#L47-L49) and await its response. We explicitly **advance** the traceback one frame up, in order to indicate the **source of the exception within the generator**. Finally the `GeneratorExit` is handled on lines [L42-L45](https://github.com/ivannz/pytorch/blob/2d40296c0c6617b3980c86762be466c995aa7f8e/torch/autograd/grad_mode.py#L42-L45) and closes the generator. Updates: clarified exception propagation Pull Request resolved: pytorch#49017 Reviewed By: izdeby Differential Revision: D25567796 Pulled By: albanD fbshipit-source-id: 801577cccfcb2b5e13a08e77faf407881343b7b0
Configuration menu - View commit details
-
Copy full SHA for 197266d - Browse repository at this point
Copy the full SHA 197266dView commit details -
webdataset prototype - ListDirFilesIterableDataset (pytorch#48944)
Summary: Pull Request resolved: pytorch#48944 This is a stack PR for webdataset prototype. I am trying to make each stack a separate dataset. To make the implementation simple, each dataset will only support the basic functionality. - [x] ListDirFilesDataset - [x] LoadFilesFromDiskIterableDataset - [x] ReadFilesFromTarIterableDataset - [x] ReadFilesFromZipIterableDataset - [x] RoutedDecoderIterableDataset Test Plan: Imported from OSS Reviewed By: izdeby Differential Revision: D25541277 Pulled By: glaringlee fbshipit-source-id: 9e738f6973493f6be1d5cc1feb7a91513fa5807c
lixinyu authored and hwangdeyu committedDec 23, 2020 Configuration menu - View commit details
-
Copy full SHA for 5165093 - Browse repository at this point
Copy the full SHA 5165093View commit details -
webdataset prototype - LoadFilesFromDiskIterableDataset (pytorch#48955)
Summary: Pull Request resolved: pytorch#48955 Test Plan: Imported from OSS Reviewed By: izdeby Differential Revision: D25541393 Pulled By: glaringlee fbshipit-source-id: dea6ad64a7ba40abe45612d99f078b14d1da8bbf
lixinyu authored and hwangdeyu committedDec 23, 2020 Configuration menu - View commit details
-
Copy full SHA for 9745362 - Browse repository at this point
Copy the full SHA 9745362View commit details -
CUDA BFloat embedding (pytorch#44848)
Summary: Pull Request resolved: pytorch#44848 Reviewed By: izdeby Differential Revision: D25574204 Pulled By: ngimel fbshipit-source-id: b35f7253a6ad2b83f7b6b06862a5ab77295373e0
Configuration menu - View commit details
-
Copy full SHA for bf3d1b4 - Browse repository at this point
Copy the full SHA bf3d1b4View commit details -
Instantiate PackedConvWeight to avoid linking error (pytorch#49442)
Summary: Pull Request resolved: pytorch#49442 When moving Aten/native to app level, symbols from native/quantized may sit in a target away from some of its call sites. As a result, there are linking errors of missing symbols of instantiations of PackedConvWeight::prepack. The solution is to instantiate PackedConvWeight in the same compilation unit. It's similar to D24941989 (pytorch@fe6bb2d). ghstack-source-id: 118676374 Test Plan: CI Reviewed By: dhruvbird Differential Revision: D25576703 fbshipit-source-id: d6e3d11d51d8172ab8487ce44ec8c042889f0f11
Configuration menu - View commit details
-
Copy full SHA for a213e48 - Browse repository at this point
Copy the full SHA a213e48View commit details -
.circleci: downgrade conda-package-handling to 1.6.0 (pytorch#49434)
Summary: Pull Request resolved: pytorch#49434 There was a bug that was introduced in conda-package-handling >= 1.6.1 that makes archives above a certain size fail out when attempting to extract see: conda/conda-package-handling#71 coincides with pytorch/builder#611 Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: xuzhao9, janeyx99, samestep Differential Revision: D25573390 Pulled By: seemethere fbshipit-source-id: 82173804f1b30da6e4b401c4949e2ee52065e149
Configuration menu - View commit details
-
Copy full SHA for d73c1f4 - Browse repository at this point
Copy the full SHA d73c1f4View commit details -
[Docs] Updating init_process_group docs to indicate correct rank range (
pytorch#49131) Summary: Pull Request resolved: pytorch#49131 Users frequently assume the correct range of ranks is 1 ... `world_size`. This PR udpates the docs to indicate that the correct rank range users should specify is 0 ... `world_size` - 1. Test Plan: Rendering and Building Docs Reviewed By: mrshenli Differential Revision: D25410532 fbshipit-source-id: fe0f17a4369b533dc98543204a38b8558e68497a
Configuration menu - View commit details
-
Copy full SHA for 98c4a4d - Browse repository at this point
Copy the full SHA 98c4a4dView commit details -
[c10d Store] Store Python Docs Fixes (pytorch#49130)
Summary: Pull Request resolved: pytorch#49130 The Python Store API docs had some typos, where boolean value were lower case, which is incorrect Python syntax. This diff fixes those typos. Test Plan: Built and Rendered Docs Reviewed By: mrshenli Differential Revision: D25411492 fbshipit-source-id: fdbf1e6b8f81e9589e638286946cad68eb7c9252
Configuration menu - View commit details
-
Copy full SHA for e9c93eb - Browse repository at this point
Copy the full SHA e9c93ebView commit details -
Add sinc operator (pytorch#48740)
Summary: Implements the sinc operator. See https://numpy.org/doc/stable/reference/generated/numpy.sinc.html ![image](https://user-images.githubusercontent.com/13428986/101653855-cdffa080-3a0d-11eb-8426-ecc81c152ebd.png) Pull Request resolved: pytorch#48740 Reviewed By: izdeby Differential Revision: D25564477 Pulled By: soulitzer fbshipit-source-id: 13f36a2b84dadfb4fd1442a2a40a3a3246cbaecb
Configuration menu - View commit details
-
Copy full SHA for 6cb4910 - Browse repository at this point
Copy the full SHA 6cb4910View commit details -
Revert "Revert D24923679: Fixed einsum compatibility/performance issu…
…es (pytorch#46398)" (pytorch#49189) Summary: Pull Request resolved: pytorch#49189 This reverts commit d307601 and fixes the bug with diagonals and ellipsis combined. Test Plan: Imported from OSS Reviewed By: glaringlee Differential Revision: D25540722 Pulled By: heitorschueroff fbshipit-source-id: 86d0c9a7dcfda600b546457dad102af2ff33e353
Configuration menu - View commit details
-
Copy full SHA for 7b4218c - Browse repository at this point
Copy the full SHA 7b4218cView commit details -
[caffe2][autograd] Avoid extensive -Wunused-variable warnings on _any…
…_requires_grad (pytorch#49167) Summary: Pull Request resolved: pytorch#49167 Building with clang and a fair warning level can result in hundreds of lines of compiler output of the form: ``` caffe2\gen_aten_libtorch\autograd\generated\VariableType_1.cpp(2279,8): warning: unused variable '_any_requires_grad' [-Wunused-variable] auto _any_requires_grad = compute_requires_grad( self ); ^ caffe2\gen_aten_libtorch\autograd\generated\VariableType_1.cpp(2461,8): warning: unused variable '_any_requires_grad' [-Wunused-variable] auto _any_requires_grad = compute_requires_grad( grad_output, self ); ^ caffe2\gen_aten_libtorch\autograd\generated\VariableType_1.cpp(2677,8): warning: unused variable '_any_requires_grad' [-Wunused-variable] auto _any_requires_grad = compute_requires_grad( self ); ^ ... ``` This happens when requires_derivative == False. Let's mark `_any_requires_grad` as potentially unused. If this were C++17 we would use `[[maybe_unused]]` but to retain compatibility with C++11 we just mark it with `(void)`. Test Plan: CI + locally built Reviewed By: ezyang Differential Revision: D25421548 fbshipit-source-id: c56279a184b1c616e8717a19ee8fad60f36f37d1
Configuration menu - View commit details
-
Copy full SHA for 6a56da9 - Browse repository at this point
Copy the full SHA 6a56da9View commit details -
Revert D25421263: [pytorch][PR] [numpy] torch.{all/any} : output dtyp…
…e is always bool Test Plan: revert-hammer Differential Revision: D25421263 (pytorch@c508e5b) Original commit changeset: c6c681ef9400 fbshipit-source-id: 4c0c9acf42b06a3ed0af8f757ea4512ca35b6c59
Configuration menu - View commit details
-
Copy full SHA for 5125131 - Browse repository at this point
Copy the full SHA 5125131View commit details -
Reland "Add test for empty tensors for batch matmuls" (pytorch#48797)
Summary: This reverts commit c7746ad. Fixes #{issue number} Pull Request resolved: pytorch#48797 Reviewed By: mruberry Differential Revision: D25575264 Pulled By: ngimel fbshipit-source-id: c7f3b384db833d727bb5bd8a51f1493a13016d09
Configuration menu - View commit details
-
Copy full SHA for c7ce84b - Browse repository at this point
Copy the full SHA c7ce84bView commit details -
Adding support for CuDNN-based LSTM with projections (pytorch#47725)
Summary: Fixes pytorch#46213 I didn't yet update the documentation, will add those change soon. A few other things that I didn't do, but want to clarify if I maybe should. 1. I didn't expose projections in c++ API: torch/csrc/api/src/nn/modules/rnn.cpp. Let me know if this is desirable and I will add those changes. 2. I didn't expose projections in "lstm_cell" function and "_thnn_differentiable_lstm_cell_backward" functions from aten/src/ATen/native/RNN.cpp. As far as I understand, they are not needed for nn.LSTM CPU execution. For lstm_cell, projections don't bring any real benefit, since if cell is used separately, it can be easily added in Python. For "_thnn_differentiable_lstm_cell_backward", I'm actually not sure where exactly that function is used, so I also disabled projections there for now. Please let me know if I should change that. 3. I added check that projections are not supported for quantized LSTMs to quantized_lstm_<data/input> functions. But I didn't add any checks to LSTMCell code. It seems that since I disabled projections in "lstm_cell" function, they should also not be available for quantized models through any other API than quantized_lstm_<data/input>. Please let me know if I'm not correct and I will add checks to other places. 4. Projections are not supported for CuDNN versions < 7.1.2. Should I add the check for CuDNN version and disable projections in that case? If so, what will be the best way to do that? 5. Currently I added projection weight as the last weight, so the layout is "w_ih, w_hh, b_ih, b_hh, w_hr". This breaks the assumption that biases come after weights and thus I had to add additional if-s in various places. Alternative way would be to have "w_ih, w_hh, w_hr, b_ih, b_hh" layout, in which case the assumption will be true. But in that case I will need to split the loop in get_parameters function from aten/src/ATen/native/cudnn/RNN.cpp. And in some cases, I will still need to add an "undefined" tensor in the 3rd position, because we get all 5 weights from CuDNN most of the time. So I'm not sure which way is better. Let me know if you think I should change to the weights-then-biases layout. Pull Request resolved: pytorch#47725 Reviewed By: zou3519 Differential Revision: D25449794 Pulled By: ngimel fbshipit-source-id: fe6ce59e481d1f5fd861a8ff7fa13d1affcedb0c
Igor Gitman authored and hwangdeyu committedDec 23, 2020 Configuration menu - View commit details
-
Copy full SHA for 1352101 - Browse repository at this point
Copy the full SHA 1352101View commit details -
Move inplace_is_vmap_compatible to BatchedTensorImpl.h (pytorch#49118)
Summary: Pull Request resolved: pytorch#49118 I need this in the next stack up. It seems useful to have as a helper function. Test Plan: - run tests Reviewed By: izdeby Differential Revision: D25563546 Pulled By: zou3519 fbshipit-source-id: a4031fdc4b2373cc230ba3c66738d91dcade96e2
Configuration menu - View commit details
-
Copy full SHA for 0991d63 - Browse repository at this point
Copy the full SHA 0991d63View commit details -
Update accumulate_grad to support vmap (pytorch#49119)
Summary: Pull Request resolved: pytorch#49119 I don't know how the accumulate_grad code gets hit via calling autograd.grad, so I went through all places in accumulate_grad that are definitely impossible to vmap through and changed them. To support this: - I added vmap support for Tensor::strides(). It returns the strides that correspond to the public dimensions of the tensor (not the ones being vmapped over). - Changed an instance of empty_strided to new_empty_strided. - Replaced an in-place operation in accumulate_grad.h Test Plan: - added a test for calling strides() inside of vmap - added tests that exercise all of the accumulate_grad code path. NB: I don't know why these tests exercise the code paths, but I've verified that they do via gdb. Suggestions for some saner test cases are very welcome. Reviewed By: izdeby Differential Revision: D25563543 Pulled By: zou3519 fbshipit-source-id: 05ac6c549ebd447416e6a07c263a16c90b2ef510
Configuration menu - View commit details
-
Copy full SHA for b2acf95 - Browse repository at this point
Copy the full SHA b2acf95View commit details -
Update TensorPipe submodule (pytorch#49467)
Summary: Pull Request resolved: pytorch#49467 Credit to beauby for the Bazel fixes. Test Plan: Export and run on CI Reviewed By: beauby Differential Revision: D25588027 fbshipit-source-id: efe1c543eb7438ca05254de67cf8b5cee625119a
Configuration menu - View commit details
-
Copy full SHA for da5c385 - Browse repository at this point
Copy the full SHA da5c385View commit details -
Add docs/README.md to make existing doc build info more discoverable (p…
…ytorch#49286) Summary: Closes pytorchgh-42003 Pull Request resolved: pytorch#49286 Reviewed By: glaringlee Differential Revision: D25535250 Pulled By: ezyang fbshipit-source-id: a7790bfe4528fa6a31698126cc687793fdf7ac3f
Configuration menu - View commit details
-
Copy full SHA for 94344a2 - Browse repository at this point
Copy the full SHA 94344a2View commit details -
Updated derivative rules for complex svd and pinverse (pytorch#47761)
Summary: Updated `svd_backward` to work correctly for complex-valued inputs. Updated `common_methods_invocations.py` to take dtype, device arguments for input construction. Removed `test_pinverse` from `test_autograd.py`, it is replaced by entries to `common_methods_invocations.py`. Added `svd` and `pinverse` to list of complex tests. References for complex-valued SVD differentiation: - https://giggleliu.github.io/2019/04/02/einsumbp.html - https://arxiv.org/abs/1909.02659 The derived rules assume gauge invariance of loss functions, so the result would not be correct for loss functions that are not gauge invariant. https://re-ra.xyz/Gauge-Problem-in-Automatic-Differentiation/ The same rule is implemented in Tensorflow and [BackwardsLinalg.jl](https://github.com/GiggleLiu/BackwardsLinalg.jl). Ref. pytorch#33152 Pull Request resolved: pytorch#47761 Reviewed By: izdeby Differential Revision: D25574962 Pulled By: mruberry fbshipit-source-id: 832b61303e883ad3a451b84850ccf0f36763a6f6
Configuration menu - View commit details
-
Copy full SHA for 6315a7e - Browse repository at this point
Copy the full SHA 6315a7eView commit details -
[quant][docs] Add fx graph mode quantization to quantization docs (py…
…torch#49211) Summary: Pull Request resolved: pytorch#49211 Test Plan: Imported from OSS Reviewed By: raghuramank100 Differential Revision: D25507480 fbshipit-source-id: 9e9e4b5fef979f5621c1bbd1b49e9cc6830da617
Configuration menu - View commit details
-
Copy full SHA for bbaa6bb - Browse repository at this point
Copy the full SHA bbaa6bbView commit details -
stft: Change require_complex warning to an error (pytorch#49022)
Summary: Pull Request resolved: pytorch#49022 Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D25569586 Pulled By: mruberry fbshipit-source-id: 09608088f540c2c3fc70465f6a23f2aec5f24f85
Configuration menu - View commit details
-
Copy full SHA for 0d82603 - Browse repository at this point
Copy the full SHA 0d82603View commit details -
Revert D25564477: [pytorch][PR] Add sinc operator
Test Plan: revert-hammer Differential Revision: D25564477 (pytorch@bbc7143) Original commit changeset: 13f36a2b84da fbshipit-source-id: 58cbe8109efaf499dd017531878b9fbbb27976bc
Configuration menu - View commit details
-
Copy full SHA for 0176da6 - Browse repository at this point
Copy the full SHA 0176da6View commit details -
Making ops c10-full: Storage arguments (pytorch#49146)
Summary: Pull Request resolved: pytorch#49146 Add support for Storage arguments to IValue and the JIT typing system, and make ops that were blocked on that c10-full. ghstack-source-id: 118710665 (Note: this ignores all push blocking failures!) Test Plan: waitforsandcastle Reviewed By: ezyang Differential Revision: D25456799 fbshipit-source-id: da14f125af352de5fcf05a83a69ad5a69d5a3b45
Configuration menu - View commit details
-
Copy full SHA for 8dcd580 - Browse repository at this point
Copy the full SHA 8dcd580View commit details -
Allow zero annealing epochs (pytorch#47579)
Summary: Fixes pytorch#47578. Pull Request resolved: pytorch#47579 Reviewed By: H-Huang Differential Revision: D25429403 Pulled By: vincentqb fbshipit-source-id: c42fbcd71b46e07c672a1e9661468848ac16de38
Configuration menu - View commit details
-
Copy full SHA for 6f50a18 - Browse repository at this point
Copy the full SHA 6f50a18View commit details -
Revert D25507480: [quant][docs] Add fx graph mode quantization to qua…
…ntization docs Test Plan: revert-hammer Differential Revision: D25507480 (pytorch@7729581) Original commit changeset: 9e9e4b5fef97 fbshipit-source-id: fdb08d824209b97defaba2e207d1a914575a6ae7
Mike Ruberry authored and hwangdeyu committedDec 23, 2020 Configuration menu - View commit details
-
Copy full SHA for 3bbc766 - Browse repository at this point
Copy the full SHA 3bbc766View commit details -
Fix link in distributed contributing doc and add link (pytorch#49141)
Summary: One of the links for ramp up tasks wasn't showing any results and the other was only RPC results. Instead of this, I just changed it to one link that has `pt_distributed_rampup` which seems reasonable as the developer will be able to see both RPC and distributed tasks. Also added test command for DDP tests. Pull Request resolved: pytorch#49141 Reviewed By: ezyang Differential Revision: D25597560 Pulled By: rohan-varma fbshipit-source-id: 85d7d2964a19ea69fe149c017cf88dff835b164a
Configuration menu - View commit details
-
Copy full SHA for e7b6a29 - Browse repository at this point
Copy the full SHA e7b6a29View commit details -
Add note to torch docs for sinh/cosh (pytorch#49413)
Summary: Address pytorch#48641 Documents the behavior of sinh and cosh in the edge cases ``` >>> b = torch.full((15,), 89, dtype=torch.float32) >>> torch.sinh(b) tensor([2.2448e+38, 2.2448e+38, 2.2448e+38, 2.2448e+38, 2.2448e+38, 2.2448e+38, 2.2448e+38, 2.2448e+38, 2.2448e+38, 2.2448e+38, 2.2448e+38, 2.2448e+38, 2.2448e+38, 2.2448e+38, 2.2448e+38]) >>> b = torch.full((16,), 89, dtype=torch.float32) >>> torch.sinh(b) tensor([inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf]) >>> b = torch.full((17,), 89, dtype=torch.float32) >>> torch.sinh(b) tensor([ inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, 2.2448e+38]) >>> b = torch.full((32,), 89, dtype=torch.float32)[::2] >>> torch.sinh(b) tensor([2.2448e+38, 2.2448e+38, 2.2448e+38, 2.2448e+38, 2.2448e+38, 2.2448e+38, 2.2448e+38, 2.2448e+38, 2.2448e+38, 2.2448e+38, 2.2448e+38, 2.2448e+38, 2.2448e+38, 2.2448e+38, 2.2448e+38, 2.2448e+38]) ``` See https://sleef.org/purec.xhtml Pull Request resolved: pytorch#49413 Reviewed By: ezyang Differential Revision: D25587932 Pulled By: soulitzer fbshipit-source-id: 6db75c45786f4b95f82459d0ce5efa37ec0774f0
Configuration menu - View commit details
-
Copy full SHA for 470a9cf - Browse repository at this point
Copy the full SHA 470a9cfView commit details -
Refine
ConvParams::use_nnpack()
(pytorch#49464)Summary: NNPACK convolution algorithms can only be used for kernels up to 16x16 Fixes pytorch#49462 Pull Request resolved: pytorch#49464 Reviewed By: xuzhao9 Differential Revision: D25587879 Pulled By: malfet fbshipit-source-id: 658197f23c08cab97f0849213ecee3f91f96c932
Configuration menu - View commit details
-
Copy full SHA for ce124c2 - Browse repository at this point
Copy the full SHA ce124c2View commit details -
T66557700 Support default argument values of a method (pytorch#48863)
Summary: Pull Request resolved: pytorch#48863 Support default arguments when invoking a module via PyTorch Lite (`mobile::Module`). Test Plan: buck test mode/dbg //caffe2/test/cpp/jit:jit -- LiteInterpreterTest.MethodInvocation buck test mode/dbg caffe2/test:mobile -- test_method_calls_with_optional_arg Reviewed By: raziel, iseeyuan Differential Revision: D25152559 fbshipit-source-id: bbf52f1fbdbfbc6f8fa8b65ab524b1cd4648f9c0
Configuration menu - View commit details
-
Copy full SHA for 0998854 - Browse repository at this point
Copy the full SHA 0998854View commit details -
[PyTorch] Merge CoinflipTLS into RecordFunctionTLS (pytorch#49359)
Summary: Pull Request resolved: pytorch#49359 This should be both slightly more efficient (1 less TLS guard check in at::shouldRunRecordFunction) and definitely more correct (CoinflipTLS is now saved whenever RecordFunctionTLS is saved), fixing a bad merge that left RecordFunctionTLS::tries_left dead. ghstack-source-id: 118624402 Test Plan: Review, CI Reviewed By: hlu1 Differential Revision: D25542799 fbshipit-source-id: 310f9fd157101f659cea13c331b2a0ee6db2db88
Configuration menu - View commit details
-
Copy full SHA for c971a62 - Browse repository at this point
Copy the full SHA c971a62View commit details -
[PyTorch] Avoid extra Tensor refcounting in _cat_out_cpu (pytorch#49364)
Summary: Pull Request resolved: pytorch#49364 We had a local `Tensor` when we only needed a `const Tensor&`. ghstack-source-id: 118624595 Test Plan: Internal benchmark. Reviewed By: hlu1 Differential Revision: D25544731 fbshipit-source-id: 7b9656d0371ab65a6313cb0ad4aa1df707884c1c
Configuration menu - View commit details
-
Copy full SHA for 4df68b3 - Browse repository at this point
Copy the full SHA 4df68b3View commit details -
[PyTorch] Use .sizes() instead of .size() in _cat_out_cpu (pytorch#49368
) Summary: Pull Request resolved: pytorch#49368 The former is faster because it doesn't allow negative indexing (which we don't use). ghstack-source-id: 118624598 Test Plan: internal benchmark Reviewed By: hlu1 Differential Revision: D25545777 fbshipit-source-id: b2714fac95c801fd735fac25b238b4a79b012993
Configuration menu - View commit details
-
Copy full SHA for bff610b - Browse repository at this point
Copy the full SHA bff610bView commit details -
[PyTorch] Use .sizes() isntead of .size() in cat_serial_kernel_impl (p…
…ytorch#49371) Summary: Pull Request resolved: pytorch#49371 As with previous diff, .sizes() is strictly more efficient. ghstack-source-id: 118627223 Test Plan: internal benchmark Differential Revision: D25546409 fbshipit-source-id: 196034716b6e11efda1ec8cb1e0fce7732d73eb4
Configuration menu - View commit details
-
Copy full SHA for 51e4cc9 - Browse repository at this point
Copy the full SHA 51e4cc9View commit details -
[PyTorch] Make tls_local_dispatch_key_set inlineable (reapply) (pytor…
…ch#49412) Summary: Pull Request resolved: pytorch#49412 FLAGS_disable_variable_dispatch had to go, but it looks like the only user was some benchmarks anyway. ghstack-source-id: 118669590 Test Plan: Small (order of 0.1% improvement) on Internal benchmarks. Wait for GitHub CI since this was reverted before due to CI break Reviewed By: ezyang Differential Revision: D25547962 fbshipit-source-id: 58424b1da230fdc5d27349af762126a5512fce43
Configuration menu - View commit details
-
Copy full SHA for e70d3f0 - Browse repository at this point
Copy the full SHA e70d3f0View commit details -
BFloat16: add explicit dtype support for to_mkldnn and to_dense (pyto…
…rch#48881) Summary: Pull Request resolved: pytorch#48881 Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D25537190 Pulled By: VitalyFedyunin fbshipit-source-id: a61a433c638e2e95576f88f081b64ff171b2316e
Configuration menu - View commit details
-
Copy full SHA for fb4da16 - Browse repository at this point
Copy the full SHA fb4da16View commit details -
Introduce tools.codegen.api.translate (pytorch#49122)
Summary: Pull Request resolved: pytorch#49122 cpparguments_exprs has induced a lot of head scratching in many recent PRs for how to structure the code in a good way. This PR eliminates the old algorithm for an entirely new algorithm inspired by logic programming. The net result is shorter, cleaner and should be more robust to future changes. This PR is a bit of a whopper. Here is the order to review it. - tools/codegen/api/types.py - Deleted CppArgument, CppArgumentPackIface (and subclasses), CppExpr, DispatcherExpr, DispatcherArgument, NativeExpr, NativeArgument, MetaArgument. All things previously called XArgument are now Binding. All things previously called XExpr are now Expr. I deleted the `__str__` implementation on Binding and fixed all call sites not to use it. On Binding, I renamed `str_no_default` and `str_default` to `defn` and `decl` for better symmetry with the corresponding signature concepts, although I'm open to naming them back to their original versions. - Obviously, things are less type safe without the class distinctions. So I introduce a new ADT called CType. CType represents the *semantic C++ type* of a binding: it is both the C++ type (e.g., `const Tensor&`) as well as the argument name that specifies what the binding denotes (e.g., `other`). Every binding now records its CType. The key observation here is that you don't actually care if a given expression is from the cpp or dispatcher or native API; what you care is having enough information to know what the expression means, so you can use it appropriately. CType has this information. For the most part, ArgNames are just the string names of the arguments as you see them in JIT schema, but there is one case (`possibly_redundant_memory_format`) where we encode a little extra information. Unlike the plain strings we previously used to represent C++ types, CType have a little bit of structure around optional and references, because the translation code needs to work around these concepts. - I took the opportunity to kill all of the private fields like `_arguments` and `_returns_type` (since the argument types don't make sense anymore). Everything is computed for you on the fly. If this is a perf problem in codegen we can start using `cached_property` decorator. - All of the heavy lifting in CppSignature.argument_packs has been moved to the cpp module. We'll head over there next. Similarly, all of the exprs methods are now calling translate, the new functionality which we haven't gotten to yet - tools/codegen/api/cpp.py - We refactor all of the type computation functions to return CType instead of str. Because CTypes need to know the denotation, there is a new `binds: ArgName` argument to most functions that provides the denotation, so we can slot it in. (An alternative would have been to construct CTypes without denotations and then fill them in post-facto, but I didn't do it this way. One downside is there are some places where I need a CType without denotation, so I fill these in with `__placeholder__` whenever this happens). - `argument` and `arguments` are now extremely simple. There is no more Pack business, just produce one or more Bindings. The one thing of note is that when both a `memory_format` and `options` are in scope, we label the memory format as `possibly_redundant_memory_format`. This will be used in translation - tools/codegen/api/dispatcher.py and tools/codegen/api/native.py - same deal as cpp.py. One thing is that `cpparguments_exprs` is deleted; that is in the translator - tools/codegen/api/translate.py - the translator! It uses a very simple backwards deduction engine to work out how to fill in the arguments of functions. There are comments in the file that explain how it works. - Everything else: just some small call site tweaks for places when I changed API. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: ljk53 Differential Revision: D25455887 Pulled By: ezyang fbshipit-source-id: 90dc58d420d4cc49281aa8647987c69f3ed42fa6
Configuration menu - View commit details
-
Copy full SHA for 15bc45f - Browse repository at this point
Copy the full SHA 15bc45fView commit details -
Revert D25569586: stft: Change require_complex warning to an error
Test Plan: revert-hammer Differential Revision: D25569586 (pytorch@5874925) Original commit changeset: 09608088f540 fbshipit-source-id: 6a5953b327a4a2465b046e29bb007a0c5f4cf14a
Mike Ruberry authored and hwangdeyu committedDec 23, 2020 Configuration menu - View commit details
-
Copy full SHA for c482c5d - Browse repository at this point
Copy the full SHA c482c5dView commit details -
[NNC] Dont inline outputs buffers on cpu (pytorch#49488)
Summary: In pytorch#48967 we enabled output buffer inlining, which results in duplicate computation if one output depends on another. This was done to fix correctness for CUDA, but is not needed for correctness for CPU and results in perf slowdown. The output buffer inlining solution for CUDA is intended to be an interim solution because it does not work with reductions. Pull Request resolved: pytorch#49488 Reviewed By: ezyang Differential Revision: D25596071 Pulled By: eellison fbshipit-source-id: bc3d987645da5ce3c603b4abac3586b169656cfd
Elias Ellison authored and hwangdeyu committedDec 23, 2020 Configuration menu - View commit details
-
Copy full SHA for fb0a942 - Browse repository at this point
Copy the full SHA fb0a942View commit details -
Prevent accidentally writing old style ops (pytorch#49510)
Summary: Pull Request resolved: pytorch#49510 Adding old style operators with out arguments will break XLA. This prevents that. See for background: https://fb.workplace.com/groups/pytorch.dev/permalink/809934446251704/ This is a temporary change that will prevent this breakage for the next couple of days until the problem is resolved for good. It will be deleted in pytorch#49164 then. ghstack-source-id: 118756437 (Note: this ignores all push blocking failures!) Test Plan: waitforsandcastle Reviewed By: bhosmer Differential Revision: D25599112 fbshipit-source-id: 6b0ca4da4b55da8aab9d1b332cd9f68e7602301e
Configuration menu - View commit details
-
Copy full SHA for c694e7d - Browse repository at this point
Copy the full SHA c694e7dView commit details -
.circleci: Only downgrade if we have conda (pytorch#49519)
Summary: Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Fixes #{issue number} Pull Request resolved: pytorch#49519 Reviewed By: robieta Differential Revision: D25603779 Pulled By: seemethere fbshipit-source-id: ca8d811925762a5a413ca906d94c974a4ac5b132
Configuration menu - View commit details
-
Copy full SHA for b39b6cb - Browse repository at this point
Copy the full SHA b39b6cbView commit details -
Fix bad error message when int overflow (pytorch#48250)
Summary: Fixes pytorch#48114 Before: ``` >>> torch.empty(2 * 10 ** 20) Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: empty(): argument 'size' must be tuple of ints, but found element of type int at pos 1 ``` After fix: ``` >>> torch.empty(2 * 10 ** 20) Traceback (most recent call last): File "<stdin>", line 1, in <module> RuntimeError: Overflow when unpacking long ``` Unclear whether we need a separate test for this case, I can add one if it's necessary... Pull Request resolved: pytorch#48250 Reviewed By: linbinyu Differential Revision: D25105217 Pulled By: ezyang fbshipit-source-id: a5aa7c0266945c8125210a2fd34ce4b6ba940c92
Configuration menu - View commit details
-
Copy full SHA for 3be7381 - Browse repository at this point
Copy the full SHA 3be7381View commit details -
Relax the atol/rtol of layernorm math kernel test. (pytorch#49507)
Summary: Pull Request resolved: pytorch#49507 Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D25598424 Pulled By: ailzhang fbshipit-source-id: b3f43e84f177cf7c14831b0b83a399b155c813c4
Ailing Zhang authored and hwangdeyu committedDec 23, 2020 Configuration menu - View commit details
-
Copy full SHA for 12c9616 - Browse repository at this point
Copy the full SHA 12c9616View commit details -
Fix CUDA extension ninja build (pytorch#49344)
Summary: I am submitting this PR on behalf of Janne Hellsten(nurpax) from NVIDIA, for the convenience of CLA. Thanks Janne a lot for the contribution! Currently, the ninja build decides whether to rebuild a .cu file or not pretty randomly. And there are actually two issues: First, the arch list in the building command is ordered randomly. When the order changes, it will unconditionally rebuild regardless of the timestamp. Second, the header files are not included in the dependency list, so if the header file changes, it is possible that ninja will not rebuild. This PR fixes both issues. The fix for the second issue requires nvcc >= 10.2. nvcc < 10.2 can still build CUDA extension as it used to be, but it will be unable to see the changes in header files. Pull Request resolved: pytorch#49344 Reviewed By: glaringlee Differential Revision: D25540157 Pulled By: ezyang fbshipit-source-id: 197541690d7f25e3ac5ebe3188beb1f131a4c51f
Configuration menu - View commit details
-
Copy full SHA for 2aa0817 - Browse repository at this point
Copy the full SHA 2aa0817View commit details -
[extensions] fix
is_ninja_available
during cuda extension building (p……ytorch#49443) Summary: tldr: current version of `is_ninja_available` of `torch/utils/cpp_extension.py` fails to run in the recent incarnations of pip w/ new build isolation feature which is now a default. This PR fixes this problem. The full story follows: -------------------------- Currently trying to build https://github.com/facebookresearch/fairscale/ which builds cuda extensions fails with the recent pip versions. The build is failing to perform `is_ninja_available`, which runs a simple subprocess to run `ninja --version` but does it with some /dev/null stream override which seems to break with the new pip versions. Currently I have `pip==20.3.3`. The recent pip performs build isolation which first fetches all dependencies to somewhere under /tmp/pip-install-xyz and then builds the package. If I build: ``` pip install fairscale --no-build-isolation ``` everything works. When building normally (i.e. without `--no-build-isolation`), the failure is a long long trace, <details> <summary>Full log</summary> <pre> pip install fairscale Collecting fairscale Downloading fairscale-0.1.1.tar.gz (83 kB) |████████████████████████████████| 83 kB 562 kB/s Installing build dependencies ... done Getting requirements to build wheel ... error ERROR: Command errored out with exit status 1: command: /home/stas/anaconda3/envs/main-38/bin/python /home/stas/anaconda3/envs/main-38/lib/python3.8/site-packages/pip/_vendor/pep517/_in_process.py get_requires_for_build_wheel /tmp/tmpjvw00c7v cwd: /tmp/pip-install-1wq9f8fp/fairscale_347f218384a64f24b8d5ce846641213e Complete output (55 lines): running egg_info writing fairscale.egg-info/PKG-INFO writing dependency_links to fairscale.egg-info/dependency_links.txt writing requirements to fairscale.egg-info/requires.txt writing top-level names to fairscale.egg-info/top_level.txt Traceback (most recent call last): File "/home/stas/anaconda3/envs/main-38/bin/ninja", line 5, in <module> from ninja import ninja ModuleNotFoundError: No module named 'ninja' Traceback (most recent call last): File "/home/stas/anaconda3/envs/main-38/lib/python3.8/site-packages/pip/_vendor/pep517/_in_process.py", line 280, in <module> main() File "/home/stas/anaconda3/envs/main-38/lib/python3.8/site-packages/pip/_vendor/pep517/_in_process.py", line 263, in main json_out['return_val'] = hook(**hook_input['kwargs']) File "/home/stas/anaconda3/envs/main-38/lib/python3.8/site-packages/pip/_vendor/pep517/_in_process.py", line 114, in get_requires_for_build_wheel return hook(config_settings) File "/tmp/pip-build-env-a5x2icen/overlay/lib/python3.8/site-packages/setuptools/build_meta.py", line 149, in get_requires_for_build_wheel return self._get_build_requires( File "/tmp/pip-build-env-a5x2icen/overlay/lib/python3.8/site-packages/setuptools/build_meta.py", line 130, in _get_build_requires self.run_setup() File "/tmp/pip-build-env-a5x2icen/overlay/lib/python3.8/site-packages/setuptools/build_meta.py", line 145, in run_setup exec(compile(code, __file__, 'exec'), locals()) File "setup.py", line 56, in <module> setuptools.setup( File "/tmp/pip-build-env-a5x2icen/overlay/lib/python3.8/site-packages/setuptools/__init__.py", line 153, in setup return distutils.core.setup(**attrs) File "/home/stas/anaconda3/envs/main-38/lib/python3.8/distutils/core.py", line 148, in setup dist.run_commands() File "/home/stas/anaconda3/envs/main-38/lib/python3.8/distutils/dist.py", line 966, in run_commands self.run_command(cmd) File "/home/stas/anaconda3/envs/main-38/lib/python3.8/distutils/dist.py", line 985, in run_command cmd_obj.run() File "/tmp/pip-build-env-a5x2icen/overlay/lib/python3.8/site-packages/setuptools/command/egg_info.py", line 298, in run self.find_sources() File "/tmp/pip-build-env-a5x2icen/overlay/lib/python3.8/site-packages/setuptools/command/egg_info.py", line 305, in find_sources mm.run() File "/tmp/pip-build-env-a5x2icen/overlay/lib/python3.8/site-packages/setuptools/command/egg_info.py", line 536, in run self.add_defaults() File "/tmp/pip-build-env-a5x2icen/overlay/lib/python3.8/site-packages/setuptools/command/egg_info.py", line 572, in add_defaults sdist.add_defaults(self) File "/home/stas/anaconda3/envs/main-38/lib/python3.8/distutils/command/sdist.py", line 228, in add_defaults self._add_defaults_ext() File "/home/stas/anaconda3/envs/main-38/lib/python3.8/distutils/command/sdist.py", line 311, in _add_defaults_ext build_ext = self.get_finalized_command('build_ext') File "/home/stas/anaconda3/envs/main-38/lib/python3.8/distutils/cmd.py", line 298, in get_finalized_command cmd_obj = self.distribution.get_command_obj(command, create) File "/home/stas/anaconda3/envs/main-38/lib/python3.8/distutils/dist.py", line 858, in get_command_obj cmd_obj = self.command_obj[command] = klass(self) File "/tmp/pip-build-env-a5x2icen/overlay/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 351, in __init__ if not is_ninja_available(): File "/tmp/pip-build-env-a5x2icen/overlay/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1310, in is_ninja_available subprocess.check_call('ninja --version'.split(), stdout=devnull) File "/home/stas/anaconda3/envs/main-38/lib/python3.8/subprocess.py", line 364, in check_call raise CalledProcessError(retcode, cmd) subprocess.CalledProcessError: Command '['ninja', '--version']' returned non-zero exit status 1. ---------------------------------------- ERROR: Command errored out with exit status 1: /home/stas/anaconda3/envs/main-38/bin/python /home/stas/anaconda3/envs/main-38/lib/python3.8/site-packages/pip/_vendor/pep517/_in_process.py get_requires_for_build_wheel /tmp/tmpjvw00c7v Check the logs for full command output. </pre> </details> and the middle of it is what we want: ``` File "/tmp/pip-build-env-a5x2icen/overlay/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 351, in __init__ if not is_ninja_available(): File "/tmp/pip-build-env-a5x2icen/overlay/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1310, in is_ninja_available subprocess.check_call('ninja --version'.split(), stdout=devnull) File "/home/stas/anaconda3/envs/main-38/lib/python3.8/subprocess.py", line 364, in check_call raise CalledProcessError(retcode, cmd) subprocess.CalledProcessError: Command '['ninja', '--version']' returned non-zero exit status 1. ``` For some reason pytorch fails to run this simple code: ``` # torch/utils/cpp_extension.py def is_ninja_available(): r''' Returns ``True`` if the `ninja <https://ninja-build.org/>`_ build system is available on the system, ``False`` otherwise. ''' with open(os.devnull, 'wb') as devnull: try: subprocess.check_call('ninja --version'.split(), stdout=devnull) except OSError: return False else: return True ``` I suspect that pip does something to `os.devnull` and that's why it fails. This PR proposes a simpler code which doesn't rely on anything but `subprocess.check_output`: ``` def is_ninja_available(): r''' Returns ``True`` if the `ninja <https://ninja-build.org/>`_ build system is available on the system, ``False`` otherwise. ''' try: subprocess.check_output('ninja --version'.split()) except Exception: return False else: return True ``` which doesn't use `os.devnull` and performs the same function. There could be a whole bunch of different exceptions there I think, so I went for the generic one - we don't care why it failed, since this function's only purpose is to suggest whether ninja can be used or not. Let's check ``` python -c "import torch.utils.cpp_extension; print(torch.utils.cpp_extension.is_ninja_available())" True ``` Look ma - no std noise to take care of. (i.e. no need for /dev/null). I was editing the installed environment-wide `cpp_extension.py` file directly, so didn't need to tweak `PYTHONPATH` - I made sure to replace `'ninja --version'.` with something that should fail and I did get `False` for the above command line. I next did a somewhat elaborate cheat to re-package an already existing binary wheel with this corrected version of `cpp_extension.py`, rather than building from source: ``` mkdir /tmp/pytorch-local-channel cd /tmp/pytorch-local-channel # get the latest nightly wheel wget https://download.pytorch.org/whl/nightly/cu110/torch-1.8.0.dev20201215%2Bcu110-cp38-cp38-linux_x86_64.whl # unpack it unzip torch-1.8.0.dev20201215+cu110-cp38-cp38-linux_x86_64.whl # edit torch/utils/cpp_extension.py to fix the python code with the new version as in this PR emacs torch/utils/cpp_extension.py & # pack the files back zip -r torch-1.8.0.dev20201215+cu110-cp38-cp38-linux_x86_64.whl caffe2 torch torch-1.8.0.dev20201215+cu110.dist-info ``` Now I tell pip to use my local channel, plus `--pre` for it to pick up the pre-release as an acceptable wheel ``` # install using this local channel git clone https://github.com/facebookresearch/fairscale/ cd fairscale pip install -v --disable-pip-version-check -e . -f file:///tmp/pytorch-local-channel --pre ``` and voila all works. ``` [...] Successfully installed fairscale ``` I noticed a whole bunch of ninja not found errors in the log, which I think is the same problem with other parts of the build system packages which also use this old check copied all over various projects and build tools, and which the recent pip breaks. ``` writing manifest file '/tmp/pip-modern-metadata-_nsdesbq/fairscale.egg-info/SOURCES.txt' Traceback (most recent call last): File "/home/stas/anaconda3/envs/main-38/bin/ninja", line 5, in <module> from ninja import ninja ModuleNotFoundError: No module named 'ninja' [...] /tmp/pip-build-env-fqflyevr/overlay/lib/python3.8/site-packages/torch/utils/cpp_extension.py:364: UserWarning: Attempted to use ninja as the BuildExtension backend but we could not find ninja.. Falling back to using the slow distutils backend. warnings.warn(msg.format('we could not find ninja.')) ``` but these don't prevent from the build completing and installing. I suppose these need to be identified and reported to various other projects, but that's another story. The new pip does something to `os.devnull` I think which breaks any code relying on it - I haven't tried to figure out what happens to that stream object, but this PR which removes its usage solves the problem. Also do notice that: ``` git clone https://github.com/facebookresearch/fairscale/ cd fairscale python setup.py bdist_wheel pip install dist/fairscale-0.1.1-cp38-cp38-linux_x86_64.whl ``` works too. So it is really a pip issue. Apologies if the notes are too many, I tried to give the complete picture and probably other projects will need those details as well. Thank you for reading. Pull Request resolved: pytorch#49443 Reviewed By: mruberry Differential Revision: D25592109 Pulled By: ezyang fbshipit-source-id: bfce4420c28b614ead48e9686f4153c6e0fbe8b7
Configuration menu - View commit details
-
Copy full SHA for dc052aa - Browse repository at this point
Copy the full SHA dc052aaView commit details -
[NNC] Add Support For is_nan (pytorch#48973)
Summary: Pull Request resolved: pytorch#48973 Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D25413166 Pulled By: eellison fbshipit-source-id: 0c79258345df18c60a862373fa16931228fb92ef
Elias Ellison authored and hwangdeyu committedDec 23, 2020 Configuration menu - View commit details
-
Copy full SHA for 6362b78 - Browse repository at this point
Copy the full SHA 6362b78View commit details -
[NNC] add support for masked_fill (pytorch#48974)
Summary: Pull Request resolved: pytorch#48974 Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D25413165 Pulled By: eellison fbshipit-source-id: 8cece1dc3692389be90c0d77bd71b103254d5ad3
Elias Ellison authored and hwangdeyu committedDec 23, 2020 Configuration menu - View commit details
-
Copy full SHA for a0d6342 - Browse repository at this point
Copy the full SHA a0d6342View commit details -
Add fusion support of aten::to (pytorch#48976)
Summary: Pull Request resolved: pytorch#48976 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D25413164 Pulled By: eellison fbshipit-source-id: 0c31787e8b5e1368b0cba6e23660799b652389cd
Elias Ellison authored and hwangdeyu committedDec 23, 2020 Configuration menu - View commit details
-
Copy full SHA for 08fd21f - Browse repository at this point
Copy the full SHA 08fd21fView commit details -
eager quant: remove fake_quant after add/mul nodes during QAT (pytorc…
…h#49213) Summary: Pull Request resolved: pytorch#49213 Changes behavior of Eager mode quantization to remove observation after add_scalar/mul_scalar. This is not used, and it removes one difference between Eager and FX modes. Test Plan: ``` python test/test_quantization.py TestQuantizeFxOps.test_quantized_add_qat python test/test_quantization.py TestQuantizeFxOps.test_quantized_mul_qat python test/test_quantization.py TestQuantizationAwareTraining.test_add_scalar_uses_input_qparams python test/test_quantization.py TestQuantizationAwareTraining.test_mul_scalar_uses_input_qparams ``` Imported from OSS Reviewed By: jerryzh168 Differential Revision: D25486276 fbshipit-source-id: 34a5d6ce0d08739319ec0f8b197cfc1309d71040
Configuration menu - View commit details
-
Copy full SHA for 5ac65cb - Browse repository at this point
Copy the full SHA 5ac65cbView commit details -
fx quant: move {input|output}_quantized_idxs cfg from convert to prep…
…are (pytorch#49238) Summary: Pull Request resolved: pytorch#49238 Moves the `input_quantized_idxs` and `output_quantized_idxs` options from the convert config to the prepare config. This is done because these operations are related to placing observers, which is numerics changing during QAT. The next PR will adjust the behavior of `input_quantized_idxs` in prepare in QAT to prevent placing a fake_quant at the input if the input is marked quantized. Placing a fake_quant there can lead to numerical inaccuracies during calibration, as it would start with scale=1 and zp=0, which may be different from the quantization parameters of the incoming quantized input. Test Plan: ``` python test/test_quantization.py TestQuantizeFx ``` Imported from OSS Reviewed By: jerryzh168 Differential Revision: D25498762 fbshipit-source-id: 17ace8f803542155652b310e5539e1882ebaadc6
Configuration menu - View commit details
-
Copy full SHA for 6c5a43d - Browse repository at this point
Copy the full SHA 6c5a43dView commit details -
fx quant: do not insert observers at quantized inputs (pytorch#49239)
Summary: Pull Request resolved: pytorch#49239 Context: the existing implementation of `quantized_input_idxs` is convert-only. Therefore, observers are inserted between the input and the first quantized node. This is a problem during QAT, because the initial input is a fake_quant, and it starts with scale=1 and zp=0. This does not match the quantization parameters of the graph input, which can lead to incorrect numerics. Fix: do not insert observer for a quantized input. Test Plan: ``` python test/test_quantization.py TestQuantizeFx ``` Imported from OSS Reviewed By: jerryzh168 Differential Revision: D25499486 fbshipit-source-id: 303b49cc9d95a9fd06fef3b0859c08be34e19d8a
Configuration menu - View commit details
-
Copy full SHA for f7a7355 - Browse repository at this point
Copy the full SHA f7a7355View commit details -
fx quant: fix fq when input is quantized and node does not need fq (p…
…ytorch#49382) Summary: Pull Request resolved: pytorch#49382 Fixes an edge case. If the input to the graph is quantized and the first node does not need activation observation, makes sure that the observer is not inserted. Test Plan: ``` python test/test_quantization.py TestQuantizeFxOps.test_int8_input_no_unnecessary_fq ``` Imported from OSS Reviewed By: jerryzh168 Differential Revision: D25551041 fbshipit-source-id: a6cba235c63ca7f6856e4128af7c1dc7fa0085ea
Configuration menu - View commit details
-
Copy full SHA for f604f1b - Browse repository at this point
Copy the full SHA f604f1bView commit details -
fx quant: make sure observer is inserted before a quantized output (p…
…ytorch#49420) Summary: Pull Request resolved: pytorch#49420 Before: if an output was marked as quantized, it could actually not be quantized, if the previous node was not quantized. After: if an output was marked as quantized, it will be quantized regardless of the quantization status of the previous node. Test Plan: ``` python test/test_quantization.py TestQuantizeFxOps.test_quant_output_always_observed ``` Imported from OSS Reviewed By: jerryzh168 Differential Revision: D25566834 fbshipit-source-id: 84755a1605fd3847edd03a7887ab9f635498c05c
Configuration menu - View commit details
-
Copy full SHA for b7a36d0 - Browse repository at this point
Copy the full SHA b7a36d0View commit details -
add files to SLOW_TESTS for target determinator (pytorch#49500)
Summary: - test_torch was split into 6 in pytorch#47356. - also test_linalg has 10 slowtest marking. Pull Request resolved: pytorch#49500 Reviewed By: ezyang, malfet Differential Revision: D25598085 Pulled By: walterddr fbshipit-source-id: 74b0b433897721db86c00e236d1dd925d7a6d3d0
Rong Rong (AI Infra) authored and hwangdeyu committedDec 23, 2020 Configuration menu - View commit details
-
Copy full SHA for 1aa640b - Browse repository at this point
Copy the full SHA 1aa640bView commit details -
[reland] Support torch.distributed.irecv(src=None, ...) (pytorch#49383)
Summary: Pull Request resolved: pytorch#49383 Reland of pytorch#47137 ghstack-source-id: 118735407 Test Plan: waitforbuildbot Reviewed By: osalpekar Differential Revision: D25551910 fbshipit-source-id: 2e1f2f77e7c69204056dfe6ed178e8ad7650ab32
Configuration menu - View commit details
-
Copy full SHA for 5aed6b3 - Browse repository at this point
Copy the full SHA 5aed6b3View commit details -
Set caffe2::pthreadpool() size in ParallelOpenMP (pytorch#45566)
Summary: Addresses pytorch#45418. This is probably not the best solution, but it's a rebase of the solution we're considering until pytorch#45418 is solved. If you can outline a better one I'm willing to implement it (: Pull Request resolved: pytorch#45566 Reviewed By: ezyang Differential Revision: D24621568 Pulled By: glaringlee fbshipit-source-id: 89dad5c61d8b5c26984d401551a1fe29df1ead04
Configuration menu - View commit details
-
Copy full SHA for 46971a5 - Browse repository at this point
Copy the full SHA 46971a5View commit details -
Add torch._foreach_zero_ API (pytorch#47286)
Summary: **In this PR** - add `_foreach_zero_` API - Update all optimizers under /_multi_tensor/ to use `_foreach_zero_` in `zero_grad` method Performance improvement ----------------- OP: zero_ ----------------- for-loop: 630.36 us foreach: 90.84 us script ``` import torch import torch.optim as optim import torch.nn as nn import torchvision import torch.utils.benchmark as benchmark_utils inputs = [torch.rand(3, 200, 200, device="cuda") for _ in range(100)] def main(): for op in [ "zero_" ]: print("\n\n----------------- OP: ", op, " -----------------") stmt = "[torch.{op}(t) for t in inputs]" timer = benchmark_utils.Timer( stmt=stmt.format(op = op), globals=globals(), label="str(optimizer)", ) print(f"autorange:\n{timer.blocked_autorange()}\n\n") stmt = "torch._foreach_{op}(inputs)" timer_mta = benchmark_utils.Timer( stmt=stmt.format(op = op), globals=globals(), label="str(optimizer_mta)", ) print(f"autorange:\n{timer_mta.blocked_autorange()}\n\n") if __name__ == "__main__": main() ``` **TODO** - Refactor zero_grad once foreach APIs are stable. **Tested** via unit tests Pull Request resolved: pytorch#47286 Reviewed By: ngimel Differential Revision: D24706240 Pulled By: izdeby fbshipit-source-id: aac69d6d134d65126ae8e5916f3627b73d8a94bf
Iurii Zdebskyi authored and hwangdeyu committedDec 23, 2020 Configuration menu - View commit details
-
Copy full SHA for 6a59ef2 - Browse repository at this point
Copy the full SHA 6a59ef2View commit details -
Bring back math_silu_backward which works for all backends. (pytorch#…
…49439) Summary: Pull Request resolved: pytorch#49439 Test Plan: Imported from OSS Reviewed By: nikithamalgifb, ngimel Differential Revision: D25594129 Pulled By: ailzhang fbshipit-source-id: 627bbea9ba478ee3a8edcc6695abab6431900192
Ailing Zhang authored and hwangdeyu committedDec 23, 2020 Configuration menu - View commit details
-
Copy full SHA for 3b1186d - Browse repository at this point
Copy the full SHA 3b1186dView commit details -
[quant][be] Add typing for quantization_mappings.py (pytorch#49179)
Summary: Pull Request resolved: pytorch#49179 Test Plan: Imported from OSS Reviewed By: vkuzo, wat3rBro Differential Revision: D25470520 fbshipit-source-id: 16e35fec9a5f3339860bd2305ae8ffdd8e2dfaf7
Configuration menu - View commit details
-
Copy full SHA for 99ba415 - Browse repository at this point
Copy the full SHA 99ba415View commit details -
Add BFloat16 support for isinf and isfinite (pytorch#49356)
Summary: Also fix some tests. Pull Request resolved: pytorch#49356 Reviewed By: mruberry Differential Revision: D25604364 Pulled By: ngimel fbshipit-source-id: 9efdd83aaa96cacc66e9689db9f9d8c24175a693
Configuration menu - View commit details
-
Copy full SHA for 5494a81 - Browse repository at this point
Copy the full SHA 5494a81View commit details -
Change aten::native_layer_norm signature to match torch.layer_norm de…
…finition (pytorch#48971) Summary: This PR is to change the `aten::native_layer_norm` and `aten::native_layer_norm_backward` signature to match `torch.layer_norm` definition. The current definition doesn't provide enough information to the PyTorch JIT to fuse layer_norm during training. `native_layer_norm(X, gamma, beta, M, N, eps)` => `native_layer_norm(input, normalized_shape, weight, bias, eps)` `native_layer_norm_backward(dY, X, mean, rstd, gamma, M, N, grad_input_mask)` => `native_layer_norm_backward(dY, input, normalized_shape, mean, rstd, weight, bias, grad_input_mask)` Pull Request resolved: pytorch#48971 Reviewed By: izdeby Differential Revision: D25574070 Pulled By: ngimel fbshipit-source-id: 23e2804295a95bda3f1ca6b41a1e4c5a3d4d31b4
Configuration menu - View commit details
-
Copy full SHA for 276e68e - Browse repository at this point
Copy the full SHA 276e68eView commit details -
Adding fix for invalid annotation types for dictionary (pytorch#49425)
Summary: Fixes pytorch#49362 **Summary:** This PR fixes the issue where invalid annotation types are used for a dictionary. Unsupported assertion message is generated for all invalid annotations **Test Case**: python test/test_jit.py TestJit.test_dict_invalid_annotations Pull Request resolved: pytorch#49425 Reviewed By: navahgar Differential Revision: D25601578 Pulled By: nikithamalgifb fbshipit-source-id: 91633e3d0891bdcb5402f044a74d02fe352ecd6f
Configuration menu - View commit details
-
Copy full SHA for 54636e1 - Browse repository at this point
Copy the full SHA 54636e1View commit details -
[pt] fuse ClipRangesGatherSigridHash (pytorch#49181)
Summary: Pull Request resolved: pytorch#49181 Fuse ClipRangesGatherSigridHash Test Plan: ``` MKL_NUM_THREADS=1 OMP_NUM_THREADS=1 numactl -m 0 -C 3 ./buck-out/opt/gen/caffe2/caffe2/fb/predictor/ptvsc2_predictor_bench --scripted_model=/data/users/ansha/tmp/adindexer/merge/traced_merge_dper_fixes.pt --pt_inputs=/data/users/ansha/tmp/adindexer/merge/container_precomputation_bs1.pt --iters=30000 --warmup_iters=10000 --num_threads=1 --pred_net=/data/users/ansha/tmp/adindexer/precomputation_merge_net.pb --c2_inputs=/data/users/ansha/tmp/adindexer/merge/c2_inputs_precomputation_bs1.pb --c2_sigrid_transforms_opt=1 --c2_use_memonger=1 --c2_weights=/data/users/ansha/tmp/adindexer/merge/c2_weights_precomputation.pb --pt_enable_static_runtime --pt_cleanup_activations=true --pt_enable_out_variant=true --do_profile --compare_results ``` Verify op fused: Node #3: 0.00104917 ms/iter, %173 : Tensor, %174 : Tensor = fb::clip_ranges_gather_sigrid_hash_offsets(%75, %76, %39, %40, %41, %38, %26) Before: 0.0919786 After: 0.0911792 Reviewed By: hlu1 Differential Revision: D25468225 fbshipit-source-id: 36bd91c140eaa57cb42cdaad46d878b94f162a9d
Configuration menu - View commit details
-
Copy full SHA for c18bc82 - Browse repository at this point
Copy the full SHA c18bc82View commit details -
Revert D25574962: [pytorch][PR] Updated derivative rules for complex …
…svd and pinverse Test Plan: revert-hammer Differential Revision: D25574962 (pytorch@9955355) Original commit changeset: 832b61303e88 fbshipit-source-id: d73f77f3e51b0f535dad6d21c5bebf8d41a6bfbd
Mike Ruberry authored and hwangdeyu committedDec 23, 2020 Configuration menu - View commit details
-
Copy full SHA for 2e3adbd - Browse repository at this point
Copy the full SHA 2e3adbdView commit details -
Remove set_quantizer_ from native_functions.yaml (pytorch#49463)
Summary: Pull Request resolved: pytorch#49463 set_quantizer_ takes a ConstQuantizerPtr argument, which is neither supported by JIT nor by c10. Also, it doesn't get dispatched (CPU and CUDA have the same implementation) and it is excluded from python bindings generation. So there is no real reason why this needs to be in native_functions.yaml Removing it unblocks the migration to c10-fullness since this is an op that would have been hard to migrate. See https://fb.quip.com/QRtJAin66lPN ghstack-source-id: 118710663 Test Plan: waitforsandcastle Reviewed By: ezyang Differential Revision: D25587763 fbshipit-source-id: 8fab921f4c256c128d48d82dac731f04ec9bad92
Configuration menu - View commit details
-
Copy full SHA for 0a2ba5d - Browse repository at this point
Copy the full SHA 0a2ba5dView commit details -
[C2] Revive unsafe CoalesceOp (pytorch#49402)
Summary: Pull Request resolved: pytorch#49402 In cases of NCCLAllReduce operations there could be non-trivial overhead for launching cooperative kernels (especially in case of async execution of different parts of the model). This diff is reviving this operator to make it possible to fuse multiple operations into a single kernel. Test Plan: Unit-test. Used in a later diff. Reviewed By: xianjiec Differential Revision: D25531206 fbshipit-source-id: 64b1c161233a726f9e2868f1059316e42a8ea1fc
Configuration menu - View commit details
-
Copy full SHA for 87a4bc5 - Browse repository at this point
Copy the full SHA 87a4bc5View commit details -
[AutoAccept][Codemod][FBSourceClangFormatLinter] Daily `arc lint --ta…
…ke CLANGFORMAT` Reviewed By: zertosh Differential Revision: D25609974 fbshipit-source-id: 4db8f8100336a2f0f2af8bc7b960d3711a5d1d7d
generatedunixname89002005325676 authored and hwangdeyu committedDec 23, 2020 Configuration menu - View commit details
-
Copy full SHA for 9df6183 - Browse repository at this point
Copy the full SHA 9df6183View commit details -
PyLong_{As/From}{Long/UnsignedLong} lint checks (pytorch#49280)
Summary: Fixes pytorch#45581 Pull Request resolved: pytorch#49280 Reviewed By: mruberry Differential Revision: D25592330 Pulled By: ezyang fbshipit-source-id: 5c16d6aed88ad1feaa7f129b4cd44c0561be2de2
Configuration menu - View commit details
-
Copy full SHA for 728a912 - Browse repository at this point
Copy the full SHA 728a912View commit details -
[reland][quant][docs] Add fx graph mode quantization to quantization …
…docs (pytorch#49211) (pytorch#49515) Summary: Pull Request resolved: pytorch#49515 Test Plan: Imported from OSS Imported from OSS Reviewed By: vkuzo Differential Revision: D25601061 fbshipit-source-id: 74e917d57895e9b4131a01fdcea8df3e94322bec
Configuration menu - View commit details
-
Copy full SHA for b8c8d33 - Browse repository at this point
Copy the full SHA b8c8d33View commit details -
Refactor RPC matchBuiltInOp to get rid of exception swallowing (pytor…
…ch#49009) Summary: Pull Request resolved: pytorch#49009 As per the title, we should generally not have exception swalling and this commit makes it so that if there is a true error in JIT operator resolution, it is propagated back to the RPC callee and we don't silently swallow any other exceptions that may happen. Swallowing the exceptions previously resulted in hard to debug issues such as unexpected ops showing up in profiler, and flaky tests which were fixed by pytorch#41287 Added a unittest that validates the error that comes from `jit/pybind_utils.h`. ghstack-source-id: 118794661 Test Plan: CI Reviewed By: mrshenli Differential Revision: D25392905 fbshipit-source-id: 6f93251635740bcf902824548b2bc6f9249be5f0
Configuration menu - View commit details
-
Copy full SHA for 2853fa3 - Browse repository at this point
Copy the full SHA 2853fa3View commit details -
Revert D25105217: [pytorch][PR] Fix bad error message when int overflow
Test Plan: revert-hammer Differential Revision: D25105217 (pytorch@c675727) Original commit changeset: a5aa7c026694 fbshipit-source-id: ddb4c93f9317e1747def8842a8072c84776cd487
Configuration menu - View commit details
-
Copy full SHA for 0567619 - Browse repository at this point
Copy the full SHA 0567619View commit details -
Set is_non_overlapping_and_dense_ flag in OpaqueTensorImpl constructor (
pytorch#49470) Summary: Pull Request resolved: pytorch#49470 pytorch#48625 changes the default contiguous settings for `TensorImpl` causing the Vulkan backend to crash. Therefore, add argument that can set `is_non_overlapping_and_dense_` back to false for `OpaqueTensorImpl` constructor. Test Plan: Imported from OSS Reviewed By: AshkanAliabadi Differential Revision: D25592826 Pulled By: SS-JIA fbshipit-source-id: e5d9de9a733875cb00c0546a3bc3271e5c6e23a3
Configuration menu - View commit details
-
Copy full SHA for 83f6ad5 - Browse repository at this point
Copy the full SHA 83f6ad5View commit details -
Test distributed collectives profiling with Gloo on GPU (pytorch#49072)
Summary: Pull Request resolved: pytorch#49072 As per the title, we should enable these tests for Gloo when run on GPU and the profiler is enabled with `use_cuda=True`. Enabling ProcessGroupNCCL profiling test to work with `use_cuda=True` is being tracked in pytorch#48987. ghstack-source-id: 118789003 Test Plan: CI Reviewed By: mrshenli Differential Revision: D25388986 fbshipit-source-id: 664d922ac2e10c77299daebdc6d3c92bb70eb56e
Configuration menu - View commit details
-
Copy full SHA for 0deecfc - Browse repository at this point
Copy the full SHA 0deecfcView commit details -
Revert D25152559: T66557700 Support default argument values of a method
Test Plan: revert-hammer Differential Revision: D25152559 (pytorch@6bde0ca) Original commit changeset: bbf52f1fbdbf fbshipit-source-id: 592fdb3078b1ac86cd394adc6c1bfd6b10d829e1
Configuration menu - View commit details
-
Copy full SHA for 8d6bce8 - Browse repository at this point
Copy the full SHA 8d6bce8View commit details -
[te] Add fast log approximation based on sleef
Summary: This is a fast log implementations benchmark: ``` buck run mode/opt //caffe2/benchmarks/cpp/tensorexpr:tensorexpr_bench -c 'fbcode.caffe2_gpu_type=none' ``` Test Plan: buck test mode/no-gpu //caffe2/test/cpp/tensorexpr:tensorexpr -- *.fastLogFloat Reviewed By: bertmaher Differential Revision: D25445815 fbshipit-source-id: 20696eacd12a55e797f606f4a6dbbd94c9652888
Configuration menu - View commit details
-
Copy full SHA for e8b6219 - Browse repository at this point
Copy the full SHA e8b6219View commit details -
[quant][eagermode][fix] Fix quantization for DeQuantStub (pytorch#49428)
Summary: Pull Request resolved: pytorch#49428 Previously dequantstub will be swapped with nn.quantized.DeQuantize regardless of qconfig reason is we skipped attaching qconfig for DeQuantStub to avoid adding fake quantize module to it but the correct fix is to skip it in insert observers, this PR fixes the issue. Test Plan: Imported from OSS Reviewed By: vkuzo Differential Revision: D25569991 fbshipit-source-id: d44a08c6e64c7a49509687dc389b57de1cbb878c
Configuration menu - View commit details
-
Copy full SHA for 14bb5d0 - Browse repository at this point
Copy the full SHA 14bb5d0View commit details -
.github: Add action workflow to update S3 HTMLS (pytorch#49509)
Summary: Successful run: https://github.com/pytorch/pytorch/runs/1572315901 Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Pull Request resolved: pytorch#49509 Reviewed By: walterddr Differential Revision: D25619133 Pulled By: seemethere fbshipit-source-id: 092ab12535f3bf4fc85bbfc690d3f5b10a5f8791
Configuration menu - View commit details
-
Copy full SHA for cfd0951 - Browse repository at this point
Copy the full SHA cfd0951View commit details -
[FileStore] Implemented numKeys and Added Tests (pytorch#49556)
Summary: Pull Request resolved: pytorch#49556 Implemented the missing Store functionality (specifically numKeys) in the FileStore. Test Plan: Added both C++ and Python tests to verify functionality. Reviewed By: jiayisuse Differential Revision: D25619001 fbshipit-source-id: 9146d0da9e0903622be3035880f619bbb2cc3891
Configuration menu - View commit details
-
Copy full SHA for 1c90741 - Browse repository at this point
Copy the full SHA 1c90741View commit details -
[FileStore] Updating Docs to Reflect FileStore changes (pytorch#49557)
Summary: Pull Request resolved: pytorch#49557 Updating the PyTorch docs to reflect that FileStore now supported the num_keys API. Also included a note to describe the behavior of the API. Test Plan: build and rendered docs. Reviewed By: jiayisuse Differential Revision: D25619000 fbshipit-source-id: 6c660d7ceb32d1d61024df8394aff3fcd0b752c1
Configuration menu - View commit details
-
Copy full SHA for 9611cf3 - Browse repository at this point
Copy the full SHA 9611cf3View commit details -
Revert D25445815: [te] Add fast log approximation based on sleef
Test Plan: revert-hammer Differential Revision: D25445815 (pytorch@1329066) Original commit changeset: 20696eacd12a fbshipit-source-id: 38830a6abd16260d60e5dd9a5594e65736a9c782
Configuration menu - View commit details
-
Copy full SHA for ddddf93 - Browse repository at this point
Copy the full SHA ddddf93View commit details -
Add dict comprehension (pytorch#47774)
Summary: Pull Request resolved: pytorch#47774 Test Plan: Imported from OSS Reviewed By: pbelevich Differential Revision: D25615464 Pulled By: ansley fbshipit-source-id: 10bba6f70e812fa580cbbbf097e93de7142484cc
Ansley Ussery authored and hwangdeyu committedDec 23, 2020 Configuration menu - View commit details
-
Copy full SHA for 0e10eb7 - Browse repository at this point
Copy the full SHA 0e10eb7View commit details -
Revert D25547962: [PyTorch] Make tls_local_dispatch_key_set inlineabl…
…e (reapply) Test Plan: revert-hammer Differential Revision: D25547962 (pytorch@6f928a4) Original commit changeset: 58424b1da230 fbshipit-source-id: 10ff9f45f6587f67e1c88886f977930b4f7e396a
Mike Ruberry authored and hwangdeyu committedDec 23, 2020 Configuration menu - View commit details
-
Copy full SHA for 4e1b7d2 - Browse repository at this point
Copy the full SHA 4e1b7d2View commit details -
Revert D25546409: [PyTorch] Use .sizes() isntead of .size() in cat_se…
…rial_kernel_impl Test Plan: revert-hammer Differential Revision: D25546409 (pytorch@953f992) Original commit changeset: 196034716b6e fbshipit-source-id: 0e80f06a98c2842d2f11db7057ffcdcaea85f3bf
Mike Ruberry authored and hwangdeyu committedDec 23, 2020 Configuration menu - View commit details
-
Copy full SHA for 7c49006 - Browse repository at this point
Copy the full SHA 7c49006View commit details -
Revert D25545777: [PyTorch] Use .sizes() instead of .size() in _cat_o…
…ut_cpu Test Plan: revert-hammer Differential Revision: D25545777 (pytorch@c1879b5) Original commit changeset: b2714fac95c8 fbshipit-source-id: f534f8fc312943f1e6ba3d4029d6cf69b006aca8
Mike Ruberry authored and hwangdeyu committedDec 23, 2020 Configuration menu - View commit details
-
Copy full SHA for 917cdeb - Browse repository at this point
Copy the full SHA 917cdebView commit details -
Revert D25544731: [PyTorch] Avoid extra Tensor refcounting in _cat_ou…
…t_cpu Test Plan: revert-hammer Differential Revision: D25544731 (pytorch@1a05104) Original commit changeset: 7b9656d0371a fbshipit-source-id: 0f7ea74eca282cadf269bbd284d59650a431ed65
Mike Ruberry authored and hwangdeyu committedDec 23, 2020 Configuration menu - View commit details
-
Copy full SHA for c04718b - Browse repository at this point
Copy the full SHA c04718bView commit details -
Revert D25542799: [PyTorch] Merge CoinflipTLS into RecordFunctionTLS
Test Plan: revert-hammer Differential Revision: D25542799 (pytorch@9ce1df0) Original commit changeset: 310f9fd15710 fbshipit-source-id: 51777914422a560e94430a786c86f5de4007a00b
Mike Ruberry authored and hwangdeyu committedDec 23, 2020 Configuration menu - View commit details
-
Copy full SHA for 0ec5fb3 - Browse repository at this point
Copy the full SHA 0ec5fb3View commit details -
[te][reapply] Add fast log approximation based on sleef (pytorch#49575)
Summary: Pull Request resolved: pytorch#49575 This is a fast log implementations benchmark: ``` buck run mode/opt //caffe2/benchmarks/cpp/tensorexpr:tensorexpr_bench -c 'fbcode.caffe2_gpu_type=none' ``` Test Plan: buck test mode/no-gpu //caffe2/test/cpp/tensorexpr:tensorexpr -- *.fastLogFloat Reviewed By: bertmaher Differential Revision: D25627157 fbshipit-source-id: a4920f4f4005ce617d372b375e790ca966275cd9
Configuration menu - View commit details
-
Copy full SHA for 309d517 - Browse repository at this point
Copy the full SHA 309d517View commit details -
[ddp launch] solve zombie problem (pytorch#49305)
Summary: I was exhausted with needing to hunt down zombies when working with ddp launcher, so this PR solves the various zombie issues. This PR addresses 2 distinct zombie scenarios caused by ddp launch.py: 1. When the main process is killed, the child processes aren't killed and continue running 2. When any of the children processes dies (e.g. OOM), the rest of the children and the parent remain running, but really are stuck To solve these problems this PR switches from `wait` to `poll` and uses signal handlers. The main problem with `wait()` was that it's not async, and I was having a 2nd process OOM, and the code was stuck waiting for the first process to finish which will not happen since the first process is blocking now waiting for the 2nd process - a sort of deadlock. My 2nd card is smaller than the first one, so it occasionally OOMs. Using `asyncio` would probably be the cleanest solution, but as it's relatively new in python, perhaps polling is good enough. I wrote this little script to reproduce 2 problematic scenarios and a normal running setup, it does 3 different things according to the `--mode` arg - `oom` - causes the 2nd process to exit prematurely emulating OOM - `clean-finish` - just exit normally in both processes - `False` (lack of arg) just keep on running - emulating multiple normally running processes ``` # oom.py import argparse from time import sleep import sys def main(): parser = argparse.ArgumentParser() parser.add_argument("--local_rank", default=False, type=int) parser.add_argument("--mode", default=False, type=str) args, _ = parser.parse_known_args() print(f"{args.local_rank} is starting") sleep(3) if args.mode == "oom": # emulate OOM in 2nd card if args.local_rank == 1: raise RuntimeError("OOM") if args.mode == "clean-finish": sleep(1) print(f"{args.local_rank} is cleanly finishing") sys.exit(0) while (True): # emulate long running process print(f"{args.local_rank} is running") sleep(1) if __name__ == "__main__": main() ``` Let's begin: ### 1. Normal execution ``` python -m torch.distributed.launch --nproc_per_node=2 ./oom.py --mode=clean-finish ``` All the processes exit upon completion - I won't bother pasting the log here - just testing that my code didn't break the normal running ### 2. OOM ``` python -m torch.distributed.launch --nproc_per_node=2 ./oom.py --mode=oom ``` ``` POLLING FOR 17547 POLLING FOR 17548 0 0 is starting 1 1 is starting POLLING FOR 17547 POLLING FOR 17548 POLLING FOR 17548 POLLING FOR 17547 POLLING FOR 17547 POLLING FOR 17548 0 is running Traceback (most recent call last): File "./oom.py", line 33, in <module> main() File "./oom.py", line 20, in main raise RuntimeError("OOM") RuntimeError: OOM POLLING FOR 17548 process 17548 is no more Killing subprocess 17547 Killing subprocess 17548 Traceback (most recent call last): File "/home/stas/anaconda3/envs/main-38/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "/home/stas/anaconda3/envs/main-38/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/home/stas/anaconda3/envs/main-38/lib/python3.8/site-packages/torch/distributed/launch.py", line 341, in <module> main() File "/home/stas/anaconda3/envs/main-38/lib/python3.8/site-packages/torch/distributed/launch.py", line 327, in main sigkill_handler(signal.SIGTERM, None) # not coming back File "/home/stas/anaconda3/envs/main-38/lib/python3.8/site-packages/torch/distributed/launch.py", line 301, in sigkill_handler raise subprocess.CalledProcessError(returncode=last_return_code, cmd=cmd) subprocess.CalledProcessError: Command '['/home/stas/anaconda3/envs/main-38/bin/python', '-u', './oom.py', '--local_rank=1', '--mode=oom']' returned non-zero exit status 1. ``` All processes exited and the trace was printed ### 3. Exit on SIGINT/SIGTERM If I started a process and then realized I made a mistake I want to be able to kill it cleanly and if any sub-processes have already been spawned I want them to be killed too. Here the sighandler takes care of trapping the SIGTERM/SIGINT. ``` python -m torch.distributed.launch --nproc_per_node=2 ./oom.py ``` Here the processes emulate a long normal run. So let's Ctrl-C the process as soon as it started and see: ``` POLLING FOR 18749 POLLING FOR 18750 0 0 is starting 1 1 is starting POLLING FOR 18749 POLLING FOR 18750 POLLING FOR 18750 POLLING FOR 18749 POLLING FOR 18749 POLLING FOR 18750 0 is running 1 is running POLLING FOR 18750 POLLING FOR 18749 0 is running 1 is running ^CTraceback (most recent call last): Killing subprocess 18749 Traceback (most recent call last): File "./oom.py", line 33, in <module> File "./oom.py", line 33, in <module> Killing subprocess 18750 Parent got kill signal=SIGINT, exiting ``` all processes got killed -------------------------------- So this covered the 2 problematic cases and 1 normal case Notes: - we could probably switch to `sleep(3)` - `1` is probably too fast - all the debug prints will be removed once you are happy - I left them so that it's easier for you to test that my PR does the right thing. Thank you! Pull Request resolved: pytorch#49305 Reviewed By: izdeby Differential Revision: D25565617 Pulled By: rohan-varma fbshipit-source-id: 1ea864113f283d4daac5eef1131c8d745aae4c99
Configuration menu - View commit details
-
Copy full SHA for 723010e - Browse repository at this point
Copy the full SHA 723010eView commit details -
Add more list peephole idioms (pytorch#48268)
Summary: Pull Request resolved: pytorch#48268 Test Plan: Imported from OSS Reviewed By: jamesr66a Differential Revision: D25104617 Pulled By: eellison fbshipit-source-id: b41c03d5da6e9b88acf21a859f61c5c70608c150
Elias Ellison authored and hwangdeyu committedDec 23, 2020 Configuration menu - View commit details
-
Copy full SHA for 4c9c61e - Browse repository at this point
Copy the full SHA 4c9c61eView commit details -
disable concat nested namespace check (pytorch#49571)
Summary: Pull Request resolved: pytorch#49571 Disable nested namespace check since OSS standard is ``` set(CMAKE_CXX_STANDARD 14) ``` and its currently causing confusion on clang-tidy internally such as D25214452 Test Plan: clang-tidy Reviewed By: xuzhao9 Differential Revision: D25626392 fbshipit-source-id: 1fb472c89ebe9b83718ae27f2c1d77b8b2412b5e
Rong Rong (AI Infra) authored and hwangdeyu committedDec 23, 2020 Configuration menu - View commit details
-
Copy full SHA for 29e296d - Browse repository at this point
Copy the full SHA 29e296dView commit details -
Add type inference for dequantization.tensors (pytorch#49517)
Summary: Pull Request resolved: pytorch#49517 We should add concrete type info for Tensor List case as well. Test Plan: ci Reviewed By: qizzzh Differential Revision: D25599223 fbshipit-source-id: 3614e9ec25fc963a8d6a0bd641735fcca6c87032
Configuration menu - View commit details
-
Copy full SHA for f7ed11e - Browse repository at this point
Copy the full SHA f7ed11eView commit details -
FLOPS Roofline Analysis Feature for PyTorch Profiler. (pytorch#46506)
Summary: FLOPs Roofline Analysis Feature for PyTorch Profiler. Currently, PyTorch Profiler lacks the ability to measure the FLOPs of operators, such as mm and conv. FLOPs are helpful to estimate the computation complexity of the operators. For now, we use input shapes to estimate the number of floating pointer operations. In the future, we may compute this information by tracking hardware counters. Pull Request resolved: pytorch#46506 Test Plan: Run `python test/test_profiler_flops.py -k test_flops`. The test will print a profiler table with "FLOPS" column, like the following: ---------------------------- ------------ ------------ ------------ ------------ ------------ ------------ --------------------------------------------- ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg # of Calls Input Shapes MFLOPS ---------------------------- ------------ ------------ ------------ ------------ ------------ ------------ --------------------------------------------- ------------ aten::matmul 0.06% 57.653us 82.97% 79.310ms 79.310ms 1 [[40, 33, 1, 243], [243, 243]] -- aten::mm 82.84% 79.186ms 82.86% 79.204ms 79.204ms 1 [[1320, 243], [243, 243]] 984.323 aten::conv2d 0.04% 36.345us 16.06% 15.347ms 15.347ms 1 [[40, 16, 18, 260], [33, 16, 18, 18], [33], [ 44065010.318 aten::convolution 0.02% 16.016us 16.02% 15.310ms 15.310ms 1 [[40, 16, 18, 260], [33, 16, 18, 18], [33], [ -- aten::_convolution 0.07% 63.855us 16.00% 15.294ms 15.294ms 1 [[40, 16, 18, 260], [33, 16, 18, 18], [33], [ -- aten::mkldnn_convolution 15.89% 15.188ms 15.93% 15.225ms 15.225ms 1 [[40, 16, 18, 260], [33, 16, 18, 18], [33], [ -- aten::relu 0.10% 98.223us 0.64% 612.157us 306.079us 2 [[40, 33, 1, 243]] -- aten::threshold 0.49% 465.416us 0.54% 513.934us 256.967us 2 [[40, 33, 1, 243], [], []] -- aten::add_ 0.29% 279.301us 0.29% 279.301us 279.301us 1 [[40, 33, 1, 243], [243], []] -- aten::empty 0.10% 99.113us 0.10% 99.113us 24.778us 4 [[], [], [], [], [], []] -- ---------------------------- ------------ ------------ ------------ ------------ ------------ ------------ --------------------------------------------- ------------ Self CPU time total: 95.584ms . ---------------------------------------------------------------------- Ran 1 test in 0.176s For now, we only provide FLOPs calculation for aten::conv2d and aten::mm operators. Reviewed By: ezyang Differential Revision: D25214452 Pulled By: xuzhao9 fbshipit-source-id: 0ae841bd8dbdeb032346dc3d9d38e19875aa1da3
Configuration menu - View commit details
-
Copy full SHA for 1f570a0 - Browse repository at this point
Copy the full SHA 1f570a0View commit details -
Disables method variant grad and grad grad checks (pytorch#49576)
Summary: These are redundant with the functional variant checks and can be very costly, as some grad and gradgrad testing takes minutes to run per variant. Maybe in the future we'll add them back for operations with divergent method implementations. Pull Request resolved: pytorch#49576 Reviewed By: albanD, ngimel Differential Revision: D25631691 Pulled By: mruberry fbshipit-source-id: 247f750979d9dafab2454cdbfa992a2aa6da724a
Mike Ruberry authored and hwangdeyu committedDec 23, 2020 Configuration menu - View commit details
-
Copy full SHA for 8c28731 - Browse repository at this point
Copy the full SHA 8c28731View commit details -
Use store based barrier in init_process_group. (pytorch#49419)
Summary: Pull Request resolved: pytorch#49419 As described in pytorch#48110, the newly introduced `barrier()` in `init_process_group` messes up NCCL communicator state since it uses a bunch of default devices to perform an allreduce which simulates a barrier(). As a ressult, subsequent NCCL operations might not behave as expected. ghstack-source-id: 118861776 Test Plan: 1) unit test added. 2) waitforbuildbot Reviewed By: mrshenli Differential Revision: D25566550 fbshipit-source-id: ab083b67b634d7c515f4945deb228f959b27c936
Configuration menu - View commit details
-
Copy full SHA for 14c3255 - Browse repository at this point
Copy the full SHA 14c3255View commit details -
Fix CustomAutogradTest.ReentrantPriority rerun failures (pytorch#49581)
Summary: Clear static variable at the end of the test to ensure test passes after re-runs Pull Request resolved: pytorch#49581 Test Plan: `./bin/test_api "--gtest_filter=CustomAutogradTest.ReentrantPriority" --gtest_repeat=50` Before the change all subsequent runs of the test failed with ``` ../test/cpp/api/autograd.cpp:681: Failure Expected equality of these values: order.size() Which is: 310 10 ``` Reviewed By: mrshenli Differential Revision: D25632374 Pulled By: malfet fbshipit-source-id: 4814d22b5dff15e1b38a0187e51070771fd58370
Configuration menu - View commit details
-
Copy full SHA for ce7608a - Browse repository at this point
Copy the full SHA ce7608aView commit details -
Set USE_KINETO=1 (pytorch#49201)
Summary: Pull Request resolved: pytorch#49201 This unblocks kineto profiler for 1.8 release. This PR supercedes pytorch#48391 Note: this will somewhat increase the size of linux server binaries, bc we add libkineto.a and libcupti_static.a: -rw-r--r-- 1 jenkins jenkins 1107502 Dec 10 21:16 build/lib/libkineto.a -rw-r--r-- 1 root root 13699658 Nov 13 2019 /usr/local/cuda/lib64/libcupti_static.a Test Plan: CI pytorch#48391 Imported from OSS Reviewed By: ngimel Differential Revision: D25480770 fbshipit-source-id: 037cd774f5547d9918d6055ef5cc952a54e48e4c
Ilia Cherniavskii authored and hwangdeyu committedDec 23, 2020 Configuration menu - View commit details
-
Copy full SHA for a90a450 - Browse repository at this point
Copy the full SHA a90a450View commit details -
Revert D25480770: Set USE_KINETO=1
Test Plan: revert-hammer Differential Revision: D25480770 (pytorch@1a92802) Original commit changeset: 037cd774f554 fbshipit-source-id: 6a6062195033ca91fcc0cfa1e890e47efc774ac1
Ilia Cherniavskii authored and hwangdeyu committedDec 23, 2020 Configuration menu - View commit details
-
Copy full SHA for 0f3059d - Browse repository at this point
Copy the full SHA 0f3059dView commit details -
Support integral types for kAbs in SimpleIREvaluator (pytorch#49357)
Summary: Pull Request resolved: pytorch#49357 This is a follow-up fix for PR pytorch#48679, where the previous PR adds support for integer inputs to aten::abs by promoting integers to float and then demote the result back to integers. This PR supports integer inputs to aten::abs more efficiently in the SimpleIREvaluator by allowing implementing integer inputs for kAbs (renamed from kFabs). - Rename kFabs to kAbs - Add support for integer input to kAbs in SimpleIREvalator (note that: llvm_codegen and cuda_codegen already supports integer inputs to kAbs) Test Plan: - `PYTORCH_TENSOREXPR_DONT_USE_LLVM=1 python test/test_jit_fuser_te.py TestTEFuser.test_unary_ops` - `python test/test_jit_fuser_te.py TestTEFuser.test_unary_ops` Imported from OSS Reviewed By: eellison Differential Revision: D25545791 fbshipit-source-id: e52f51a352d149f66ce8341fb3beb479be08a230
Peng Wu authored and hwangdeyu committedDec 23, 2020 Configuration menu - View commit details
-
Copy full SHA for 4fc5d14 - Browse repository at this point
Copy the full SHA 4fc5d14View commit details -
Add op bench for caffe2 quantile op (pytorch#49598)
Summary: Pull Request resolved: pytorch#49598 Add op bench for caffe2 quantile op Test Plan: `buck run mode/opt caffe2/benchmarks/operator_benchmark/c2:quantile_op_test -- --wramup_iterations=10000 --iterations=10000` Reviewed By: radkris-git Differential Revision: D25590085 fbshipit-source-id: 0db58ac87c595b2bf2958f6299a1bf2ccea019db
Configuration menu - View commit details
-
Copy full SHA for f6d0b3c - Browse repository at this point
Copy the full SHA f6d0b3cView commit details -
add checkout PR tip step for quick checks (pytorch#49590)
Summary: Pull Request resolved: pytorch#49590 Reviewed By: samestep Differential Revision: D25633341 Pulled By: walterddr fbshipit-source-id: 6e8db1f628f562d7632390bdb7788437cb1bf63d
Rong Rong (AI Infra) authored and hwangdeyu committedDec 23, 2020 Configuration menu - View commit details
-
Copy full SHA for 25d77c8 - Browse repository at this point
Copy the full SHA 25d77c8View commit details -
Refactor VmapPhysicalView::newLogicalToPhysical (pytorch#49482)
Summary: Pull Request resolved: pytorch#49482 Motivation ========== Batching rules always invoke newLogicalToPhysical at the very end to turn a physical tensor into a logical BatchedTensor (an example is below): ``` Tensor select_backward_batching_rule(const Tensor& grad, IntArrayRef input_sizes, int64_t dim, int64_t index) { auto grad_physical = MultiBatchVmapTransform::logicalToPhysical(grad); auto grad_input = at::zeros(grad_physical.getPhysicalShape(input_sizes), grad.options()); auto physical_dim = getGradInputPhysicalDim(dim, input_sizes, grad_physical.numBatchDims()); grad_input.select(physical_dim, index).copy_(grad_physical.tensor()); return grad_physical.newLogicalFromPhysical(grad_input); } ``` However, albanD noted that this function is confusing and ambiguous because it's unclear which physical tensor is being turned into the logical (in this case, grad_physical is a VmapPhysicalView, but we're really transforming grad_input and returning it). pytorch#44505 (comment) I didn't want to make too many changes to the batching rule API because I think we'll change it even more in the future, but this PR attempts to remove the ambiguity by applying one of the suggestions in pytorch#44505 (comment) This PR ======= The diagnosis of the problem is that we were conflating "VmapPhysicalView", which maps logical attributes on a Tensor (like dimension and shape) to physical attributes, with the reverse physical-to-logical map. This PR creates a new VmapPhysicalToLogicalMap object that handles the latter. Instead of calling `grad_physical.newLogicalFromPhysical(grad_input)`, an author of batching rules should now retrieve the VmapPhysicalToLogicalMap object and apply it to their physical input. So the above code becomes: ``` grad_physical.getPhysicalToLogicalMap().apply(grad_input) ``` I've also moved VmapPhysicalView::makeLogicalFromPhysicalListInplace to VmapPhysicalToLogicalMap::applyInplace. Test Plan ========= wait for tests Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D25592645 Pulled By: zou3519 fbshipit-source-id: 9c6ede9901ec6b70e5763193064658a8f91e6d48
Configuration menu - View commit details
-
Copy full SHA for 41dbb0e - Browse repository at this point
Copy the full SHA 41dbb0eView commit details -
fixed the first line of torch.rst to match the __init__.py file's fir…
…st line (pytorch#49584) Summary: Changed the first line of the torch.rst file to match that of the __init__.py file Fixes pytorch#49228 Pull Request resolved: pytorch#49584 Reviewed By: VitalyFedyunin Differential Revision: D25639260 Pulled By: mrshenli fbshipit-source-id: a0bafd945ff92115eed932662feedc46d29dfaab
Configuration menu - View commit details
-
Copy full SHA for 2aa7bd0 - Browse repository at this point
Copy the full SHA 2aa7bd0View commit details -
Fix Module backward hooks for all Tensor inputs/outputs (pytorch#46163)
Summary: Fixes pytorch#598 This is BC-breaking as we now explicitly don't call the hook when there are not Tensors at the top level of the output. This feature was not working anyways as the returned grad_input/grad_output were wrong (not respecting the output structure and wrong inputs for multi-Node Module). This is also BC-breaking as we now report the correct gradients for `nn.Module`s that contain multiple autograd `Node`s while we use to return bad results before. Pull Request resolved: pytorch#46163 Reviewed By: ailzhang, mruberry Differential Revision: D24894180 Pulled By: albanD fbshipit-source-id: e1b5d193d2818eb2f51e2a2722c7405c8bd13c2b
Configuration menu - View commit details
-
Copy full SHA for e2bc618 - Browse repository at this point
Copy the full SHA e2bc618View commit details -
Remove deadlines for Caffe2 hypothesis_test when running on GPU. (pyt…
…orch#49591) Summary: Pull Request resolved: pytorch#49591 A bunch of these tests are marked flaky, and have been since time immemorial. (Read: as far back as Buck will build.) However closer inspection reveals that they fail if and only if run on a GPU worker. What seems to be going on is that there are more jobs than GPUs, so the contention causes waits which registers as timeouts on the test. This diff is kind of hacky, but it basically just drops deadlines if a GPU is present. Because Caffe2 is going away I'm not too terribly concerned about a beautiful solution, but we may as well keep some test coverage if it's easy. CC Sebastian, Ilia, Min, and Hongzheng who also have tasks for what seems to be the same flakiness. Test Plan: Turn the tests back on and see if they fall over. (The failure repros reliably on an OnDemand GPU and is fixed by this change, so it's not really just a hail Mary.) Reviewed By: ngimel Differential Revision: D25632981 fbshipit-source-id: 43dcce416fea916ba91f891e9e5b59b2c11cca1a
Taylor Robie authored and hwangdeyu committedDec 23, 2020 Configuration menu - View commit details
-
Copy full SHA for e253a31 - Browse repository at this point
Copy the full SHA e253a31View commit details -
[FX] Enforce args is tuple and kwargs is dict (pytorch#49526)
Summary: Pull Request resolved: pytorch#49526 Test Plan: Imported from OSS Reviewed By: Chillee Differential Revision: D25606115 Pulled By: jamesr66a fbshipit-source-id: f2a21d02a2cf8c08cbd618efc5a6a28d34806851
James Reed authored and hwangdeyu committedDec 23, 2020 Configuration menu - View commit details
-
Copy full SHA for a7d4333 - Browse repository at this point
Copy the full SHA a7d4333View commit details -
Renaming CAFFE2_API to TORCH_API (pytorch#49496)
Summary: Since caffe2 and torch have been consolidated, CAFFE2_API should be merged with TORCH_API. Addresses a TODO. Manually edited some references of the removed `CAFFE2_API`: * `CONTRIBUTING.md` * `caffe2/proto/CMakeLists.txt` * `cmake/ProtoBuf.cmake` * `c10/macros/Export.h` * `torch/csrc/WindowsTorchApiMacro.h` Pull Request resolved: pytorch#49496 Reviewed By: malfet, samestep Differential Revision: D25600726 Pulled By: janeyx99 fbshipit-source-id: 7e068d959e397ac183c097d7e9a9afeca5ddd782
Configuration menu - View commit details
-
Copy full SHA for efb851b - Browse repository at this point
Copy the full SHA efb851bView commit details -
[PyTorch Mobile] Export Operator List from Mobile CompilationUnit ins…
…tead of from TorchScript Model (pytorch#49385) Summary: Pull Request resolved: pytorch#49385 Currently, the API to export operator lists accepts a `torch::jit::Module` object, and spits out an operator list. The operator list is practically used only for mobile. This is not ideal because the set of root operators may change by the time the model is subsequently optmized and exported for mobile. What we need to to instead is glean the list of operators from the mobile model itself (`bytecode.pkl` specifically), and expose that instead. Also updated the logic in `converter`. ### Before this change: 1. Get operator List from Torch Script Model 2. Convert to bytecode mobile model ### After this change: 1. Convert to bytecode mobile model 2. Use this converted mobile model to get the list of operators for each method on the model ghstack-source-id: 118796752 Test Plan: Added a unit test in `test_lite_interpreter.cpp` to ensure that all model referenced operators show up in the exported operator list. Also make `test_lite_interpreter.cpp` runnable from `xplat/caffe2/BUCK` since this is where the production code will be built from. Verified that the list of operators produced before and after this change for an example model (segmentation) are the same. {P147863234} Also verified that the operator lists for BI-Xray model is different (we have been having problems with missing operators for this one): {P154903132} Reviewed By: iseeyuan Differential Revision: D24690094 fbshipit-source-id: 0426a6ef90456a811010cfe337c415882ae2deff
Configuration menu - View commit details
-
Copy full SHA for 2e88a18 - Browse repository at this point
Copy the full SHA 2e88a18View commit details -
New profiler API (pytorch#48280)
Summary: Pull Request resolved: pytorch#48280 Adding new API for the kineto profiler that supports enable predicate function Test Plan: unit test Reviewed By: ngimel Differential Revision: D25142220 Pulled By: ilia-cher fbshipit-source-id: c57fa42855895075328733d7379eaf3dc1743d14
Ilia Cherniavskii authored and hwangdeyu committedDec 23, 2020 Configuration menu - View commit details
-
Copy full SHA for 422e2d1 - Browse repository at this point
Copy the full SHA 422e2d1View commit details -
Adding support for bitwise augassignment operators (pytorch#44621)
Summary: ======== Fixes #{42915} This commit adds support for Bitwise Shorthands in TorchScript, i.e : |=,&=,^=,<<=,>>=,**= Testing: ====== This commit also adds test for the above fix in test_jit.py The test can be invoked by pytest -k augassign test/test_jit.py Here is a snapshot of the testing: <img width="1238" alt="image" src="https://user-images.githubusercontent.com/70345919/93105141-8f9f5300-f663-11ea-836b-3b52da6d2be5.png"> Pull Request resolved: pytorch#44621 Reviewed By: mrshenli Differential Revision: D23906344 Pulled By: nikithamalgifb fbshipit-source-id: 4c93a7430a625f698b163609ccec15e51417d564
Configuration menu - View commit details
-
Copy full SHA for d770127 - Browse repository at this point
Copy the full SHA d770127View commit details -
Test pipeline parallelism works with DDP. (pytorch#48470)
Summary: Pull Request resolved: pytorch#48470 Adding a unit test to test this works as expected. Although, this doesn't work with other checkpointing modes of the pipe and checkpoint=never needs to be set for this to work. ghstack-source-id: 118820806 Test Plan: waitforbuildbot Reviewed By: mrshenli Differential Revision: D25182668 fbshipit-source-id: 85e69e338bf388c132a303ad93e29ec2cc4a0ed8
Configuration menu - View commit details
-
Copy full SHA for 3f2a6c5 - Browse repository at this point
Copy the full SHA 3f2a6c5View commit details -
[FX] Emit named tuple construction node when NamedTuple appears as an…
… arg (pytorch#49553) Summary: Pull Request resolved: pytorch#49553 Test Plan: Imported from OSS Reviewed By: zdevito Differential Revision: D25618577 Pulled By: jamesr66a fbshipit-source-id: 042f742f9ca02e59bbceda97bfcf47f9bac07873
James Reed authored and hwangdeyu committedDec 23, 2020 Configuration menu - View commit details
-
Copy full SHA for 7e483a7 - Browse repository at this point
Copy the full SHA 7e483a7View commit details -
[package] implicitly extern stdlib before mocking (pytorch#49306)
Summary: Pull Request resolved: pytorch#49306 This allows you to mock out everything except for specific patterns while still correctly externing the python standard library. This makes it less likely that you will need to override require_module. Test Plan: Imported from OSS Reviewed By: suo Differential Revision: D25526212 Pulled By: zdevito fbshipit-source-id: 7339f4c7f12af883496f79de95e57d452bb32dc2
Configuration menu - View commit details
-
Copy full SHA for e530504 - Browse repository at this point
Copy the full SHA e530504View commit details -
Upload test times to S3 (pytorch#49190)
Summary: This PR currently just modifies the `test/print_test_stats.py` script (run in the `pytorch_linux_test` job) so that now it uploads test times to the new `ossci-metrics` S3 bucket (rather than just to Scribe) if passed the `--upload-to-s3` parameter. The next step is to add an additional step to that `pytorch_linux_test` job which checks if it's being run on a PR, and if so, finds the `master` commit to compare against (similar to what's done in the now-unused `.jenkins/pytorch/short-perf-test-{c,g}pu.sh` scripts) and adds test time info to the Dr CI comment if the PR is significantly different from the base revision. Pull Request resolved: pytorch#49190 Test Plan: An "integration test" would be to just look in [the `ossci-metrics` S3 bucket](https://s3.console.aws.amazon.com/s3/buckets/ossci-metrics) to confirm that the CI run(s) for this PR did indeed upload their test time data successfully. To test this locally, first make sure you have all the packages you need, such as these: ``` $ conda install -c anaconda boto3 $ conda install -c conda-forge unittest-xml-reporting ``` Then run whatever tests you want; these are the ones I used for my local smoke test, for no particular reason: ``` $ python test/test_spectral_ops.py --save-xml=/tmp/reports/spectral_ops ``` Once the tests finish, run the script to upload their times to S3: ``` $ CIRCLE_SHA1="$(git rev-parse HEAD)" CIRCLE_JOB=foo test/print_test_stats.py --upload-to-s3 /tmp/reports/spectral_ops ``` Now check that they uploaded successfully: ``` $ aws s3 cp "s3://ossci-metrics/test_time/$(git rev-parse HEAD)/foo/" /tmp/reports --recursive ``` And that it's a valid `*.json.bz2` file: ``` $ bzip2 -kdc /tmp/reports/*Z.json.bz2 | jq . | head -n21 { "build_pr": null, "build_tag": null, "build_sha1": "e46f43621b910bc2f18dd33c08f5af18a542d5ed", "build_branch": null, "build_job": "foo", "build_workflow_id": null, "total_seconds": 0.9640000000000003, "suites": { "TestFFTCPU": { "total_seconds": 0.9640000000000003, "cases": [ { "name": "test_fft_invalid_dtypes_cpu", "seconds": 0.022, "errored": false, "failed": false, "skipped": false }, { "name": "test_istft_throws_cpu", ``` Reviewed By: walterddr, malfet Differential Revision: D25618035 Pulled By: samestep fbshipit-source-id: 4d8013859a38a49e5bba700c5134951ca1a9d8b7
Configuration menu - View commit details
-
Copy full SHA for 113ca4d - Browse repository at this point
Copy the full SHA 113ca4dView commit details -
Cleanup APIs for pipeline parallelism. (pytorch#48630)
Summary: Pull Request resolved: pytorch#48630 1) Make torch.distributed.pipeline package public. 2) Make several helper methods private. ghstack-source-id: 118820803 Test Plan: waitforbuildbot Reviewed By: rohan-varma Differential Revision: D25235688 fbshipit-source-id: c32833ebf090ddbd4eaf06fcb5e3f9d421623a60
Configuration menu - View commit details
-
Copy full SHA for c4d42b4 - Browse repository at this point
Copy the full SHA c4d42b4View commit details -
[torchscript] Fix constant propagation schemas (pytorch#49605)
Summary: Pull Request resolved: pytorch#49605 Test Plan: Imported from OSS Reviewed By: eellison Differential Revision: D25643157 Pulled By: IvanKobzarev fbshipit-source-id: c5440622f6cf559afadca853e1eb7a9fbb8edf7f
Configuration menu - View commit details
-
Copy full SHA for 8bef7b7 - Browse repository at this point
Copy the full SHA 8bef7b7View commit details -
Add sinc operator (pytorch#48740)
Summary: Implements the sinc operator. See https://numpy.org/doc/stable/reference/generated/numpy.sinc.html ![image](https://user-images.githubusercontent.com/13428986/101653855-cdffa080-3a0d-11eb-8426-ecc81c152ebd.png) Pull Request resolved: pytorch#48740 Reviewed By: ezyang Differential Revision: D25597565 Pulled By: soulitzer fbshipit-source-id: 6dbcf282ee4eba34930bc9e5c85c0c5e79cf0322
Configuration menu - View commit details
-
Copy full SHA for 0839efa - Browse repository at this point
Copy the full SHA 0839efaView commit details -
Output stacks (support for SVG visualization) (pytorch#48438)
Summary: Pull Request resolved: pytorch#48438 Outputting stacks in a format suitable for SVG vizualization (e.g. with https://github.com/brendangregg/FlameGraph tool) Test Plan: python test/test_profiler.py -k test_export_stacks e.g. resnet18 (note: actual SVG is interactive): <img width="1193" alt="Screen Shot 2020-11-24 at 7 06 27 PM" src="https://user-images.githubusercontent.com/30845429/100178160-397f3500-2e88-11eb-81c4-34b19c5fcb87.png"> Reviewed By: dzhulgakov Differential Revision: D25174270 Pulled By: ilia-cher fbshipit-source-id: 6b60084071b209441805c468f5ff777318e42d1a
Ilia Cherniavskii authored and hwangdeyu committedDec 23, 2020 Configuration menu - View commit details
-
Copy full SHA for 3086f7f - Browse repository at this point
Copy the full SHA 3086f7fView commit details -
torch.reciprocal
: promote integer inputs to float (pytorch#49102)Summary: Fixes pytorch#49091 Pull Request resolved: pytorch#49102 Reviewed By: VitalyFedyunin Differential Revision: D25639541 Pulled By: soulitzer fbshipit-source-id: 1dd360bd7b77f106d606143d8d3961610bac8cb7
Configuration menu - View commit details
-
Copy full SHA for c5e477a - Browse repository at this point
Copy the full SHA c5e477aView commit details -
[NNC] Disable masked fill (pytorch#49622)
Summary: There's a bug internally, disable as quick fix before investigation Pull Request resolved: pytorch#49622 Test Plan: Imported from GitHub, without a `Test Plan:` line. build Reviewed By: zheng-xq, PursueHappinessDirectly Differential Revision: D25651897 Pulled By: eellison fbshipit-source-id: dd1454f2ef7506d7844016128aa6320d7e69aa6e
Elias Ellison authored and hwangdeyu committedDec 23, 2020 Configuration menu - View commit details
-
Copy full SHA for 55296c4 - Browse repository at this point
Copy the full SHA 55296c4View commit details -
[Issue pytorch#46210] added torch.fx.len() to provide support for len…
…(); added a test case for torch.fx.len() (pytorch#49532) Summary: Pull Request resolved: pytorch#49532 Test Plan: Imported from OSS Reviewed By: jamesr66a Differential Revision: D25608804 Pulled By: huiguoo fbshipit-source-id: 93ac02ab57db5d200d92443062286c34782ec0ef
Configuration menu - View commit details
-
Copy full SHA for 65b8aa3 - Browse repository at this point
Copy the full SHA 65b8aa3View commit details -
Inline coverage report combining/reporting (pytorch#49615)
Summary: Instead of calling coverage frontend import coverage module and call combine() and html_report() Fixes pytorch#49596 by not using a strict mode when combining those reports Pull Request resolved: pytorch#49615 Reviewed By: seemethere Differential Revision: D25645196 Pulled By: malfet fbshipit-source-id: be55b5c23a3569a331cbdf3f86d8c89bc27d5fe1
Configuration menu - View commit details
-
Copy full SHA for c8968bf - Browse repository at this point
Copy the full SHA c8968bfView commit details -
[Gradient Compression] Implement the original layerwise PowerSGD (pyt…
…orch#49417) Summary: Pull Request resolved: pytorch#49417 The existing implementation applies PowerSGD to a batch of flattened tensors, which is a coarse-grained compression. This hook now is renamed as "batched_powerSGD_hook". Now implement the original implementation in the paper, which applies PowerSGD to each per-parameter tensor. This is a layerwise fine-grained compression. Although this original implementation is slower, it is expected to achieve a higher accuracy, especially when the shapes of per-param tensors cannot be aligned. Also add a test in distributed_test.py. Original PR issue: Investigate Applying PowerSGD to Communication Hook for Gradient Compression pytorch#47202 ghstack-source-id: 118921275 Test Plan: buck test mode/dev-nosan caffe2/test/distributed:c10d -- test_powerSGD_ddp_comm_hook_nccl buck test mode/dev-nosan caffe2/test/distributed:distributed_nccl_fork -- test_DistributedDataParallel_powerSGD_ddp_comm_hook Reviewed By: rohan-varma Differential Revision: D25511543 fbshipit-source-id: 19ef188bc2d4c7406443c8fa233c1f2c2f27d93c
Yi Wang authored and hwangdeyu committedDec 23, 2020 Configuration menu - View commit details
-
Copy full SHA for a17f9e6 - Browse repository at this point
Copy the full SHA a17f9e6View commit details -
Improve documentation for pipeline parallelism. (pytorch#48638)
Summary: Pull Request resolved: pytorch#48638 Polishing up some of the docs for the main `Pipe` class and its `forward` method. ghstack-source-id: 118820804 Test Plan: waitforbuildbot Reviewed By: rohan-varma Differential Revision: D25237705 fbshipit-source-id: ba3d8737b90a80024c827c0887fc56f14bf678b7
Configuration menu - View commit details
-
Copy full SHA for b375c45 - Browse repository at this point
Copy the full SHA b375c45View commit details -
Add benchmark for torch.distributed.pipeline.sync.Pipe (pytorch#49577)
Summary: Pull Request resolved: pytorch#49577 Repurposing the benchmarking from https://github.com/facebookresearch/fairscale/blob/master/benchmarks/pipe.py and pulling in a stripped down version of the benchmark into PyTorch. Sample output: ``` Running benchmark with args: Namespace(batch_size=8, checkpoint='never', chunks=4, host='localhost', max_batch=10, num_decoder_layers=10, num_devices=4) Number of parameters for model: 292833040 | batch 1 | wps 3593.07 | loss 25.98 | ppl 192556591553.37 | batch 2 | wps 4405.16 | loss 19.36 | ppl 256201548.33 | batch 3 | wps 4404.98 | loss 23.56 | ppl 17111244076.37 | batch 4 | wps 4413.25 | loss 27.11 | ppl 594561327825.83 | batch 5 | wps 4408.53 | loss 25.92 | ppl 181277705101.33 | batch 6 | wps 4385.64 | loss 24.92 | ppl 66592883598.50 | batch 7 | wps 4434.11 | loss 24.75 | ppl 56113635884.68 | batch 8 | wps 4441.25 | loss 24.88 | ppl 63666024212.82 | batch 9 | wps 4425.49 | loss 25.35 | ppl 101959669008.98 | batch 10 | wps 4421.05 | loss 25.34 | ppl 101597621863.94 Peak memory usage for GPUs: cuda:0: 2.38GiB, cuda:1: 3.04GiB, cuda:2: 3.04GiB, cuda:3: 3.67GiB, ``` ghstack-source-id: 118939686 Test Plan: sentinel Reviewed By: rohan-varma Differential Revision: D25628721 fbshipit-source-id: 41c788eed4f852aef019aec18a84cb25ad254f3a
Configuration menu - View commit details
-
Copy full SHA for 8c75384 - Browse repository at this point
Copy the full SHA 8c75384View commit details -
Bump tensorpipe version (pytorch#49599)
Summary: Pull Request resolved: pytorch#49599 Reviewed By: lw Differential Revision: D25639036 Pulled By: mrshenli fbshipit-source-id: 595b396a01d7fa9049d88447ab9079e286637afe
Lucas Hosseini authored and hwangdeyu committedDec 23, 2020 Configuration menu - View commit details
-
Copy full SHA for 4dd4d0c - Browse repository at this point
Copy the full SHA 4dd4d0cView commit details -
Summary: Fix lint on master Pull Request resolved: pytorch#49629 Reviewed By: rohan-varma Differential Revision: D25654199 Pulled By: mrshenli fbshipit-source-id: 2ab5669ad47996c0ca0f9b6611855767d5af0506
Configuration menu - View commit details
-
Copy full SHA for 32073ec - Browse repository at this point
Copy the full SHA 32073ecView commit details -
[quant][graphmode][fx] Allow user to specify qconfig for call_method (p…
…ytorch#49621) Summary: Pull Request resolved: pytorch#49621 This adds support to configure qconfig for a call_method, e.g. x.chunk, this will help workaround a problem in our internal model. TODO: since call_method is also a string and we flatten the qconfig, might need to resolve namespace conflict between call_method and module_name TODO: Add scope support to set the qconfig for call_method correctly with original qconfig Test Plan: Imported from OSS Reviewed By: vkuzo Differential Revision: D25651828 fbshipit-source-id: 82d66b121d37c8274fd481b6a2e9f9b54c5ca73d
Configuration menu - View commit details
-
Copy full SHA for cd8ef1a - Browse repository at this point
Copy the full SHA cd8ef1aView commit details -
Revert D25511543: [Gradient Compression] Implement the original layer…
…wise PowerSGD Test Plan: revert-hammer Differential Revision: D25511543 (pytorch@71f3399) Original commit changeset: 19ef188bc2d4 fbshipit-source-id: a363641a059aeacc57684884998cf8fb7363d748
Configuration menu - View commit details
-
Copy full SHA for 9b0a4c6 - Browse repository at this point
Copy the full SHA 9b0a4c6View commit details -
[PyTorch Mobile] Preserve bundled input related methods when calling …
…optimize_for_mobile (pytorch#49170) Summary: Pull Request resolved: pytorch#49170 Added an extra step to **always** preserve the bundled inputs methods if they are present in the input module. Also added a check to see if all the methods in the `preseved_methods` exist. If not, we will now throw an exception. This can hopefully stop hard-to-debug inputs from getting into downstream functions. ~~Add an optional argument `preserve_bundled_inputs_methods=False` to the `optimize_for_mobile` function. If set to be True, the function will now add three additional functions related with bundled inputs to be preserved: `get_all_bundled_inputs`, `get_num_bundled_inputs` and `run_on_bundled_input`.~~ Test Plan: `buck test mode/dev //caffe2/test:mobile -- 'test_preserve_bundled_inputs_methods \(test_mobile_optimizer\.TestOptimizer\)'` or `buck test caffe2/test:mobile` to run some other related tests as well. Reviewed By: dhruvbird Differential Revision: D25463719 fbshipit-source-id: 6670dfd59bcaf54b56019c1a43db04b288481b6a
Configuration menu - View commit details
-
Copy full SHA for 8d2580f - Browse repository at this point
Copy the full SHA 8d2580fView commit details -
Disable test on windows (pytorch#49636)
Summary: Pull Request resolved: pytorch#49636 test_export_stacks fails with permission errors Test Plan: CI Imported from OSS Reviewed By: robieta Differential Revision: D25654680 fbshipit-source-id: 5689289e06eebc0686030f90ed56483a072b6850
Ilia Cherniavskii authored and hwangdeyu committedDec 23, 2020 Configuration menu - View commit details
-
Copy full SHA for c6210c3 - Browse repository at this point
Copy the full SHA c6210c3View commit details -
Remove DataPtr extractor from CUDAFuture (pytorch#48840)
Summary: Pull Request resolved: pytorch#48840 The CUDAFuture class needs to inspect the values it contains in order to extract its tensors (in fact, the DataPtrs backing those). These are needed first to determine what CUDA devices back those tensors, so that an event for each such device can be recorded; and later to record these DataPtrs with the CUDA caching allocator if they are used in other streams. This became complicated when Python was added to the mix, because to inspect a Python object we need to acquire the GIL, but we couldn't do so from code that was supposed to also work in C++-only mode. The solution was for users to provide a custom way to extract DataPtrs, so that the PythonFutureWrapper could install such a custom Python-aware one. This was the DataPtr extractor. In pytorch#48502 a different suggestion was proposed. At its root, it consists in adding support for IValues of type PyObject to the visit() and getSubValues() methods. In order to deal with the GIL, we do this through a virtual method: PyObjectHolder, which is the base class, is available also in C++-only mode, and thus defines this method but leaves it unimplemented; ConcretePyObjectHolder, which is the subclass, is only included in Python mode, and thus it can implement that method, acquire the GIL, and do what it's supposed to. In my opinion, this approach is just brilliant! Thank wanchaol for proposing it! It hides the complexity of dealing with Python inside getSubValues(), where it can be done properly, thus simplifying enormously the CUDAFuture and the PythonFutureWrapper classes. ghstack-source-id: 118704935 Test Plan: Unit tests Reviewed By: wanchaol Differential Revision: D25334355 fbshipit-source-id: 3f1d3bf6e6e8505a114c877fb9a6fcc3f68d91d3
Configuration menu - View commit details
-
Copy full SHA for 20c0038 - Browse repository at this point
Copy the full SHA 20c0038View commit details -
disable kthvalue overlap (pytorch#48254)
Summary: Fixes pytorch#47934 Pull Request resolved: pytorch#48254 Reviewed By: bdhirsh Differential Revision: D25276689 Pulled By: VitalyFedyunin fbshipit-source-id: a70774e31c269b41786170e99ec1ede42596ba7b
Configuration menu - View commit details
-
Copy full SHA for 0eecd3d - Browse repository at this point
Copy the full SHA 0eecd3dView commit details -
Resubmit: [Gradient Compression] Implement the original layerwise Pow…
…erSGD (pytorch#49639) Summary: Pull Request resolved: pytorch#49639 Resubmit pytorch#49417 with a fix for distributed_test. The previous submission broke a multi-gpu test that runs on 4 GPUs. Since this test only runs on master, couldn't detect it before the submission. The real diff is: pytorch@4ca1014 This time I have verified that the previous failed test `pytorch_linux_xenial_cuda10_2_cudnn7_py3_multigpu_test` could pass after creating a PR (pytorch#49651) from a separate branch: https://app.circleci.com/pipelines/github/pytorch/pytorch/253644/workflows/c1c02b70-0877-40e6-8b4c-61f60f6b70ed/jobs/9768079 ghstack-source-id: 118969912 Test Plan: buck test mode/dev-nosan caffe2/test/distributed:distributed_nccl_fork -- test_DistributedDataParallel_powerSGD_ddp_comm_hook、 Reviewed By: mrshenli Differential Revision: D25654961 fbshipit-source-id: 2a45c8ceb9bdb54ff7309a8b66ec87e913e0150e
Yi Wang authored and hwangdeyu committedDec 23, 2020 Configuration menu - View commit details
-
Copy full SHA for 5ea5c01 - Browse repository at this point
Copy the full SHA 5ea5c01View commit details -
Updated derivative rules for complex svd and pinverse (pytorch#47761)
Summary: Updated `svd_backward` to work correctly for complex-valued inputs. Updated `common_methods_invocations.py` to take dtype, device arguments for input construction. Removed `test_pinverse` from `test_autograd.py`, it is replaced by entries to `common_methods_invocations.py`. Added `svd` and `pinverse` to list of complex tests. References for complex-valued SVD differentiation: - https://giggleliu.github.io/2019/04/02/einsumbp.html - https://arxiv.org/abs/1909.02659 The derived rules assume gauge invariance of loss functions, so the result would not be correct for loss functions that are not gauge invariant. https://re-ra.xyz/Gauge-Problem-in-Automatic-Differentiation/ The same rule is implemented in Tensorflow and [BackwardsLinalg.jl](https://github.com/GiggleLiu/BackwardsLinalg.jl). Ref. pytorch#33152 Pull Request resolved: pytorch#47761 Reviewed By: ngimel Differential Revision: D25658897 Pulled By: mruberry fbshipit-source-id: ba33ecbbea3f592238c01e62c7f193daf22a9d01
Configuration menu - View commit details
-
Copy full SHA for 8e25d99 - Browse repository at this point
Copy the full SHA 8e25d99View commit details -
[Gradient Compression] Add error feedback to layerwise PowerSGD (pyto…
…rch#49418) Summary: Pull Request resolved: pytorch#49418 Add error feedback to the original implementation of PowerSGD. Original PR issue: Investigate Applying PowerSGD to Communication Hook for Gradient Compression pytorch#47202 ghstack-source-id: 118670930 Test Plan: buck test mode/dev-nosan caffe2/test/distributed:c10d -- test_powerSGD_ddp_comm_hook_nccl buck test mode/dev-nosan caffe2/test/distributed:distributed_nccl_fork -- test_DistributedDataParallel_powerSGD_ddp_comm_hook Reviewed By: rohan-varma Differential Revision: D25555538 fbshipit-source-id: c01145cc9acf574a4c6aa337dbbba0ba7d9350b2
Yi Wang authored and hwangdeyu committedDec 23, 2020 Configuration menu - View commit details
-
Copy full SHA for 50e3afc - Browse repository at this point
Copy the full SHA 50e3afcView commit details -
[Gradient Compression] Replace the assertions in PowerSGD comm hook b…
…y stream syncrhonization (pytorch#49435) Summary: Pull Request resolved: pytorch#49435 Previously the assertion that prevents illegal memory access is because of the torch.any that returns a boolean value, which initiates a data transfer from the device to the host and forces a synchronization. An explicit synchronization is more to the point. Original PR issue: Investigate Applying PowerSGD to Communication Hook for Gradient Compression pytorch#47202 ghstack-source-id: 118664204 Test Plan: buck test mode/dev-nosan caffe2/test/distributed:c10d -- test_powerSGD_ddp_comm_hook_nccl buck test mode/dev-nosan caffe2/test/distributed:distributed_nccl_fork -- test_DistributedDataParallel_powerSGD_ddp_comm_hook Reviewed By: rohan-varma Differential Revision: D25573484 fbshipit-source-id: 516d0d502da2863b516c15332702335ee662f072
Yi Wang authored and hwangdeyu committedDec 23, 2020 Configuration menu - View commit details
-
Copy full SHA for 53750d2 - Browse repository at this point
Copy the full SHA 53750d2View commit details -
Add support for torch.tensor_split to accept a tensor for
indices
a……rgument (pytorch#49169) Summary: Pull Request resolved: pytorch#49169 Trying to solve PR request pytorch#47479. This diff tries to overload method `torch.tensor_split` to also accept a tensor for argument `split_size_or_sections` which currently accepts a python list or int. The motivation is to avoid converting a tensor to a list so that when tracing a model/module the tensor operations can be recorded. Implementation is following the diff that originally added the `tensor_split` method D24166164 (pytorch@ef4817f). Test Plan: ``` buck test caffe2/test:torch -- tensor_split ``` https://www.internalfb.com/intern/testinfra/testconsole/testrun/5910974550563805/ ``` buck test caffe2/test:others -- tensor_split ``` https://www.internalfb.com/intern/testinfra/testconsole/testrun/1688849905082678/ Reviewed By: mruberry Differential Revision: D25440885 fbshipit-source-id: 6705dc551279e3a5eb1e5ec1ede2728eab85ffb1
Edson Romero authored and hwangdeyu committedDec 23, 2020 Configuration menu - View commit details
-
Copy full SHA for 9098cc7 - Browse repository at this point
Copy the full SHA 9098cc7View commit details -
[AutoAccept][Codemod][FBSourceClangFormatLinter] Daily `arc lint --ta…
…ke CLANGFORMAT` Reviewed By: zertosh Differential Revision: D25662961 fbshipit-source-id: f5811a5797fd6dc8733fdf86f35c93d12a08d53a
generatedunixname89002005325676 authored and hwangdeyu committedDec 23, 2020 Configuration menu - View commit details
-
Copy full SHA for c7f5af6 - Browse repository at this point
Copy the full SHA c7f5af6View commit details -
[WIP][DataLoader] CollateIterableDataset prototype (pytorch#48933)
Summary: Pull Request resolved: pytorch#48933 Prototype for CollateIterableDataset. Move `collate_batch_fn` to BatchIterableDataset - CollateIterableDataset - [x] Prototype - [x] Tests - BatchIterableDataset - [x] Prototype - [x] Tests - SamplerIterableDataset - [x] Prototype - [x] Tests Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D25623635 Pulled By: ejguan fbshipit-source-id: 99ba077619f672551ac15367baaba985db35a9c2
Configuration menu - View commit details
-
Copy full SHA for b410315 - Browse repository at this point
Copy the full SHA b410315View commit details -
[WIP][DataLoader] Prototype of BatchIterableDataset (pytorch#49186)
Summary: Pull Request resolved: pytorch#49186 Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D25623636 Pulled By: ejguan fbshipit-source-id: 01a08cccb69301481c55b46358203354b9b4f5fa
Configuration menu - View commit details
-
Copy full SHA for cf9ad1f - Browse repository at this point
Copy the full SHA cf9ad1fView commit details -
[WIP][DataLoader] Prototype of SamplerIterableDataset (pytorch#49363)
Summary: Pull Request resolved: pytorch#49363 Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D25623637 Pulled By: ejguan fbshipit-source-id: 9155d27d1fc91996b74110795cc73f1da0eedd44
Configuration menu - View commit details
-
Copy full SHA for dc3bbaa - Browse repository at this point
Copy the full SHA dc3bbaaView commit details -
[Mask R-CNN]Add Int8 AABB Generate proposals Op (pytorch#49574)
Summary: Pull Request resolved: pytorch#49574 Adds support for additional Eigen Utils for custom type defs. Reviewed By: linbinyu Differential Revision: D25624556 fbshipit-source-id: 0ffa90aaf8cbf1d08825e95156fb40d966ca7042
Configuration menu - View commit details
-
Copy full SHA for b3355fd - Browse repository at this point
Copy the full SHA b3355fdView commit details -
Fix sinc docs typo (pytorch#49667)
Summary: Fix small typo in sinc docs Pull Request resolved: pytorch#49667 Reviewed By: ngimel Differential Revision: D25665721 Pulled By: soulitzer fbshipit-source-id: 5f78b9e34bb0084e51ae79d1afc450bcb0ae3d75
Configuration menu - View commit details
-
Copy full SHA for 6da4c09 - Browse repository at this point
Copy the full SHA 6da4c09View commit details -
Added linalg.solve (pytorch#48456)
Summary: This PR adds `torch.linalg.solve`. `linalg_solve_out` uses in-place operations on the provided result tensor. I modified `apply_solve` to accept tensor of Int instead of std::vector, that way we can write a function similar to `linalg_solve_out` but removing the error checks and device memory synchronization. In comparison to `torch.solve` this routine accepts 1-dimensional tensors and batches of 1-dim tensors for the right-hand-side term. `torch.solve` requires it to be at least 2-dimensional. Ref. pytorch#42666 Pull Request resolved: pytorch#48456 Reviewed By: izdeby Differential Revision: D25562222 Pulled By: mruberry fbshipit-source-id: a9355c029e2442c2e448b6309511919631f9e43b
Configuration menu - View commit details
-
Copy full SHA for 0edf70b - Browse repository at this point
Copy the full SHA 0edf70bView commit details -
Fix return type Any for Ternary ops (pytorch#49165)
Summary: Pull Request resolved: pytorch#49165 Test Plan: Imported from OSS Reviewed By: eellison Differential Revision: D25463694 Pulled By: ejguan fbshipit-source-id: 5cf907e8de6eeb0171d61175a60fac9812b76c6c
Configuration menu - View commit details
-
Copy full SHA for 0c19f79 - Browse repository at this point
Copy the full SHA 0c19f79View commit details -
Fix typo in add_pr_curve docstrings. (pytorch#49648)
Summary: Very small PR to fix a typo. ### Description Fixed 1 typo in the documentation of `torch/utils/tensorboard/writer.py` (replaced "_should in_" by "_should be in_") Pull Request resolved: pytorch#49648 Reviewed By: ngimel Differential Revision: D25665831 Pulled By: mrshenli fbshipit-source-id: a4e733515603bb9313c1267fdf2cfcc2bc2773c6
Configuration menu - View commit details
-
Copy full SHA for ffc1c0c - Browse repository at this point
Copy the full SHA ffc1c0cView commit details -
Fixed a typo in dataloader.py. (pytorch#49437)
Summary: This small PR fixes a one character typo in the docstring for `DataLoader`. Pull Request resolved: pytorch#49437 Reviewed By: ngimel Differential Revision: D25665971 Pulled By: mrshenli fbshipit-source-id: b60f975f1e3bf0bb8f88e39f490f716c602f087e
Configuration menu - View commit details
-
Copy full SHA for 0b652f9 - Browse repository at this point
Copy the full SHA 0b652f9View commit details -
[NNC] Intermediate allocs flattened and dependency support (pytorch#4…
…9554) Summary: Makes two changes in NNC for intermediate buffer allocations: 1. Flattens dimensions of buffers allocated in LoopNest::prepareForCodegen() to match their flattened usages. 2. Adds support for tracking memory dependencies of Alloc/Free to the MemDependencyChecker, which will allow us to check safety of accesses to intermediate buffers (coming in a future diff). I didn't add any new tests as the mem dependency checker tests already cover it pretty well, particularly the GEMM test. Pull Request resolved: pytorch#49554 Reviewed By: VitalyFedyunin Differential Revision: D25643133 Pulled By: nickgg fbshipit-source-id: 66be3054eb36f0a4279d0c36562e63aa2dae371c
Configuration menu - View commit details
-
Copy full SHA for 0a6a102 - Browse repository at this point
Copy the full SHA 0a6a102View commit details -
Implementing NumPy-like function torch.broadcast_to (pytorch#48997)
Summary: Related pytorch#38349 Implement NumPy-like function `torch.broadcast_to` to broadcast the input tensor to a new shape. Pull Request resolved: pytorch#48997 Reviewed By: anjali411, ngimel Differential Revision: D25663937 Pulled By: mruberry fbshipit-source-id: 0415c03f92f02684983f412666d0a44515b99373
Configuration menu - View commit details
-
Copy full SHA for 11d2494 - Browse repository at this point
Copy the full SHA 11d2494View commit details -
Sparse-sparse matrix multiplication (CPU/CUDA) (pytorch#39526)
Summary: This PR implements matrix multiplication support for 2-d sparse tensors using the COO sparse format. The current implementation of `torch.sparse.mm` support this configuration, `torch.sparse.mm(sparse_matrix1, sparse_matrix2.to_dense())`, but this could spend a lot of memory when sparse_matrix2's shape is large. This implementation extends `torch.sparse.mm` function to support `torch.sparse.mm(sparse_matrix1, sparse_matrix2)` Resolves #[20988](pytorch#20988) for CPU/CUDA. - [x] sparse matmul - [x] CPU/CUDA C++ implementation - [x] unittests - [x] update torch.sparse.mm documentation - [x] autograd support The CPU sparse-sparse matmul was implemented taking as a reference this work "Sparse Matrix Multiplication Package (SMMP)". The GPU sparse-sparse matmul is based on cuSparse, there is specific code for CUSPARSE when CUSPARSE_VERSION >= 11 and old version of CUSPARSE. Both CPU/CUDA rely on the sparse-sparse matmul algorithm using the CSR indices format as it is one of the fastest algorithm. Here it is the latest benchmark (script is here) results for torch.sparse.mm (CUDA) and torch.sparse.mm (CPU) and scipy, values are float32 scalars: size | density | sparse.mm(CUDA) | sparse.mm(CPU) | scipy_coo_matmul -- | -- | -- | -- | -- (32, 10000) | 0.01 | 822.7 | 79.4 | 704.1 (32, 10000) | 0.05 | 1741.1 | 402.6 | 1155.3 (32, 10000) | 0.1 | 2956.8 | 840.8 | 1885.4 (32, 10000) | 0.25 | 6417.7 | 2832.3 | 4665.2 (512, 10000) | 0.01 | 1010.2 | 3941.3 | 26937.7 (512, 10000) | 0.05 | 2216.2 | 26903.8 | 57343.7 (512, 10000) | 0.1 | 4868.4 | 87773.7 | 117477.0 (512, 10000) | 0.25 | 16639.3 | 608105.0 | 624290.4 (1024, 10000) | 0.01 | 1224.8 | 13088.1 | 110379.2 (1024, 10000) | 0.05 | 3897.5 | 94783.9 | 236541.8 (1024, 10000) | 0.1 | 10559.1 | 405312.5 | 525483.4 (1024, 10000) | 0.25 | 57456.3 | 2424337.5 | 2729318.7 A new backward algorithm was implemented using only `sparse @ sparse` and `sparse_mask` operations. Here is some benchmarking: ``` [------------------------- sparse.mm-backward -------------------------] | sparse.backward | dense.backward ----------------------------------------------------------------------- (32, 10000) | 0.01 | 13.5 | 2.4 (32, 10000) | 0.05 | 52.3 | 2.4 (512, 10000) | 0.01 | 1016.8 | 491.5 (512, 10000) | 0.05 | 1604.3 | 492.3 (1024, 10000) | 0.01 | 2384.1 | 1963.7 (1024, 10000) | 0.05 | 3965.8 | 1951.9 ``` I added new benchmark tests. Now I am using a real dataset used in recent studies [1, 2] with different sparsity levels. ``` [---------------------------------- matmul ---------------------------------] | 0.5 | 0.7 | 0.8 | 0.9 | 0.95 | 0.98 1 threads: ------------------------------------------------------------------ (cpu) torch | 5.4 | 5.4 | 5.2 | 5.3 | 5.3 | 5.4 torch.sparse | 122.2 | 51.9 | 27.5 | 11.4 | 4.9 | 1.8 scipy | 150.1 | 87.4 | 69.2 | 56.8 | 38.4 | 17.1 (cuda) torch | 1.3 | 1.1 | 1.1 | 1.1 | 1.1 | 1.1 torch.sparse | 20.0 | 8.4 | 5.1 | 2.5 | 1.5 | 1.1 [----------------------------------- backward -----------------------------------] | 0.5 | 0.7 | 0.8 | 0.9 | 0.95 | 0.98 1 threads: ----------------------------------------------------------------------- (cpu) torch | 17.7 | 17.9 | 17.7 | 17.7 | 17.6 | 17.9 torch.sparse | 672.9 | 432.6 | 327.5 | 230.8 | 176.7 | 116.7 (cuda) torch | 3.8 | 3.6 | 3.5 | 3.5 | 3.6 | 3.5 torch.sparse | 68.8 | 46.2 | 35.6 | 24.2 | 17.8 | 11.9 Times are in milliseconds (ms). ``` In summary, I can say that the new `sparse @ sparse` backward algorithm is better as it is more about saving space than performance. Moreover, it is better than other options tested before. ## **References** 1. Trevor Gale, Matei Zaharia, Cliff Young, Erich Elsen. **Sparse GPU Kernels for Deep Learning.** Proceedings of the International Conference for High Performance Computing, 2020. [https://github.com/google-research/google-research/tree/master/sgk](https://github.com/google-research/google-research/tree/master/sgk) 2. Trevor Gale, Erich Elsen, Sara Hooker. **The State of Sparsity in Deep Neural Networks.** [https://github.com/google-research/google-research/tree/master/state_of_sparsity](https://github.com/google-research/google-research/tree/master/state_of_sparsity) Pull Request resolved: pytorch#39526 Reviewed By: mruberry Differential Revision: D25661239 Pulled By: ngimel fbshipit-source-id: b515ecd66d25f347d637e159d51aa45fb43b6938
Configuration menu - View commit details
-
Copy full SHA for 38ff78f - Browse repository at this point
Copy the full SHA 38ff78fView commit details -
[BE] Introduce
set_cwd
context manager (pytorch#49657)Summary: Used to temporarily change working directory, but restore it even if exception is raised Use it in test_type_hints and during code coverage collection Pull Request resolved: pytorch#49657 Reviewed By: walterddr Differential Revision: D25660543 Pulled By: malfet fbshipit-source-id: 77f08d57e4b60b95daa4068d0dacf7c25f978526
Configuration menu - View commit details
-
Copy full SHA for 209bddb - Browse repository at this point
Copy the full SHA 209bddbView commit details -
add close() method to tqdm mock (pytorch#46040)
Summary: In `torchvision` we use [`torch.hub.tqdm`](https://github.com/pytorch/vision/blob/2cc20d7485458a6368e8995e3f79799589b632bd/torchvision/datasets/utils.py#L11) to display the dataset download. One of our methods uses [`tqdm().close()`](https://github.com/pytorch/vision/blob/2cc20d7485458a6368e8995e3f79799589b632bd/torchvision/datasets/utils.py#L188), which is [not included in the mock](https://github.com/pmeier/pytorch/blob/283ae1998cd6920b588907adfb88909afb522ae2/torch/hub.py#L22-L49). This PR adds a `close()` method to the mock. Cc fmassa Pull Request resolved: pytorch#46040 Reviewed By: mrshenli Differential Revision: D25619429 Pulled By: fmassa fbshipit-source-id: a137f2417d8a47923ccb1ec6b7d5298c1545245c
Configuration menu - View commit details
-
Copy full SHA for 2e52d1d - Browse repository at this point
Copy the full SHA 2e52d1dView commit details -
Dynamic GRU quantization support (pytorch#49448)
Summary: Pull Request resolved: pytorch#49448 ghstack-source-id: 118982171 Test Plan: buck test caffe2/test:quantization -- 'test_qlstmGRU \(quantization\.test_quantized_op\.TestDynamicQuantizedRNNOp\)' --print-passing-details buck test caffe2/test:quantization -- 'test_quantized_rnn \(quantization\.test_quantize\.TestPostTrainingDynamic\)' --print-passing-details buck test caffe2/test:quantization -- 'test_qrnncell \(quantization\.test_quantized_op\.TestDynamicQuantizedRNNOp\)' --run-disabled --print-passing-details Reviewed By: vkuzo Differential Revision: D25579815 fbshipit-source-id: 413cc8888eb8058230b94c9576d2fa54b0ed1416
Configuration menu - View commit details
-
Copy full SHA for 3dafed5 - Browse repository at this point
Copy the full SHA 3dafed5View commit details -
converted current debugging statements in LLVM codegen to jit-logging…
… statements pytorch#48771 (pytorch#49040) Summary: Pull Request resolved: pytorch#49040 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D25407356 Pulled By: huiguoo fbshipit-source-id: 1c1f893ed8d0877bee27e9a673a5dce2203c2bad
Configuration menu - View commit details
-
Copy full SHA for 6f66ee4 - Browse repository at this point
Copy the full SHA 6f66ee4View commit details -
added macros in jit logging to check whether loggings are enabled; re…
…placed similar checks in LLVM codegen with such macros (pytorch#49121) Summary: Pull Request resolved: pytorch#49121 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D25445971 Pulled By: huiguoo fbshipit-source-id: 980775a94159aa0b3b66fae938962761b38703d5
Configuration menu - View commit details
-
Copy full SHA for a20a1f9 - Browse repository at this point
Copy the full SHA a20a1f9View commit details -
change block codegen to handle new inlining in NNC (pytorch#47687)
Summary: minor changes to block codegen to handle new inlining in NNC. For Block code generation we need to delay inlining before collecting dimension data about the tensors. We need to collect the dimension of the tensor before they were flattened. We don't have this information after the inlining pass, so for Block we run inling after we have collected this data using `CreateBufferMap` analysis. Pull Request resolved: pytorch#47687 Reviewed By: ZolotukhinM Differential Revision: D24864869 Pulled By: protonu fbshipit-source-id: 9574c0599f7d959a1cf0eb49d4e3e541cbe9b1d3
Protonu Basu authored and hwangdeyu committedDec 23, 2020 Configuration menu - View commit details
-
Copy full SHA for 8cb4a36 - Browse repository at this point
Copy the full SHA 8cb4a36View commit details -
Clean up backward compatibility skip list (pytorch#49691)
Summary: Pull Request resolved: pytorch#49691 Quite a few stale items, let's make the list short. Test Plan: oss ci Reviewed By: hl475 Differential Revision: D25667464 fbshipit-source-id: cff1be8b5e0068470b3f621acf6bf4fbd414233e
Configuration menu - View commit details
-
Copy full SHA for 2af5914 - Browse repository at this point
Copy the full SHA 2af5914View commit details -
Enable product for bool tensor (pytorch#48637)
Summary: Fixes pytorch#48351 Pull Request resolved: pytorch#48637 Reviewed By: mrshenli Differential Revision: D25658596 Pulled By: mruberry fbshipit-source-id: ff3ada74b6d281c8e4753ed38339a1c036f722ee
Configuration menu - View commit details
-
Copy full SHA for 83c91f9 - Browse repository at this point
Copy the full SHA 83c91f9View commit details -
Fix test_cuda_init_race skip rules (pytorch#49693)
Summary: Fixes pytorch#49432 Pull Request resolved: pytorch#49693 Reviewed By: walterddr, janeyx99 Differential Revision: D25668027 Pulled By: malfet fbshipit-source-id: 802cbd39e4ebe585709179f332b680f5f7978814
Configuration menu - View commit details
-
Copy full SHA for 56115b7 - Browse repository at this point
Copy the full SHA 56115b7View commit details -
Add base forward grad logic (pytorch#49097)
Summary: Pull Request resolved: pytorch#49097 RFC: pytorch/rfcs#11 This PR add the basic logic to handle forward grad as dual Tensors. It contains the following: - Mechanism to save dual state on a Tensor and clear it up when the dual level ends - C++ and python user facing API - Updated view system that is able to track both forward and backward views The current PR has the following limitations: - Extensive tests are in the next PR in the stack as formulas are needed to write full tests. - Only the manual formulas have been audited and no other formula is actually implemented here (they are in the next PR in the stack) - Only level 0 is allowed for now. This was discussed and agreed that it is not needed for the first version of this PR. - We can save one ViewInfo creation when both the forward and backward views have the same base. This can be done by adding a boolean flag to the DifferentiableViewMeta and extra logic in the `as_view` method. This is left out to keep this PR concise. - We can skip tracking forward views if the base has a forward grad. This can be done by adding extra logic in the `as_view` method. This is left out to keep this PR concise. Reading guide: - Updated view handling in [gen_variable_type.py](https://github.com/pytorch/pytorch/pull/49097/files#diff-f6553cec68caeaea36f6c8b14ff76a6d39dfd774e0ea9ef2f76e8d81fd9af5df), [VariableTypeUtils.h](https://github.com/pytorch/pytorch/pull/49097/files#diff-ec71cfa45954dece1236c661d170e6341879c5be637f4abf52e826d61b40695a), [variable.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-60e3bfe444e89efc7149f25b38e472710525984789934ab83f1bd5671b8ff285) (skip code below "[Forward Grad View]" for now), [variable.h](https://github.com/pytorch/pytorch/pull/49097/files#diff-1604bcd0e4350ed99ec45e437cee7ac9ebe337392c9ea16a236247aeeb35b02bR266-R542) and [custom_function.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-dd85f452082b5bb6612bbc12adb496f8827defa228509f7b493de1d517522d5d). This introduces the new ViewInfo to hold view informations shared for forward and backward. It also updates the differentiable view meta to use this. And it updates the as_view function to handle both forward and backward view. - New forward grad class that handle storing gradients and tracking at each level [forward_grad.h](https://github.com/pytorch/pytorch/pull/49097/files#diff-c6c5b9ab2d7e5dde4102495faa1b6bbbfc23aa3e47deb7359c0bfe1eb004c0cb), [forward_grad.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-de2ab54ade7312701850d71a119a4f4ee4b9fc5a9c42a467cdd4e73c033531dd) and [build_variables.bzl](https://github.com/pytorch/pytorch/pull/49097/files#diff-dfdfa2efb17beddfd9094524f95351fd197db6c8857e96b436fb599870359325). EDIT: These files also contain the new flag to globally disable forward AD that allows us to reduce performance issues while this is in development. - Lowest level API and binding between Tensor and AutogradMeta in [TensorBody.h](https://github.com/pytorch/pytorch/pull/49097/files#diff-7554853205392fa743357bf845ecc350a974ec049383248c12daaf2f4de04911), [TensorImpl.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-052bd9150ef8e09289ddf644b5a6830ede49207201cd41728f6d7cc6d9cead94), [TensorImpl.h](https://github.com/pytorch/pytorch/pull/49097/files#diff-a15aae4cf23da44970db7cece62ff981265575c798c62f7b52d87c8809dfe2e1) and the rest of [variable.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-60e3bfe444e89efc7149f25b38e472710525984789934ab83f1bd5671b8ff285R557-R677) - API to access the forward primal that needs to be a differentiable function (and so in native_functions.yaml) [native_functions.yaml](https://github.com/pytorch/pytorch/pull/49097/files#diff-2f3dbd85efb9b5172f2264eedd3be47dd765e6ab7cc8bf3ade5e62c28ae35991) [NamedRegistrations.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-69bd3bea510c9b64e1633fa18c3ea63d4b8348dbad3a78ad9de844ab3e43dc1d), [VariableMethodsStub.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-23f5fcb737a2b289811fe0f4b65aef775e7c824b2e629ecd343df51405cd434f), [derivatives.yaml](https://github.com/pytorch/pytorch/pull/49097/files#diff-e4c2f99a2404e98c3586e07425da73008f36b1bada790648a7297af141d37f8c), [gen_python_functions.py](https://github.com/pytorch/pytorch/pull/49097/files#diff-e4c2f99a2404e98c3586e07425da73008f36b1bada790648a7297af141d37f8c), [gen_trace_type.py](https://github.com/pytorch/pytorch/pull/49097/files#diff-54e0b976027bf8debefb959ff360b89ae93466970c843365b1b3a03806d868ce), [TraceTypeManual.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-f34636741ad4a23d018e0c289bc750c3bad887b45660e1d6eaf440d234a78fbf) and [part of VariableTypeManual.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-6e19a1bce8cbdba8714b6e2c794a76bc0864b64a49cfa757cb0b5afdc937d1a4R198-R243) - c++ API [autograd.h](https://github.com/pytorch/pytorch/pull/49097/files#diff-349028fbe8291a965a7a263c323b208fe071c35c66179ee997ef84fa81aa4b1e), [autograd.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-a3fe908d67dfec16a1fcde300de68b0701bf68b88db7451f29f2bee255cf30c9) - python binding [init.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-c58a67c85191c22c9b3bb439117d8053edfd9dea839fa010cf967d404c3c630d) - python API [forward_ad.py](https://github.com/pytorch/pytorch/pull/49097/files#diff-a4efad4ba18fffdfb264c21e5475997a24a743089a899f8ec1a5ff962c6738d9), [autograd/__init__.py](https://github.com/pytorch/pytorch/pull/49097/files#diff-743abcafd32ad0e69f39ac5a91df4197b7e1921c135cacee7ef6dc829a8a7af8) - c++ and python printing [Formatting.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-881dba501e71662e2e4818b4b016f739b344c8aed2f5edc6b871eda47a2aced0), [_tensor_str.py](https://github.com/pytorch/pytorch/pull/49097/files#diff-a7911f8d5e73adbff914d99fd7818ace2a7030b6a3748abe06ec6fc6e3df9cc3) - Utility for formulas and updated manual functions to respect new view system as well as forward grad [FunctionsManual.h](https://github.com/pytorch/pytorch/pull/49097/files#diff-6378bb6dc81a64dab676d61731341fa5d1088418f32a1473a33a0ccfc2357dc1), [FunctionsManual.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-4adbd88239afcd60e8198aab65d4f5e43b62314e34b80551e997a1ea503adea5) [rest of VariableTypeManual.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-6e19a1bce8cbdba8714b6e2c794a76bc0864b64a49cfa757cb0b5afdc937d1a4R264-R433) - Ensure SavedVariable save forward grad properly [saved_variable.h](https://github.com/pytorch/pytorch/pull/49097/files#diff-c1b8039d776241abe177d5aa99b79dd9489a9b3e529da8ab24c2e386c1238ae2), [saved_variable.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-cc9fba479b5beae06b2eea2e390d17796e0341c5b037a20b5bcaccbb0c341030) Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D25607503 Pulled By: albanD fbshipit-source-id: f1396290de1d75760f3d380c43cdd56e86fa6099
Configuration menu - View commit details
-
Copy full SHA for 97d64bc - Browse repository at this point
Copy the full SHA 97d64bcView commit details -
Do not use negative values in GCD computation. (pytorch#49379)
Summary: GCD should always return positive integers. When negative values are used, we hit a corner case that results in an infinite recursion during simplification. Pull Request resolved: pytorch#49379 Reviewed By: ezyang Differential Revision: D25597115 Pulled By: navahgar fbshipit-source-id: b0e8ac07ee50a5eb775c032628d4840df7424927
Configuration menu - View commit details
-
Copy full SHA for b77390b - Browse repository at this point
Copy the full SHA b77390bView commit details -
[jit][tracer] allow traced modules to return dicts with tuple values …
…when strict=False (pytorch#49568) Summary: Pull Request resolved: pytorch#49568 We have some inference use cases where the expected output of a module is of the form `{"key": (t1, t1)}` and are currently jit tracing the modules until we can reach jit script compatibility. Test Plan: buck test mode/dev caffe2/test:jit -- 'test_trace_returning_complex_dict' Reviewed By: houseroad Differential Revision: D25624152 fbshipit-source-id: 5adef0e3c9d54cd31ad5fece4ac6530d541fd673
Configuration menu - View commit details
-
Copy full SHA for 4ab6172 - Browse repository at this point
Copy the full SHA 4ab6172View commit details -
Move device guard from MultiTensorApply.cuh (pytorch#46664)
Summary: Pull Request resolved: pytorch#46664 Test Plan: Imported from OSS Reviewed By: anjali411 Differential Revision: D24453343 Pulled By: izdeby fbshipit-source-id: b82a658af50ededc985195ed02dbf60e792c7a13
Iurii Zdebskyi authored and hwangdeyu committedDec 23, 2020 Configuration menu - View commit details
-
Copy full SHA for eb6a2ab - Browse repository at this point
Copy the full SHA eb6a2abView commit details -
Use store based barrier only for certain store types. (pytorch#49694)
Summary: Pull Request resolved: pytorch#49694 The store based barrier introduced in pytorch#49419 broke for certain store types. This is a quick fix to resolve the issues for other store types. ghstack-source-id: 119006874 Test Plan: 1) waitforbuildbot Reviewed By: ppwwyyxx, rohan-varma Differential Revision: D25668404 fbshipit-source-id: 751fb8b229ad6f50ee9c50f63a70de5a91c9eda5
Configuration menu - View commit details
-
Copy full SHA for 4164cb2 - Browse repository at this point
Copy the full SHA 4164cb2View commit details -
Fix TCPStore type coercion (pytorch#49685)
Summary: Fixes pytorch#49052 The TCPStore example with 4 arguments was working because the datetime value was being implicitly converted to a bool. Modified the pybind definition and updated documentation. Pull Request resolved: pytorch#49685 Test Plan: ``` import torch.distributed as dist from datetime import timedelta dist.TCPStore("127.0.0.1", 0, True, timedelta(seconds=30)) ``` Now fails with ``` TypeError: __init__(): incompatible constructor arguments. The following argument types are supported: 1. torch._C._distributed_c10d.TCPStore(host_name: str, port: int, world_size: int, is_master: bool, timeout: datetime.timedelta = datetime.timedelta(seconds=300)) Invoked with: '127.0.0.1', 0, True, datetime.timedelta(seconds=30) ``` Reviewed By: mrshenli, ngimel Differential Revision: D25668021 Pulled By: H-Huang fbshipit-source-id: ce40b8648d0a414f0255666fbc680f1a66fae090
Configuration menu - View commit details
-
Copy full SHA for ccde23b - Browse repository at this point
Copy the full SHA ccde23bView commit details -
replacing THC_CLASS and THC_API with TORCH_CUDA_API (pytorch#49690)
Summary: THC_API and THC_CLASS were leftover macros from before the consolidation of caffe2, aten, and torch. Now that they're combined, these are misleading and should just be TORCH_CUDA_API. The only file I manually edited was `THCGeneral.h.in`. Pull Request resolved: pytorch#49690 Reviewed By: malfet Differential Revision: D25667982 Pulled By: janeyx99 fbshipit-source-id: 2fdf7912b2a0537b7c25e1fed21cc301fa59d57f
Configuration menu - View commit details
-
Copy full SHA for 1e9a97f - Browse repository at this point
Copy the full SHA 1e9a97fView commit details -
Revert D25607503: Add base forward grad logic
Test Plan: revert-hammer Differential Revision: D25607503 (pytorch@fdf02ef) Original commit changeset: f1396290de1d fbshipit-source-id: 057206e28ff48ee288856adfe3ca577d4880789f
Walter Shen authored and hwangdeyu committedDec 23, 2020 Configuration menu - View commit details
-
Copy full SHA for 220afd2 - Browse repository at this point
Copy the full SHA 220afd2View commit details -
[TensorExpr] Change
LoopNest::vectorize
to acceptFor*
instead of…… `Stmt*`. (pytorch#49696) Summary: Pull Request resolved: pytorch#49696 And make it static. Test Plan: Imported from OSS Reviewed By: navahgar, nickgg Differential Revision: D25668695 Pulled By: ZolotukhinM fbshipit-source-id: 8d7fb507d6f3beca70e868d9e0f4c46247311a99
Mikhail Zolotukhin authored and hwangdeyu committedDec 23, 2020 Configuration menu - View commit details
-
Copy full SHA for 1b63e24 - Browse repository at this point
Copy the full SHA 1b63e24View commit details -
[TensorExpr] Move
SimpleIREval
implementation from .h to .cpp. (pyt……orch#49697) Summary: Pull Request resolved: pytorch#49697 Mostly mechanical move. This refactoring helps to hide unnecessary details from the SimpleIREval interface and make it more similar to a pure 'codegen'. Test Plan: Imported from OSS Reviewed By: nickgg Differential Revision: D25668696 Pulled By: ZolotukhinM fbshipit-source-id: 423247bfcdfa88403e8ec92152f00110bb9da19c
Mikhail Zolotukhin authored and hwangdeyu committedDec 23, 2020 Configuration menu - View commit details
-
Copy full SHA for d1fac89 - Browse repository at this point
Copy the full SHA d1fac89View commit details -
unbreak mypy torch/quantization (pytorch#49549)
Summary: Pull Request resolved: pytorch#49549 Somehow `mypy torch/quantization` got broken in the past couple of days: https://gist.github.com/vkuzo/07af454246f0a68e6fa8929beeec7e0d . I didn't see any relevant PRs other than pytorch#47725, which doesn't seem related. The error doesn't seem real, as the arguments to `_cudnn_rnn_flatten_weight` seem correct. For now, ignoring the failure so we have a clean `mypy` run on `torch/quantization`. Test Plan: ``` mypy torch/quantization ``` Imported from OSS Reviewed By: jerryzh168 Differential Revision: D25616972 fbshipit-source-id: 46c207fe1565ec949c0b1f57d6cd0c93f627e6bd
Configuration menu - View commit details
-
Copy full SHA for ca537cd - Browse repository at this point
Copy the full SHA ca537cdView commit details -
fx quant: types for fusion_patterns.py (pytorch#49606)
Summary: Pull Request resolved: pytorch#49606 Adds more types, for readability. Test Plan: ``` mypy torch/quantization ``` Imported from OSS Reviewed By: jerryzh168 Differential Revision: D25643894 fbshipit-source-id: 4aad52fe4e59ad74b6e0e3acd0f98fba91561a29
Configuration menu - View commit details
-
Copy full SHA for 6d8e9d3 - Browse repository at this point
Copy the full SHA 6d8e9d3View commit details -
fx quant: add types to observed_module.py (pytorch#49607)
Summary: Pull Request resolved: pytorch#49607 Readability Test Plan: ``` mypy torch/quantization ``` Imported from OSS Reviewed By: jerryzh168 Differential Revision: D25643895 fbshipit-source-id: b4b8741b07ac4827c3bacd2084df81fbfdd0c2d5
Configuration menu - View commit details
-
Copy full SHA for 1de10d5 - Browse repository at this point
Copy the full SHA 1de10d5View commit details -
fx quant: fix types on _find_quants (pytorch#49616)
Summary: Pull Request resolved: pytorch#49616 Add types to `_find_quants` I/O and fix resulting errors, needed for an upcoming bug fix. Test Plan: ``` mypy torch/quantization python test/test_quantization.py TestQuantizeFx ``` Imported from OSS Reviewed By: jerryzh168 Differential Revision: D25645719 fbshipit-source-id: 4bf788b55fd4fd086c83a4438b9c2df22b9cff49
Configuration menu - View commit details
-
Copy full SHA for 1869bd7 - Browse repository at this point
Copy the full SHA 1869bd7View commit details -
[FX] Fix python code having spurious newlines from placeholders (pyto…
…rch#49720) Summary: Pull Request resolved: pytorch#49720 Test Plan: Imported from OSS Reviewed By: zdevito Differential Revision: D25675825 Pulled By: jamesr66a fbshipit-source-id: a9028acad9c8feb877fff5cd09aedabed52a3f4b
James Reed authored and hwangdeyu committedDec 23, 2020 Configuration menu - View commit details
-
Copy full SHA for c8aefec - Browse repository at this point
Copy the full SHA c8aefecView commit details -
[pt][ATen] Optimize bmm (pytorch#49506)
Summary: Pull Request resolved: pytorch#49506 - Get rid of expensive stuff like `TensorArg`, `checkBackend`, `checkSize`, and `TensorAccessor`. - Add `checkDim` that does not require creating a `TensorArg` which incurs a refcount bump - Avoid unnecessary calls to `torch.select`, which goes through the dispatcher in the cases we care about, with mat1 and mat2 not permuted or permuted with dims = [0, 2, 1]. The pt version of bmm supports crazy cases like when the inputs are permuted with dims = [1, 2, 0], which is uncommon in SparseNNs. Test Plan: Unit test: ``` buck test //caffe2/test:linalg ``` Benchmark with the adindexer model: ``` Before: I1216 14:02:24.155516 2595800 PyTorchPredictorBenchLib.cpp:209] PyTorch run finished. Milliseconds per iter: 0.0847197. Iters per second: 11803.6 After: I1216 14:02:26.583878 2595939 PyTorchPredictorBenchLib.cpp:209] PyTorch run finished. Milliseconds per iter: 0.082051. Iters per second: 12187.5 ``` Reviewed By: bwasti Differential Revision: D25577574 fbshipit-source-id: 8aba69b950e7b4d9d1b14ba837931695a908c068
Hao Lu authored and hwangdeyu committedDec 23, 2020 Configuration menu - View commit details
-
Copy full SHA for f41aa50 - Browse repository at this point
Copy the full SHA f41aa50View commit details -
[PyTorch] Remove direct reference to native symbols in sparse related…
… non-native codes (pytorch#49721) Summary: Pull Request resolved: pytorch#49721 As a refactor effort of per-app selective build, we are decoupling ATen/native from the rest of aten (D25413998). All symbols of ATen/native could only be referenced through dispatcher (pytorch#48684). This diff is to decouple the native reference recently introduced for sparse tensors. ghstack-source-id: 119028080 Test Plan: CI Reviewed By: dhruvbird, ngimel Differential Revision: D25675711 fbshipit-source-id: 381cbb3b361ee41b002055399d4996a9ca21377c
Configuration menu - View commit details
-
Copy full SHA for db3f718 - Browse repository at this point
Copy the full SHA db3f718View commit details -
[Gradient Compression] Warm-start of PowerSGD (pytorch#49451)
Summary: Pull Request resolved: pytorch#49451 Reuse the low-rank tensors P(s) and Q(s) from the previous iteration if possible. This can give a better compression performance in terms of both accuracy and speed. Also add a unit test for batched PowerSGD to test_c10d.py. Original PR issue: Investigate Applying PowerSGD to Communication Hook for Gradient Compression pytorch#47202 ghstack-source-id: 119014132 Test Plan: buck test mode/dev-nosan caffe2/test/distributed:c10d -- test_powerSGD_ddp_comm_hook_nccl buck test mode/dev-nosan caffe2/test/distributed:distributed_nccl_fork -- test_DistributedDataParallel_powerSGD_ddp_comm_hook Reviewed By: rohan-varma Differential Revision: D25583086 fbshipit-source-id: a757df3c4cfcc0ead4647f7de2f43198f1e063ee
Yi Wang authored and hwangdeyu committedDec 23, 2020 Configuration menu - View commit details
-
Copy full SHA for 4cebcbd - Browse repository at this point
Copy the full SHA 4cebcbdView commit details -
NewModuleTest: Don't call both check_jacobian and gradcheck (pytorch#…
…49566) Summary: Pull Request resolved: pytorch#49566 Fixes pytorch#49422. check_jacobian and gradcheck do roughly the same thing: they both compute an analytic jacobian and a numeric jacobian and check that they are equivalent. Furthermore, NewModuleTest will (by default) call both check_jacobian and gradcheck, leading to some redundant checks that waste CI resources. However, there is one subtle difference: `check_jacobian` can handle the special case where a Module takes in dense inputs and dense parameters but returns sparse gradients, but that is not something gradcheck can handle. This is only used in the tests for nn.Embedding and nn.EmbeddingBag. This PR does the following: - have NewModuleTest call gradcheck instead of check_jacobian by default - add a new "has_sparse_gradients" flag to NewModuleTest. These are True for the nn.Embedding and nn.EmbeddingBag sparse gradient tests. If `has_sparse_gradients` is True, then we call check_jacobian, otherwise, we call gradcheck. - Kills the "jacobian_input" flag. This flag was used to tell NewModuleTest to not attempt to compute the jacobian for the inputs to the module. This is only desireable if the input to the module isn't differentiable and was only set in the case of nn.Embedding / nn.EmbeddingBag that take a LongTensor input. `gradcheck` handles these automatically by not checking gradients for non-differentiable inputs. Test Plan: - Code reading - run test_nn.py Reviewed By: albanD Differential Revision: D25622929 Pulled By: zou3519 fbshipit-source-id: 8d831ada98b6a95d63f087ea9bce1b574c996a22
Configuration menu - View commit details
-
Copy full SHA for 10b5558 - Browse repository at this point
Copy the full SHA 10b5558View commit details -
[fix] inplace remainder/% (pytorch#49390)
Summary: Fixes pytorch#49214 **BC-Breaking** Before this PR, `%=` didn't actually do the operation inplace and returned a new tensor. After this PR, `%=` operation is actually inplace and the modified input tensor is returned. Before PR, ```python >>> import torch >>> a = torch.tensor([11,12,13]) >>> id(a) 139627966219328 >>> a %= 10 >>> id(a) 139627966219264 ``` After PR, ```python >>> import torch >>> a = torch.tensor([11,12,13]) >>> id(a) 139804702425280 >>> a %= 10 >>> id(a) 139804702425280 ``` Pull Request resolved: pytorch#49390 Reviewed By: izdeby Differential Revision: D25560423 Pulled By: zou3519 fbshipit-source-id: 2b92bfda260582aa4ac22c4025376295e51f854e
Configuration menu - View commit details
-
Copy full SHA for 25c852b - Browse repository at this point
Copy the full SHA 25c852bView commit details -
Complex backward for torch.sqrt (pytorch#49461)
Summary: Pull Request resolved: pytorch#49461 resolves pytorch#48398 Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D25589454 Pulled By: anjali411 fbshipit-source-id: 46e9f913c8ab3e18c98d6f623b2394044b6fe079
Configuration menu - View commit details
-
Copy full SHA for bc28081 - Browse repository at this point
Copy the full SHA bc28081View commit details -
[ROCm] add 4.0 to nightly builds (pytorch#49632)
Summary: Depends on pytorch/builder#614. Pull Request resolved: pytorch#49632 Reviewed By: ngimel Differential Revision: D25665880 Pulled By: walterddr fbshipit-source-id: b37a55b7e3028648453b422683fa4a72e0ee04a4
Configuration menu - View commit details
-
Copy full SHA for 03214d5 - Browse repository at this point
Copy the full SHA 03214d5View commit details -
Make PyTorch partially cross-compilable for Apple M1 (pytorch#49701)
Summary: Update CPUINFO to include pytorch/cpuinfo#51 Update sleef to include shibatch/sleef#376 Modify aten/src/ATen/native/quantized/cpu/qnnpack/CMakeLists.txt to recognize CMAKE_OSX_ARCHITECTURES Pull Request resolved: pytorch#49701 Test Plan: `cmake -DCMAKE_OSX_ARCHITECTURES=x86_64 -DPYTHON_EXECUTABLE=/usr/bin/python3 -DUSE_XNNPACK=NO -DBUILD_TEST=YES .. -G Ninja; ninja basic` finishes successfully on Apple M1 Reviewed By: janeyx99 Differential Revision: D25669219 Pulled By: malfet fbshipit-source-id: 5ee36b64e3a7ac76448f2a300ac4993375a26de5
Configuration menu - View commit details
-
Copy full SHA for a813673 - Browse repository at this point
Copy the full SHA a813673View commit details -
[onnxifi] Get rid of class member (pytorch#49380)
Summary: Pull Request resolved: pytorch#49380 Couldn't resist removing a class member that is only used in one function. Reviewed By: yinghai Differential Revision: D25547366 fbshipit-source-id: 74e61c6a0068566fb7956380862999163e7e94bf
Configuration menu - View commit details
-
Copy full SHA for 2d2a1f6 - Browse repository at this point
Copy the full SHA 2d2a1f6View commit details -
Reland: Add base forward grad logic (pytorch#49734)
Summary: Pull Request resolved: pytorch#49734 RFC: pytorch/rfcs#11 This PR add the basic logic to handle forward grad as dual Tensors. It contains the following: - Mechanism to save dual state on a Tensor and clear it up when the dual level ends - C++ and python user facing API - Updated view system that is able to track both forward and backward views The current PR has the following limitations: - Extensive tests are in the next PR in the stack as formulas are needed to write full tests. - Only the manual formulas have been audited and no other formula is actually implemented here (they are in the next PR in the stack) - Only level 0 is allowed for now. This was discussed and agreed that it is not needed for the first version of this PR. - We can save one ViewInfo creation when both the forward and backward views have the same base. This can be done by adding a boolean flag to the DifferentiableViewMeta and extra logic in the `as_view` method. This is left out to keep this PR concise. - We can skip tracking forward views if the base has a forward grad. This can be done by adding extra logic in the `as_view` method. This is left out to keep this PR concise. Reading guide: - Updated view handling in [gen_variable_type.py](https://github.com/pytorch/pytorch/pull/49097/files#diff-f6553cec68caeaea36f6c8b14ff76a6d39dfd774e0ea9ef2f76e8d81fd9af5df), [VariableTypeUtils.h](https://github.com/pytorch/pytorch/pull/49097/files#diff-ec71cfa45954dece1236c661d170e6341879c5be637f4abf52e826d61b40695a), [variable.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-60e3bfe444e89efc7149f25b38e472710525984789934ab83f1bd5671b8ff285) (skip code below "[Forward Grad View]" for now), [variable.h](https://github.com/pytorch/pytorch/pull/49097/files#diff-1604bcd0e4350ed99ec45e437cee7ac9ebe337392c9ea16a236247aeeb35b02bR266-R542) and [custom_function.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-dd85f452082b5bb6612bbc12adb496f8827defa228509f7b493de1d517522d5d). This introduces the new ViewInfo to hold view informations shared for forward and backward. It also updates the differentiable view meta to use this. And it updates the as_view function to handle both forward and backward view. - New forward grad class that handle storing gradients and tracking at each level [forward_grad.h](https://github.com/pytorch/pytorch/pull/49097/files#diff-c6c5b9ab2d7e5dde4102495faa1b6bbbfc23aa3e47deb7359c0bfe1eb004c0cb), [forward_grad.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-de2ab54ade7312701850d71a119a4f4ee4b9fc5a9c42a467cdd4e73c033531dd) and [build_variables.bzl](https://github.com/pytorch/pytorch/pull/49097/files#diff-dfdfa2efb17beddfd9094524f95351fd197db6c8857e96b436fb599870359325). EDIT: These files also contain the new flag to globally disable forward AD that allows us to reduce performance issues while this is in development. - Lowest level API and binding between Tensor and AutogradMeta in [TensorBody.h](https://github.com/pytorch/pytorch/pull/49097/files#diff-7554853205392fa743357bf845ecc350a974ec049383248c12daaf2f4de04911), [TensorImpl.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-052bd9150ef8e09289ddf644b5a6830ede49207201cd41728f6d7cc6d9cead94), [TensorImpl.h](https://github.com/pytorch/pytorch/pull/49097/files#diff-a15aae4cf23da44970db7cece62ff981265575c798c62f7b52d87c8809dfe2e1) and the rest of [variable.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-60e3bfe444e89efc7149f25b38e472710525984789934ab83f1bd5671b8ff285R557-R677) - API to access the forward primal that needs to be a differentiable function (and so in native_functions.yaml) [native_functions.yaml](https://github.com/pytorch/pytorch/pull/49097/files#diff-2f3dbd85efb9b5172f2264eedd3be47dd765e6ab7cc8bf3ade5e62c28ae35991) [NamedRegistrations.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-69bd3bea510c9b64e1633fa18c3ea63d4b8348dbad3a78ad9de844ab3e43dc1d), [VariableMethodsStub.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-23f5fcb737a2b289811fe0f4b65aef775e7c824b2e629ecd343df51405cd434f), [derivatives.yaml](https://github.com/pytorch/pytorch/pull/49097/files#diff-e4c2f99a2404e98c3586e07425da73008f36b1bada790648a7297af141d37f8c), [gen_python_functions.py](https://github.com/pytorch/pytorch/pull/49097/files#diff-e4c2f99a2404e98c3586e07425da73008f36b1bada790648a7297af141d37f8c), [gen_trace_type.py](https://github.com/pytorch/pytorch/pull/49097/files#diff-54e0b976027bf8debefb959ff360b89ae93466970c843365b1b3a03806d868ce), [TraceTypeManual.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-f34636741ad4a23d018e0c289bc750c3bad887b45660e1d6eaf440d234a78fbf) and [part of VariableTypeManual.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-6e19a1bce8cbdba8714b6e2c794a76bc0864b64a49cfa757cb0b5afdc937d1a4R198-R243) - c++ API [autograd.h](https://github.com/pytorch/pytorch/pull/49097/files#diff-349028fbe8291a965a7a263c323b208fe071c35c66179ee997ef84fa81aa4b1e), [autograd.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-a3fe908d67dfec16a1fcde300de68b0701bf68b88db7451f29f2bee255cf30c9) - python binding [init.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-c58a67c85191c22c9b3bb439117d8053edfd9dea839fa010cf967d404c3c630d) - python API [forward_ad.py](https://github.com/pytorch/pytorch/pull/49097/files#diff-a4efad4ba18fffdfb264c21e5475997a24a743089a899f8ec1a5ff962c6738d9), [autograd/__init__.py](https://github.com/pytorch/pytorch/pull/49097/files#diff-743abcafd32ad0e69f39ac5a91df4197b7e1921c135cacee7ef6dc829a8a7af8) - c++ and python printing [Formatting.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-881dba501e71662e2e4818b4b016f739b344c8aed2f5edc6b871eda47a2aced0), [_tensor_str.py](https://github.com/pytorch/pytorch/pull/49097/files#diff-a7911f8d5e73adbff914d99fd7818ace2a7030b6a3748abe06ec6fc6e3df9cc3) - Utility for formulas and updated manual functions to respect new view system as well as forward grad [FunctionsManual.h](https://github.com/pytorch/pytorch/pull/49097/files#diff-6378bb6dc81a64dab676d61731341fa5d1088418f32a1473a33a0ccfc2357dc1), [FunctionsManual.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-4adbd88239afcd60e8198aab65d4f5e43b62314e34b80551e997a1ea503adea5) [rest of VariableTypeManual.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-6e19a1bce8cbdba8714b6e2c794a76bc0864b64a49cfa757cb0b5afdc937d1a4R264-R433) - Ensure SavedVariable save forward grad properly [saved_variable.h](https://github.com/pytorch/pytorch/pull/49097/files#diff-c1b8039d776241abe177d5aa99b79dd9489a9b3e529da8ab24c2e386c1238ae2), [saved_variable.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-cc9fba479b5beae06b2eea2e390d17796e0341c5b037a20b5bcaccbb0c341030) Test Plan: Imported from OSS Reviewed By: gchanan Differential Revision: D25678797 Pulled By: albanD fbshipit-source-id: 3d58550c11b5f58b9b73fd30596d042b857fb9dd
Configuration menu - View commit details
-
Copy full SHA for e241e1a - Browse repository at this point
Copy the full SHA e241e1aView commit details -
Fix get_overlap_status for tensors without storage (pytorch#49638)
Summary: Pull Request resolved: pytorch#49638 Reviewed By: ngimel Differential Revision: D25681908 Pulled By: asuhan fbshipit-source-id: 2ea8623614f2f0027f6437cf2819ba1657464f54
Configuration menu - View commit details
-
Copy full SHA for 1c39e42 - Browse repository at this point
Copy the full SHA 1c39e42View commit details -
Minor doc fix: change truncating to rounding in TF32 docs (pytorch#49625
) Summary: Minor doc fix in clarifying that the input data is rounded not truncated. CC zasdfgbnm ngimel Pull Request resolved: pytorch#49625 Reviewed By: mruberry Differential Revision: D25668244 Pulled By: ngimel fbshipit-source-id: ac97e41e0ca296276544f9e9f85b2cf1790d9985
pbialecki authored and hwangdeyu committedDec 23, 2020 Configuration menu - View commit details
-
Copy full SHA for 4406379 - Browse repository at this point
Copy the full SHA 4406379View commit details -
remove unused THCBlas (pytorch#49725)
Summary: removes unused THCBlas, call `at::cuda::blas::gemm` directly where needed. Pull Request resolved: pytorch#49725 Reviewed By: mruberry Differential Revision: D25680831 Pulled By: ngimel fbshipit-source-id: d826f3f558b156f45f2a4864daf3f6d086bda78c
Configuration menu - View commit details
-
Copy full SHA for 40e15e5 - Browse repository at this point
Copy the full SHA 40e15e5View commit details -
only upload s3 stats on master, nightly, and release branch (pytorch#…
…49645) Summary: Pull Request resolved: pytorch#49645 Reviewed By: malfet Differential Revision: D25665851 Pulled By: walterddr fbshipit-source-id: 1cf50f6e3657f70776aaf3c5d3823c8a586bf22d
Rong Rong (AI Infra) authored and hwangdeyu committedDec 23, 2020 Configuration menu - View commit details
-
Copy full SHA for 5e176cb - Browse repository at this point
Copy the full SHA 5e176cbView commit details -
Configuration menu - View commit details
-
Copy full SHA for 73985d9 - Browse repository at this point
Copy the full SHA 73985d9View commit details -
Merge branch 'onnx_ms_1' of github.com:hwangdeyu/pytorch into onnx_ms_1
hwangdeyu committedDec 23, 2020 Configuration menu - View commit details
-
Copy full SHA for 2bfe745 - Browse repository at this point
Copy the full SHA 2bfe745View commit details
Commits on Jan 4, 2021
-
Configuration menu - View commit details
-
Copy full SHA for 525ac26 - Browse repository at this point
Copy the full SHA 525ac26View commit details -
Configuration menu - View commit details
-
Copy full SHA for 9259b03 - Browse repository at this point
Copy the full SHA 9259b03View commit details -
[ONNX] Add checks in ONNXSetDynamicInputShape (pytorch#49783)
* [ONNX] Add checks in ONNXSetDynamicInputShape * [ONNX] Add checks in ONNXSetDynamicInputShape
Configuration menu - View commit details
-
Copy full SHA for 1baebbb - Browse repository at this point
Copy the full SHA 1baebbbView commit details
Commits on Jan 5, 2021
-
[ONNX] Enable export af aten::__derive_index (pytorch#49514)
* Add derive_index * Add derive_index test * Adding more tests * Update symbolic_opset9.py
Configuration menu - View commit details
-
Copy full SHA for 4898616 - Browse repository at this point
Copy the full SHA 4898616View commit details -
[ONNX] Update symbolic for unfold (pytorch#49378)
* update symbolic for unfold * update symbolic_opse12 file * update symbolic_opse12 file * [ONNX] Support onnx if/loop sequence output in opset 13 - (pytorch#49270) * Symbolic function for torch.square (pytorch#49446) * instead of a pass use a helper function * update ort version * Revert "instead of a pass use a helper function" This reverts commit 723b446. * update symbolics * update symbolic * update symbolics * [ONNX] Support onnx if/loop sequence output in opset 13 - (pytorch#49270) * Symbolic function for torch.square (pytorch#49446) * empty commit * fix clang-tidy * fix clang-tidy Co-authored-by: Bowen Bao <bowbao@microsoft.com> Co-authored-by: David Fan <30608893+jiafatom@users.noreply.github.com>
Configuration menu - View commit details
-
Copy full SHA for eef5191 - Browse repository at this point
Copy the full SHA eef5191View commit details -
[ONNX] Update the sequence of initializers in exported graph so that …
…it is as same as inputs. (pytorch#49798) * [ONNX] Support onnx if/loop sequence output in opset 13 - (pytorch#49270) * Symbolic function for torch.square (pytorch#49446) * [ONNX] Support onnx if/loop sequence output in opset 13 - (pytorch#49270) * Symbolic function for torch.square (pytorch#49446) * Update code so that initializers' sequence is as same as inputs. * Correct the format according to flake8. * Correct the format by clang-format. * Add a new test for script model. * Update expect files for Test_Operators tests. Co-authored-by: Bowen Bao <bowbao@microsoft.com> Co-authored-by: David Fan <30608893+jiafatom@users.noreply.github.com>
Configuration menu - View commit details
-
Copy full SHA for 97a8af1 - Browse repository at this point
Copy the full SHA 97a8af1View commit details
Commits on Jan 6, 2021
-
[ONNX] Enable opset 13 ops (pytorch#49612)
* Enable opset 13 ORT tests * Update test.sh * Set environ var * Update test.sh * Enabling more ops for opset 13 * change master to main * Update symbolic_opset13.py * Flake 8 fix * [ONNX] Support onnx if/loop sequence output in opset 13 - (pytorch#49270) * Symbolic function for torch.square (pytorch#49446) * Clean up tests * Exclude more tests * Trigge build * [ONNX] Support onnx if/loop sequence output in opset 13 - (pytorch#49270) * Symbolic function for torch.square (pytorch#49446) * update ORT version * disable more tests * clean up * flake8 * Disable TV tests * Update test_pytorch_onnx_onnxruntime.py Co-authored-by: Bowen Bao <bowbao@microsoft.com> Co-authored-by: David Fan <30608893+jiafatom@users.noreply.github.com>
Configuration menu - View commit details
-
Copy full SHA for 616da7c - Browse repository at this point
Copy the full SHA 616da7cView commit details -
Merge branch 'onnx_ms_1' of https://github.com/pytorch/pytorch into p…
…ytorch-onnx_ms_1
hwangdeyu committedJan 6, 2021 Configuration menu - View commit details
-
Copy full SHA for b3ae16c - Browse repository at this point
Copy the full SHA b3ae16cView commit details -
t push origin onnx_ms_1:Merge branch 'pytorch-onnx_ms_1' into onnx_ms_1
hwangdeyu committedJan 6, 2021 Configuration menu - View commit details
-
Copy full SHA for 2c69cf3 - Browse repository at this point
Copy the full SHA 2c69cf3View commit details -
Reland: Add base forward grad logic (pytorch#49734)
Summary: Pull Request resolved: pytorch#49734 RFC: pytorch/rfcs#11 This PR add the basic logic to handle forward grad as dual Tensors. It contains the following: - Mechanism to save dual state on a Tensor and clear it up when the dual level ends - C++ and python user facing API - Updated view system that is able to track both forward and backward views The current PR has the following limitations: - Extensive tests are in the next PR in the stack as formulas are needed to write full tests. - Only the manual formulas have been audited and no other formula is actually implemented here (they are in the next PR in the stack) - Only level 0 is allowed for now. This was discussed and agreed that it is not needed for the first version of this PR. - We can save one ViewInfo creation when both the forward and backward views have the same base. This can be done by adding a boolean flag to the DifferentiableViewMeta and extra logic in the `as_view` method. This is left out to keep this PR concise. - We can skip tracking forward views if the base has a forward grad. This can be done by adding extra logic in the `as_view` method. This is left out to keep this PR concise. Reading guide: - Updated view handling in [gen_variable_type.py](https://github.com/pytorch/pytorch/pull/49097/files#diff-f6553cec68caeaea36f6c8b14ff76a6d39dfd774e0ea9ef2f76e8d81fd9af5df), [VariableTypeUtils.h](https://github.com/pytorch/pytorch/pull/49097/files#diff-ec71cfa45954dece1236c661d170e6341879c5be637f4abf52e826d61b40695a), [variable.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-60e3bfe444e89efc7149f25b38e472710525984789934ab83f1bd5671b8ff285) (skip code below "[Forward Grad View]" for now), [variable.h](https://github.com/pytorch/pytorch/pull/49097/files#diff-1604bcd0e4350ed99ec45e437cee7ac9ebe337392c9ea16a236247aeeb35b02bR266-R542) and [custom_function.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-dd85f452082b5bb6612bbc12adb496f8827defa228509f7b493de1d517522d5d). This introduces the new ViewInfo to hold view informations shared for forward and backward. It also updates the differentiable view meta to use this. And it updates the as_view function to handle both forward and backward view. - New forward grad class that handle storing gradients and tracking at each level [forward_grad.h](https://github.com/pytorch/pytorch/pull/49097/files#diff-c6c5b9ab2d7e5dde4102495faa1b6bbbfc23aa3e47deb7359c0bfe1eb004c0cb), [forward_grad.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-de2ab54ade7312701850d71a119a4f4ee4b9fc5a9c42a467cdd4e73c033531dd) and [build_variables.bzl](https://github.com/pytorch/pytorch/pull/49097/files#diff-dfdfa2efb17beddfd9094524f95351fd197db6c8857e96b436fb599870359325). EDIT: These files also contain the new flag to globally disable forward AD that allows us to reduce performance issues while this is in development. - Lowest level API and binding between Tensor and AutogradMeta in [TensorBody.h](https://github.com/pytorch/pytorch/pull/49097/files#diff-7554853205392fa743357bf845ecc350a974ec049383248c12daaf2f4de04911), [TensorImpl.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-052bd9150ef8e09289ddf644b5a6830ede49207201cd41728f6d7cc6d9cead94), [TensorImpl.h](https://github.com/pytorch/pytorch/pull/49097/files#diff-a15aae4cf23da44970db7cece62ff981265575c798c62f7b52d87c8809dfe2e1) and the rest of [variable.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-60e3bfe444e89efc7149f25b38e472710525984789934ab83f1bd5671b8ff285R557-R677) - API to access the forward primal that needs to be a differentiable function (and so in native_functions.yaml) [native_functions.yaml](https://github.com/pytorch/pytorch/pull/49097/files#diff-2f3dbd85efb9b5172f2264eedd3be47dd765e6ab7cc8bf3ade5e62c28ae35991) [NamedRegistrations.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-69bd3bea510c9b64e1633fa18c3ea63d4b8348dbad3a78ad9de844ab3e43dc1d), [VariableMethodsStub.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-23f5fcb737a2b289811fe0f4b65aef775e7c824b2e629ecd343df51405cd434f), [derivatives.yaml](https://github.com/pytorch/pytorch/pull/49097/files#diff-e4c2f99a2404e98c3586e07425da73008f36b1bada790648a7297af141d37f8c), [gen_python_functions.py](https://github.com/pytorch/pytorch/pull/49097/files#diff-e4c2f99a2404e98c3586e07425da73008f36b1bada790648a7297af141d37f8c), [gen_trace_type.py](https://github.com/pytorch/pytorch/pull/49097/files#diff-54e0b976027bf8debefb959ff360b89ae93466970c843365b1b3a03806d868ce), [TraceTypeManual.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-f34636741ad4a23d018e0c289bc750c3bad887b45660e1d6eaf440d234a78fbf) and [part of VariableTypeManual.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-6e19a1bce8cbdba8714b6e2c794a76bc0864b64a49cfa757cb0b5afdc937d1a4R198-R243) - c++ API [autograd.h](https://github.com/pytorch/pytorch/pull/49097/files#diff-349028fbe8291a965a7a263c323b208fe071c35c66179ee997ef84fa81aa4b1e), [autograd.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-a3fe908d67dfec16a1fcde300de68b0701bf68b88db7451f29f2bee255cf30c9) - python binding [init.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-c58a67c85191c22c9b3bb439117d8053edfd9dea839fa010cf967d404c3c630d) - python API [forward_ad.py](https://github.com/pytorch/pytorch/pull/49097/files#diff-a4efad4ba18fffdfb264c21e5475997a24a743089a899f8ec1a5ff962c6738d9), [autograd/__init__.py](https://github.com/pytorch/pytorch/pull/49097/files#diff-743abcafd32ad0e69f39ac5a91df4197b7e1921c135cacee7ef6dc829a8a7af8) - c++ and python printing [Formatting.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-881dba501e71662e2e4818b4b016f739b344c8aed2f5edc6b871eda47a2aced0), [_tensor_str.py](https://github.com/pytorch/pytorch/pull/49097/files#diff-a7911f8d5e73adbff914d99fd7818ace2a7030b6a3748abe06ec6fc6e3df9cc3) - Utility for formulas and updated manual functions to respect new view system as well as forward grad [FunctionsManual.h](https://github.com/pytorch/pytorch/pull/49097/files#diff-6378bb6dc81a64dab676d61731341fa5d1088418f32a1473a33a0ccfc2357dc1), [FunctionsManual.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-4adbd88239afcd60e8198aab65d4f5e43b62314e34b80551e997a1ea503adea5) [rest of VariableTypeManual.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-6e19a1bce8cbdba8714b6e2c794a76bc0864b64a49cfa757cb0b5afdc937d1a4R264-R433) - Ensure SavedVariable save forward grad properly [saved_variable.h](https://github.com/pytorch/pytorch/pull/49097/files#diff-c1b8039d776241abe177d5aa99b79dd9489a9b3e529da8ab24c2e386c1238ae2), [saved_variable.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-cc9fba479b5beae06b2eea2e390d17796e0341c5b037a20b5bcaccbb0c341030) Test Plan: Imported from OSS Reviewed By: gchanan Differential Revision: D25678797 Pulled By: albanD fbshipit-source-id: 3d58550c11b5f58b9b73fd30596d042b857fb9dd
Configuration menu - View commit details
-
Copy full SHA for c92808f - Browse repository at this point
Copy the full SHA c92808fView commit details -
add binary_cross_entropy_with_logits op to ONNX opset version 12
hwangdeyu committedJan 6, 2021 Configuration menu - View commit details
-
Copy full SHA for d144220 - Browse repository at this point
Copy the full SHA d144220View commit details -
hwangdeyu committed
Jan 6, 2021 Configuration menu - View commit details
-
Copy full SHA for e6dd64a - Browse repository at this point
Copy the full SHA e6dd64aView commit details -
hwangdeyu committed
Jan 6, 2021 Configuration menu - View commit details
-
Copy full SHA for 0992510 - Browse repository at this point
Copy the full SHA 0992510View commit details -
fix comments:fix reduction message, delete duplicate test
hwangdeyu committedJan 6, 2021 Configuration menu - View commit details
-
Copy full SHA for cdc08ce - Browse repository at this point
Copy the full SHA cdc08ceView commit details
Commits on Jan 14, 2021
-
Merge remote-tracking branch 'origin1/onnx_ms_1' into deyu/bce_with_l…
…ogits_sy12
hwangdeyu committedJan 14, 2021 Configuration menu - View commit details
-
Copy full SHA for 0e09ee9 - Browse repository at this point
Copy the full SHA 0e09ee9View commit details
Commits on Jan 15, 2021
-
Merge remote-tracking branch 'origin1/onnx_ms_1' into deyu/bce_with_l…
…ogits_sy12
hwangdeyu committedJan 15, 2021 Configuration menu - View commit details
-
Copy full SHA for d2ebe7e - Browse repository at this point
Copy the full SHA d2ebe7eView commit details
Commits on Jan 18, 2021
-
replace mustBeNone() to symblic_help fuction _is_none()
hwangdeyu committedJan 18, 2021 Configuration menu - View commit details
-
Copy full SHA for 5275cc5 - Browse repository at this point
Copy the full SHA 5275cc5View commit details