Don't support kwargs at runtime in aot_module_simplified #89664

ezyang · 2022-11-24T23:46:00Z

Stack from ghstack (oldest at bottom):

The preexisting logic here added in
pytorch/functorch#970 was very peculiar: if top_kwargs
was non-empty, then the inner compiled function supports kwargs. Naively, this
would leave you to expect that there is some sort of correlation between
top_kwargs and kwargs. But in fact, they're completely unrelated! top_kwargs
is the AOTAutograd configuration knobs (e.g., fw_compiler/bw_compiler), but
kwargs is the RUNTIME kwargs that are to be passed to the compiled function.
But (1) we don't support this (the function to be compiled only takes a list
of tensors) and (2) even if we did support it, conditioning on whether or not
you had passed AOTAutograd configuration kwargs to support kwargs at runtime
is bonkers.

So delete it.

Signed-off-by: Edward Z. Yang ezyang@fb.com

The preexisting logic here added in pytorch/functorch#970 was very peculiar: if top_kwargs was non-empty, then the inner compiled function supports kwargs. Naively, this would leave you to expect that there is some sort of correlation between top_kwargs and kwargs. But in fact, they're completely unrelated! top_kwargs is the AOTAutograd configuration knobs (e.g., fw_compiler/bw_compiler), but kwargs is the RUNTIME kwargs that are to be passed to the compiled function. But (1) we don't support this (the function to be compiled only takes a list of tensors) and (2) even if we did support it, conditioning on whether or not you had passed AOTAutograd configuration kwargs to support kwargs at runtime is bonkers. So delete it. Signed-off-by: Edward Z. Yang <ezyang@fb.com> [ghstack-poisoned]

pytorch-bot · 2022-11-24T23:46:02Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/89664

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit our office hours

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 Failures

As of commit 58f1e5f:

The following jobs have failed:

cuda11.6-py3.10-gcc7-sm86 / test (inductor_timm, 1, 2, linux.g5.4xlarge.nvidia.gpu)

This comment was automatically generated by Dr. CI and updates every 15 minutes.

The preexisting logic here added in pytorch/functorch#970 was very peculiar: if top_kwargs was non-empty, then the inner compiled function supports kwargs. Naively, this would leave you to expect that there is some sort of correlation between top_kwargs and kwargs. But in fact, they're completely unrelated! top_kwargs is the AOTAutograd configuration knobs (e.g., fw_compiler/bw_compiler), but kwargs is the RUNTIME kwargs that are to be passed to the compiled function. But (1) we don't support this (the function to be compiled only takes a list of tensors) and (2) even if we did support it, conditioning on whether or not you had passed AOTAutograd configuration kwargs to support kwargs at runtime is bonkers. So delete it. Signed-off-by: Edward Z. Yang <ezyangfb.com> ghstack-source-id: bb1423e0795b52ef8f907af58058a964949c26cd Pull Request resolved: #89664

The preexisting logic here added in pytorch/functorch#970 was very peculiar: if top_kwargs was non-empty, then the inner compiled function supports kwargs. Naively, this would leave you to expect that there is some sort of correlation between top_kwargs and kwargs. But in fact, they're completely unrelated! top_kwargs is the AOTAutograd configuration knobs (e.g., fw_compiler/bw_compiler), but kwargs is the RUNTIME kwargs that are to be passed to the compiled function. But (1) we don't support this (the function to be compiled only takes a list of tensors) and (2) even if we did support it, conditioning on whether or not you had passed AOTAutograd configuration kwargs to support kwargs at runtime is bonkers. So delete it. Signed-off-by: Edward Z. Yang <ezyangfb.com> [ghstack-poisoned]

The preexisting logic here added in pytorch/functorch#970 was very peculiar: if top_kwargs was non-empty, then the inner compiled function supports kwargs. Naively, this would leave you to expect that there is some sort of correlation between top_kwargs and kwargs. But in fact, they're completely unrelated! top_kwargs is the AOTAutograd configuration knobs (e.g., fw_compiler/bw_compiler), but kwargs is the RUNTIME kwargs that are to be passed to the compiled function. But (1) we don't support this (the function to be compiled only takes a list of tensors) and (2) even if we did support it, conditioning on whether or not you had passed AOTAutograd configuration kwargs to support kwargs at runtime is bonkers. So delete it. Signed-off-by: Edward Z. Yang <ezyangfb.com> ghstack-source-id: c691d0de2c471c4503b4048c995a86c93f8b6101 Pull Request resolved: #89664

The preexisting logic here added in pytorch/functorch#970 was very peculiar: if top_kwargs was non-empty, then the inner compiled function supports kwargs. Naively, this would leave you to expect that there is some sort of correlation between top_kwargs and kwargs. But in fact, they're completely unrelated! top_kwargs is the AOTAutograd configuration knobs (e.g., fw_compiler/bw_compiler), but kwargs is the RUNTIME kwargs that are to be passed to the compiled function. But (1) we don't support this (the function to be compiled only takes a list of tensors) and (2) even if we did support it, conditioning on whether or not you had passed AOTAutograd configuration kwargs to support kwargs at runtime is bonkers. So delete it. Signed-off-by: Edward Z. Yang <ezyangfb.com> [ghstack-poisoned]

commit 63ebc8d6a000199e963d29b6c8a0f54d3150872b Author: Jakub Pietrak <jakub.pietrak@intel.com> Date: Thu Dec 1 13:32:03 2022 +0100 rm print commit 2c8ffeaf1b2168ed9ad4ca6b192a1231fb036760 Author: Jakub Pietrak <jakub.pietrak@intel.com> Date: Thu Dec 1 11:35:02 2022 +0100 pytorch_sparse.matmul to torch.sparse.matmul commit ee0e184a1ce5dc6ad7005a67621fac19d6fdbb0b Merge: 4562359b9f 3a858ba8e3 Author: Jakub Pietrak <jakub.pietrak@intel.com> Date: Mon Nov 28 14:09:42 2022 +0100 Merge branch 'gh/mingfeima/85/head' of https://github.com/pytorch/pytorch into pyg-36 commit 4562359b9fb3de301690334a892d44911eda45c8 Merge: deba083400 b5616cd5f4 Author: Jakub Pietrak <jakub.pietrak@intel.com> Date: Mon Nov 28 12:22:11 2022 +0000 Merge branch 'master' of https://github.com/pytorch/pytorch into pyg-36 commit deba0834008ad95af7e3a6603223a0f8a5555967 Merge: 0e1a8522bb a97d0508cb Author: Jakub Pietrak <jakub.pietrak@intel.com> Date: Mon Nov 28 12:19:25 2022 +0000 Merge branch 'pyg-36' of https://github.com/JakubPietrakIntel/pytorch into pyg-36 commit 0e1a8522bb695387816a29bbfcf182962429b3ab Merge: 059a238619 75bfbc35ca Author: Jakub Pietrak <jakub.pietrak@intel.com> Date: Mon Nov 28 12:16:35 2022 +0000 Merge remote-tracking branch 'origin/gh/mingfeima/85/head' into pyg-36 commit b5616cd5f4fc150138b79d3396a603eda6a7a8a8 Author: Michael Voznesensky <voznesenskym@gmail.com> Date: Mon Nov 28 05:12:37 2022 +0000 Add simple assert to detect fake tensors on modules (#89723) Pull Request resolved: https://github.com/pytorch/pytorch/pull/89723 Approved by: https://github.com/ezyang commit db1f1144f1303db45e0b9d96e4bb6bdd87c80e5a Author: Edward Z. Yang <ezyang@fb.com> Date: Sat Nov 26 13:52:28 2022 -0800 Beef up AOTAutograd logging with aot_id and input descriptions (#89710) A few things in this PR, that I found useful while debugging some recent issues: - We now allocate an aot_id to each aot_function/aot_module invocation, and print it whenever we report error messages and graph output logging. Check the comment for why this sort of thing is useful, and also why it's different from nth_graph. This number is now incorporated into aot_graph_name - I noticed that nth_graph only gets incremented when backwards is compiled. Because backwards is compiled lazily, this means that multiple forward graphs would have gotten the same ID! I change nth_graph to always increment to avoid confusion here. - I added a simple describe_input function, which makes use of num_params_buffers to tell the user if the input index they're looking at is a param/buffer or an input. With the help of https://github.com/pytorch/pytorch/pull/89709 we could give even more detailed information about inputs (we could also easily give detailed information about parameters if we stored a mapping of index to parameter name, but I didn't need this when debugging so I'll let someone else add it if they need it.) Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/89710 Approved by: https://github.com/bdhirsh commit 5f8848f32901e35cead64d520885f718679c2bbe Author: Edward Z. Yang <ezyang@fb.com> Date: Thu Nov 24 15:26:55 2022 -0500 Don't suppress log messages for dynamo CI config (#89653) Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/89653 Approved by: https://github.com/albanD, https://github.com/kit1980 commit 1a2dd6b15e0089a9e45ba4feb90c2d0dfac19238 Author: Edward Z. Yang <ezyang@fb.com> Date: Sun Nov 27 19:27:45 2022 -0500 Add single process version of dynamo distributed hf_Bert tests (#89721) It's a lot easier to debug problems in the Dynamo optimization pass if you aren't actually triggering a multiprocessing run. Keep these tests around. I think the other tests can probably get this treatment too, leaving this to future work. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/89721 Approved by: https://github.com/voznesenskym commit 0e7c100c9b7417efb1a8f65778a1e3c9ad10ef3e Author: Edward Z. Yang <ezyang@fb.com> Date: Sat Nov 26 11:25:24 2022 -0800 Add debug asserts to AOTAutograd for input consistency with compilation (#89702) Fixes https://github.com/pytorch/torchdynamo/issues/1927 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/89702 Approved by: https://github.com/bdhirsh commit 1f95f24d3003a35568a00b5e5e18439846089b0f Author: Edward Z. Yang <ezyang@fb.com> Date: Sat Nov 26 11:25:24 2022 -0800 Factor input deduplication into a separate function (#89701) It turns out that instead of having a giant blobby aot_dispatch_autograd function, we can factor it into a series of wrapper functions, each of which successively guarantees more invariants on the inner compilation function until the final inner function is quite trivial. How exactly you have to wrap the input user functions and the output compiled functions can be expressed concisely in Haskell, so I've included the Haskell formulation in code comments. This PR shows how to do this for input deduplication. Dealing with the rest of the view handling is left to future work. This PR should also be a slight performance improvement as deduplicating is skipped entirely when there are no duplicate inputs. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/89701 Approved by: https://github.com/bdhirsh commit dcefc8f90fbc86041a7abcce4f227d15c59bd96c Author: Edward Z. Yang <ezyang@fb.com> Date: Sat Nov 26 14:28:56 2022 -0500 Implement guard_source on RandomValueSource (#89711) I audited the pattern matches on the enum and it didn't look like this one should apply there. Sorry, no test, I know this matters on symbolic-shapes branch but I haven't had time to extract out a minimal reproducer. Take my word for it. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/89711 Approved by: https://github.com/jansel commit 1da633f98a5da000083c0c47d9e192b2689f867b Author: Edward Z. Yang <ezyang@fb.com> Date: Thu Nov 24 13:57:17 2022 +0000 Access named parameters/buffers/etc via getattr rather than index (#89625) I'm not sure why this never caused problems before. The error manifests as `TypeError: 'MyModule' object is not subscriptable` Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/89625 Approved by: https://github.com/albanD commit e36d68af8885f27d8c0b4727ab078bf53e55e7a0 Author: Horace He <chilli@fb.com> Date: Thu Nov 24 02:17:37 2022 +0000 Don't allow recomputing a node that *must* be materialized in the backwards pass (#89171) Pull Request resolved: https://github.com/pytorch/pytorch/pull/89171 Approved by: https://github.com/ngimel commit b709078dc673cbd5025a1df3eae7f5c60acc2698 Author: Taylor Robie <taylorrobie@fb.com> Date: Sat Nov 26 10:33:21 2022 -0800 [Profiler] Memory profiler part 11: Mark tensors created in the backward pass which don't correspond to parameters. (#88926) There are various Tensors created in the backward pass which do not correspond to parameters. We don't want to mark these as gradients, but we do still want to convey as much information as possible. Thus, this PR introduces an AUTOGRAD_DETAIL category. (Which can be grouped with GRADIENT in visualization if one wishes to take a coarse grained view of the world.) Differential Revision: [D40868661](https://our.internmc.facebook.com/intern/diff/D40868661/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/88926 Approved by: https://github.com/chaekit commit 143d2881a844934c95c4ada63b38179d97e65af3 Author: Taylor Robie <taylorrobie@fb.com> Date: Sat Nov 26 10:33:19 2022 -0800 [Profiler] Memory profiler part 10: Mark optimizer state (#88925) This is also a fairly simple pass, since we're simply collecting values from the python tracer. Differential Revision: [D40868664](https://our.internmc.facebook.com/intern/diff/D40868664/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/88925 Approved by: https://github.com/chaekit commit ae725d501e33ed6f823997bea03d99cdc8dae5ff Author: Taylor Robie <taylorrobie@fb.com> Date: Sat Nov 26 10:33:18 2022 -0800 [Profiler] Memory profiler part 9: Mark activations (#88924) This is a fairly straightforward pass: start at inputs and flood fill until we reach the backward pass. Differential Revision: [D40868662](https://our.internmc.facebook.com/intern/diff/D40868662/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/88924 Approved by: https://github.com/chaekit commit 56e40fe054ecb7700142ea9ae7fe37e77800a2da Author: Yuxin Wu <ppwwyyxx@users.noreply.github.com> Date: Sun Nov 27 05:55:24 2022 +0000 Let SyncBatchNorm fallback to BN if not using distributed training (#89706) Fixes #63662 Pull Request resolved: https://github.com/pytorch/pytorch/pull/89706 Approved by: https://github.com/soumith commit 39449ea61d9a6644731687219282f610cbf7cf54 Author: PyTorch MergeBot <pytorchmergebot@users.noreply.github.com> Date: Sun Nov 27 02:59:04 2022 +0000 [vision hash update] update the pinned vision hash (#89692) This PR is auto-generated nightly by [this action](https://github.com/pytorch/pytorch/blob/master/.github/workflows/_update-commit-hash.yml). Update the pinned vision hash. Pull Request resolved: https://github.com/pytorch/pytorch/pull/89692 Approved by: https://github.com/pytorchbot commit 483d3a3d07e6694757c5158bc21f7f757f8c82c3 Author: Taylor Robie <taylorrobie@fb.com> Date: Sat Nov 26 10:33:16 2022 -0800 [Profiler] E2E expecttests for category assignment (#88653) Up until now the unit tests for category assignment have been narrowly scoped to specific checks on specific Tensors. However as we start to reach reasonable levels of category assignment it's useful to supplement those tests with higher level summary tests to inspect the larger graph and confirm that it makes sense. (It will also be necessary for some categories like activations where it is tedious to record all relevant Tensors.) The general structure of these tests is to capture a model invocation with `__torch_dispatch__` and then cross reference those inputs and outputs with the categories assigned by the memory profiler. Differential Revision: [D40868659](https://our.internmc.facebook.com/intern/diff/D40868659/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/88653 Approved by: https://github.com/chaekit commit 0435894bb3b2d60e5da9f993c2a56d95fb03a971 Author: Taylor Robie <taylorrobie@fb.com> Date: Sat Nov 26 10:33:14 2022 -0800 [Profiler] Memory profiler part 8: Mark parameters. (#87568) Following the pattern of earlier PRs, we use two methods to extract parameters. The primary one is the Python tracer; both nn.Module and optim.Optimizer collect parameters and in most cases that is sufficient. As a fallback we can analyze the data flow graph and deduce likely parameters based on gradient computation and updates. Parameter identification has a circular interaction with input identification. Inputs are defined as "not part of the core forward-backward-update loop", but we need inputs for the parameter identification fallback to give us a proxy for the forward pass. Thus, we mark parameters from the python tracer which limits which Tensors get marked as inputs. While not necessary, it adds a bit of robustness. (As shown by the strengthening of the input unit tests.) Differential Revision: [D40238619](https://our.internmc.facebook.com/intern/diff/D40238619/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/87568 Approved by: https://github.com/chaekit commit 17fa6bf1f57cbbe84a14566efcf00f21e1abe489 Author: Taylor Robie <taylorrobie@fb.com> Date: Sat Nov 26 10:33:13 2022 -0800 [Profiler] Memory profiler part 7: Mark inputs (#87567) It is surprisingly difficult to identify the leaves of the data flow graph. The issue is that inputs and pre-existing parameters look identical until parameter identification takes place. It's not too bad for training since Autograd lets us differentiate between them however I still want the tool to do something reasonable in inference. Some of this will be ameliorated when a later PR pulls in parameters from python tracing. The current approach is passable, but I will continue to mull over refinements. Differential Revision: [D40220388](https://our.internmc.facebook.com/intern/diff/D40220388/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/87567 Approved by: https://github.com/chaekit commit 64c5c77cd47212da719eb29c3b0a2b07cebb3705 Author: Taylor Robie <taylorrobie@fb.com> Date: Sat Nov 26 10:33:11 2022 -0800 [Profiler] Memory profiler part 6: Mark gradients and temporary intermediates. (#87566) Semantic assignment will be built up as a series of passes which gradually pin down the regions of a trace. For this reason it is important to be very meticulous in the assignment of categories. We begin with gradients as they are both straightforward to identify and foundational to subsequent analysis. There are two mechanisms that the profiler can use to tag gradients, each with their own advantages and limitations. The first is direct inspection of the op graph which is generic but predicated on certain features of the Autograd engine. (And therefore not necessarily exhaustive.) The second approach is direct instrumentation via the python tracer. This method relies requires that gradients be attached to an nn.Module parameter and can miss corner cases such as `set_to_none=True` due to the cache structure of the python tracer. Combined these two approaches provide very high coverage. Temporaries are more straightforward; we can easily add them by trivial local inspection of a data flow node. Because this is the first PR in the end-to-end section most of the code is building the scaffolding for category bookkeeping and unit testing. (The actual gradient extraction was covered in an earlier PR.) Differential Revision: [D40220389](https://our.internmc.facebook.com/intern/diff/D40220389/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/87566 Approved by: https://github.com/chaekit commit 5f09a6d573a2a07c00c76c3cbdbffe0fafe2436d Author: Taylor Robie <taylorrobie@fb.com> Date: Sat Nov 26 10:33:09 2022 -0800 [Profiler] Memory profiler part 5: Data flow graph (#87006) The semantic meaning of a Tensor is tightly coupled to its lineage. The data flow graph allows us to identify temporary Tensors, masks, inputs, activations, and more. However one important nuance is that Tensors must be versioned; operations which mutate their inputs can also change the semantic meaning of said inputs. It is challenging to assemble a complete picture of the data flow in a PyTorch model because ops can, and often do, recursively call into other ops. For the purpose of memory profiling this is an implementation detail, so instead we traverse the op tree to identify top level ops and allocations and then coalesce their children, folding inputs and outputs into the top level Node. Differential Revision: [D40220391](https://our.internmc.facebook.com/intern/diff/D40220391/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/87006 Approved by: https://github.com/chaekit commit c3116dd78b294f1bd3f6424dc1bfb7ff86bb0a66 Author: Taylor Robie <taylorrobie@fb.com> Date: Sat Nov 26 10:33:08 2022 -0800 [Profiler] Memory profiler part 4: Select top level torch ops (#86880) In a later PR we will walk the children of these nodes and formulate a node from the entire bundle to build a data flow graph. This PR simply defines what a "top level" op is. Differential Revision: [D40220387](https://our.internmc.facebook.com/intern/diff/D40220387/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/86880 Approved by: https://github.com/chaekit commit bb77accb4c996e3aab9ae4b665fb8464400c8194 Author: Jiong Gong <jiong.gong@intel.com> Date: Sat Nov 26 14:06:44 2022 +0000 [Inductor] Record cpp kernel in PyTorch Profiler (#89367) Add an option `config.cpp.enable_kernel_profile` to record individual cpp kernel time in PyTorch Profiler. Pull Request resolved: https://github.com/pytorch/pytorch/pull/89367 Approved by: https://github.com/jansel commit 36018a6ee63f140b95ad644d09920798b0c624f8 Author: Edward Z. Yang <ezyang@fb.com> Date: Fri Nov 25 13:48:35 2022 -0800 Don't suppress exceptions from backends (#89656) Taken from voz's https://github.com/pytorch/pytorch/pull/89392 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/89656 Approved by: https://github.com/voznesenskym commit 3e20d023b1f442ebe59e76604395cd8d4abed52a Author: Natalia Gimelshein <ngimel@fb.com> Date: Sat Nov 26 03:08:23 2022 +0000 put descriptive kernel names behind config (#89697) Per title, generated kernel names are often long and confusing. Pull Request resolved: https://github.com/pytorch/pytorch/pull/89697 Approved by: https://github.com/Chillee commit 591dfffa38848de54b7f5f4e49260847024c9281 Author: jlukehubbard <58089207+jlukehubbard@users.noreply.github.com> Date: Fri Nov 25 21:31:53 2022 +0000 update docstring for torch.linalg.lstsq (#89383) Previous documentation lacked details about the handling of over- and underdetermined systems, and made incorrect mention of MAGMA. Fixes #85021 Pull Request resolved: https://github.com/pytorch/pytorch/pull/89383 Approved by: https://github.com/lezcano commit c9a0cc86407d7ec20524b0e26305109d0cf2b5c2 Author: Edward Z. Yang <ezyang@fb.com> Date: Fri Nov 25 03:31:20 2022 +0000 Simplify aot_module_simplified by removing top_args/top_kwargs (#89666) This makes good on Chillee's CR comment at https://github.com/pytorch/functorch/pull/660/files/af30d351cc93dfafb5a94dbcb32983c5ef65fd6a#r843315222 which was never done in the original PR. There is no logic change, just unpack the args/kwargs at the top level and remove the inner function indirection. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/89666 Approved by: https://github.com/voznesenskym commit 6168f22fae66da5703e087bcd10076921ca157e7 Author: Edward Z. Yang <ezyang@fb.com> Date: Fri Nov 25 03:31:19 2022 +0000 Don't support kwargs at runtime in aot_module_simplified (#89664) The preexisting logic here added in https://github.com/pytorch/functorch/pull/970 was very peculiar: if top_kwargs was non-empty, then the inner compiled function supports kwargs. Naively, this would leave you to expect that there is some sort of correlation between top_kwargs and kwargs. But in fact, they're completely unrelated! top_kwargs is the AOTAutograd configuration knobs (e.g., fw_compiler/bw_compiler), but kwargs is the RUNTIME kwargs that are to be passed to the compiled function. But (1) we don't support this (the function to be compiled only takes a list of tensors) and (2) even if we did support it, conditioning on whether or not you had passed AOTAutograd configuration kwargs to support kwargs at runtime is bonkers. So delete it. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/89664 Approved by: https://github.com/voznesenskym commit b04dda4291f1d30b064572e4521e82fa2573af77 Author: Edward Z. Yang <ezyang@fb.com> Date: Fri Nov 25 03:31:19 2022 +0000 Delay verify correctness wrapping to call site. (#89662) There is only one call site for compiler_fn, so we can safely delay wrapping verify correctness to here. This will help later when we change the backend compiler calling convention to pass fake tensors (but I need to pass real tensors here.) This is adapted from voz's changes at https://github.com/pytorch/pytorch/pull/89392 but with less changes to the substantive logic. I only moved the relevant inner implementation; there are no changes otherwise. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/89662 Approved by: https://github.com/voznesenskym commit 61a3fe4b6409965223273c1098f9a77ff071efe1 Author: Natalia Gimelshein <ngimel@fb.com> Date: Fri Nov 25 19:42:38 2022 +0000 make inductor correctly propagate nans for maximum and minimum (#89612) Partially fixes https://github.com/pytorch/torchdynamo/issues/594 Also, small cleanup for `where` codegen Pull Request resolved: https://github.com/pytorch/pytorch/pull/89612 Approved by: https://github.com/soumith, https://github.com/jansel commit 70c0a3006ee96b3db1f531109fc383f8159e2d2f Author: Ikko Ashimine <eltociear@gmail.com> Date: Fri Nov 25 19:26:18 2022 +0000 Fix typo in segment_reduction_op_gpu.cu (#89647) menber -> member Pull Request resolved: https://github.com/pytorch/pytorch/pull/89647 Approved by: https://github.com/kit1980 commit 2c0bd85c755043d696452ddab354f3ff6775738b Author: kshitij12345 <kshitijkalambarkar@gmail.com> Date: Fri Nov 25 14:53:57 2022 +0000 complex: register c10::complex with py::cast (#89680) Fixes #77134 TODO: * [x] Add test (tested locally with script below) (Are there similar tests in the test-suite?) ```c++ namespace py = pybind11; int main() { py::scoped_interpreter guard{}; // start the interpreter auto casted_cdouble = py::cast(c10::complex<double>(1.0, 2.0)); assert( (c10::complex<double>(1.0, 2.0) == py::cast<c10::complex<double>>(casted_cdouble))); auto casted_cfloat = py::cast(c10::complex<float>(1.0, 2.0)); assert( (c10::complex<double>(1.0, 2.0) == py::cast<c10::complex<double>>(casted_cfloat))); auto casted_chalf = py::cast(c10::complex<at::Half>(1.0, 2.0)); assert( (c10::complex<double>(1.0, 2.0) == py::cast<c10::complex<double>>(casted_chalf))); } ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/89680 Approved by: https://github.com/ezyang commit a97d0508cb5259951bc48300fb914cebdf322bb9 Merge: 849be586e6 abb446af8c Author: Jakub Pietrak <jakub.pietrak@intel.com> Date: Fri Nov 25 15:24:54 2022 +0100 Merge branch 'master' of https://github.com/pytorch/pytorch into pyg-36 commit 849be586e649421ba58182feb9067a4ac65479e3 Merge: 059a238619 75bfbc35ca Author: Jakub Pietrak <jakub.pietrak@intel.com> Date: Fri Nov 25 14:25:40 2022 +0100 Merge branch 'gh/mingfeima/85/head' into pyg-36 commit abb446af8c65a49bbc3767e14605a73d244c176b Author: Alvaro Gaona <alvgaona@gmail.com> Date: Fri Nov 25 11:09:28 2022 +0000 Implement old windows in Python (#87082) Relates to #85366 - Bartlett, Blackman, Hamming, Hann. - Except Kaiser which will be in a different PR Pull Request resolved: https://github.com/pytorch/pytorch/pull/87082 Approved by: https://github.com/mruberry, https://github.com/lezcano commit 059a238619b122f922c569c618919a277420e483 Merge: 26ba2e9751 95ea47ef0c Author: Jakub Pietrak <97102979+JakubPietrakIntel@users.noreply.github.com> Date: Fri Nov 25 10:00:53 2022 +0100 Merge branch 'pytorch:master' into jpietrak/pyg-36 commit 95ea47ef0c1cffe1fe05cc36bdc47c26cc72f13e Author: Jason Ansel <jansel@meta.com> Date: Fri Nov 25 04:28:36 2022 +0000 torchdynamo to torch._dynamo in aot_autograd.py (#89385) Test Plan: Run torchbench models Differential Revision: D41429573 Pull Request resolved: https://github.com/pytorch/pytorch/pull/89385 Approved by: https://github.com/soumith, https://github.com/malfet commit 69043247819042db18ac9526c2d747fa61fe8880 Author: Edward Z. Yang <ezyang@fb.com> Date: Thu Nov 24 12:00:13 2022 -0800 Remove fake_tensor_propagation (#89646) You always have to run dynamo with fake tensors. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/89646 Approved by: https://github.com/soumith commit 1aa1014b262b75d4269d9a4d8b562c6ee43a0991 Author: Edward Z. Yang <ezyang@fb.com> Date: Thu Nov 24 12:00:12 2022 -0800 xfail maml test, instead of running it without fake tensor prop (#89645) A previous version of this patch graph breaks when torch.tensor fails, but that causes ``` PYTORCH_TEST_WITH_DYNAMO=1 python test/nn/test_embedding.py -k test_embedding_bag_1D_padding_idx_cpu_float32 ``` to start failing. Probably another latent bug that needs investigating. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/89645 Approved by: https://github.com/albanD commit a048913e2530442360c36a48420079ca9ebca149 Author: PyTorch MergeBot <pytorchmergebot@users.noreply.github.com> Date: Fri Nov 25 03:03:41 2022 +0000 [vision hash update] update the pinned vision hash (#89667) This PR is auto-generated nightly by [this action](https://github.com/pytorch/pytorch/blob/master/.github/workflows/_update-commit-hash.yml). Update the pinned vision hash. Pull Request resolved: https://github.com/pytorch/pytorch/pull/89667 Approved by: https://github.com/pytorchbot commit 3b3ebcd031b68762938806f541d7247a1521bb11 Author: XiaobingSuper <xiaobing.zhang@intel.com> Date: Thu Nov 24 02:33:01 2022 -0500 TorchDynamo: weight prepack for single conv (#89209) Pull Request resolved: https://github.com/pytorch/pytorch/pull/89209 Approved by: https://github.com/jgong5, https://github.com/jansel commit 0c4f3db7bf24e94125c6802718a1105ee548c953 Author: XiaobingSuper <xiaobing.zhang@intel.com> Date: Thu Nov 24 02:32:59 2022 -0500 TorchDynamo: weight prepack for mkl linear (#89109) Pull Request resolved: https://github.com/pytorch/pytorch/pull/89109 Approved by: https://github.com/jgong5, https://github.com/jansel commit 07151a6bd62e308b6b32e2e0edfc4d5f0563576e Author: XiaobingSuper <xiaobing.zhang@intel.com> Date: Thu Nov 24 02:32:55 2022 -0500 TorchDynamo: weight prepack for onednn convolution external call (#88988) This PR is about enabled weight prepack using the MKLDNN tensor: 1. enable fake tensor mode for MKLDNN tensor input. 2. make convolution fusion kernel support MKLDNN tensor input. 3. do the weight prepack at FX fusion step. For better performance, we always use channels_last for CPU convolution path. because we test that the channels_last path can get a better performance than block input path, and also avoid the activation's layout conversion(plain to block, block to plain), currently, there only need plain to plain format conversion. Pull Request resolved: https://github.com/pytorch/pytorch/pull/88988 Approved by: https://github.com/jgong5, https://github.com/jansel commit 0884fdaba0280e3f3ad2abc34c0940587f744886 Author: Edward Z. Yang <ezyang@fb.com> Date: Thu Nov 24 14:31:00 2022 -0500 Revert "Dont clone unmutated args in triton autotuning (#89519)" (#89652) This reverts commit f18f0c70ab10c400947e71be30794e04dcc22acf. Testing to see if this fixes gmixer_24_224 mixer_b16_224 Pull Request resolved: https://github.com/pytorch/pytorch/pull/89652 Approved by: https://github.com/eellison commit 4a16f8cdb26be3561742e86f184e59f65418fe63 Author: Edward Z. Yang <ezyang@fb.com> Date: Thu Nov 24 09:00:09 2022 -0800 Reenable fake_tensor_propagation on test_cudnn_rnn (#89644) Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/89644 Approved by: https://github.com/anjali411 commit fc7dcb684aa38da5b1534fc701657ee63af8909c Author: Edward Z. Yang <ezyang@fb.com> Date: Thu Nov 24 09:00:09 2022 -0800 Run optimizer tests with fake tensors (#89643) This is a slight regression: RAdam and Adagrad don't appear to trace at all under fake tensors. But I think this is a more accurate reflection of the current state of affairs. Along the way fix some problems on the fake tensor path. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/89643 Approved by: https://github.com/anjali411 commit 9b13508ef3a4e858fbbbf068b3a825f1632e8daa Author: Edward Z. Yang <ezyang@fb.com> Date: Thu Nov 24 09:00:08 2022 -0800 Force test_rng_state to run with fake tensor prop (#89641) I'm not really sure what desertfire's intended follow up was on https://github.com/pytorch/pytorch/pull/87490 because when I remove the unsupported() call, dynamo tests pass. But the change here is conservative and I think strictly better than the current situation. The idea is to force fake tensor pop on for the test, and then just observe that we are doing a graph break. Clearly, export doesn't work, so I manually xfail it. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/89641 Approved by: https://github.com/anjali411 commit c6be06d93ab911a3fbb185451c8cf42bcedad0c1 Author: Edward Z. Yang <ezyang@fb.com> Date: Thu Nov 24 09:00:08 2022 -0800 Easy: These tests work with fake_tensor_propagation on (#89640) Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/89640 Approved by: https://github.com/anjali411, https://github.com/albanD commit 6fb6eb0a7498839e69302da7bf8c04205c64e0f3 Author: Edward Z. Yang <ezyang@fb.com> Date: Thu Nov 24 08:11:48 2022 -0800 Support unspecialized integers with dynamic shapes (#89639) Previously, we hackily wrapped unspecialized integers into tensors and treated them as tensor inputs. Sometimes, downstream operations would not be able to deal with the tensor input. Now, we wrap them into SymInt, so more correct overload selection occurs. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/89639 Approved by: https://github.com/anjali411 commit 0c96841a20f0ae9380ef26657914276a42c9c9d7 Author: Edward Z. Yang <ezyang@fb.com> Date: Thu Nov 24 08:11:47 2022 -0800 Cond capture with fake tensors actually works; don't raise in this case (#89638) Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/89638 Approved by: https://github.com/anjali411 commit d3c012f409a4e4d5a11070a90b5578da82778030 Author: kshitij12345 <kshitijkalambarkar@gmail.com> Date: Thu Nov 24 21:41:20 2022 +0000 [test_nn] split pruning tests from test_nn (#89590) Ref: https://github.com/pytorch/pytorch/issues/63085 Note: Doesn't need corresponding XLA PR as the migrated tests were not run on XLA (as they weren't in TestNNDeviceType). Pull Request resolved: https://github.com/pytorch/pytorch/pull/89590 Approved by: https://github.com/albanD commit 83666f167dcf023d301f16fad82b9afb374ad836 Author: Aleksandar Samardžić <asamardzic@quansight.com> Date: Thu Nov 24 14:44:12 2022 +0000 Added vectorized CPU code for uint8_t datatype. (#89284) Pull Request resolved: https://github.com/pytorch/pytorch/pull/89284 Approved by: https://github.com/lezcano, https://github.com/peterbell10 commit 9497552771ca59c68509398ab3094e590a3047c5 Author: Howard Huang <howardhuang@meta.com> Date: Thu Nov 24 19:41:17 2022 +0000 Update SyncBatchNorm _all_gather_base to all_gather_into_tensor (#89521) Summary: Fixes https://github.com/pytorch/pytorch/issues/88568 `_all_gather_base` is deprecated. So replacing its usage with `all_gather_into_tensor` Test Plan: CI Differential Revision: D41479983 Pull Request resolved: https://github.com/pytorch/pytorch/pull/89521 Approved by: https://github.com/wz337 commit 94a88b53ed37854379813abf9641d1637fe2688b Author: Edward Z. Yang <ezyang@fb.com> Date: Thu Nov 24 08:11:46 2022 -0800 Remove fake_tensors_available (#89637) As we are one repo now, they are always available. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/89637 Approved by: https://github.com/anjali411 commit 1c8b0779de76d0c76d34835047106ab37b41790b Author: Emilio Castillo <ecastill@preferred.jp> Date: Thu Nov 24 18:25:26 2022 +0000 Fix segfault when swapping custom allocator (#89613) Just screwed it before merging ... Pull Request resolved: https://github.com/pytorch/pytorch/pull/89613 Approved by: https://github.com/albanD commit fd279fe85b8f5a8e74c615436f0b180621b6ef52 Author: Edward Z. Yang <ezyang@fb.com> Date: Thu Nov 24 09:23:05 2022 -0500 Make pytest work again on test/dynamo (#89631) Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/89631 Approved by: https://github.com/anjali411 commit c3e85d879cdbd3973754760c6767c75276b1dca8 Author: albanD <desmaison.alban@gmail.com> Date: Thu Nov 24 17:11:42 2022 +0000 Mention discrepency between original impl and our impl of RAdam (#89575) Fixes https://github.com/pytorch/pytorch/issues/88836 Pull Request resolved: https://github.com/pytorch/pytorch/pull/89575 Approved by: https://github.com/mruberry commit 860bae49e4925868a0221ec4345d08407280bac7 Author: Edward Z. Yang <ezyang@fb.com> Date: Wed Nov 23 08:04:31 2022 -0800 Suppress guards on as_strided call only. (#89569) See comment in meta_utils.py for the whole story. This doesn't have a substantive impact yet, but will in the next PR on the stack. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/89569 Approved by: https://github.com/albanD commit 1588ea0dbf16f37ce14cfc8764666985c16ccbf9 Author: mfkasim1 <firman.kasim@gmail.com> Date: Thu Nov 24 11:11:51 2022 +0000 Added log1p for complex in c10 (#89214) One PR towards #89205. The content is mostly from PR #38465, but slightly changed the expression to make it faster. Here are some benchmarking code: ```c++ // main.cc template<typename T> inline std::complex<T> log1p_v0(const std::complex<T> &z) { // this PR T x = z.real(); T y = z.imag(); T theta = std::atan2(y, x + T(1)); T r = x * (x + T(2)) + y * y; return {T(0.5) * std::log1p(r), theta}; } template<typename T> inline std::complex<T> log1p_v1(const std::complex<T> &z) { // PR #38465 T x = z.real(); T y = z.imag(); std::complex<T> p1 = z + T(1); T r = std::abs(p1); T a = std::arg(p1); T rm1 = (x * x + y * y + x * T(2)) / (r + 1); return {std::log1p(rm1), a}; } template<typename T> inline std::complex<T> log1p_v2(const std::complex<T> &z) { // naive, but numerically inaccurate return std::log(T(1) + z); } int main() { int n = 1000000; std::complex<float> res(0.0, 0.0); std::complex<float> input(0.5, 2.0); auto start = std::chrono::system_clock::now(); for (int i = 0; i < n; i++) { res += log1p_v0(input); } auto end = std::chrono::system_clock::now(); auto elapsed = end - start; std::cout << "time for v0: " << elapsed.count() << '\n'; start = std::chrono::system_clock::now(); for (int i = 0; i < n; i++) { res += log1p_v1(input); } end = std::chrono::system_clock::now(); elapsed = end - start; std::cout << "time for v1: " << elapsed.count() << '\n'; start = std::chrono::system_clock::now(); for (int i = 0; i < n; i++) { res += log1p_v2(input); } end = std::chrono::system_clock::now(); elapsed = end - start; std::cout << "time for v2: " << elapsed.count() << '\n'; std::cout << res << '\n'; } ``` Compiling the script with command `g++ main.cc` produces the following results: ``` time for v0: 237812271 time for v1: 414524941 time for v2: 360585994 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/89214 Approved by: https://github.com/lezcano commit 4f5c4c022a8365d06ac401582958bbf0fd3f8337 Author: Jiewen Tan <jwtan@google.com> Date: Thu Nov 24 10:57:01 2022 +0000 [LTC] Refine MetricsArena::Reset (#89608) Summary: After counters are reset, getters' behaviors are inconsistent. To improve that, here I 1) move the validation of CounterData into CounterData::IsValid such that it's better encapsulated, 2) divide getters into two groups: a) MetricsArena::GetCounter() and b) MetricsArena::ForEachCounter(), and route MetricsArena::GetCounterNames() and CreateMetricReport() to use b. This is paired with pytorch/xla#4217. Test Plan: PJRT_DEVICE=CPU python xla/test/test_metrics.py Pull Request resolved: https://github.com/pytorch/pytorch/pull/89608 Approved by: https://github.com/JackCaoG commit a8629a1c18fd13300ce69c1d6042004038885cf0 Author: Jithun Nair <jithun.nair@amd.com> Date: Thu Nov 24 10:53:20 2022 +0000 Upgrade nightly wheels to ROCm5.3 (#89101) Dependent on PR https://github.com/pytorch/builder/pull/1193 Pull Request resolved: https://github.com/pytorch/pytorch/pull/89101 Approved by: https://github.com/kit1980 commit c0d81aa70ce45a0c2e7ced6c9f42a92d15523188 Author: Ivan Yashchuk <ivan.yashchuk@aalto.fi> Date: Thu Nov 24 09:37:10 2022 +0000 Use fx.replace_pattern for removing empty_like+fill in nvFuser+PrimTorch execution (#89132) I learned about `torch.fx.replace_pattern` and it's a cleaner way of removing unnecessary tensor materialization from the graph coming from tracing C++ code `1 - tensor`. Test: ``` python -m pytest test/test_prims.py -k "test_silu_backward_no_filled_tensor" ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/89132 Approved by: https://github.com/mruberry, https://github.com/jjsjann123 commit b515c1d96082214e81cc57ce2a1de9164b50206f Author: Hao Guan <10684225+hguandl@users.noreply.github.com> Date: Thu Nov 24 08:14:24 2022 +0000 [QAT] Check the value of numel to avoid segfault (#81547) Fixes #78123 Segmentation fault RuntimeError: numel is out of the bound of input tensor Pull Request resolved: https://github.com/pytorch/pytorch/pull/81547 Approved by: https://github.com/kit1980 commit 22a1b5e243e852e1c423c697e51975d1545d2a1b Author: Vasiliy Kuznetsov <vasiliy@fb.com> Date: Wed Nov 23 13:01:15 2022 -0800 quantization: deprecate observer compute_dtype and replace with is_dynamic (#85431) Summary: This PR deprecates the `compute_dtype` field on observers, and replaces it with the `is_dynamic` field on observers. This is better aligned with the reference model spec. Test plan: ``` python test/test_quantization.py TestQuantizeFx python test/test_quantization.py TestQuantizeFxOps ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/85431 Approved by: https://github.com/jerryzh168 commit e4ccec6ecab9b48e804d58f60135f0950fca864f Author: Yanbo Liang <ybliang8@gmail.com> Date: Thu Nov 24 05:28:58 2022 +0000 [Dynamo] Fix bug of using customized torch.autograd.Function (#89397) Fixes https://github.com/pytorch/torchdynamo/issues/1899 Pull Request resolved: https://github.com/pytorch/pytorch/pull/89397 Approved by: https://github.com/jansel commit 903ae4570e401e5c4e42dc4a44cae37f805044a4 Author: Michael Lazos <mlazos@fb.com> Date: Thu Nov 24 04:15:34 2022 +0000 Disable optimizer tracing, enable for tests only (#89500) Disabling optimizer tracing before launch until it can be added to the benchmark suites without increasing compile times Pull Request resolved: https://github.com/pytorch/pytorch/pull/89500 Approved by: https://github.com/anijain2305 commit c79489c8e69f965f3e5af8f3f39df78e7d4732ba Author: albanD <desmaison.alban@gmail.com> Date: Thu Nov 24 03:39:55 2022 +0000 Expose to python the backward AD view_func (#89586) This will be useful for other systems (AOTAutograd) that want to replay autograd views. FYI @bdhirsh Pull Request resolved: https://github.com/pytorch/pytorch/pull/89586 Approved by: https://github.com/soulitzer commit 4cb6bbbe27162c7b0835879131991d2155329718 Author: Nikita Karetnikov <nikita@karetnikov.org> Date: Thu Nov 24 01:02:28 2022 +0100 Symintify `embedding` (#89327) Pull Request resolved: https://github.com/pytorch/pytorch/pull/89327 Approved by: https://github.com/ezyang commit 9c867eae1a7fffb6f893717073150cff04a923a4 Author: Wu, Chunyuan <chunyuan.wu@intel.com> Date: Wed Nov 23 20:10:41 2022 +0000 nnc: fix Store if value is fp32 while buf is bf16 (#86788) Fixes https://github.com/pytorch/pytorch/issues/86533. For the below graph: ```bash [DUMP kernel.cpp:1690] TensorExprKernel graph: [DUMP kernel.cpp:1690] graph(%x.1 : BFloat16(10, strides=[1], requires_grad=0, device=cpu)): [DUMP kernel.cpp:1690] %1 : int = prim::Constant[value=0]() [DUMP kernel.cpp:1690] %2 : BFloat16(10, strides=[1], requires_grad=0, device=cpu) = aten::pow(%x.1, %1) # test/test_tensorexpr.py:1330:29 [DUMP kernel.cpp:1690] %3 : BFloat16(10, strides=[1], requires_grad=0, device=cpu) = aten::sin(%2) # test/test_tensorexpr.py:1330:19 [DUMP kernel.cpp:1690] return (%3) ``` **Loop stmt before the fix:** The store value `0.8414709568023682f` is float while the scalar_type of the store buf `aten_sin` is bf16. ```bash [DEBUG llvm_codegen.cpp:489] After HalfRewriter { [DEBUG llvm_codegen.cpp:489] aten_sin[Ramp(0ll, 1ll, 8)] = Broadcast(0.8414709568023682f, 8); [DEBUG llvm_codegen.cpp:489] for (int64_t i_1_tail_tail = 0ll; i_1_tail_tail < 2ll; i_1_tail_tail++) { [DEBUG llvm_codegen.cpp:489] aten_sin[i_1_tail_tail + 8ll] = 0.8414709568023682f; [DEBUG llvm_codegen.cpp:489] } [DEBUG llvm_codegen.cpp:489] } ``` **Loop stmt after the fix:** ```bash [DEBUG llvm_codegen.cpp:489] After HalfRewriter { [DEBUG llvm_codegen.cpp:489] aten_sin[Ramp(0ll, 1ll, 8)] = bfloat16(Broadcast(0.8414709568023682f, 8)); [DEBUG llvm_codegen.cpp:489] for (int64_t i_1_tail_tail = 0ll; i_1_tail_tail < 2ll; i_1_tail_tail++) { [DEBUG llvm_codegen.cpp:489] aten_sin[i_1_tail_tail + 8ll] = bfloat16(0.8414709568023682f); [DEBUG llvm_codegen.cpp:489] } [DEBUG llvm_codegen.cpp:489] } ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/86788 Approved by: https://github.com/EikanWang, https://github.com/kit1980 commit f0e5bc4b9f231b438f76ddd13b2c21b7cb8a09ac Author: Zhijing Li (Accelerator Enablement) <tissue030@meta.com> Date: Thu Nov 24 02:18:32 2022 +0000 Symintified layer_norm (#89466) Summary: As titled. Test Plan: ``` buck2 run mode/opt scripts/wwei6:test_executorch ``` Differential Revision: D41451390 Pull Request resolved: https://github.com/pytorch/pytorch/pull/89466 Approved by: https://github.com/frank-wei, https://github.com/ezyang commit fdb2dd113d3aec0acb2a473de6be49940ab6a115 Author: Alexander Grund <alexander.grund@tu-dresden.de> Date: Thu Nov 24 01:52:11 2022 +0000 Install missing VSX headers (POWER) (#85547) E.g. `test_cpp_extensions_aot_ninja` fails as it includes `vec.h` which requires the vec/vsx/* headers and `sleef.h`. The latter is also required for AVX512 builds on non MSVC compilers. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85547 Approved by: https://github.com/kit1980 commit e922bd4e523b0a30f6607f6497ac458571e00131 Author: Wei-Sheng Chin <wschin@outlook.com> Date: Thu Nov 24 01:30:09 2022 +0000 [ONNX] Move two headers from .h to .cc (#86852) As title. Header dependency should be as small as possible. Pull Request resolved: https://github.com/pytorch/pytorch/pull/86852 Approved by: https://github.com/titaiwangms, https://github.com/BowenBao commit 23fe2ff910fd1577281a2210d1184aff705191b8 Author: Shunting Zhang <shunting@meta.com> Date: Thu Nov 24 01:28:10 2022 +0000 verify the number of outputs of xla graph (#89536) This PR add tests to verify the behavior of number of outputs returns by an XLA graph. The understanding from this PR will help us fix https://github.com/pytorch/torchdynamo/issues/1908 and enable training for dynamo/torchxla integration eventually. Send this PR separately so Jack could help verify if the behavior is expected and play with it. List some code snippets here since their behavior is not straightforward at a first glance: ``` def forward(self, a, b, c): """ The XLA graph will only return the first 2 items """ return a + b, a + c, b ``` ``` def forward(self, a, b, c): """ Inplace update on b cause it to be returned in XLA graph """ b.zero_() return a + b, a + c, b ``` ``` def forward(self, a, b, c): """ Even if we return b twice, the XLA graph only return b once. """ b.zero_() return a + b, a + c, b, b ``` Here are what observed by the added tests: 1. XLA does not return outputs that are also inputs -- if the tensor is not inplace updated. At first glance people may feel curious why should we consider this kind of 'non-realistic' corner case. But this kind of graphs indeed shows up in AOTAutograd. The main reason is AOTAutograd lift all model parameters/buffers as graph input and may return some of them. Check ***test_direct_return*** 2. if a tensor is inplace updated, XLA will still return it as graph output even if it's also an input. The only difference compared to item 1 is, the inplace updating on the tensor cause it being returned. This happens for BatchNorm2d since the running_mean/variance tensors will be inplace updated during training. Check ***test_direct_return_with_inplace_update*** Pull Request resolved: https://github.com/pytorch/pytorch/pull/89536 Approved by: https://github.com/jansel commit 0bde5149819e9854bca1363aa6c9f52f7db2496e Author: Nikita Shulga <nshulga@meta.com> Date: Thu Nov 24 00:57:17 2022 +0000 Add `c10::` namespace in front of `optional` (#89605) Prep change for moving the codebase to C++17 standard Was part of https://github.com/pytorch/pytorch/pull/85969 Pull Request resolved: https://github.com/pytorch/pytorch/pull/89605 Approved by: https://github.com/weiwangmeta, https://github.com/kit1980 commit e19a7165fd1a9a35fcac42706c20e658776c10ab Author: foram-chandra <96388449+foram-chandra@users.noreply.github.com> Date: Thu Nov 24 00:34:26 2022 +0000 [nn] Remove deprecation warning from nn.functional.{tanh, sigmoid} (#86905) Fixes #65909 Pull Request resolved: https://github.com/pytorch/pytorch/pull/86905 Approved by: https://github.com/albanD, https://github.com/kit1980 commit a00bd6f686d7a485f7bea5f971b7e793118842b8 Author: clee2000 <44682903+clee2000@users.noreply.github.com> Date: Wed Nov 23 23:48:32 2022 +0000 Don't run auto request review on forked PRs (#89583) tested on https://github.com/pytorch/pytorch/pull/89581 Pull Request resolved: https://github.com/pytorch/pytorch/pull/89583 Approved by: https://github.com/albanD, https://github.com/malfet commit 0a1a53083e331b3648ad4cb6f750d130e3530731 Author: Nikita Karetnikov <nikita@karetnikov.org> Date: Wed Nov 23 20:42:55 2022 +0000 [primTorch] Enable regex error testing for some refs (#87765) Pull Request resolved: https://github.com/pytorch/pytorch/pull/87765 Approved by: https://github.com/mruberry commit 3ad2a032f4924d58c556b80840f6d51aa8a4472b Author: Nikita Shulga <nshulga@meta.com> Date: Wed Nov 23 23:23:24 2022 +0000 Update default cmake to 3.18 (#89570) Set `cmake.dir` to `/usr/local` in `.circleci/scripts/build_android_gradle.sh ` Prep change for raising compiler standard to C++17: cmake-3.18 is the first one to support CUDA17 language Pull Request resolved: https://github.com/pytorch/pytorch/pull/89570 Approved by: https://github.com/atalman commit 8695f0cced016d43298b43a4baf30315061fdacd Author: Jane Xu <janeyx@meta.com> Date: Wed Nov 23 23:23:17 2022 +0000 Rectify `native_batch_norm` schema by splitting it into two legit schemas (#88697) Using the same repro from the issue (but with BatchNorm2D) Rectifies native_batch_norm schema by splitting the schema into 2: 1. one will have NON-optional alias-able running_mean and running_var inputs 2. the other will just not have those parameters at all (no_stats variation) **Calling for name suggestions!** I've added tests in test_functionalization.py as well as an entry in common_method_invocations.py for `native_batch_norm_legit` CI should pass. Because of bc/fc reasons, we reroute native_batch_norm to call our new schemas ONLY through the python dispatcher, but in 2 weeks or so, we should make `native_batch_norm_legit` the official batch_norm. Pull Request resolved: https://github.com/pytorch/pytorch/pull/88697 Approved by: https://github.com/albanD commit a00efe55c3790789b967facf10c3f426faa98155 Author: Everton Constantino <everton.constantino@linaro.org> Date: Wed Nov 23 22:46:29 2022 +0000 Fix CheckOutputStreamSetting on JitLoggingTest as it failed if logging wasn't enabled. (#82722) `JIT_LOG` checks if logging was enabled for that particular file and when it isn't it doesn't output anything. Since the test checks for the size of `test_stream` it fails. I believe forcing the file to have logging enabled to see if the stream is being correctly set during test makes no sense so this patches just forcibly outputs and checks if it worked. Pull Request resolved: https://github.com/pytorch/pytorch/pull/82722 Approved by: https://github.com/davidberard98 commit b8d3afd88665de5f01f696333d0ff291bd94a57b Author: Huy Do <huydhn@gmail.com> Date: Wed Nov 23 22:39:36 2022 +0000 Skip upload test stats for test reports from rerun disabled tests workflow (#89548) I have found the reason why uploading tests stats fails for rerun disabled workflow, for example https://github.com/pytorch/pytorch/actions/runs/3522896778/jobs/5917765699. The problem is that the pytest XML file is now too big to be processed quickly (x50 bigger). Unlike unittest, `pytest-flakefinder` used by rerun disabled tests for test_ops includes skipped messages multiple times (50 times by default, retrying and skipping). This slows down the upload test stats script too much (O(n)) because it tries to gather all the stats. On the other hand, `check_disabled_tests` doesn't suffer from the same issue because it ignores all these skipped messages. This is a quick fix to skip test reports from rerun disabled tests workflow when trying to upload test stats. I'll try to fix this properly later in the way we use pytest-flakefinder. From what I see, a zipped test report from rerun disabled test is only few MB ([example](https://gha-artifacts.s3.amazonaws.com/pytorch/pytorch/3521687954/1/artifact/test-reports-test-default-1-2-linux.2xlarge_9636028803.zip)), but will balloon up to a much bigger XML file after extracting from a dozen to a few hundred MB (text). The size of the zipped file is not a big immediate problem [3521687954](https://github.com/pytorch/pytorch/actions/runs/3521687954) is an example workflow with rerun disabled tests and mem leak check. The script can now finish when running locally: * `upload_test_stats` finishes around 3+ minutes ``` time python -m tools.stats.upload_test_stats --workflow-run-id 3521687954 --workflow-run-attempt 1 --head-branch master ... Writing 8925 documents to S3 Done! Writing 1760 documents to S3 Done! Writing 1675249 documents to S3 Done! python3 -m tools.stats.upload_test_stats --workflow-run-id 3521687954 1 185.69s user 12.89s system 75% cpu 4:22.82 total ``` * `check_disabled_tests` finishes within 3 minutes ``` time python -m tools.stats.check_disabled_tests --workflow-run-id 3521687954 --workflow-run-attempt 1 --repo pytorch/pytorch ... python -m tools.stats.check_disabled_tests --workflow-run-id 3521687954 1 154.19s user 4.17s system 97% cpu 2:42.50 total ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/89548 Approved by: https://github.com/clee2000 commit f18f0c70ab10c400947e71be30794e04dcc22acf Author: Elias Ellison <elias.ellison@gmail.com> Date: Wed Nov 23 19:02:51 2022 +0000 Dont clone unmutated args in triton autotuning (#89519) Improves first memory compression on pytorch struct from .55 -> .73. However, it doesn't totally eliminate the overhead from autotuning. Any other pointers on where the overhead is coming from in autotuning would be great. Edit: i think it's just the triton cache clearing https://github.com/openai/triton/blob/44f577984d28ee979f704e2c28a1dcbac9639840/python/triton/testing.py#L159 Pull Request resolved: https://github.com/pytorch/pytorch/pull/89519 Approved by: https://github.com/ngimel, https://github.com/jansel commit ac19c5be82febc2140d4601c98daf45646a399ab Author: Peter Bell <peterbell10@live.co.uk> Date: Tue Nov 22 22:26:21 2022 +0000 FFT: disable dimension wrapping for scalar tensors (#89234) Fixes #88985 By default, `maybe_wrap_dim` allows through `dim=0` or `dim=-1` for scalar tensors which leads to an invalid dimension being used to index into `tensor.sizes()` as in the code sample from the issue. Pull Request resolved: https://github.com/pytorch/pytorch/pull/89234 Approved by: https://github.com/mruberry commit 50e2e4faf38c6ebafacc43b72c40333f1f7b401e Author: Pearu Peterson <pearu.peterson@gmail.com> Date: Wed Nov 23 12:05:37 2022 +0200 Sparse CSC/BSR/BSC serialization and pickle support (#89553) Fixes https://github.com/pytorch/pytorch/issues/89497 Pull Request resolved: https://github.com/pytorch/pytorch/pull/89553 Approved by: https://github.com/cpuhrsch commit a8d6b82167ef417e21c807cb29d7eabea15014da Author: Elias Ellison <elias.ellison@gmail.com> Date: Wed Nov 23 16:47:43 2022 +0000 Fix norm decomp when dtype is passed in (#89508) Fix for https://github.com/pytorch/torchdynamo/issues/1889. The wrapper was doing a downcast even when the dtype was explicitly passed in. Pull Request resolved: https://github.com/pytorch/pytorch/pull/89508 Approved by: https://github.com/anijain2305 commit 72110d783344c4121730b032ca0d269896604dcf Author: Elias Ellison <elias.ellison@gmail.com> Date: Wed Nov 23 17:03:09 2022 +0000 Fix Upsample Decomp Striding For Small Channels (#89528) Fix for https://github.com/pytorch/torchdynamo/issues/623. Pull Request resolved: https://github.com/pytorch/pytorch/pull/89528 Approved by: https://github.com/ngimel, https://github.com/anijain2305 commit b7483be06afe8d4242adeb559cfbe6e0e89419d0 Author: Jerry Zhang <jerryzh168@gmail.com> Date: Wed Nov 23 11:03:45 2022 -0800 [quant][docs] Add docstrings for operators defined in torch.ops.quantized_decomposed namespace (#89547) Summary: no functionality changes Test Plan: NA Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/89547 Approved by: https://github.com/vkuzo commit a188f05e8c1788d393c072868421991dfcb55b02 Author: Natalia Gimelshein <ngimel@fb.com> Date: Wed Nov 23 20:18:54 2022 +0000 Reland #89031 Added conv constraint that infers layouts (#89530) Relands #89031 Per title. We now set strides from fx graph only for convolutions and mm, which is a hack, but bmm in some cases caused extra copy, and there is no obvious way to fix that, we should rethink the strides anyway. Pull Request resolved: https://github.com/pytorch/pytorch/pull/89530 Approved by: https://github.com/Chillee commit e800d27b10137727c68cb71bccabe3a93cf38e9e Author: William Wen <williamwen@fb.com> Date: Wed Nov 23 20:11:39 2022 +0000 [dashboard] Add graphs for all summary metrics, add additional testing flags (#89580) Title. Test post: https://github.com/pytorch/torchdynamo/issues/1831#issuecomment-1325572179 Pull Request resolved: https://github.com/pytorch/pytorch/pull/89580 Approved by: https://github.com/davidberard98 commit 953f39578a7019c4c34bc1dbd6cb0facb554af79 Author: Charlie West-Taylor <charliew@graphcore.ai> Date: Wed Nov 23 19:51:50 2022 +0000 Mark IPU device as not supports_as_strided (#89130) Currently causes issues in calls to `.to`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/89130 Approved by: https://github.com/albanD commit 37e46a503502cdeda791cf684522ef83b5655328 Author: Yanbo Liang <ybliang8@gmail.com> Date: Wed Nov 23 19:44:46 2022 +0000 [Dynamo] Fix several bugs & code refactor in RangeVariable (#89322) Fix bug in [7k github models](https://github.com/pytorch/torchdynamo/issues/1884): https://github.com/jansel/pytorch-jit-paritybench/blob/master/generated/test_clovaai_stargan_v2.py ``` E TypeError: 'list' object cannot be interpreted as an integer E E from user code: E File "/scratch/ybliang/work/repos/pytorch-jit-paritybench/generated/test_clovaai_stargan_v2.py", line 335, in forward E idx = torch.LongTensor(range(y.size(0))) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/89322 Approved by: https://github.com/jansel commit 91dcef41ae96ede3f07375c2d38cb28d534e97f8 Author: Xilun Wu <12968408+XilunWu@users.noreply.github.com> Date: Wed Nov 23 19:43:28 2022 +0000 Thread PG: add allreduce to threaded pg (#89043) Summary: Goal Add `all_reduce` collective to multi-threaded ProcessGroup added in D40236769 (https://github.com/pytorch/pytorch/commit/6663ae5537f3c61030ba4d425bd57a097c51430a). Code Motion Added `allreduce` collective to ProcessLocalGroup (a subclass of c10d ProcessGroup). What's Next Add a DDP test utilizing the new allreduce op. Generalize `allreduce` to allow other `ReduceOp`s besides `SUM`. Test Plan: cd fbcode/caffe2 buck2 test mode/dev //caffe2/test/distributed:multi_threaded Differential Revision: D41046606 Pull Request resolved: https://github.com/pytorch/pytorch/pull/89043 Approved by: https://github.com/wanchaol commit 27db806888c36b029f51197a40e5196cc10792db Author: Charlie West-Taylor <charliew@graphcore.ai> Date: Wed Nov 23 19:41:07 2022 +0000 Handle Tensor.__deepcopy__ via clone(), on IPU (#89129) Currently it falls through to a call to `storage()`, which the IPU doesn't support. I've made the minimal change here for ease of merging (this'd help us if it was in for 1.13.1), however... **QUESTION**: Is there any reason why `not torch._C._has_storage(self)` needs to *also* be guarded on `self.device.type == privateuseone`? in other words, could the condition for using `clone` not be this? ```python self.is_sparse or self.device.type in ["lazy", "xla", "mps", "ort", "meta", "hpu", "ipu"] or not torch._C._has_storage(self) or (type(self) is not Tensor and self.data_ptr() == 0) ``` If the condition fails, the very next thing is a call to `self._typed_storage()` which will fail, so it feels to me like *any* case without storage shouldn't fall through to the `storage()` call. The original PR for adding the 'no storage and device is `PrivateUse1`' condition ([86557](https://github.com/pytorch/pytorch/pull/86557)) doesn't discuss whether this could be broadened. Pull Request resolved: https://github.com/pytorch/pytorch/pull/89129 Approved by: https://github.com/albanD commit fa7a963f6536dd05c381fbf23270f4f009f9f113 Author: Sergii Dymchenko <sdym@fb.com> Date: Wed Nov 23 19:39:47 2022 +0000 Remove BaseException TODO (#89540) After discussion in https://github.com/pytorch/pytorch/pull/88461#issuecomment-1318965664 Pull Request resolved: https://github.com/pytorch/pytorch/pull/89540 Approved by: https://github.com/H-Huang commit 9eed6b7f9aa4f5fc65075de3189acc9add221660 Author: Yanbo Liang <ybliang8@gmail.com> Date: Wed Nov 23 19:39:43 2022 +0000 [Dynamo] Several fixes on TensorVariable & TorchVariable (#89486) This is a group of bug fixes for [7k github models](https://github.com/pytorch/torchdynamo/issues/1884), it would fix 30+ model tests. * Support ```tensor.type()```. * Support ```tensor.get_device()```. * Support ```torch.nn.functional._Reduction.get_enum```. * Support ```torch._utils._get_device_index()```. * Fallback ```tensor.data_ptr()```. * ```FakeTensor``` always returns 0 * For no fake tensor propagation, we ```clone``` the input tensor, which makes no sense to track the original ```data_ptr```. And I don't think this is a very popular API. Pull Request resolved: https://github.com/pytorch/pytorch/pull/89486 Approved by: https://github.com/jansel commit f03e6672fb6a694d6f03980e3f34d8181c7cc663 Author: Iris <wz337@cornell.edu> Date: Wed Nov 23 19:39:01 2022 +0000 [Checkpoint][2D] Minor update for dedup_tensors.py (#89542) Rename variables for better readability. Pull Request resolved: https://github.com/pytorch/pytorch/pull/89542 Approved by: https://github.com/H-Huang commit 74703eb50299b26082bc2a357770739a68460199 Author: Iris <wz337@cornell.edu> Date: Wed Nov 23 19:36:01 2022 +0000 [Checkpoint] Add a logger to dedup_tensors (#89503) Add a logger to dedup_tensors to log the duplicate keys to remove in global plan (List of SavePlan). Pull Request resolved: https://github.com/pytorch/pytorch/pull/89503 Approved by: https://github.com/fduwjj commit 57353c9608263df98156a73aaa6ed35a2a2306ad Author: Brian Hirsh <hirsheybar@fb.com> Date: Wed Nov 23 08:29:08 2022 -0800 first draft of input mutation handling for aot autograd (#88817) Pull Request resolved: https://github.com/pytorch/pytorch/pull/88817 Approved by: https://github.com/ezyang, https://github.com/wconstab commit 902e4e3926a9333178510f032580e4acd56c40da Author: PyTorch MergeBot <pytorchmergebot@users.noreply.github.com> Date: Wed Nov 23 19:05:13 2022 +0000 Revert "Fix the kineto daemon build condition (#89174)" This reverts commit 9fd00f194ae4e28948a9a03a6382c20dde04e4fd. Reverted https://github.com/pytorch/pytorch/pull/89174 on behalf of https://github.com/robieta due to For some reason this is interacting badly with NVFuser. I think it is instability in kineto, but until we figure out what's going on reverting is a necessary evil. commit 049a0f2cd5916c8392c6bd1adc41c709de892f3a Author: Bin Bao <binbao@fb.com> Date: Wed Nov 23 02:00:44 2022 +0000 [inductor] Update CI model tests (#89499) Summary: 1) Add model inference test 2) Switch model training test to use AMP Pull Request resolved: https://github.com/pytorch/pytorch/pull/89499 Approved by: https://github.com/bertmaher commit 95474e00a9477b1333e13fa95887a2ce05c4a6a6 Author: Jerry Zhang <jerryzh168@gmail.com> Date: Tue Nov 22 20:29:26 2022 -0800 [quant][be] Remove unused util code (#89272) Summary: att Test Plan: python test/test_quantization.py TestQuantizeFx Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/89272 Approved by: https://github.com/andrewor14 commit 128faf2b69f62b55d3ae1b4cb3e24ec594af0009 Author: Jerry Zhang <jerryzh168@gmail.com> Date: Tue Nov 22 20:29:26 2022 -0800 [quant][be] Refactor the error checking code for quantize_per_channel op (#89271) Summary: at Test Plan: make sure it compiles Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/89271 Approve…

The preexisting logic here added in pytorch/functorch#970 was very peculiar: if top_kwargs was non-empty, then the inner compiled function supports kwargs. Naively, this would leave you to expect that there is some sort of correlation between top_kwargs and kwargs. But in fact, they're completely unrelated! top_kwargs is the AOTAutograd configuration knobs (e.g., fw_compiler/bw_compiler), but kwargs is the RUNTIME kwargs that are to be passed to the compiled function. But (1) we don't support this (the function to be compiled only takes a list of tensors) and (2) even if we did support it, conditioning on whether or not you had passed AOTAutograd configuration kwargs to support kwargs at runtime is bonkers. So delete it. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: pytorch#89664 Approved by: https://github.com/voznesenskym

ezyang mentioned this pull request Nov 24, 2022

xfail maml test, instead of running it without fake tensor prop #89645

Closed

This was referenced Nov 24, 2022

Remove fake_tensor_propagation #89646

Closed

Delay verify correctness wrapping to call site. #89662

Closed

github-actions bot requested review from albanD, anjali411, antoniojkim, bdhirsh, Chillee, miladm, SherlockNoMad, voznesenskym and wconstab November 24, 2022 23:46

ezyang requested review from jansel and anijain2305 November 24, 2022 23:46

ezyang mentioned this pull request Nov 24, 2022

lintfix #89665

Closed

ezyang added ciflow/trunk Trigger trunk jobs on your pull request topic: not user facing topic category labels Nov 25, 2022

ezyang added release notes: functorch release notes category; Pertaining to torch.func or pytorch/functorch ciflow/inductor labels Nov 25, 2022

voznesenskym approved these changes Nov 25, 2022

View reviewed changes

ezyang mentioned this pull request Nov 25, 2022

Don't suppress exceptions from backends #89656

Closed

pytorchmergebot closed this in 6168f22 Nov 25, 2022

This was referenced Nov 26, 2022

Unit test fixes #89703

Closed

Add simple assert to detect fake tensors on modules #89722

Closed

facebook-github-bot deleted the gh/ezyang/1584/head branch June 8, 2023 16:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Don't support kwargs at runtime in aot_module_simplified #89664

Don't support kwargs at runtime in aot_module_simplified #89664

ezyang commented Nov 24, 2022 •

edited

pytorch-bot bot commented Nov 24, 2022 •

edited

Don't support kwargs at runtime in aot_module_simplified #89664

Don't support kwargs at runtime in aot_module_simplified #89664

Conversation

ezyang commented Nov 24, 2022 • edited

pytorch-bot bot commented Nov 24, 2022 • edited

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/89664

❌ 1 Failures

ezyang commented Nov 24, 2022 •

edited

pytorch-bot bot commented Nov 24, 2022 •

edited