Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't support kwargs at runtime in aot_module_simplified #89664

Closed
wants to merge 3 commits into from

Conversation

ezyang
Copy link
Contributor

@ezyang ezyang commented Nov 24, 2022

Stack from ghstack (oldest at bottom):

The preexisting logic here added in
pytorch/functorch#970 was very peculiar: if top_kwargs
was non-empty, then the inner compiled function supports kwargs. Naively, this
would leave you to expect that there is some sort of correlation between
top_kwargs and kwargs. But in fact, they're completely unrelated! top_kwargs
is the AOTAutograd configuration knobs (e.g., fw_compiler/bw_compiler), but
kwargs is the RUNTIME kwargs that are to be passed to the compiled function.
But (1) we don't support this (the function to be compiled only takes a list
of tensors) and (2) even if we did support it, conditioning on whether or not
you had passed AOTAutograd configuration kwargs to support kwargs at runtime
is bonkers.

So delete it.

Signed-off-by: Edward Z. Yang ezyang@fb.com

The preexisting logic here added in
pytorch/functorch#970 was very peculiar: if top_kwargs
was non-empty, then the inner compiled function supports kwargs.  Naively, this
would leave you to expect that there is some sort of correlation between
top_kwargs and kwargs.  But in fact, they're completely unrelated!  top_kwargs
is the AOTAutograd configuration knobs (e.g., fw_compiler/bw_compiler), but
kwargs is the RUNTIME kwargs that are to be passed to the compiled function.
But (1) we don't support this (the function to be compiled only takes a list
of tensors) and (2) even if we did support it, conditioning on whether or not
you had passed AOTAutograd configuration kwargs to support kwargs at runtime
is bonkers.

So delete it.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

[ghstack-poisoned]
@pytorch-bot
Copy link

pytorch-bot bot commented Nov 24, 2022

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/89664

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 Failures

As of commit 58f1e5f:

The following jobs have failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

ezyang added a commit that referenced this pull request Nov 24, 2022
The preexisting logic here added in
pytorch/functorch#970 was very peculiar: if top_kwargs
was non-empty, then the inner compiled function supports kwargs.  Naively, this
would leave you to expect that there is some sort of correlation between
top_kwargs and kwargs.  But in fact, they're completely unrelated!  top_kwargs
is the AOTAutograd configuration knobs (e.g., fw_compiler/bw_compiler), but
kwargs is the RUNTIME kwargs that are to be passed to the compiled function.
But (1) we don't support this (the function to be compiled only takes a list
of tensors) and (2) even if we did support it, conditioning on whether or not
you had passed AOTAutograd configuration kwargs to support kwargs at runtime
is bonkers.

So delete it.

Signed-off-by: Edward Z. Yang <ezyangfb.com>

ghstack-source-id: bb1423e0795b52ef8f907af58058a964949c26cd
Pull Request resolved: #89664
The preexisting logic here added in
pytorch/functorch#970 was very peculiar: if top_kwargs
was non-empty, then the inner compiled function supports kwargs.  Naively, this
would leave you to expect that there is some sort of correlation between
top_kwargs and kwargs.  But in fact, they're completely unrelated!  top_kwargs
is the AOTAutograd configuration knobs (e.g., fw_compiler/bw_compiler), but
kwargs is the RUNTIME kwargs that are to be passed to the compiled function.
But (1) we don't support this (the function to be compiled only takes a list
of tensors) and (2) even if we did support it, conditioning on whether or not
you had passed AOTAutograd configuration kwargs to support kwargs at runtime
is bonkers.

So delete it.

Signed-off-by: Edward Z. Yang <ezyangfb.com>

[ghstack-poisoned]
@ezyang ezyang mentioned this pull request Nov 24, 2022
ezyang added a commit that referenced this pull request Nov 24, 2022
The preexisting logic here added in
pytorch/functorch#970 was very peculiar: if top_kwargs
was non-empty, then the inner compiled function supports kwargs.  Naively, this
would leave you to expect that there is some sort of correlation between
top_kwargs and kwargs.  But in fact, they're completely unrelated!  top_kwargs
is the AOTAutograd configuration knobs (e.g., fw_compiler/bw_compiler), but
kwargs is the RUNTIME kwargs that are to be passed to the compiled function.
But (1) we don't support this (the function to be compiled only takes a list
of tensors) and (2) even if we did support it, conditioning on whether or not
you had passed AOTAutograd configuration kwargs to support kwargs at runtime
is bonkers.

So delete it.

Signed-off-by: Edward Z. Yang <ezyangfb.com>

ghstack-source-id: c691d0de2c471c4503b4048c995a86c93f8b6101
Pull Request resolved: #89664
The preexisting logic here added in
pytorch/functorch#970 was very peculiar: if top_kwargs
was non-empty, then the inner compiled function supports kwargs.  Naively, this
would leave you to expect that there is some sort of correlation between
top_kwargs and kwargs.  But in fact, they're completely unrelated!  top_kwargs
is the AOTAutograd configuration knobs (e.g., fw_compiler/bw_compiler), but
kwargs is the RUNTIME kwargs that are to be passed to the compiled function.
But (1) we don't support this (the function to be compiled only takes a list
of tensors) and (2) even if we did support it, conditioning on whether or not
you had passed AOTAutograd configuration kwargs to support kwargs at runtime
is bonkers.

So delete it.

Signed-off-by: Edward Z. Yang <ezyangfb.com>

[ghstack-poisoned]
@ezyang ezyang added ciflow/trunk Trigger trunk jobs on your pull request topic: not user facing topic category labels Nov 25, 2022
@ezyang ezyang added release notes: functorch release notes category; Pertaining to torch.func or pytorch/functorch ciflow/inductor labels Nov 25, 2022
JakubPietrakIntel added a commit to JakubPietrakIntel/pytorch that referenced this pull request Dec 7, 2022
commit 63ebc8d6a000199e963d29b6c8a0f54d3150872b
Author: Jakub Pietrak <jakub.pietrak@intel.com>
Date:   Thu Dec 1 13:32:03 2022 +0100

    rm print

commit 2c8ffeaf1b2168ed9ad4ca6b192a1231fb036760
Author: Jakub Pietrak <jakub.pietrak@intel.com>
Date:   Thu Dec 1 11:35:02 2022 +0100

    pytorch_sparse.matmul to torch.sparse.matmul

commit ee0e184a1ce5dc6ad7005a67621fac19d6fdbb0b
Merge: 4562359b9f 3a858ba8e3
Author: Jakub Pietrak <jakub.pietrak@intel.com>
Date:   Mon Nov 28 14:09:42 2022 +0100

    Merge branch 'gh/mingfeima/85/head' of https://github.com/pytorch/pytorch into pyg-36

commit 4562359b9fb3de301690334a892d44911eda45c8
Merge: deba083400 b5616cd5f4
Author: Jakub Pietrak <jakub.pietrak@intel.com>
Date:   Mon Nov 28 12:22:11 2022 +0000

    Merge branch 'master' of https://github.com/pytorch/pytorch into pyg-36

commit deba0834008ad95af7e3a6603223a0f8a5555967
Merge: 0e1a8522bb a97d0508cb
Author: Jakub Pietrak <jakub.pietrak@intel.com>
Date:   Mon Nov 28 12:19:25 2022 +0000

    Merge branch 'pyg-36' of https://github.com/JakubPietrakIntel/pytorch into pyg-36

commit 0e1a8522bb695387816a29bbfcf182962429b3ab
Merge: 059a238619 75bfbc35ca
Author: Jakub Pietrak <jakub.pietrak@intel.com>
Date:   Mon Nov 28 12:16:35 2022 +0000

    Merge remote-tracking branch 'origin/gh/mingfeima/85/head' into pyg-36

commit b5616cd5f4fc150138b79d3396a603eda6a7a8a8
Author: Michael Voznesensky <voznesenskym@gmail.com>
Date:   Mon Nov 28 05:12:37 2022 +0000

    Add simple assert to detect fake tensors on modules (#89723)

    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89723
    Approved by: https://github.com/ezyang

commit db1f1144f1303db45e0b9d96e4bb6bdd87c80e5a
Author: Edward Z. Yang <ezyang@fb.com>
Date:   Sat Nov 26 13:52:28 2022 -0800

    Beef up AOTAutograd logging with aot_id and input descriptions (#89710)

    A few things in this PR, that I found useful while debugging some
    recent issues:

    - We now allocate an aot_id to each aot_function/aot_module invocation,
      and print it whenever we report error messages and graph output
      logging.  Check the comment for why this sort of thing is useful,
      and also why it's different from nth_graph.  This number is now
      incorporated into aot_graph_name

    - I noticed that nth_graph only gets incremented when backwards is
      compiled.  Because backwards is compiled lazily, this means that
      multiple forward graphs would have gotten the same ID!  I change
      nth_graph to always increment to avoid confusion here.

    - I added a simple describe_input function, which makes use of
      num_params_buffers to tell the user if the input index they're
      looking at is a param/buffer or an input.  With the help of
      https://github.com/pytorch/pytorch/pull/89709 we could give
      even more detailed information about inputs  (we could also
      easily give detailed information about parameters if we stored
      a mapping of index to parameter name, but I didn't need this
      when debugging so I'll let someone else add it if they need
      it.)

    Signed-off-by: Edward Z. Yang <ezyang@fb.com>
    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89710
    Approved by: https://github.com/bdhirsh

commit 5f8848f32901e35cead64d520885f718679c2bbe
Author: Edward Z. Yang <ezyang@fb.com>
Date:   Thu Nov 24 15:26:55 2022 -0500

    Don't suppress log messages for dynamo CI config (#89653)

    Signed-off-by: Edward Z. Yang <ezyang@fb.com>
    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89653
    Approved by: https://github.com/albanD, https://github.com/kit1980

commit 1a2dd6b15e0089a9e45ba4feb90c2d0dfac19238
Author: Edward Z. Yang <ezyang@fb.com>
Date:   Sun Nov 27 19:27:45 2022 -0500

    Add single process version of dynamo distributed hf_Bert tests (#89721)

    It's a lot easier to debug problems in the Dynamo optimization pass if
    you aren't actually triggering a multiprocessing run.  Keep these tests
    around.

    I think the other tests can probably get this treatment too, leaving
    this to future work.

    Signed-off-by: Edward Z. Yang <ezyang@fb.com>
    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89721
    Approved by: https://github.com/voznesenskym

commit 0e7c100c9b7417efb1a8f65778a1e3c9ad10ef3e
Author: Edward Z. Yang <ezyang@fb.com>
Date:   Sat Nov 26 11:25:24 2022 -0800

    Add debug asserts to AOTAutograd for input consistency with compilation (#89702)

    Fixes https://github.com/pytorch/torchdynamo/issues/1927

    Signed-off-by: Edward Z. Yang <ezyang@fb.com>
    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89702
    Approved by: https://github.com/bdhirsh

commit 1f95f24d3003a35568a00b5e5e18439846089b0f
Author: Edward Z. Yang <ezyang@fb.com>
Date:   Sat Nov 26 11:25:24 2022 -0800

    Factor input deduplication into a separate function (#89701)

    It turns out that instead of having a giant blobby aot_dispatch_autograd
    function, we can factor it into a series of wrapper functions, each
    of which successively guarantees more invariants on the inner
    compilation function until the final inner function is quite trivial.
    How exactly you have to wrap the input user functions and the output
    compiled functions can be expressed concisely in Haskell, so I've
    included the Haskell formulation in code comments.

    This PR shows how to do this for input deduplication.  Dealing with the
    rest of the view handling is left to future work.

    This PR should also be a slight performance improvement as deduplicating
    is skipped entirely when there are no duplicate inputs.

    Signed-off-by: Edward Z. Yang <ezyang@fb.com>
    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89701
    Approved by: https://github.com/bdhirsh

commit dcefc8f90fbc86041a7abcce4f227d15c59bd96c
Author: Edward Z. Yang <ezyang@fb.com>
Date:   Sat Nov 26 14:28:56 2022 -0500

    Implement guard_source on RandomValueSource (#89711)

    I audited the pattern matches on the enum and it didn't
    look like this one should apply there.

    Sorry, no test, I know this matters on symbolic-shapes branch
    but I haven't had time to extract out a minimal reproducer.
    Take my word for it.

    Signed-off-by: Edward Z. Yang <ezyang@fb.com>

    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89711
    Approved by: https://github.com/jansel

commit 1da633f98a5da000083c0c47d9e192b2689f867b
Author: Edward Z. Yang <ezyang@fb.com>
Date:   Thu Nov 24 13:57:17 2022 +0000

    Access named parameters/buffers/etc via getattr rather than index (#89625)

    I'm not sure why this never caused problems before.  The error
    manifests as `TypeError: 'MyModule' object is not subscriptable`

    Signed-off-by: Edward Z. Yang <ezyang@fb.com>

    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89625
    Approved by: https://github.com/albanD

commit e36d68af8885f27d8c0b4727ab078bf53e55e7a0
Author: Horace He <chilli@fb.com>
Date:   Thu Nov 24 02:17:37 2022 +0000

    Don't allow recomputing a node that *must* be materialized in the backwards pass (#89171)

    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89171
    Approved by: https://github.com/ngimel

commit b709078dc673cbd5025a1df3eae7f5c60acc2698
Author: Taylor Robie <taylorrobie@fb.com>
Date:   Sat Nov 26 10:33:21 2022 -0800

    [Profiler] Memory profiler part 11: Mark tensors created in the backward pass which don't correspond to parameters. (#88926)

    There are various Tensors created in the backward pass which do not correspond to parameters. We don't want to mark these as gradients, but we do still want to convey as much information as possible. Thus, this PR introduces an AUTOGRAD_DETAIL category. (Which can be grouped with GRADIENT in visualization if one wishes to take a coarse grained view of the world.)

    Differential Revision: [D40868661](https://our.internmc.facebook.com/intern/diff/D40868661/)
    Pull Request resolved: https://github.com/pytorch/pytorch/pull/88926
    Approved by: https://github.com/chaekit

commit 143d2881a844934c95c4ada63b38179d97e65af3
Author: Taylor Robie <taylorrobie@fb.com>
Date:   Sat Nov 26 10:33:19 2022 -0800

    [Profiler] Memory profiler part 10: Mark optimizer state (#88925)

    This is also a fairly simple pass, since we're simply collecting values from the python tracer.

    Differential Revision: [D40868664](https://our.internmc.facebook.com/intern/diff/D40868664/)
    Pull Request resolved: https://github.com/pytorch/pytorch/pull/88925
    Approved by: https://github.com/chaekit

commit ae725d501e33ed6f823997bea03d99cdc8dae5ff
Author: Taylor Robie <taylorrobie@fb.com>
Date:   Sat Nov 26 10:33:18 2022 -0800

    [Profiler] Memory profiler part 9: Mark activations (#88924)

    This is a fairly straightforward pass: start at inputs and flood fill until we reach the backward pass.

    Differential Revision: [D40868662](https://our.internmc.facebook.com/intern/diff/D40868662/)
    Pull Request resolved: https://github.com/pytorch/pytorch/pull/88924
    Approved by: https://github.com/chaekit

commit 56e40fe054ecb7700142ea9ae7fe37e77800a2da
Author: Yuxin Wu <ppwwyyxx@users.noreply.github.com>
Date:   Sun Nov 27 05:55:24 2022 +0000

    Let SyncBatchNorm fallback to BN if not using distributed training (#89706)

    Fixes #63662
    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89706
    Approved by: https://github.com/soumith

commit 39449ea61d9a6644731687219282f610cbf7cf54
Author: PyTorch MergeBot <pytorchmergebot@users.noreply.github.com>
Date:   Sun Nov 27 02:59:04 2022 +0000

    [vision hash update] update the pinned vision hash (#89692)

    This PR is auto-generated nightly by [this action](https://github.com/pytorch/pytorch/blob/master/.github/workflows/_update-commit-hash.yml).
    Update the pinned vision hash.
    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89692
    Approved by: https://github.com/pytorchbot

commit 483d3a3d07e6694757c5158bc21f7f757f8c82c3
Author: Taylor Robie <taylorrobie@fb.com>
Date:   Sat Nov 26 10:33:16 2022 -0800

    [Profiler] E2E expecttests for category assignment (#88653)

    Up until now the unit tests for category assignment have been narrowly scoped to specific checks on specific Tensors. However as we start to reach reasonable levels of category assignment it's useful to supplement those tests with higher level summary tests to inspect the larger graph and confirm that it makes sense. (It will also be necessary for some categories like activations where it is tedious to record all relevant Tensors.)

    The general structure of these tests is to capture a model invocation with `__torch_dispatch__` and then cross reference those inputs and outputs with the categories assigned by the memory profiler.

    Differential Revision: [D40868659](https://our.internmc.facebook.com/intern/diff/D40868659/)
    Pull Request resolved: https://github.com/pytorch/pytorch/pull/88653
    Approved by: https://github.com/chaekit

commit 0435894bb3b2d60e5da9f993c2a56d95fb03a971
Author: Taylor Robie <taylorrobie@fb.com>
Date:   Sat Nov 26 10:33:14 2022 -0800

    [Profiler] Memory profiler part 8: Mark parameters. (#87568)

    Following the pattern of earlier PRs, we use two methods to extract parameters. The primary one is the Python tracer; both nn.Module and optim.Optimizer collect parameters and in most cases that is sufficient. As a fallback we can analyze the data flow graph and deduce likely parameters based on gradient computation and updates.

    Parameter identification has a circular interaction with input identification. Inputs are defined as "not part of the core forward-backward-update loop", but we need inputs for the parameter identification fallback to give us a proxy for the forward pass. Thus, we mark parameters from the python tracer which limits which Tensors get marked as inputs. While not necessary, it adds a bit of robustness. (As shown by the strengthening of the input unit tests.)

    Differential Revision: [D40238619](https://our.internmc.facebook.com/intern/diff/D40238619/)
    Pull Request resolved: https://github.com/pytorch/pytorch/pull/87568
    Approved by: https://github.com/chaekit

commit 17fa6bf1f57cbbe84a14566efcf00f21e1abe489
Author: Taylor Robie <taylorrobie@fb.com>
Date:   Sat Nov 26 10:33:13 2022 -0800

    [Profiler] Memory profiler part 7: Mark inputs (#87567)

    It is surprisingly difficult to identify the leaves of the data flow graph. The issue is that inputs and pre-existing parameters look identical until parameter identification takes place. It's not too bad for training since Autograd lets us differentiate between them however I still want the tool to do something reasonable in inference.

    Some of this will be ameliorated when a later PR pulls in parameters from python tracing. The current approach is passable, but I will continue to mull over refinements.

    Differential Revision: [D40220388](https://our.internmc.facebook.com/intern/diff/D40220388/)
    Pull Request resolved: https://github.com/pytorch/pytorch/pull/87567
    Approved by: https://github.com/chaekit

commit 64c5c77cd47212da719eb29c3b0a2b07cebb3705
Author: Taylor Robie <taylorrobie@fb.com>
Date:   Sat Nov 26 10:33:11 2022 -0800

    [Profiler] Memory profiler part 6: Mark gradients and temporary intermediates. (#87566)

    Semantic assignment will be built up as a series of passes which gradually pin down the regions of a trace. For this reason it is important to be very meticulous in the assignment of categories.

    We begin with gradients as they are both straightforward to identify and foundational to subsequent analysis. There are two mechanisms that the profiler can use to tag gradients, each with their own advantages and limitations. The first is direct inspection of the op graph which is generic but predicated on certain features of the Autograd engine. (And therefore not necessarily exhaustive.) The second approach is direct instrumentation via the python tracer. This method relies requires that gradients be attached to an nn.Module parameter and can miss corner cases such as `set_to_none=True` due to the cache structure of the python tracer. Combined these two approaches provide very high coverage.

    Temporaries are more straightforward; we can easily add them by trivial local inspection of a data flow node.

    Because this is the first PR in the end-to-end section most of the code is building the scaffolding for category bookkeeping and unit testing. (The actual gradient extraction was covered in an earlier PR.)

    Differential Revision: [D40220389](https://our.internmc.facebook.com/intern/diff/D40220389/)
    Pull Request resolved: https://github.com/pytorch/pytorch/pull/87566
    Approved by: https://github.com/chaekit

commit 5f09a6d573a2a07c00c76c3cbdbffe0fafe2436d
Author: Taylor Robie <taylorrobie@fb.com>
Date:   Sat Nov 26 10:33:09 2022 -0800

    [Profiler] Memory profiler part 5: Data flow graph (#87006)

    The semantic meaning of a Tensor is tightly coupled to its lineage. The data flow graph allows us to identify temporary Tensors, masks, inputs, activations, and more. However one important nuance is that Tensors must be versioned; operations which mutate their inputs can also change the semantic meaning of said inputs.

    It is challenging to assemble a complete picture of the data flow in a PyTorch model because ops can, and often do, recursively call into other ops. For the purpose of memory profiling this is an implementation detail, so instead we traverse the op tree to identify top level ops and allocations and then coalesce their children, folding inputs and outputs into the top level Node.

    Differential Revision: [D40220391](https://our.internmc.facebook.com/intern/diff/D40220391/)
    Pull Request resolved: https://github.com/pytorch/pytorch/pull/87006
    Approved by: https://github.com/chaekit

commit c3116dd78b294f1bd3f6424dc1bfb7ff86bb0a66
Author: Taylor Robie <taylorrobie@fb.com>
Date:   Sat Nov 26 10:33:08 2022 -0800

    [Profiler] Memory profiler part 4: Select top level torch ops (#86880)

    In a later PR we will walk the children of these nodes and formulate a node from the entire bundle to build a data flow graph. This PR simply defines what a "top level" op is.

    Differential Revision: [D40220387](https://our.internmc.facebook.com/intern/diff/D40220387/)
    Pull Request resolved: https://github.com/pytorch/pytorch/pull/86880
    Approved by: https://github.com/chaekit

commit bb77accb4c996e3aab9ae4b665fb8464400c8194
Author: Jiong Gong <jiong.gong@intel.com>
Date:   Sat Nov 26 14:06:44 2022 +0000

    [Inductor] Record cpp kernel in PyTorch Profiler (#89367)

    Add an option `config.cpp.enable_kernel_profile` to record individual cpp kernel time in PyTorch Profiler.

    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89367
    Approved by: https://github.com/jansel

commit 36018a6ee63f140b95ad644d09920798b0c624f8
Author: Edward Z. Yang <ezyang@fb.com>
Date:   Fri Nov 25 13:48:35 2022 -0800

    Don't suppress exceptions from backends (#89656)

    Taken from voz's https://github.com/pytorch/pytorch/pull/89392

    Signed-off-by: Edward Z. Yang <ezyang@fb.com>

    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89656
    Approved by: https://github.com/voznesenskym

commit 3e20d023b1f442ebe59e76604395cd8d4abed52a
Author: Natalia Gimelshein <ngimel@fb.com>
Date:   Sat Nov 26 03:08:23 2022 +0000

    put descriptive kernel names behind config (#89697)

    Per title, generated kernel names are often long and confusing.

    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89697
    Approved by: https://github.com/Chillee

commit 591dfffa38848de54b7f5f4e49260847024c9281
Author: jlukehubbard <58089207+jlukehubbard@users.noreply.github.com>
Date:   Fri Nov 25 21:31:53 2022 +0000

    update docstring for torch.linalg.lstsq (#89383)

    Previous documentation lacked details about the handling of over- and underdetermined systems, and made incorrect mention of MAGMA.

    Fixes #85021

    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89383
    Approved by: https://github.com/lezcano

commit c9a0cc86407d7ec20524b0e26305109d0cf2b5c2
Author: Edward Z. Yang <ezyang@fb.com>
Date:   Fri Nov 25 03:31:20 2022 +0000

    Simplify aot_module_simplified by removing top_args/top_kwargs (#89666)

    This makes good on Chillee's CR comment at
    https://github.com/pytorch/functorch/pull/660/files/af30d351cc93dfafb5a94dbcb32983c5ef65fd6a#r843315222
    which was never done in the original PR.

    There is no logic change, just unpack the args/kwargs at the top
    level and remove the inner function indirection.

    Signed-off-by: Edward Z. Yang <ezyang@fb.com>
    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89666
    Approved by: https://github.com/voznesenskym

commit 6168f22fae66da5703e087bcd10076921ca157e7
Author: Edward Z. Yang <ezyang@fb.com>
Date:   Fri Nov 25 03:31:19 2022 +0000

    Don't support kwargs at runtime in aot_module_simplified (#89664)

    The preexisting logic here added in
    https://github.com/pytorch/functorch/pull/970 was very peculiar: if top_kwargs
    was non-empty, then the inner compiled function supports kwargs.  Naively, this
    would leave you to expect that there is some sort of correlation between
    top_kwargs and kwargs.  But in fact, they're completely unrelated!  top_kwargs
    is the AOTAutograd configuration knobs (e.g., fw_compiler/bw_compiler), but
    kwargs is the RUNTIME kwargs that are to be passed to the compiled function.
    But (1) we don't support this (the function to be compiled only takes a list
    of tensors) and (2) even if we did support it, conditioning on whether or not
    you had passed AOTAutograd configuration kwargs to support kwargs at runtime
    is bonkers.

    So delete it.

    Signed-off-by: Edward Z. Yang <ezyang@fb.com>
    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89664
    Approved by: https://github.com/voznesenskym

commit b04dda4291f1d30b064572e4521e82fa2573af77
Author: Edward Z. Yang <ezyang@fb.com>
Date:   Fri Nov 25 03:31:19 2022 +0000

    Delay verify correctness wrapping to call site. (#89662)

    There is only one call site for compiler_fn, so we can safely delay
    wrapping verify correctness to here.  This will help later when we
    change the backend compiler calling convention to pass fake tensors
    (but I need to pass real tensors here.)

    This is adapted from voz's changes at https://github.com/pytorch/pytorch/pull/89392
    but with less changes to the substantive logic.  I only moved the relevant
    inner implementation; there are no changes otherwise.

    Signed-off-by: Edward Z. Yang <ezyang@fb.com>

    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89662
    Approved by: https://github.com/voznesenskym

commit 61a3fe4b6409965223273c1098f9a77ff071efe1
Author: Natalia Gimelshein <ngimel@fb.com>
Date:   Fri Nov 25 19:42:38 2022 +0000

    make inductor correctly propagate nans for maximum and minimum (#89612)

    Partially fixes https://github.com/pytorch/torchdynamo/issues/594
    Also, small cleanup for `where` codegen

    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89612
    Approved by: https://github.com/soumith, https://github.com/jansel

commit 70c0a3006ee96b3db1f531109fc383f8159e2d2f
Author: Ikko Ashimine <eltociear@gmail.com>
Date:   Fri Nov 25 19:26:18 2022 +0000

    Fix typo in segment_reduction_op_gpu.cu (#89647)

    menber -> member

    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89647
    Approved by: https://github.com/kit1980

commit 2c0bd85c755043d696452ddab354f3ff6775738b
Author: kshitij12345 <kshitijkalambarkar@gmail.com>
Date:   Fri Nov 25 14:53:57 2022 +0000

    complex: register c10::complex with py::cast (#89680)

    Fixes #77134

    TODO:
    * [x] Add test (tested locally with script below) (Are there similar tests in the test-suite?)

    ```c++

    namespace py = pybind11;

    int main() {
        py::scoped_interpreter guard{}; // start the interpreter
        auto casted_cdouble = py::cast(c10::complex<double>(1.0, 2.0));
        assert(
            (c10::complex<double>(1.0, 2.0) ==
             py::cast<c10::complex<double>>(casted_cdouble)));

        auto casted_cfloat = py::cast(c10::complex<float>(1.0, 2.0));
        assert(
            (c10::complex<double>(1.0, 2.0) ==
             py::cast<c10::complex<double>>(casted_cfloat)));

        auto casted_chalf = py::cast(c10::complex<at::Half>(1.0, 2.0));
        assert(
            (c10::complex<double>(1.0, 2.0) ==
             py::cast<c10::complex<double>>(casted_chalf)));
    }

    ```
    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89680
    Approved by: https://github.com/ezyang

commit a97d0508cb5259951bc48300fb914cebdf322bb9
Merge: 849be586e6 abb446af8c
Author: Jakub Pietrak <jakub.pietrak@intel.com>
Date:   Fri Nov 25 15:24:54 2022 +0100

    Merge branch 'master' of https://github.com/pytorch/pytorch into pyg-36

commit 849be586e649421ba58182feb9067a4ac65479e3
Merge: 059a238619 75bfbc35ca
Author: Jakub Pietrak <jakub.pietrak@intel.com>
Date:   Fri Nov 25 14:25:40 2022 +0100

    Merge branch 'gh/mingfeima/85/head' into pyg-36

commit abb446af8c65a49bbc3767e14605a73d244c176b
Author: Alvaro Gaona <alvgaona@gmail.com>
Date:   Fri Nov 25 11:09:28 2022 +0000

    Implement old windows in Python (#87082)

    Relates to #85366

    - Bartlett, Blackman, Hamming, Hann.
    - Except Kaiser which will be in a different PR

    Pull Request resolved: https://github.com/pytorch/pytorch/pull/87082
    Approved by: https://github.com/mruberry, https://github.com/lezcano

commit 059a238619b122f922c569c618919a277420e483
Merge: 26ba2e9751 95ea47ef0c
Author: Jakub Pietrak <97102979+JakubPietrakIntel@users.noreply.github.com>
Date:   Fri Nov 25 10:00:53 2022 +0100

    Merge branch 'pytorch:master' into jpietrak/pyg-36

commit 95ea47ef0c1cffe1fe05cc36bdc47c26cc72f13e
Author: Jason Ansel <jansel@meta.com>
Date:   Fri Nov 25 04:28:36 2022 +0000

    torchdynamo to torch._dynamo in aot_autograd.py (#89385)

    Test Plan: Run torchbench models

    Differential Revision: D41429573

    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89385
    Approved by: https://github.com/soumith, https://github.com/malfet

commit 69043247819042db18ac9526c2d747fa61fe8880
Author: Edward Z. Yang <ezyang@fb.com>
Date:   Thu Nov 24 12:00:13 2022 -0800

    Remove fake_tensor_propagation (#89646)

    You always have to run dynamo with fake tensors.

    Signed-off-by: Edward Z. Yang <ezyang@fb.com>

    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89646
    Approved by: https://github.com/soumith

commit 1aa1014b262b75d4269d9a4d8b562c6ee43a0991
Author: Edward Z. Yang <ezyang@fb.com>
Date:   Thu Nov 24 12:00:12 2022 -0800

    xfail maml test, instead of running it without fake tensor prop (#89645)

    A previous version of this patch graph breaks when torch.tensor fails, but that causes

    ```
    PYTORCH_TEST_WITH_DYNAMO=1 python test/nn/test_embedding.py -k test_embedding_bag_1D_padding_idx_cpu_float32
    ```

    to start failing. Probably another latent bug that needs investigating.

    Signed-off-by: Edward Z. Yang <ezyang@fb.com>

    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89645
    Approved by: https://github.com/albanD

commit a048913e2530442360c36a48420079ca9ebca149
Author: PyTorch MergeBot <pytorchmergebot@users.noreply.github.com>
Date:   Fri Nov 25 03:03:41 2022 +0000

    [vision hash update] update the pinned vision hash (#89667)

    This PR is auto-generated nightly by [this action](https://github.com/pytorch/pytorch/blob/master/.github/workflows/_update-commit-hash.yml).
    Update the pinned vision hash.
    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89667
    Approved by: https://github.com/pytorchbot

commit 3b3ebcd031b68762938806f541d7247a1521bb11
Author: XiaobingSuper <xiaobing.zhang@intel.com>
Date:   Thu Nov 24 02:33:01 2022 -0500

     TorchDynamo: weight prepack for single conv (#89209)

    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89209
    Approved by: https://github.com/jgong5, https://github.com/jansel

commit 0c4f3db7bf24e94125c6802718a1105ee548c953
Author: XiaobingSuper <xiaobing.zhang@intel.com>
Date:   Thu Nov 24 02:32:59 2022 -0500

    TorchDynamo: weight prepack for mkl linear (#89109)

    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89109
    Approved by: https://github.com/jgong5, https://github.com/jansel

commit 07151a6bd62e308b6b32e2e0edfc4d5f0563576e
Author: XiaobingSuper <xiaobing.zhang@intel.com>
Date:   Thu Nov 24 02:32:55 2022 -0500

    TorchDynamo: weight prepack for onednn convolution external call (#88988)

    This PR is about enabled weight prepack using the MKLDNN tensor:
    1.  enable fake tensor mode for MKLDNN tensor input.
    2.  make convolution fusion kernel support MKLDNN tensor input.
    3. do the weight prepack at FX fusion step.

    For better performance, we always use channels_last for CPU convolution path. because we test that the channels_last path can get a better performance than block input path, and also avoid the activation's layout conversion(plain to block, block to plain), currently, there only need plain to plain format conversion.

    Pull Request resolved: https://github.com/pytorch/pytorch/pull/88988
    Approved by: https://github.com/jgong5, https://github.com/jansel

commit 0884fdaba0280e3f3ad2abc34c0940587f744886
Author: Edward Z. Yang <ezyang@fb.com>
Date:   Thu Nov 24 14:31:00 2022 -0500

    Revert "Dont clone unmutated args in triton autotuning (#89519)" (#89652)

    This reverts commit f18f0c70ab10c400947e71be30794e04dcc22acf.

    Testing to see if this fixes gmixer_24_224 mixer_b16_224

    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89652
    Approved by: https://github.com/eellison

commit 4a16f8cdb26be3561742e86f184e59f65418fe63
Author: Edward Z. Yang <ezyang@fb.com>
Date:   Thu Nov 24 09:00:09 2022 -0800

    Reenable fake_tensor_propagation on test_cudnn_rnn (#89644)

    Signed-off-by: Edward Z. Yang <ezyang@fb.com>

    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89644
    Approved by: https://github.com/anjali411

commit fc7dcb684aa38da5b1534fc701657ee63af8909c
Author: Edward Z. Yang <ezyang@fb.com>
Date:   Thu Nov 24 09:00:09 2022 -0800

    Run optimizer tests with fake tensors (#89643)

    This is a slight regression: RAdam and Adagrad don't appear to
    trace at all under fake tensors.  But I think this is a more accurate
    reflection of the current state of affairs.

    Along the way fix some problems on the fake tensor path.

    Signed-off-by: Edward Z. Yang <ezyang@fb.com>

    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89643
    Approved by: https://github.com/anjali411

commit 9b13508ef3a4e858fbbbf068b3a825f1632e8daa
Author: Edward Z. Yang <ezyang@fb.com>
Date:   Thu Nov 24 09:00:08 2022 -0800

    Force test_rng_state to run with fake tensor prop (#89641)

    I'm not really sure what desertfire's intended follow up was
    on https://github.com/pytorch/pytorch/pull/87490 because when I remove
    the unsupported() call, dynamo tests pass.  But the change here is
    conservative and I think strictly better than the current situation.
    The idea is to force fake tensor pop on for the test, and then just
    observe that we are doing a graph break.  Clearly, export doesn't work,
    so I manually xfail it.

    Signed-off-by: Edward Z. Yang <ezyang@fb.com>

    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89641
    Approved by: https://github.com/anjali411

commit c6be06d93ab911a3fbb185451c8cf42bcedad0c1
Author: Edward Z. Yang <ezyang@fb.com>
Date:   Thu Nov 24 09:00:08 2022 -0800

    Easy: These tests work with fake_tensor_propagation on (#89640)

    Signed-off-by: Edward Z. Yang <ezyang@fb.com>

    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89640
    Approved by: https://github.com/anjali411, https://github.com/albanD

commit 6fb6eb0a7498839e69302da7bf8c04205c64e0f3
Author: Edward Z. Yang <ezyang@fb.com>
Date:   Thu Nov 24 08:11:48 2022 -0800

    Support unspecialized integers with dynamic shapes (#89639)

    Previously, we hackily wrapped unspecialized integers into
    tensors and treated them as tensor inputs.  Sometimes, downstream
    operations would not be able to deal with the tensor input.  Now,
    we wrap them into SymInt, so more correct overload selection occurs.

    Signed-off-by: Edward Z. Yang <ezyang@fb.com>

    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89639
    Approved by: https://github.com/anjali411

commit 0c96841a20f0ae9380ef26657914276a42c9c9d7
Author: Edward Z. Yang <ezyang@fb.com>
Date:   Thu Nov 24 08:11:47 2022 -0800

    Cond capture with fake tensors actually works; don't raise in this case (#89638)

    Signed-off-by: Edward Z. Yang <ezyang@fb.com>

    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89638
    Approved by: https://github.com/anjali411

commit d3c012f409a4e4d5a11070a90b5578da82778030
Author: kshitij12345 <kshitijkalambarkar@gmail.com>
Date:   Thu Nov 24 21:41:20 2022 +0000

    [test_nn] split pruning tests from test_nn (#89590)

    Ref: https://github.com/pytorch/pytorch/issues/63085

    Note: Doesn't need corresponding XLA PR as the migrated tests were not run on XLA (as they weren't in TestNNDeviceType).
    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89590
    Approved by: https://github.com/albanD

commit 83666f167dcf023d301f16fad82b9afb374ad836
Author: Aleksandar Samardžić <asamardzic@quansight.com>
Date:   Thu Nov 24 14:44:12 2022 +0000

    Added vectorized CPU code for uint8_t datatype. (#89284)

    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89284
    Approved by: https://github.com/lezcano, https://github.com/peterbell10

commit 9497552771ca59c68509398ab3094e590a3047c5
Author: Howard Huang <howardhuang@meta.com>
Date:   Thu Nov 24 19:41:17 2022 +0000

    Update SyncBatchNorm _all_gather_base to all_gather_into_tensor (#89521)

    Summary: Fixes https://github.com/pytorch/pytorch/issues/88568

    `_all_gather_base` is deprecated. So replacing its usage with `all_gather_into_tensor`

    Test Plan: CI

    Differential Revision: D41479983

    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89521
    Approved by: https://github.com/wz337

commit 94a88b53ed37854379813abf9641d1637fe2688b
Author: Edward Z. Yang <ezyang@fb.com>
Date:   Thu Nov 24 08:11:46 2022 -0800

    Remove fake_tensors_available (#89637)

    As we are one repo now, they are always available.

    Signed-off-by: Edward Z. Yang <ezyang@fb.com>

    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89637
    Approved by: https://github.com/anjali411

commit 1c8b0779de76d0c76d34835047106ab37b41790b
Author: Emilio Castillo <ecastill@preferred.jp>
Date:   Thu Nov 24 18:25:26 2022 +0000

    Fix segfault when swapping custom allocator (#89613)

    Just screwed it before merging ...

    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89613
    Approved by: https://github.com/albanD

commit fd279fe85b8f5a8e74c615436f0b180621b6ef52
Author: Edward Z. Yang <ezyang@fb.com>
Date:   Thu Nov 24 09:23:05 2022 -0500

    Make pytest work again on test/dynamo (#89631)

    Signed-off-by: Edward Z. Yang <ezyang@fb.com>

    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89631
    Approved by: https://github.com/anjali411

commit c3e85d879cdbd3973754760c6767c75276b1dca8
Author: albanD <desmaison.alban@gmail.com>
Date:   Thu Nov 24 17:11:42 2022 +0000

    Mention discrepency between original impl and our impl of RAdam (#89575)

    Fixes https://github.com/pytorch/pytorch/issues/88836

    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89575
    Approved by: https://github.com/mruberry

commit 860bae49e4925868a0221ec4345d08407280bac7
Author: Edward Z. Yang <ezyang@fb.com>
Date:   Wed Nov 23 08:04:31 2022 -0800

    Suppress guards on as_strided call only. (#89569)

    See comment in meta_utils.py for the whole story.

    This doesn't have a substantive impact yet, but will in the next
    PR on the stack.

    Signed-off-by: Edward Z. Yang <ezyang@fb.com>
    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89569
    Approved by: https://github.com/albanD

commit 1588ea0dbf16f37ce14cfc8764666985c16ccbf9
Author: mfkasim1 <firman.kasim@gmail.com>
Date:   Thu Nov 24 11:11:51 2022 +0000

    Added log1p for complex in c10 (#89214)

    One PR towards #89205.
    The content is mostly from PR #38465, but slightly changed the expression to make it faster.

    Here are some benchmarking code:
    ```c++

    // main.cc

    template<typename T> inline std::complex<T> log1p_v0(const std::complex<T> &z) {
        // this PR
        T x = z.real();
        T y = z.imag();
        T theta = std::atan2(y, x + T(1));
        T r = x * (x + T(2)) + y * y;
        return {T(0.5) * std::log1p(r), theta};
    }

    template<typename T> inline std::complex<T> log1p_v1(const std::complex<T> &z) {
        // PR #38465
        T x = z.real();
        T y = z.imag();
        std::complex<T> p1 = z + T(1);
        T r = std::abs(p1);
        T a = std::arg(p1);
        T rm1 = (x * x + y * y + x * T(2)) / (r + 1);
        return {std::log1p(rm1), a};
    }

    template<typename T>
    inline std::complex<T> log1p_v2(const std::complex<T> &z) {
        // naive, but numerically inaccurate
        return std::log(T(1) + z);
    }

    int main() {
        int n = 1000000;
        std::complex<float> res(0.0, 0.0);
        std::complex<float> input(0.5, 2.0);
        auto start = std::chrono::system_clock::now();
        for (int i = 0; i < n; i++) {
            res += log1p_v0(input);
        }
        auto end = std::chrono::system_clock::now();
        auto elapsed = end - start;
        std::cout << "time for v0: " << elapsed.count() << '\n';

        start = std::chrono::system_clock::now();
        for (int i = 0; i < n; i++) {
            res += log1p_v1(input);
        }
        end = std::chrono::system_clock::now();
        elapsed = end - start;
        std::cout << "time for v1: " << elapsed.count() << '\n';

        start = std::chrono::system_clock::now();
        for (int i = 0; i < n; i++) {
            res += log1p_v2(input);
        }
        end = std::chrono::system_clock::now();
        elapsed = end - start;
        std::cout << "time for v2: " << elapsed.count() << '\n';
        std::cout << res << '\n';
    }
    ```

    Compiling the script with command `g++ main.cc` produces the following results:
    ```
    time for v0: 237812271
    time for v1: 414524941
    time for v2: 360585994
    ```

    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89214
    Approved by: https://github.com/lezcano

commit 4f5c4c022a8365d06ac401582958bbf0fd3f8337
Author: Jiewen Tan <jwtan@google.com>
Date:   Thu Nov 24 10:57:01 2022 +0000

    [LTC] Refine MetricsArena::Reset (#89608)

    Summary:
    After counters are reset, getters' behaviors are inconsistent. To improve that, here I 1) move the validation of CounterData into CounterData::IsValid such that it's better encapsulated, 2) divide getters into two groups: a) MetricsArena::GetCounter() and b) MetricsArena::ForEachCounter(), and route MetricsArena::GetCounterNames() and CreateMetricReport() to use b.

    This is paired with pytorch/xla#4217.

    Test Plan:
    PJRT_DEVICE=CPU python xla/test/test_metrics.py

    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89608
    Approved by: https://github.com/JackCaoG

commit a8629a1c18fd13300ce69c1d6042004038885cf0
Author: Jithun Nair <jithun.nair@amd.com>
Date:   Thu Nov 24 10:53:20 2022 +0000

    Upgrade nightly wheels to ROCm5.3 (#89101)

    Dependent on PR https://github.com/pytorch/builder/pull/1193

    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89101
    Approved by: https://github.com/kit1980

commit c0d81aa70ce45a0c2e7ced6c9f42a92d15523188
Author: Ivan Yashchuk <ivan.yashchuk@aalto.fi>
Date:   Thu Nov 24 09:37:10 2022 +0000

    Use fx.replace_pattern for removing empty_like+fill in nvFuser+PrimTorch execution (#89132)

    I learned about `torch.fx.replace_pattern` and it's a cleaner way of removing unnecessary tensor materialization from the graph coming from tracing  C++ code `1 - tensor`.

    Test:
    ```
    python -m pytest test/test_prims.py -k "test_silu_backward_no_filled_tensor"
    ```

    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89132
    Approved by: https://github.com/mruberry, https://github.com/jjsjann123

commit b515c1d96082214e81cc57ce2a1de9164b50206f
Author: Hao Guan <10684225+hguandl@users.noreply.github.com>
Date:   Thu Nov 24 08:14:24 2022 +0000

    [QAT] Check the value of numel to avoid segfault (#81547)

    Fixes #78123

    Segmentation fault

    RuntimeError: numel is out of the bound of input tensor
    Pull Request resolved: https://github.com/pytorch/pytorch/pull/81547
    Approved by: https://github.com/kit1980

commit 22a1b5e243e852e1c423c697e51975d1545d2a1b
Author: Vasiliy Kuznetsov <vasiliy@fb.com>
Date:   Wed Nov 23 13:01:15 2022 -0800

    quantization: deprecate observer compute_dtype and replace with is_dynamic (#85431)

    Summary:

    This PR deprecates the `compute_dtype` field on observers, and replaces
    it with the `is_dynamic` field on observers.  This is better aligned
    with the reference model spec.

    Test plan:

    ```
    python test/test_quantization.py TestQuantizeFx
    python test/test_quantization.py TestQuantizeFxOps
    ```

    Pull Request resolved: https://github.com/pytorch/pytorch/pull/85431
    Approved by: https://github.com/jerryzh168

commit e4ccec6ecab9b48e804d58f60135f0950fca864f
Author: Yanbo Liang <ybliang8@gmail.com>
Date:   Thu Nov 24 05:28:58 2022 +0000

    [Dynamo] Fix bug of using customized torch.autograd.Function (#89397)

    Fixes https://github.com/pytorch/torchdynamo/issues/1899

    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89397
    Approved by: https://github.com/jansel

commit 903ae4570e401e5c4e42dc4a44cae37f805044a4
Author: Michael Lazos <mlazos@fb.com>
Date:   Thu Nov 24 04:15:34 2022 +0000

    Disable optimizer tracing, enable for tests only (#89500)

    Disabling optimizer tracing before launch until it can be added to the benchmark suites without increasing compile times

    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89500
    Approved by: https://github.com/anijain2305

commit c79489c8e69f965f3e5af8f3f39df78e7d4732ba
Author: albanD <desmaison.alban@gmail.com>
Date:   Thu Nov 24 03:39:55 2022 +0000

    Expose to python the backward AD view_func (#89586)

    This will be useful for other systems (AOTAutograd) that want to replay autograd views.

    FYI @bdhirsh
    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89586
    Approved by: https://github.com/soulitzer

commit 4cb6bbbe27162c7b0835879131991d2155329718
Author: Nikita Karetnikov <nikita@karetnikov.org>
Date:   Thu Nov 24 01:02:28 2022 +0100

    Symintify `embedding` (#89327)

    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89327
    Approved by: https://github.com/ezyang

commit 9c867eae1a7fffb6f893717073150cff04a923a4
Author: Wu, Chunyuan <chunyuan.wu@intel.com>
Date:   Wed Nov 23 20:10:41 2022 +0000

    nnc: fix Store if value is fp32 while buf is bf16 (#86788)

    Fixes https://github.com/pytorch/pytorch/issues/86533.
    For the below graph:
    ```bash
    [DUMP kernel.cpp:1690] TensorExprKernel graph:
    [DUMP kernel.cpp:1690] graph(%x.1 : BFloat16(10, strides=[1], requires_grad=0, device=cpu)):
    [DUMP kernel.cpp:1690]   %1 : int = prim::Constant[value=0]()
    [DUMP kernel.cpp:1690]   %2 : BFloat16(10, strides=[1], requires_grad=0, device=cpu) = aten::pow(%x.1, %1) # test/test_tensorexpr.py:1330:29
    [DUMP kernel.cpp:1690]   %3 : BFloat16(10, strides=[1], requires_grad=0, device=cpu) = aten::sin(%2) # test/test_tensorexpr.py:1330:19
    [DUMP kernel.cpp:1690]   return (%3)
    ```

    **Loop stmt before the fix:**
    The store value `0.8414709568023682f` is float while the scalar_type of the store buf `aten_sin` is bf16.
    ```bash
    [DEBUG llvm_codegen.cpp:489] After HalfRewriter {
    [DEBUG llvm_codegen.cpp:489]   aten_sin[Ramp(0ll, 1ll, 8)] = Broadcast(0.8414709568023682f, 8);
    [DEBUG llvm_codegen.cpp:489]   for (int64_t i_1_tail_tail = 0ll; i_1_tail_tail < 2ll; i_1_tail_tail++) {
    [DEBUG llvm_codegen.cpp:489]     aten_sin[i_1_tail_tail + 8ll] = 0.8414709568023682f;
    [DEBUG llvm_codegen.cpp:489]   }
    [DEBUG llvm_codegen.cpp:489] }
    ```

    **Loop stmt after the fix:**
    ```bash
    [DEBUG llvm_codegen.cpp:489] After HalfRewriter {
    [DEBUG llvm_codegen.cpp:489]   aten_sin[Ramp(0ll, 1ll, 8)] = bfloat16(Broadcast(0.8414709568023682f, 8));
    [DEBUG llvm_codegen.cpp:489]   for (int64_t i_1_tail_tail = 0ll; i_1_tail_tail < 2ll; i_1_tail_tail++) {
    [DEBUG llvm_codegen.cpp:489]     aten_sin[i_1_tail_tail + 8ll] = bfloat16(0.8414709568023682f);
    [DEBUG llvm_codegen.cpp:489]   }
    [DEBUG llvm_codegen.cpp:489] }
    ```
    Pull Request resolved: https://github.com/pytorch/pytorch/pull/86788
    Approved by: https://github.com/EikanWang, https://github.com/kit1980

commit f0e5bc4b9f231b438f76ddd13b2c21b7cb8a09ac
Author: Zhijing Li (Accelerator Enablement) <tissue030@meta.com>
Date:   Thu Nov 24 02:18:32 2022 +0000

    Symintified layer_norm (#89466)

    Summary: As titled.

    Test Plan:
    ```
    buck2 run mode/opt scripts/wwei6:test_executorch
    ```

    Differential Revision: D41451390

    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89466
    Approved by: https://github.com/frank-wei, https://github.com/ezyang

commit fdb2dd113d3aec0acb2a473de6be49940ab6a115
Author: Alexander Grund <alexander.grund@tu-dresden.de>
Date:   Thu Nov 24 01:52:11 2022 +0000

    Install missing VSX headers (POWER) (#85547)

    E.g. `test_cpp_extensions_aot_ninja` fails as it includes `vec.h` which requires the vec/vsx/* headers and `sleef.h`. The latter is also required for AVX512 builds on non MSVC compilers.

    Pull Request resolved: https://github.com/pytorch/pytorch/pull/85547
    Approved by: https://github.com/kit1980

commit e922bd4e523b0a30f6607f6497ac458571e00131
Author: Wei-Sheng Chin <wschin@outlook.com>
Date:   Thu Nov 24 01:30:09 2022 +0000

    [ONNX] Move two headers from .h to .cc (#86852)

    As title. Header dependency should be as small as possible.
    Pull Request resolved: https://github.com/pytorch/pytorch/pull/86852
    Approved by: https://github.com/titaiwangms, https://github.com/BowenBao

commit 23fe2ff910fd1577281a2210d1184aff705191b8
Author: Shunting Zhang <shunting@meta.com>
Date:   Thu Nov 24 01:28:10 2022 +0000

    verify the number of outputs of xla graph (#89536)

    This PR add tests to verify the behavior of number of outputs returns by an XLA graph. The understanding from this PR will help us fix https://github.com/pytorch/torchdynamo/issues/1908 and enable training for dynamo/torchxla integration eventually. Send this PR separately so Jack could help verify if the behavior is expected and play with it.

    List some code snippets here since their behavior is not straightforward at a first glance:
    ```
        def forward(self, a, b, c):
            """
            The XLA graph will only return the first 2 items
            """
            return a + b, a + c, b
    ```

    ```
        def forward(self, a, b, c):
            """
            Inplace update on b cause it to be returned in XLA graph
            """
            b.zero_()
            return a + b, a + c, b
    ```

    ```
        def forward(self, a, b, c):
            """
            Even if we return b twice, the XLA graph only return b once.
            """
            b.zero_()
            return a + b, a + c, b, b
    ```

    Here are what observed by the added tests:

    1. XLA does not return outputs that are also inputs -- if the tensor is not inplace updated. At first glance people may feel curious why should we consider this kind of 'non-realistic' corner case. But this kind of graphs indeed shows up in AOTAutograd. The main reason is AOTAutograd lift all model parameters/buffers as graph input and may return some of them.  Check ***test_direct_return***
    2. if a tensor is inplace updated, XLA will still return it as graph output even if it's also an input.  The only difference compared to item 1 is, the inplace updating on the tensor cause it being returned. This happens for BatchNorm2d since the running_mean/variance tensors will be inplace updated during training. Check ***test_direct_return_with_inplace_update***

    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89536
    Approved by: https://github.com/jansel

commit 0bde5149819e9854bca1363aa6c9f52f7db2496e
Author: Nikita Shulga <nshulga@meta.com>
Date:   Thu Nov 24 00:57:17 2022 +0000

    Add `c10::` namespace in front of `optional` (#89605)

    Prep change for moving the codebase to C++17 standard
    Was part of https://github.com/pytorch/pytorch/pull/85969

    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89605
    Approved by: https://github.com/weiwangmeta, https://github.com/kit1980

commit e19a7165fd1a9a35fcac42706c20e658776c10ab
Author: foram-chandra <96388449+foram-chandra@users.noreply.github.com>
Date:   Thu Nov 24 00:34:26 2022 +0000

    [nn] Remove deprecation warning from nn.functional.{tanh, sigmoid} (#86905)

    Fixes #65909

    Pull Request resolved: https://github.com/pytorch/pytorch/pull/86905
    Approved by: https://github.com/albanD, https://github.com/kit1980

commit a00bd6f686d7a485f7bea5f971b7e793118842b8
Author: clee2000 <44682903+clee2000@users.noreply.github.com>
Date:   Wed Nov 23 23:48:32 2022 +0000

    Don't run auto request review on forked PRs (#89583)

    tested on https://github.com/pytorch/pytorch/pull/89581
    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89583
    Approved by: https://github.com/albanD, https://github.com/malfet

commit 0a1a53083e331b3648ad4cb6f750d130e3530731
Author: Nikita Karetnikov <nikita@karetnikov.org>
Date:   Wed Nov 23 20:42:55 2022 +0000

    [primTorch] Enable regex error testing for some refs (#87765)

    Pull Request resolved: https://github.com/pytorch/pytorch/pull/87765
    Approved by: https://github.com/mruberry

commit 3ad2a032f4924d58c556b80840f6d51aa8a4472b
Author: Nikita Shulga <nshulga@meta.com>
Date:   Wed Nov 23 23:23:24 2022 +0000

    Update default cmake to 3.18 (#89570)

    Set `cmake.dir` to `/usr/local` in `.circleci/scripts/build_android_gradle.sh `
    Prep change for raising compiler standard to C++17: cmake-3.18 is the first one to support CUDA17 language

    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89570
    Approved by: https://github.com/atalman

commit 8695f0cced016d43298b43a4baf30315061fdacd
Author: Jane Xu <janeyx@meta.com>
Date:   Wed Nov 23 23:23:17 2022 +0000

    Rectify `native_batch_norm` schema by splitting it into two legit schemas (#88697)

    Using the same repro from the issue (but with BatchNorm2D)

    Rectifies native_batch_norm schema by splitting the schema into 2:
    1. one will have NON-optional alias-able running_mean and running_var inputs
    2. the other will just not have those parameters at all (no_stats variation)

    **Calling for name suggestions!**
    I've added tests in test_functionalization.py as well as an entry in common_method_invocations.py for `native_batch_norm_legit`
    CI should pass.
    Because of bc/fc reasons, we reroute native_batch_norm to call our new schemas ONLY through the python dispatcher, but in 2 weeks or so, we should make `native_batch_norm_legit` the official batch_norm.

    Pull Request resolved: https://github.com/pytorch/pytorch/pull/88697
    Approved by: https://github.com/albanD

commit a00efe55c3790789b967facf10c3f426faa98155
Author: Everton Constantino <everton.constantino@linaro.org>
Date:   Wed Nov 23 22:46:29 2022 +0000

    Fix CheckOutputStreamSetting on JitLoggingTest as it failed if logging wasn't enabled. (#82722)

    `JIT_LOG` checks if logging was enabled for that particular file and when it isn't it doesn't output anything. Since the test checks for the size of `test_stream` it fails. I believe forcing the file to have logging enabled to see if the stream is being correctly set during test makes no sense so this patches just forcibly outputs and checks if it worked.

    Pull Request resolved: https://github.com/pytorch/pytorch/pull/82722
    Approved by: https://github.com/davidberard98

commit b8d3afd88665de5f01f696333d0ff291bd94a57b
Author: Huy Do <huydhn@gmail.com>
Date:   Wed Nov 23 22:39:36 2022 +0000

    Skip upload test stats for test reports from rerun disabled tests workflow (#89548)

    I have found the reason why uploading tests stats fails for rerun disabled workflow, for example https://github.com/pytorch/pytorch/actions/runs/3522896778/jobs/5917765699.  The problem is that the pytest XML file is now too big to be processed quickly (x50 bigger). Unlike unittest, `pytest-flakefinder` used by rerun disabled tests for test_ops includes skipped messages multiple times (50 times by default, retrying and skipping).  This slows down the upload test stats script too much (O(n)) because it tries to gather all the stats. On the other hand, `check_disabled_tests` doesn't suffer from the same issue because it ignores all these skipped messages.

    This is a quick fix to skip test reports from rerun disabled tests workflow when trying to upload test stats.

    I'll try to fix this properly later in the way we use pytest-flakefinder. From what I see, a zipped test report from rerun disabled test is only few MB ([example](https://gha-artifacts.s3.amazonaws.com/pytorch/pytorch/3521687954/1/artifact/test-reports-test-default-1-2-linux.2xlarge_9636028803.zip)), but will balloon up to a much bigger XML file after extracting from a dozen to a few hundred MB (text).  The size of the zipped file is not a big immediate problem

    [3521687954](https://github.com/pytorch/pytorch/actions/runs/3521687954) is an example workflow with rerun disabled tests and mem leak check.  The script can now finish when running locally:

    * `upload_test_stats` finishes around 3+ minutes
    ```
    time python -m tools.stats.upload_test_stats --workflow-run-id 3521687954 --workflow-run-attempt 1 --head-branch master
    ...
    Writing 8925 documents to S3
    Done!
    Writing 1760 documents to S3
    Done!
    Writing 1675249 documents to S3
    Done!
    python3 -m tools.stats.upload_test_stats --workflow-run-id 3521687954  1    185.69s user 12.89s system 75% cpu 4:22.82 total
    ```

    * `check_disabled_tests` finishes within 3 minutes
    ```
    time python -m tools.stats.check_disabled_tests --workflow-run-id 3521687954 --workflow-run-attempt 1 --repo pytorch/pytorch
    ...
    python -m tools.stats.check_disabled_tests --workflow-run-id 3521687954  1    154.19s user 4.17s system 97% cpu 2:42.50 total
    ```

    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89548
    Approved by: https://github.com/clee2000

commit f18f0c70ab10c400947e71be30794e04dcc22acf
Author: Elias Ellison <elias.ellison@gmail.com>
Date:   Wed Nov 23 19:02:51 2022 +0000

    Dont clone unmutated args in triton autotuning (#89519)

    Improves first memory compression on pytorch struct from .55 -> .73. However, it doesn't totally eliminate the overhead from autotuning. Any other pointers on where the overhead is coming from in autotuning would be great.

    Edit: i think it's just the triton cache clearing https://github.com/openai/triton/blob/44f577984d28ee979f704e2c28a1dcbac9639840/python/triton/testing.py#L159

    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89519
    Approved by: https://github.com/ngimel, https://github.com/jansel

commit ac19c5be82febc2140d4601c98daf45646a399ab
Author: Peter Bell <peterbell10@live.co.uk>
Date:   Tue Nov 22 22:26:21 2022 +0000

    FFT: disable dimension wrapping for scalar tensors (#89234)

    Fixes #88985

    By default, `maybe_wrap_dim` allows through `dim=0` or `dim=-1`
    for scalar tensors which leads to an invalid dimension being used to
    index into `tensor.sizes()` as in the code sample from the issue.

    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89234
    Approved by: https://github.com/mruberry

commit 50e2e4faf38c6ebafacc43b72c40333f1f7b401e
Author: Pearu Peterson <pearu.peterson@gmail.com>
Date:   Wed Nov 23 12:05:37 2022 +0200

    Sparse CSC/BSR/BSC serialization and pickle support (#89553)

    Fixes https://github.com/pytorch/pytorch/issues/89497

    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89553
    Approved by: https://github.com/cpuhrsch

commit a8d6b82167ef417e21c807cb29d7eabea15014da
Author: Elias Ellison <elias.ellison@gmail.com>
Date:   Wed Nov 23 16:47:43 2022 +0000

    Fix norm decomp when dtype is passed in (#89508)

    Fix for https://github.com/pytorch/torchdynamo/issues/1889. The wrapper was doing a downcast even when the dtype was explicitly passed in.

    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89508
    Approved by: https://github.com/anijain2305

commit 72110d783344c4121730b032ca0d269896604dcf
Author: Elias Ellison <elias.ellison@gmail.com>
Date:   Wed Nov 23 17:03:09 2022 +0000

    Fix Upsample Decomp Striding For Small Channels (#89528)

    Fix for https://github.com/pytorch/torchdynamo/issues/623.

    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89528
    Approved by: https://github.com/ngimel, https://github.com/anijain2305

commit b7483be06afe8d4242adeb559cfbe6e0e89419d0
Author: Jerry Zhang <jerryzh168@gmail.com>
Date:   Wed Nov 23 11:03:45 2022 -0800

    [quant][docs] Add docstrings for operators defined in torch.ops.quantized_decomposed namespace (#89547)

    Summary:
    no functionality changes

    Test Plan:
    NA

    Reviewers:

    Subscribers:

    Tasks:

    Tags:

    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89547
    Approved by: https://github.com/vkuzo

commit a188f05e8c1788d393c072868421991dfcb55b02
Author: Natalia Gimelshein <ngimel@fb.com>
Date:   Wed Nov 23 20:18:54 2022 +0000

    Reland #89031 Added conv constraint that infers layouts (#89530)

    Relands #89031
    Per title. We now set strides from fx graph only for convolutions and mm, which is a hack, but bmm in some cases caused extra copy, and there is no obvious way to fix that, we should rethink the strides anyway.

    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89530
    Approved by: https://github.com/Chillee

commit e800d27b10137727c68cb71bccabe3a93cf38e9e
Author: William Wen <williamwen@fb.com>
Date:   Wed Nov 23 20:11:39 2022 +0000

    [dashboard] Add graphs for all summary metrics, add additional testing flags (#89580)

    Title. Test post: https://github.com/pytorch/torchdynamo/issues/1831#issuecomment-1325572179

    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89580
    Approved by: https://github.com/davidberard98

commit 953f39578a7019c4c34bc1dbd6cb0facb554af79
Author: Charlie West-Taylor <charliew@graphcore.ai>
Date:   Wed Nov 23 19:51:50 2022 +0000

    Mark IPU device as not supports_as_strided (#89130)

    Currently causes issues in calls to `.to`.
    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89130
    Approved by: https://github.com/albanD

commit 37e46a503502cdeda791cf684522ef83b5655328
Author: Yanbo Liang <ybliang8@gmail.com>
Date:   Wed Nov 23 19:44:46 2022 +0000

    [Dynamo] Fix several bugs & code refactor in RangeVariable (#89322)

    Fix bug in [7k github models](https://github.com/pytorch/torchdynamo/issues/1884): https://github.com/jansel/pytorch-jit-paritybench/blob/master/generated/test_clovaai_stargan_v2.py
    ```
    E       TypeError: 'list' object cannot be interpreted as an integer
    E
    E       from user code:
    E          File "/scratch/ybliang/work/repos/pytorch-jit-paritybench/generated/test_clovaai_stargan_v2.py", line 335, in forward
    E           idx = torch.LongTensor(range(y.size(0)))
    ```

    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89322
    Approved by: https://github.com/jansel

commit 91dcef41ae96ede3f07375c2d38cb28d534e97f8
Author: Xilun Wu <12968408+XilunWu@users.noreply.github.com>
Date:   Wed Nov 23 19:43:28 2022 +0000

    Thread PG: add allreduce to threaded pg (#89043)

    Summary:
    Goal
    Add `all_reduce` collective  to multi-threaded ProcessGroup added in D40236769 (https://github.com/pytorch/pytorch/commit/6663ae5537f3c61030ba4d425bd57a097c51430a).

    Code Motion
    Added `allreduce` collective to ProcessLocalGroup (a subclass of c10d ProcessGroup).

    What's Next
    Add a DDP test utilizing the new allreduce op.
    Generalize `allreduce` to allow other `ReduceOp`s besides `SUM`.

    Test Plan:
    cd fbcode/caffe2
    buck2 test mode/dev //caffe2/test/distributed:multi_threaded

    Differential Revision: D41046606

    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89043
    Approved by: https://github.com/wanchaol

commit 27db806888c36b029f51197a40e5196cc10792db
Author: Charlie West-Taylor <charliew@graphcore.ai>
Date:   Wed Nov 23 19:41:07 2022 +0000

    Handle Tensor.__deepcopy__ via clone(), on IPU (#89129)

    Currently it falls through to a call to `storage()`, which the IPU doesn't support.

    I've made the minimal change here for ease of merging (this'd help us if it was in for 1.13.1), however...

    **QUESTION**: Is there any reason why `not torch._C._has_storage(self)` needs to *also* be guarded on `self.device.type == privateuseone`? in other words, could the condition for using `clone` not be this?

    ```python
    self.is_sparse
    or self.device.type
    in ["lazy", "xla", "mps", "ort", "meta", "hpu", "ipu"]
    or not torch._C._has_storage(self)
    or (type(self) is not Tensor and self.data_ptr() == 0)
    ```

    If the condition fails, the very next thing is a call to `self._typed_storage()` which will fail, so it feels to me like *any* case without storage shouldn't fall through to the `storage()` call.

    The original PR for adding the 'no storage and device is `PrivateUse1`' condition ([86557](https://github.com/pytorch/pytorch/pull/86557)) doesn't discuss whether this could be broadened.
    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89129
    Approved by: https://github.com/albanD

commit fa7a963f6536dd05c381fbf23270f4f009f9f113
Author: Sergii Dymchenko <sdym@fb.com>
Date:   Wed Nov 23 19:39:47 2022 +0000

    Remove BaseException TODO (#89540)

    After discussion in https://github.com/pytorch/pytorch/pull/88461#issuecomment-1318965664
    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89540
    Approved by: https://github.com/H-Huang

commit 9eed6b7f9aa4f5fc65075de3189acc9add221660
Author: Yanbo Liang <ybliang8@gmail.com>
Date:   Wed Nov 23 19:39:43 2022 +0000

    [Dynamo] Several fixes on TensorVariable & TorchVariable (#89486)

    This is a group of bug fixes for [7k github models](https://github.com/pytorch/torchdynamo/issues/1884), it would fix 30+ model tests.
    * Support ```tensor.type()```.
    * Support ```tensor.get_device()```.
    * Support ```torch.nn.functional._Reduction.get_enum```.
    * Support ```torch._utils._get_device_index()```.
    * Fallback ```tensor.data_ptr()```.
      * ```FakeTensor``` always returns 0
      * For no fake tensor propagation, we ```clone``` the input tensor, which makes no sense to track the original ```data_ptr```. And I don't think this is a very popular API.

    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89486
    Approved by: https://github.com/jansel

commit f03e6672fb6a694d6f03980e3f34d8181c7cc663
Author: Iris <wz337@cornell.edu>
Date:   Wed Nov 23 19:39:01 2022 +0000

    [Checkpoint][2D] Minor update for dedup_tensors.py (#89542)

    Rename variables for better readability.

    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89542
    Approved by: https://github.com/H-Huang

commit 74703eb50299b26082bc2a357770739a68460199
Author: Iris <wz337@cornell.edu>
Date:   Wed Nov 23 19:36:01 2022 +0000

    [Checkpoint] Add a logger to dedup_tensors (#89503)

    Add a logger to dedup_tensors to log the duplicate keys to remove in global plan (List of SavePlan).

    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89503
    Approved by: https://github.com/fduwjj

commit 57353c9608263df98156a73aaa6ed35a2a2306ad
Author: Brian Hirsh <hirsheybar@fb.com>
Date:   Wed Nov 23 08:29:08 2022 -0800

    first draft of input mutation handling for aot autograd (#88817)

    Pull Request resolved: https://github.com/pytorch/pytorch/pull/88817
    Approved by: https://github.com/ezyang, https://github.com/wconstab

commit 902e4e3926a9333178510f032580e4acd56c40da
Author: PyTorch MergeBot <pytorchmergebot@users.noreply.github.com>
Date:   Wed Nov 23 19:05:13 2022 +0000

    Revert "Fix the kineto daemon build condition (#89174)"

    This reverts commit 9fd00f194ae4e28948a9a03a6382c20dde04e4fd.

    Reverted https://github.com/pytorch/pytorch/pull/89174 on behalf of https://github.com/robieta due to For some reason this is interacting badly with NVFuser. I think it is instability in kineto, but until we figure out what's going on reverting is a necessary evil.

commit 049a0f2cd5916c8392c6bd1adc41c709de892f3a
Author: Bin Bao <binbao@fb.com>
Date:   Wed Nov 23 02:00:44 2022 +0000

    [inductor] Update CI model tests (#89499)

    Summary:
    1) Add model inference test
    2) Switch model training test to use AMP

    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89499
    Approved by: https://github.com/bertmaher

commit 95474e00a9477b1333e13fa95887a2ce05c4a6a6
Author: Jerry Zhang <jerryzh168@gmail.com>
Date:   Tue Nov 22 20:29:26 2022 -0800

    [quant][be] Remove unused util code (#89272)

    Summary:
    att

    Test Plan:
    python test/test_quantization.py TestQuantizeFx

    Reviewers:

    Subscribers:

    Tasks:

    Tags:

    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89272
    Approved by: https://github.com/andrewor14

commit 128faf2b69f62b55d3ae1b4cb3e24ec594af0009
Author: Jerry Zhang <jerryzh168@gmail.com>
Date:   Tue Nov 22 20:29:26 2022 -0800

    [quant][be] Refactor the error checking code for quantize_per_channel op (#89271)

    Summary:
    at

    Test Plan:
    make sure it compiles

    Reviewers:

    Subscribers:

    Tasks:

    Tags:
    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89271
    Approve…
JakubPietrakIntel added a commit to JakubPietrakIntel/pytorch that referenced this pull request Dec 7, 2022
commit 63ebc8d6a000199e963d29b6c8a0f54d3150872b
Author: Jakub Pietrak <jakub.pietrak@intel.com>
Date:   Thu Dec 1 13:32:03 2022 +0100

    rm print

commit 2c8ffeaf1b2168ed9ad4ca6b192a1231fb036760
Author: Jakub Pietrak <jakub.pietrak@intel.com>
Date:   Thu Dec 1 11:35:02 2022 +0100

    pytorch_sparse.matmul to torch.sparse.matmul

commit ee0e184a1ce5dc6ad7005a67621fac19d6fdbb0b
Merge: 4562359b9f 3a858ba8e3
Author: Jakub Pietrak <jakub.pietrak@intel.com>
Date:   Mon Nov 28 14:09:42 2022 +0100

    Merge branch 'gh/mingfeima/85/head' of https://github.com/pytorch/pytorch into pyg-36

commit 4562359b9fb3de301690334a892d44911eda45c8
Merge: deba083400 b5616cd5f4
Author: Jakub Pietrak <jakub.pietrak@intel.com>
Date:   Mon Nov 28 12:22:11 2022 +0000

    Merge branch 'master' of https://github.com/pytorch/pytorch into pyg-36

commit deba0834008ad95af7e3a6603223a0f8a5555967
Merge: 0e1a8522bb a97d0508cb
Author: Jakub Pietrak <jakub.pietrak@intel.com>
Date:   Mon Nov 28 12:19:25 2022 +0000

    Merge branch 'pyg-36' of https://github.com/JakubPietrakIntel/pytorch into pyg-36

commit 0e1a8522bb695387816a29bbfcf182962429b3ab
Merge: 059a238619 75bfbc35ca
Author: Jakub Pietrak <jakub.pietrak@intel.com>
Date:   Mon Nov 28 12:16:35 2022 +0000

    Merge remote-tracking branch 'origin/gh/mingfeima/85/head' into pyg-36

commit b5616cd5f4fc150138b79d3396a603eda6a7a8a8
Author: Michael Voznesensky <voznesenskym@gmail.com>
Date:   Mon Nov 28 05:12:37 2022 +0000

    Add simple assert to detect fake tensors on modules (#89723)

    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89723
    Approved by: https://github.com/ezyang

commit db1f1144f1303db45e0b9d96e4bb6bdd87c80e5a
Author: Edward Z. Yang <ezyang@fb.com>
Date:   Sat Nov 26 13:52:28 2022 -0800

    Beef up AOTAutograd logging with aot_id and input descriptions (#89710)

    A few things in this PR, that I found useful while debugging some
    recent issues:

    - We now allocate an aot_id to each aot_function/aot_module invocation,
      and print it whenever we report error messages and graph output
      logging.  Check the comment for why this sort of thing is useful,
      and also why it's different from nth_graph.  This number is now
      incorporated into aot_graph_name

    - I noticed that nth_graph only gets incremented when backwards is
      compiled.  Because backwards is compiled lazily, this means that
      multiple forward graphs would have gotten the same ID!  I change
      nth_graph to always increment to avoid confusion here.

    - I added a simple describe_input function, which makes use of
      num_params_buffers to tell the user if the input index they're
      looking at is a param/buffer or an input.  With the help of
      https://github.com/pytorch/pytorch/pull/89709 we could give
      even more detailed information about inputs  (we could also
      easily give detailed information about parameters if we stored
      a mapping of index to parameter name, but I didn't need this
      when debugging so I'll let someone else add it if they need
      it.)

    Signed-off-by: Edward Z. Yang <ezyang@fb.com>
    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89710
    Approved by: https://github.com/bdhirsh

commit 5f8848f32901e35cead64d520885f718679c2bbe
Author: Edward Z. Yang <ezyang@fb.com>
Date:   Thu Nov 24 15:26:55 2022 -0500

    Don't suppress log messages for dynamo CI config (#89653)

    Signed-off-by: Edward Z. Yang <ezyang@fb.com>
    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89653
    Approved by: https://github.com/albanD, https://github.com/kit1980

commit 1a2dd6b15e0089a9e45ba4feb90c2d0dfac19238
Author: Edward Z. Yang <ezyang@fb.com>
Date:   Sun Nov 27 19:27:45 2022 -0500

    Add single process version of dynamo distributed hf_Bert tests (#89721)

    It's a lot easier to debug problems in the Dynamo optimization pass if
    you aren't actually triggering a multiprocessing run.  Keep these tests
    around.

    I think the other tests can probably get this treatment too, leaving
    this to future work.

    Signed-off-by: Edward Z. Yang <ezyang@fb.com>
    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89721
    Approved by: https://github.com/voznesenskym

commit 0e7c100c9b7417efb1a8f65778a1e3c9ad10ef3e
Author: Edward Z. Yang <ezyang@fb.com>
Date:   Sat Nov 26 11:25:24 2022 -0800

    Add debug asserts to AOTAutograd for input consistency with compilation (#89702)

    Fixes https://github.com/pytorch/torchdynamo/issues/1927

    Signed-off-by: Edward Z. Yang <ezyang@fb.com>
    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89702
    Approved by: https://github.com/bdhirsh

commit 1f95f24d3003a35568a00b5e5e18439846089b0f
Author: Edward Z. Yang <ezyang@fb.com>
Date:   Sat Nov 26 11:25:24 2022 -0800

    Factor input deduplication into a separate function (#89701)

    It turns out that instead of having a giant blobby aot_dispatch_autograd
    function, we can factor it into a series of wrapper functions, each
    of which successively guarantees more invariants on the inner
    compilation function until the final inner function is quite trivial.
    How exactly you have to wrap the input user functions and the output
    compiled functions can be expressed concisely in Haskell, so I've
    included the Haskell formulation in code comments.

    This PR shows how to do this for input deduplication.  Dealing with the
    rest of the view handling is left to future work.

    This PR should also be a slight performance improvement as deduplicating
    is skipped entirely when there are no duplicate inputs.

    Signed-off-by: Edward Z. Yang <ezyang@fb.com>
    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89701
    Approved by: https://github.com/bdhirsh

commit dcefc8f90fbc86041a7abcce4f227d15c59bd96c
Author: Edward Z. Yang <ezyang@fb.com>
Date:   Sat Nov 26 14:28:56 2022 -0500

    Implement guard_source on RandomValueSource (#89711)

    I audited the pattern matches on the enum and it didn't
    look like this one should apply there.

    Sorry, no test, I know this matters on symbolic-shapes branch
    but I haven't had time to extract out a minimal reproducer.
    Take my word for it.

    Signed-off-by: Edward Z. Yang <ezyang@fb.com>

    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89711
    Approved by: https://github.com/jansel

commit 1da633f98a5da000083c0c47d9e192b2689f867b
Author: Edward Z. Yang <ezyang@fb.com>
Date:   Thu Nov 24 13:57:17 2022 +0000

    Access named parameters/buffers/etc via getattr rather than index (#89625)

    I'm not sure why this never caused problems before.  The error
    manifests as `TypeError: 'MyModule' object is not subscriptable`

    Signed-off-by: Edward Z. Yang <ezyang@fb.com>

    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89625
    Approved by: https://github.com/albanD

commit e36d68af8885f27d8c0b4727ab078bf53e55e7a0
Author: Horace He <chilli@fb.com>
Date:   Thu Nov 24 02:17:37 2022 +0000

    Don't allow recomputing a node that *must* be materialized in the backwards pass (#89171)

    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89171
    Approved by: https://github.com/ngimel

commit b709078dc673cbd5025a1df3eae7f5c60acc2698
Author: Taylor Robie <taylorrobie@fb.com>
Date:   Sat Nov 26 10:33:21 2022 -0800

    [Profiler] Memory profiler part 11: Mark tensors created in the backward pass which don't correspond to parameters. (#88926)

    There are various Tensors created in the backward pass which do not correspond to parameters. We don't want to mark these as gradients, but we do still want to convey as much information as possible. Thus, this PR introduces an AUTOGRAD_DETAIL category. (Which can be grouped with GRADIENT in visualization if one wishes to take a coarse grained view of the world.)

    Differential Revision: [D40868661](https://our.internmc.facebook.com/intern/diff/D40868661/)
    Pull Request resolved: https://github.com/pytorch/pytorch/pull/88926
    Approved by: https://github.com/chaekit

commit 143d2881a844934c95c4ada63b38179d97e65af3
Author: Taylor Robie <taylorrobie@fb.com>
Date:   Sat Nov 26 10:33:19 2022 -0800

    [Profiler] Memory profiler part 10: Mark optimizer state (#88925)

    This is also a fairly simple pass, since we're simply collecting values from the python tracer.

    Differential Revision: [D40868664](https://our.internmc.facebook.com/intern/diff/D40868664/)
    Pull Request resolved: https://github.com/pytorch/pytorch/pull/88925
    Approved by: https://github.com/chaekit

commit ae725d501e33ed6f823997bea03d99cdc8dae5ff
Author: Taylor Robie <taylorrobie@fb.com>
Date:   Sat Nov 26 10:33:18 2022 -0800

    [Profiler] Memory profiler part 9: Mark activations (#88924)

    This is a fairly straightforward pass: start at inputs and flood fill until we reach the backward pass.

    Differential Revision: [D40868662](https://our.internmc.facebook.com/intern/diff/D40868662/)
    Pull Request resolved: https://github.com/pytorch/pytorch/pull/88924
    Approved by: https://github.com/chaekit

commit 56e40fe054ecb7700142ea9ae7fe37e77800a2da
Author: Yuxin Wu <ppwwyyxx@users.noreply.github.com>
Date:   Sun Nov 27 05:55:24 2022 +0000

    Let SyncBatchNorm fallback to BN if not using distributed training (#89706)

    Fixes #63662
    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89706
    Approved by: https://github.com/soumith

commit 39449ea61d9a6644731687219282f610cbf7cf54
Author: PyTorch MergeBot <pytorchmergebot@users.noreply.github.com>
Date:   Sun Nov 27 02:59:04 2022 +0000

    [vision hash update] update the pinned vision hash (#89692)

    This PR is auto-generated nightly by [this action](https://github.com/pytorch/pytorch/blob/master/.github/workflows/_update-commit-hash.yml).
    Update the pinned vision hash.
    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89692
    Approved by: https://github.com/pytorchbot

commit 483d3a3d07e6694757c5158bc21f7f757f8c82c3
Author: Taylor Robie <taylorrobie@fb.com>
Date:   Sat Nov 26 10:33:16 2022 -0800

    [Profiler] E2E expecttests for category assignment (#88653)

    Up until now the unit tests for category assignment have been narrowly scoped to specific checks on specific Tensors. However as we start to reach reasonable levels of category assignment it's useful to supplement those tests with higher level summary tests to inspect the larger graph and confirm that it makes sense. (It will also be necessary for some categories like activations where it is tedious to record all relevant Tensors.)

    The general structure of these tests is to capture a model invocation with `__torch_dispatch__` and then cross reference those inputs and outputs with the categories assigned by the memory profiler.

    Differential Revision: [D40868659](https://our.internmc.facebook.com/intern/diff/D40868659/)
    Pull Request resolved: https://github.com/pytorch/pytorch/pull/88653
    Approved by: https://github.com/chaekit

commit 0435894bb3b2d60e5da9f993c2a56d95fb03a971
Author: Taylor Robie <taylorrobie@fb.com>
Date:   Sat Nov 26 10:33:14 2022 -0800

    [Profiler] Memory profiler part 8: Mark parameters. (#87568)

    Following the pattern of earlier PRs, we use two methods to extract parameters. The primary one is the Python tracer; both nn.Module and optim.Optimizer collect parameters and in most cases that is sufficient. As a fallback we can analyze the data flow graph and deduce likely parameters based on gradient computation and updates.

    Parameter identification has a circular interaction with input identification. Inputs are defined as "not part of the core forward-backward-update loop", but we need inputs for the parameter identification fallback to give us a proxy for the forward pass. Thus, we mark parameters from the python tracer which limits which Tensors get marked as inputs. While not necessary, it adds a bit of robustness. (As shown by the strengthening of the input unit tests.)

    Differential Revision: [D40238619](https://our.internmc.facebook.com/intern/diff/D40238619/)
    Pull Request resolved: https://github.com/pytorch/pytorch/pull/87568
    Approved by: https://github.com/chaekit

commit 17fa6bf1f57cbbe84a14566efcf00f21e1abe489
Author: Taylor Robie <taylorrobie@fb.com>
Date:   Sat Nov 26 10:33:13 2022 -0800

    [Profiler] Memory profiler part 7: Mark inputs (#87567)

    It is surprisingly difficult to identify the leaves of the data flow graph. The issue is that inputs and pre-existing parameters look identical until parameter identification takes place. It's not too bad for training since Autograd lets us differentiate between them however I still want the tool to do something reasonable in inference.

    Some of this will be ameliorated when a later PR pulls in parameters from python tracing. The current approach is passable, but I will continue to mull over refinements.

    Differential Revision: [D40220388](https://our.internmc.facebook.com/intern/diff/D40220388/)
    Pull Request resolved: https://github.com/pytorch/pytorch/pull/87567
    Approved by: https://github.com/chaekit

commit 64c5c77cd47212da719eb29c3b0a2b07cebb3705
Author: Taylor Robie <taylorrobie@fb.com>
Date:   Sat Nov 26 10:33:11 2022 -0800

    [Profiler] Memory profiler part 6: Mark gradients and temporary intermediates. (#87566)

    Semantic assignment will be built up as a series of passes which gradually pin down the regions of a trace. For this reason it is important to be very meticulous in the assignment of categories.

    We begin with gradients as they are both straightforward to identify and foundational to subsequent analysis. There are two mechanisms that the profiler can use to tag gradients, each with their own advantages and limitations. The first is direct inspection of the op graph which is generic but predicated on certain features of the Autograd engine. (And therefore not necessarily exhaustive.) The second approach is direct instrumentation via the python tracer. This method relies requires that gradients be attached to an nn.Module parameter and can miss corner cases such as `set_to_none=True` due to the cache structure of the python tracer. Combined these two approaches provide very high coverage.

    Temporaries are more straightforward; we can easily add them by trivial local inspection of a data flow node.

    Because this is the first PR in the end-to-end section most of the code is building the scaffolding for category bookkeeping and unit testing. (The actual gradient extraction was covered in an earlier PR.)

    Differential Revision: [D40220389](https://our.internmc.facebook.com/intern/diff/D40220389/)
    Pull Request resolved: https://github.com/pytorch/pytorch/pull/87566
    Approved by: https://github.com/chaekit

commit 5f09a6d573a2a07c00c76c3cbdbffe0fafe2436d
Author: Taylor Robie <taylorrobie@fb.com>
Date:   Sat Nov 26 10:33:09 2022 -0800

    [Profiler] Memory profiler part 5: Data flow graph (#87006)

    The semantic meaning of a Tensor is tightly coupled to its lineage. The data flow graph allows us to identify temporary Tensors, masks, inputs, activations, and more. However one important nuance is that Tensors must be versioned; operations which mutate their inputs can also change the semantic meaning of said inputs.

    It is challenging to assemble a complete picture of the data flow in a PyTorch model because ops can, and often do, recursively call into other ops. For the purpose of memory profiling this is an implementation detail, so instead we traverse the op tree to identify top level ops and allocations and then coalesce their children, folding inputs and outputs into the top level Node.

    Differential Revision: [D40220391](https://our.internmc.facebook.com/intern/diff/D40220391/)
    Pull Request resolved: https://github.com/pytorch/pytorch/pull/87006
    Approved by: https://github.com/chaekit

commit c3116dd78b294f1bd3f6424dc1bfb7ff86bb0a66
Author: Taylor Robie <taylorrobie@fb.com>
Date:   Sat Nov 26 10:33:08 2022 -0800

    [Profiler] Memory profiler part 4: Select top level torch ops (#86880)

    In a later PR we will walk the children of these nodes and formulate a node from the entire bundle to build a data flow graph. This PR simply defines what a "top level" op is.

    Differential Revision: [D40220387](https://our.internmc.facebook.com/intern/diff/D40220387/)
    Pull Request resolved: https://github.com/pytorch/pytorch/pull/86880
    Approved by: https://github.com/chaekit

commit bb77accb4c996e3aab9ae4b665fb8464400c8194
Author: Jiong Gong <jiong.gong@intel.com>
Date:   Sat Nov 26 14:06:44 2022 +0000

    [Inductor] Record cpp kernel in PyTorch Profiler (#89367)

    Add an option `config.cpp.enable_kernel_profile` to record individual cpp kernel time in PyTorch Profiler.

    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89367
    Approved by: https://github.com/jansel

commit 36018a6ee63f140b95ad644d09920798b0c624f8
Author: Edward Z. Yang <ezyang@fb.com>
Date:   Fri Nov 25 13:48:35 2022 -0800

    Don't suppress exceptions from backends (#89656)

    Taken from voz's https://github.com/pytorch/pytorch/pull/89392

    Signed-off-by: Edward Z. Yang <ezyang@fb.com>

    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89656
    Approved by: https://github.com/voznesenskym

commit 3e20d023b1f442ebe59e76604395cd8d4abed52a
Author: Natalia Gimelshein <ngimel@fb.com>
Date:   Sat Nov 26 03:08:23 2022 +0000

    put descriptive kernel names behind config (#89697)

    Per title, generated kernel names are often long and confusing.

    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89697
    Approved by: https://github.com/Chillee

commit 591dfffa38848de54b7f5f4e49260847024c9281
Author: jlukehubbard <58089207+jlukehubbard@users.noreply.github.com>
Date:   Fri Nov 25 21:31:53 2022 +0000

    update docstring for torch.linalg.lstsq (#89383)

    Previous documentation lacked details about the handling of over- and underdetermined systems, and made incorrect mention of MAGMA.

    Fixes #85021

    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89383
    Approved by: https://github.com/lezcano

commit c9a0cc86407d7ec20524b0e26305109d0cf2b5c2
Author: Edward Z. Yang <ezyang@fb.com>
Date:   Fri Nov 25 03:31:20 2022 +0000

    Simplify aot_module_simplified by removing top_args/top_kwargs (#89666)

    This makes good on Chillee's CR comment at
    https://github.com/pytorch/functorch/pull/660/files/af30d351cc93dfafb5a94dbcb32983c5ef65fd6a#r843315222
    which was never done in the original PR.

    There is no logic change, just unpack the args/kwargs at the top
    level and remove the inner function indirection.

    Signed-off-by: Edward Z. Yang <ezyang@fb.com>
    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89666
    Approved by: https://github.com/voznesenskym

commit 6168f22fae66da5703e087bcd10076921ca157e7
Author: Edward Z. Yang <ezyang@fb.com>
Date:   Fri Nov 25 03:31:19 2022 +0000

    Don't support kwargs at runtime in aot_module_simplified (#89664)

    The preexisting logic here added in
    https://github.com/pytorch/functorch/pull/970 was very peculiar: if top_kwargs
    was non-empty, then the inner compiled function supports kwargs.  Naively, this
    would leave you to expect that there is some sort of correlation between
    top_kwargs and kwargs.  But in fact, they're completely unrelated!  top_kwargs
    is the AOTAutograd configuration knobs (e.g., fw_compiler/bw_compiler), but
    kwargs is the RUNTIME kwargs that are to be passed to the compiled function.
    But (1) we don't support this (the function to be compiled only takes a list
    of tensors) and (2) even if we did support it, conditioning on whether or not
    you had passed AOTAutograd configuration kwargs to support kwargs at runtime
    is bonkers.

    So delete it.

    Signed-off-by: Edward Z. Yang <ezyang@fb.com>
    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89664
    Approved by: https://github.com/voznesenskym

commit b04dda4291f1d30b064572e4521e82fa2573af77
Author: Edward Z. Yang <ezyang@fb.com>
Date:   Fri Nov 25 03:31:19 2022 +0000

    Delay verify correctness wrapping to call site. (#89662)

    There is only one call site for compiler_fn, so we can safely delay
    wrapping verify correctness to here.  This will help later when we
    change the backend compiler calling convention to pass fake tensors
    (but I need to pass real tensors here.)

    This is adapted from voz's changes at https://github.com/pytorch/pytorch/pull/89392
    but with less changes to the substantive logic.  I only moved the relevant
    inner implementation; there are no changes otherwise.

    Signed-off-by: Edward Z. Yang <ezyang@fb.com>

    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89662
    Approved by: https://github.com/voznesenskym

commit 61a3fe4b6409965223273c1098f9a77ff071efe1
Author: Natalia Gimelshein <ngimel@fb.com>
Date:   Fri Nov 25 19:42:38 2022 +0000

    make inductor correctly propagate nans for maximum and minimum (#89612)

    Partially fixes https://github.com/pytorch/torchdynamo/issues/594
    Also, small cleanup for `where` codegen

    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89612
    Approved by: https://github.com/soumith, https://github.com/jansel

commit 70c0a3006ee96b3db1f531109fc383f8159e2d2f
Author: Ikko Ashimine <eltociear@gmail.com>
Date:   Fri Nov 25 19:26:18 2022 +0000

    Fix typo in segment_reduction_op_gpu.cu (#89647)

    menber -> member

    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89647
    Approved by: https://github.com/kit1980

commit 2c0bd85c755043d696452ddab354f3ff6775738b
Author: kshitij12345 <kshitijkalambarkar@gmail.com>
Date:   Fri Nov 25 14:53:57 2022 +0000

    complex: register c10::complex with py::cast (#89680)

    Fixes #77134

    TODO:
    * [x] Add test (tested locally with script below) (Are there similar tests in the test-suite?)

    ```c++

    namespace py = pybind11;

    int main() {
        py::scoped_interpreter guard{}; // start the interpreter
        auto casted_cdouble = py::cast(c10::complex<double>(1.0, 2.0));
        assert(
            (c10::complex<double>(1.0, 2.0) ==
             py::cast<c10::complex<double>>(casted_cdouble)));

        auto casted_cfloat = py::cast(c10::complex<float>(1.0, 2.0));
        assert(
            (c10::complex<double>(1.0, 2.0) ==
             py::cast<c10::complex<double>>(casted_cfloat)));

        auto casted_chalf = py::cast(c10::complex<at::Half>(1.0, 2.0));
        assert(
            (c10::complex<double>(1.0, 2.0) ==
             py::cast<c10::complex<double>>(casted_chalf)));
    }

    ```
    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89680
    Approved by: https://github.com/ezyang

commit a97d0508cb5259951bc48300fb914cebdf322bb9
Merge: 849be586e6 abb446af8c
Author: Jakub Pietrak <jakub.pietrak@intel.com>
Date:   Fri Nov 25 15:24:54 2022 +0100

    Merge branch 'master' of https://github.com/pytorch/pytorch into pyg-36

commit 849be586e649421ba58182feb9067a4ac65479e3
Merge: 059a238619 75bfbc35ca
Author: Jakub Pietrak <jakub.pietrak@intel.com>
Date:   Fri Nov 25 14:25:40 2022 +0100

    Merge branch 'gh/mingfeima/85/head' into pyg-36

commit abb446af8c65a49bbc3767e14605a73d244c176b
Author: Alvaro Gaona <alvgaona@gmail.com>
Date:   Fri Nov 25 11:09:28 2022 +0000

    Implement old windows in Python (#87082)

    Relates to #85366

    - Bartlett, Blackman, Hamming, Hann.
    - Except Kaiser which will be in a different PR

    Pull Request resolved: https://github.com/pytorch/pytorch/pull/87082
    Approved by: https://github.com/mruberry, https://github.com/lezcano

commit 059a238619b122f922c569c618919a277420e483
Merge: 26ba2e9751 95ea47ef0c
Author: Jakub Pietrak <97102979+JakubPietrakIntel@users.noreply.github.com>
Date:   Fri Nov 25 10:00:53 2022 +0100

    Merge branch 'pytorch:master' into jpietrak/pyg-36

commit 95ea47ef0c1cffe1fe05cc36bdc47c26cc72f13e
Author: Jason Ansel <jansel@meta.com>
Date:   Fri Nov 25 04:28:36 2022 +0000

    torchdynamo to torch._dynamo in aot_autograd.py (#89385)

    Test Plan: Run torchbench models

    Differential Revision: D41429573

    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89385
    Approved by: https://github.com/soumith, https://github.com/malfet

commit 69043247819042db18ac9526c2d747fa61fe8880
Author: Edward Z. Yang <ezyang@fb.com>
Date:   Thu Nov 24 12:00:13 2022 -0800

    Remove fake_tensor_propagation (#89646)

    You always have to run dynamo with fake tensors.

    Signed-off-by: Edward Z. Yang <ezyang@fb.com>

    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89646
    Approved by: https://github.com/soumith

commit 1aa1014b262b75d4269d9a4d8b562c6ee43a0991
Author: Edward Z. Yang <ezyang@fb.com>
Date:   Thu Nov 24 12:00:12 2022 -0800

    xfail maml test, instead of running it without fake tensor prop (#89645)

    A previous version of this patch graph breaks when torch.tensor fails, but that causes

    ```
    PYTORCH_TEST_WITH_DYNAMO=1 python test/nn/test_embedding.py -k test_embedding_bag_1D_padding_idx_cpu_float32
    ```

    to start failing. Probably another latent bug that needs investigating.

    Signed-off-by: Edward Z. Yang <ezyang@fb.com>

    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89645
    Approved by: https://github.com/albanD

commit a048913e2530442360c36a48420079ca9ebca149
Author: PyTorch MergeBot <pytorchmergebot@users.noreply.github.com>
Date:   Fri Nov 25 03:03:41 2022 +0000

    [vision hash update] update the pinned vision hash (#89667)

    This PR is auto-generated nightly by [this action](https://github.com/pytorch/pytorch/blob/master/.github/workflows/_update-commit-hash.yml).
    Update the pinned vision hash.
    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89667
    Approved by: https://github.com/pytorchbot

commit 3b3ebcd031b68762938806f541d7247a1521bb11
Author: XiaobingSuper <xiaobing.zhang@intel.com>
Date:   Thu Nov 24 02:33:01 2022 -0500

     TorchDynamo: weight prepack for single conv (#89209)

    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89209
    Approved by: https://github.com/jgong5, https://github.com/jansel

commit 0c4f3db7bf24e94125c6802718a1105ee548c953
Author: XiaobingSuper <xiaobing.zhang@intel.com>
Date:   Thu Nov 24 02:32:59 2022 -0500

    TorchDynamo: weight prepack for mkl linear (#89109)

    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89109
    Approved by: https://github.com/jgong5, https://github.com/jansel

commit 07151a6bd62e308b6b32e2e0edfc4d5f0563576e
Author: XiaobingSuper <xiaobing.zhang@intel.com>
Date:   Thu Nov 24 02:32:55 2022 -0500

    TorchDynamo: weight prepack for onednn convolution external call (#88988)

    This PR is about enabled weight prepack using the MKLDNN tensor:
    1.  enable fake tensor mode for MKLDNN tensor input.
    2.  make convolution fusion kernel support MKLDNN tensor input.
    3. do the weight prepack at FX fusion step.

    For better performance, we always use channels_last for CPU convolution path. because we test that the channels_last path can get a better performance than block input path, and also avoid the activation's layout conversion(plain to block, block to plain), currently, there only need plain to plain format conversion.

    Pull Request resolved: https://github.com/pytorch/pytorch/pull/88988
    Approved by: https://github.com/jgong5, https://github.com/jansel

commit 0884fdaba0280e3f3ad2abc34c0940587f744886
Author: Edward Z. Yang <ezyang@fb.com>
Date:   Thu Nov 24 14:31:00 2022 -0500

    Revert "Dont clone unmutated args in triton autotuning (#89519)" (#89652)

    This reverts commit f18f0c70ab10c400947e71be30794e04dcc22acf.

    Testing to see if this fixes gmixer_24_224 mixer_b16_224

    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89652
    Approved by: https://github.com/eellison

commit 4a16f8cdb26be3561742e86f184e59f65418fe63
Author: Edward Z. Yang <ezyang@fb.com>
Date:   Thu Nov 24 09:00:09 2022 -0800

    Reenable fake_tensor_propagation on test_cudnn_rnn (#89644)

    Signed-off-by: Edward Z. Yang <ezyang@fb.com>

    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89644
    Approved by: https://github.com/anjali411

commit fc7dcb684aa38da5b1534fc701657ee63af8909c
Author: Edward Z. Yang <ezyang@fb.com>
Date:   Thu Nov 24 09:00:09 2022 -0800

    Run optimizer tests with fake tensors (#89643)

    This is a slight regression: RAdam and Adagrad don't appear to
    trace at all under fake tensors.  But I think this is a more accurate
    reflection of the current state of affairs.

    Along the way fix some problems on the fake tensor path.

    Signed-off-by: Edward Z. Yang <ezyang@fb.com>

    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89643
    Approved by: https://github.com/anjali411

commit 9b13508ef3a4e858fbbbf068b3a825f1632e8daa
Author: Edward Z. Yang <ezyang@fb.com>
Date:   Thu Nov 24 09:00:08 2022 -0800

    Force test_rng_state to run with fake tensor prop (#89641)

    I'm not really sure what desertfire's intended follow up was
    on https://github.com/pytorch/pytorch/pull/87490 because when I remove
    the unsupported() call, dynamo tests pass.  But the change here is
    conservative and I think strictly better than the current situation.
    The idea is to force fake tensor pop on for the test, and then just
    observe that we are doing a graph break.  Clearly, export doesn't work,
    so I manually xfail it.

    Signed-off-by: Edward Z. Yang <ezyang@fb.com>

    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89641
    Approved by: https://github.com/anjali411

commit c6be06d93ab911a3fbb185451c8cf42bcedad0c1
Author: Edward Z. Yang <ezyang@fb.com>
Date:   Thu Nov 24 09:00:08 2022 -0800

    Easy: These tests work with fake_tensor_propagation on (#89640)

    Signed-off-by: Edward Z. Yang <ezyang@fb.com>

    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89640
    Approved by: https://github.com/anjali411, https://github.com/albanD

commit 6fb6eb0a7498839e69302da7bf8c04205c64e0f3
Author: Edward Z. Yang <ezyang@fb.com>
Date:   Thu Nov 24 08:11:48 2022 -0800

    Support unspecialized integers with dynamic shapes (#89639)

    Previously, we hackily wrapped unspecialized integers into
    tensors and treated them as tensor inputs.  Sometimes, downstream
    operations would not be able to deal with the tensor input.  Now,
    we wrap them into SymInt, so more correct overload selection occurs.

    Signed-off-by: Edward Z. Yang <ezyang@fb.com>

    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89639
    Approved by: https://github.com/anjali411

commit 0c96841a20f0ae9380ef26657914276a42c9c9d7
Author: Edward Z. Yang <ezyang@fb.com>
Date:   Thu Nov 24 08:11:47 2022 -0800

    Cond capture with fake tensors actually works; don't raise in this case (#89638)

    Signed-off-by: Edward Z. Yang <ezyang@fb.com>

    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89638
    Approved by: https://github.com/anjali411

commit d3c012f409a4e4d5a11070a90b5578da82778030
Author: kshitij12345 <kshitijkalambarkar@gmail.com>
Date:   Thu Nov 24 21:41:20 2022 +0000

    [test_nn] split pruning tests from test_nn (#89590)

    Ref: https://github.com/pytorch/pytorch/issues/63085

    Note: Doesn't need corresponding XLA PR as the migrated tests were not run on XLA (as they weren't in TestNNDeviceType).
    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89590
    Approved by: https://github.com/albanD

commit 83666f167dcf023d301f16fad82b9afb374ad836
Author: Aleksandar Samardžić <asamardzic@quansight.com>
Date:   Thu Nov 24 14:44:12 2022 +0000

    Added vectorized CPU code for uint8_t datatype. (#89284)

    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89284
    Approved by: https://github.com/lezcano, https://github.com/peterbell10

commit 9497552771ca59c68509398ab3094e590a3047c5
Author: Howard Huang <howardhuang@meta.com>
Date:   Thu Nov 24 19:41:17 2022 +0000

    Update SyncBatchNorm _all_gather_base to all_gather_into_tensor (#89521)

    Summary: Fixes https://github.com/pytorch/pytorch/issues/88568

    `_all_gather_base` is deprecated. So replacing its usage with `all_gather_into_tensor`

    Test Plan: CI

    Differential Revision: D41479983

    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89521
    Approved by: https://github.com/wz337

commit 94a88b53ed37854379813abf9641d1637fe2688b
Author: Edward Z. Yang <ezyang@fb.com>
Date:   Thu Nov 24 08:11:46 2022 -0800

    Remove fake_tensors_available (#89637)

    As we are one repo now, they are always available.

    Signed-off-by: Edward Z. Yang <ezyang@fb.com>

    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89637
    Approved by: https://github.com/anjali411

commit 1c8b0779de76d0c76d34835047106ab37b41790b
Author: Emilio Castillo <ecastill@preferred.jp>
Date:   Thu Nov 24 18:25:26 2022 +0000

    Fix segfault when swapping custom allocator (#89613)

    Just screwed it before merging ...

    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89613
    Approved by: https://github.com/albanD

commit fd279fe85b8f5a8e74c615436f0b180621b6ef52
Author: Edward Z. Yang <ezyang@fb.com>
Date:   Thu Nov 24 09:23:05 2022 -0500

    Make pytest work again on test/dynamo (#89631)

    Signed-off-by: Edward Z. Yang <ezyang@fb.com>

    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89631
    Approved by: https://github.com/anjali411

commit c3e85d879cdbd3973754760c6767c75276b1dca8
Author: albanD <desmaison.alban@gmail.com>
Date:   Thu Nov 24 17:11:42 2022 +0000

    Mention discrepency between original impl and our impl of RAdam (#89575)

    Fixes https://github.com/pytorch/pytorch/issues/88836

    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89575
    Approved by: https://github.com/mruberry

commit 860bae49e4925868a0221ec4345d08407280bac7
Author: Edward Z. Yang <ezyang@fb.com>
Date:   Wed Nov 23 08:04:31 2022 -0800

    Suppress guards on as_strided call only. (#89569)

    See comment in meta_utils.py for the whole story.

    This doesn't have a substantive impact yet, but will in the next
    PR on the stack.

    Signed-off-by: Edward Z. Yang <ezyang@fb.com>
    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89569
    Approved by: https://github.com/albanD

commit 1588ea0dbf16f37ce14cfc8764666985c16ccbf9
Author: mfkasim1 <firman.kasim@gmail.com>
Date:   Thu Nov 24 11:11:51 2022 +0000

    Added log1p for complex in c10 (#89214)

    One PR towards #89205.
    The content is mostly from PR #38465, but slightly changed the expression to make it faster.

    Here are some benchmarking code:
    ```c++

    // main.cc

    template<typename T> inline std::complex<T> log1p_v0(const std::complex<T> &z) {
        // this PR
        T x = z.real();
        T y = z.imag();
        T theta = std::atan2(y, x + T(1));
        T r = x * (x + T(2)) + y * y;
        return {T(0.5) * std::log1p(r), theta};
    }

    template<typename T> inline std::complex<T> log1p_v1(const std::complex<T> &z) {
        // PR #38465
        T x = z.real();
        T y = z.imag();
        std::complex<T> p1 = z + T(1);
        T r = std::abs(p1);
        T a = std::arg(p1);
        T rm1 = (x * x + y * y + x * T(2)) / (r + 1);
        return {std::log1p(rm1), a};
    }

    template<typename T>
    inline std::complex<T> log1p_v2(const std::complex<T> &z) {
        // naive, but numerically inaccurate
        return std::log(T(1) + z);
    }

    int main() {
        int n = 1000000;
        std::complex<float> res(0.0, 0.0);
        std::complex<float> input(0.5, 2.0);
        auto start = std::chrono::system_clock::now();
        for (int i = 0; i < n; i++) {
            res += log1p_v0(input);
        }
        auto end = std::chrono::system_clock::now();
        auto elapsed = end - start;
        std::cout << "time for v0: " << elapsed.count() << '\n';

        start = std::chrono::system_clock::now();
        for (int i = 0; i < n; i++) {
            res += log1p_v1(input);
        }
        end = std::chrono::system_clock::now();
        elapsed = end - start;
        std::cout << "time for v1: " << elapsed.count() << '\n';

        start = std::chrono::system_clock::now();
        for (int i = 0; i < n; i++) {
            res += log1p_v2(input);
        }
        end = std::chrono::system_clock::now();
        elapsed = end - start;
        std::cout << "time for v2: " << elapsed.count() << '\n';
        std::cout << res << '\n';
    }
    ```

    Compiling the script with command `g++ main.cc` produces the following results:
    ```
    time for v0: 237812271
    time for v1: 414524941
    time for v2: 360585994
    ```

    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89214
    Approved by: https://github.com/lezcano

commit 4f5c4c022a8365d06ac401582958bbf0fd3f8337
Author: Jiewen Tan <jwtan@google.com>
Date:   Thu Nov 24 10:57:01 2022 +0000

    [LTC] Refine MetricsArena::Reset (#89608)

    Summary:
    After counters are reset, getters' behaviors are inconsistent. To improve that, here I 1) move the validation of CounterData into CounterData::IsValid such that it's better encapsulated, 2) divide getters into two groups: a) MetricsArena::GetCounter() and b) MetricsArena::ForEachCounter(), and route MetricsArena::GetCounterNames() and CreateMetricReport() to use b.

    This is paired with pytorch/xla#4217.

    Test Plan:
    PJRT_DEVICE=CPU python xla/test/test_metrics.py

    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89608
    Approved by: https://github.com/JackCaoG

commit a8629a1c18fd13300ce69c1d6042004038885cf0
Author: Jithun Nair <jithun.nair@amd.com>
Date:   Thu Nov 24 10:53:20 2022 +0000

    Upgrade nightly wheels to ROCm5.3 (#89101)

    Dependent on PR https://github.com/pytorch/builder/pull/1193

    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89101
    Approved by: https://github.com/kit1980

commit c0d81aa70ce45a0c2e7ced6c9f42a92d15523188
Author: Ivan Yashchuk <ivan.yashchuk@aalto.fi>
Date:   Thu Nov 24 09:37:10 2022 +0000

    Use fx.replace_pattern for removing empty_like+fill in nvFuser+PrimTorch execution (#89132)

    I learned about `torch.fx.replace_pattern` and it's a cleaner way of removing unnecessary tensor materialization from the graph coming from tracing  C++ code `1 - tensor`.

    Test:
    ```
    python -m pytest test/test_prims.py -k "test_silu_backward_no_filled_tensor"
    ```

    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89132
    Approved by: https://github.com/mruberry, https://github.com/jjsjann123

commit b515c1d96082214e81cc57ce2a1de9164b50206f
Author: Hao Guan <10684225+hguandl@users.noreply.github.com>
Date:   Thu Nov 24 08:14:24 2022 +0000

    [QAT] Check the value of numel to avoid segfault (#81547)

    Fixes #78123

    Segmentation fault

    RuntimeError: numel is out of the bound of input tensor
    Pull Request resolved: https://github.com/pytorch/pytorch/pull/81547
    Approved by: https://github.com/kit1980

commit 22a1b5e243e852e1c423c697e51975d1545d2a1b
Author: Vasiliy Kuznetsov <vasiliy@fb.com>
Date:   Wed Nov 23 13:01:15 2022 -0800

    quantization: deprecate observer compute_dtype and replace with is_dynamic (#85431)

    Summary:

    This PR deprecates the `compute_dtype` field on observers, and replaces
    it with the `is_dynamic` field on observers.  This is better aligned
    with the reference model spec.

    Test plan:

    ```
    python test/test_quantization.py TestQuantizeFx
    python test/test_quantization.py TestQuantizeFxOps
    ```

    Pull Request resolved: https://github.com/pytorch/pytorch/pull/85431
    Approved by: https://github.com/jerryzh168

commit e4ccec6ecab9b48e804d58f60135f0950fca864f
Author: Yanbo Liang <ybliang8@gmail.com>
Date:   Thu Nov 24 05:28:58 2022 +0000

    [Dynamo] Fix bug of using customized torch.autograd.Function (#89397)

    Fixes https://github.com/pytorch/torchdynamo/issues/1899

    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89397
    Approved by: https://github.com/jansel

commit 903ae4570e401e5c4e42dc4a44cae37f805044a4
Author: Michael Lazos <mlazos@fb.com>
Date:   Thu Nov 24 04:15:34 2022 +0000

    Disable optimizer tracing, enable for tests only (#89500)

    Disabling optimizer tracing before launch until it can be added to the benchmark suites without increasing compile times

    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89500
    Approved by: https://github.com/anijain2305

commit c79489c8e69f965f3e5af8f3f39df78e7d4732ba
Author: albanD <desmaison.alban@gmail.com>
Date:   Thu Nov 24 03:39:55 2022 +0000

    Expose to python the backward AD view_func (#89586)

    This will be useful for other systems (AOTAutograd) that want to replay autograd views.

    FYI @bdhirsh
    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89586
    Approved by: https://github.com/soulitzer

commit 4cb6bbbe27162c7b0835879131991d2155329718
Author: Nikita Karetnikov <nikita@karetnikov.org>
Date:   Thu Nov 24 01:02:28 2022 +0100

    Symintify `embedding` (#89327)

    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89327
    Approved by: https://github.com/ezyang

commit 9c867eae1a7fffb6f893717073150cff04a923a4
Author: Wu, Chunyuan <chunyuan.wu@intel.com>
Date:   Wed Nov 23 20:10:41 2022 +0000

    nnc: fix Store if value is fp32 while buf is bf16 (#86788)

    Fixes https://github.com/pytorch/pytorch/issues/86533.
    For the below graph:
    ```bash
    [DUMP kernel.cpp:1690] TensorExprKernel graph:
    [DUMP kernel.cpp:1690] graph(%x.1 : BFloat16(10, strides=[1], requires_grad=0, device=cpu)):
    [DUMP kernel.cpp:1690]   %1 : int = prim::Constant[value=0]()
    [DUMP kernel.cpp:1690]   %2 : BFloat16(10, strides=[1], requires_grad=0, device=cpu) = aten::pow(%x.1, %1) # test/test_tensorexpr.py:1330:29
    [DUMP kernel.cpp:1690]   %3 : BFloat16(10, strides=[1], requires_grad=0, device=cpu) = aten::sin(%2) # test/test_tensorexpr.py:1330:19
    [DUMP kernel.cpp:1690]   return (%3)
    ```

    **Loop stmt before the fix:**
    The store value `0.8414709568023682f` is float while the scalar_type of the store buf `aten_sin` is bf16.
    ```bash
    [DEBUG llvm_codegen.cpp:489] After HalfRewriter {
    [DEBUG llvm_codegen.cpp:489]   aten_sin[Ramp(0ll, 1ll, 8)] = Broadcast(0.8414709568023682f, 8);
    [DEBUG llvm_codegen.cpp:489]   for (int64_t i_1_tail_tail = 0ll; i_1_tail_tail < 2ll; i_1_tail_tail++) {
    [DEBUG llvm_codegen.cpp:489]     aten_sin[i_1_tail_tail + 8ll] = 0.8414709568023682f;
    [DEBUG llvm_codegen.cpp:489]   }
    [DEBUG llvm_codegen.cpp:489] }
    ```

    **Loop stmt after the fix:**
    ```bash
    [DEBUG llvm_codegen.cpp:489] After HalfRewriter {
    [DEBUG llvm_codegen.cpp:489]   aten_sin[Ramp(0ll, 1ll, 8)] = bfloat16(Broadcast(0.8414709568023682f, 8));
    [DEBUG llvm_codegen.cpp:489]   for (int64_t i_1_tail_tail = 0ll; i_1_tail_tail < 2ll; i_1_tail_tail++) {
    [DEBUG llvm_codegen.cpp:489]     aten_sin[i_1_tail_tail + 8ll] = bfloat16(0.8414709568023682f);
    [DEBUG llvm_codegen.cpp:489]   }
    [DEBUG llvm_codegen.cpp:489] }
    ```
    Pull Request resolved: https://github.com/pytorch/pytorch/pull/86788
    Approved by: https://github.com/EikanWang, https://github.com/kit1980

commit f0e5bc4b9f231b438f76ddd13b2c21b7cb8a09ac
Author: Zhijing Li (Accelerator Enablement) <tissue030@meta.com>
Date:   Thu Nov 24 02:18:32 2022 +0000

    Symintified layer_norm (#89466)

    Summary: As titled.

    Test Plan:
    ```
    buck2 run mode/opt scripts/wwei6:test_executorch
    ```

    Differential Revision: D41451390

    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89466
    Approved by: https://github.com/frank-wei, https://github.com/ezyang

commit fdb2dd113d3aec0acb2a473de6be49940ab6a115
Author: Alexander Grund <alexander.grund@tu-dresden.de>
Date:   Thu Nov 24 01:52:11 2022 +0000

    Install missing VSX headers (POWER) (#85547)

    E.g. `test_cpp_extensions_aot_ninja` fails as it includes `vec.h` which requires the vec/vsx/* headers and `sleef.h`. The latter is also required for AVX512 builds on non MSVC compilers.

    Pull Request resolved: https://github.com/pytorch/pytorch/pull/85547
    Approved by: https://github.com/kit1980

commit e922bd4e523b0a30f6607f6497ac458571e00131
Author: Wei-Sheng Chin <wschin@outlook.com>
Date:   Thu Nov 24 01:30:09 2022 +0000

    [ONNX] Move two headers from .h to .cc (#86852)

    As title. Header dependency should be as small as possible.
    Pull Request resolved: https://github.com/pytorch/pytorch/pull/86852
    Approved by: https://github.com/titaiwangms, https://github.com/BowenBao

commit 23fe2ff910fd1577281a2210d1184aff705191b8
Author: Shunting Zhang <shunting@meta.com>
Date:   Thu Nov 24 01:28:10 2022 +0000

    verify the number of outputs of xla graph (#89536)

    This PR add tests to verify the behavior of number of outputs returns by an XLA graph. The understanding from this PR will help us fix https://github.com/pytorch/torchdynamo/issues/1908 and enable training for dynamo/torchxla integration eventually. Send this PR separately so Jack could help verify if the behavior is expected and play with it.

    List some code snippets here since their behavior is not straightforward at a first glance:
    ```
        def forward(self, a, b, c):
            """
            The XLA graph will only return the first 2 items
            """
            return a + b, a + c, b
    ```

    ```
        def forward(self, a, b, c):
            """
            Inplace update on b cause it to be returned in XLA graph
            """
            b.zero_()
            return a + b, a + c, b
    ```

    ```
        def forward(self, a, b, c):
            """
            Even if we return b twice, the XLA graph only return b once.
            """
            b.zero_()
            return a + b, a + c, b, b
    ```

    Here are what observed by the added tests:

    1. XLA does not return outputs that are also inputs -- if the tensor is not inplace updated. At first glance people may feel curious why should we consider this kind of 'non-realistic' corner case. But this kind of graphs indeed shows up in AOTAutograd. The main reason is AOTAutograd lift all model parameters/buffers as graph input and may return some of them.  Check ***test_direct_return***
    2. if a tensor is inplace updated, XLA will still return it as graph output even if it's also an input.  The only difference compared to item 1 is, the inplace updating on the tensor cause it being returned. This happens for BatchNorm2d since the running_mean/variance tensors will be inplace updated during training. Check ***test_direct_return_with_inplace_update***

    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89536
    Approved by: https://github.com/jansel

commit 0bde5149819e9854bca1363aa6c9f52f7db2496e
Author: Nikita Shulga <nshulga@meta.com>
Date:   Thu Nov 24 00:57:17 2022 +0000

    Add `c10::` namespace in front of `optional` (#89605)

    Prep change for moving the codebase to C++17 standard
    Was part of https://github.com/pytorch/pytorch/pull/85969

    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89605
    Approved by: https://github.com/weiwangmeta, https://github.com/kit1980

commit e19a7165fd1a9a35fcac42706c20e658776c10ab
Author: foram-chandra <96388449+foram-chandra@users.noreply.github.com>
Date:   Thu Nov 24 00:34:26 2022 +0000

    [nn] Remove deprecation warning from nn.functional.{tanh, sigmoid} (#86905)

    Fixes #65909

    Pull Request resolved: https://github.com/pytorch/pytorch/pull/86905
    Approved by: https://github.com/albanD, https://github.com/kit1980

commit a00bd6f686d7a485f7bea5f971b7e793118842b8
Author: clee2000 <44682903+clee2000@users.noreply.github.com>
Date:   Wed Nov 23 23:48:32 2022 +0000

    Don't run auto request review on forked PRs (#89583)

    tested on https://github.com/pytorch/pytorch/pull/89581
    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89583
    Approved by: https://github.com/albanD, https://github.com/malfet

commit 0a1a53083e331b3648ad4cb6f750d130e3530731
Author: Nikita Karetnikov <nikita@karetnikov.org>
Date:   Wed Nov 23 20:42:55 2022 +0000

    [primTorch] Enable regex error testing for some refs (#87765)

    Pull Request resolved: https://github.com/pytorch/pytorch/pull/87765
    Approved by: https://github.com/mruberry

commit 3ad2a032f4924d58c556b80840f6d51aa8a4472b
Author: Nikita Shulga <nshulga@meta.com>
Date:   Wed Nov 23 23:23:24 2022 +0000

    Update default cmake to 3.18 (#89570)

    Set `cmake.dir` to `/usr/local` in `.circleci/scripts/build_android_gradle.sh `
    Prep change for raising compiler standard to C++17: cmake-3.18 is the first one to support CUDA17 language

    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89570
    Approved by: https://github.com/atalman

commit 8695f0cced016d43298b43a4baf30315061fdacd
Author: Jane Xu <janeyx@meta.com>
Date:   Wed Nov 23 23:23:17 2022 +0000

    Rectify `native_batch_norm` schema by splitting it into two legit schemas (#88697)

    Using the same repro from the issue (but with BatchNorm2D)

    Rectifies native_batch_norm schema by splitting the schema into 2:
    1. one will have NON-optional alias-able running_mean and running_var inputs
    2. the other will just not have those parameters at all (no_stats variation)

    **Calling for name suggestions!**
    I've added tests in test_functionalization.py as well as an entry in common_method_invocations.py for `native_batch_norm_legit`
    CI should pass.
    Because of bc/fc reasons, we reroute native_batch_norm to call our new schemas ONLY through the python dispatcher, but in 2 weeks or so, we should make `native_batch_norm_legit` the official batch_norm.

    Pull Request resolved: https://github.com/pytorch/pytorch/pull/88697
    Approved by: https://github.com/albanD

commit a00efe55c3790789b967facf10c3f426faa98155
Author: Everton Constantino <everton.constantino@linaro.org>
Date:   Wed Nov 23 22:46:29 2022 +0000

    Fix CheckOutputStreamSetting on JitLoggingTest as it failed if logging wasn't enabled. (#82722)

    `JIT_LOG` checks if logging was enabled for that particular file and when it isn't it doesn't output anything. Since the test checks for the size of `test_stream` it fails. I believe forcing the file to have logging enabled to see if the stream is being correctly set during test makes no sense so this patches just forcibly outputs and checks if it worked.

    Pull Request resolved: https://github.com/pytorch/pytorch/pull/82722
    Approved by: https://github.com/davidberard98

commit b8d3afd88665de5f01f696333d0ff291bd94a57b
Author: Huy Do <huydhn@gmail.com>
Date:   Wed Nov 23 22:39:36 2022 +0000

    Skip upload test stats for test reports from rerun disabled tests workflow (#89548)

    I have found the reason why uploading tests stats fails for rerun disabled workflow, for example https://github.com/pytorch/pytorch/actions/runs/3522896778/jobs/5917765699.  The problem is that the pytest XML file is now too big to be processed quickly (x50 bigger). Unlike unittest, `pytest-flakefinder` used by rerun disabled tests for test_ops includes skipped messages multiple times (50 times by default, retrying and skipping).  This slows down the upload test stats script too much (O(n)) because it tries to gather all the stats. On the other hand, `check_disabled_tests` doesn't suffer from the same issue because it ignores all these skipped messages.

    This is a quick fix to skip test reports from rerun disabled tests workflow when trying to upload test stats.

    I'll try to fix this properly later in the way we use pytest-flakefinder. From what I see, a zipped test report from rerun disabled test is only few MB ([example](https://gha-artifacts.s3.amazonaws.com/pytorch/pytorch/3521687954/1/artifact/test-reports-test-default-1-2-linux.2xlarge_9636028803.zip)), but will balloon up to a much bigger XML file after extracting from a dozen to a few hundred MB (text).  The size of the zipped file is not a big immediate problem

    [3521687954](https://github.com/pytorch/pytorch/actions/runs/3521687954) is an example workflow with rerun disabled tests and mem leak check.  The script can now finish when running locally:

    * `upload_test_stats` finishes around 3+ minutes
    ```
    time python -m tools.stats.upload_test_stats --workflow-run-id 3521687954 --workflow-run-attempt 1 --head-branch master
    ...
    Writing 8925 documents to S3
    Done!
    Writing 1760 documents to S3
    Done!
    Writing 1675249 documents to S3
    Done!
    python3 -m tools.stats.upload_test_stats --workflow-run-id 3521687954  1    185.69s user 12.89s system 75% cpu 4:22.82 total
    ```

    * `check_disabled_tests` finishes within 3 minutes
    ```
    time python -m tools.stats.check_disabled_tests --workflow-run-id 3521687954 --workflow-run-attempt 1 --repo pytorch/pytorch
    ...
    python -m tools.stats.check_disabled_tests --workflow-run-id 3521687954  1    154.19s user 4.17s system 97% cpu 2:42.50 total
    ```

    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89548
    Approved by: https://github.com/clee2000

commit f18f0c70ab10c400947e71be30794e04dcc22acf
Author: Elias Ellison <elias.ellison@gmail.com>
Date:   Wed Nov 23 19:02:51 2022 +0000

    Dont clone unmutated args in triton autotuning (#89519)

    Improves first memory compression on pytorch struct from .55 -> .73. However, it doesn't totally eliminate the overhead from autotuning. Any other pointers on where the overhead is coming from in autotuning would be great.

    Edit: i think it's just the triton cache clearing https://github.com/openai/triton/blob/44f577984d28ee979f704e2c28a1dcbac9639840/python/triton/testing.py#L159

    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89519
    Approved by: https://github.com/ngimel, https://github.com/jansel

commit ac19c5be82febc2140d4601c98daf45646a399ab
Author: Peter Bell <peterbell10@live.co.uk>
Date:   Tue Nov 22 22:26:21 2022 +0000

    FFT: disable dimension wrapping for scalar tensors (#89234)

    Fixes #88985

    By default, `maybe_wrap_dim` allows through `dim=0` or `dim=-1`
    for scalar tensors which leads to an invalid dimension being used to
    index into `tensor.sizes()` as in the code sample from the issue.

    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89234
    Approved by: https://github.com/mruberry

commit 50e2e4faf38c6ebafacc43b72c40333f1f7b401e
Author: Pearu Peterson <pearu.peterson@gmail.com>
Date:   Wed Nov 23 12:05:37 2022 +0200

    Sparse CSC/BSR/BSC serialization and pickle support (#89553)

    Fixes https://github.com/pytorch/pytorch/issues/89497

    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89553
    Approved by: https://github.com/cpuhrsch

commit a8d6b82167ef417e21c807cb29d7eabea15014da
Author: Elias Ellison <elias.ellison@gmail.com>
Date:   Wed Nov 23 16:47:43 2022 +0000

    Fix norm decomp when dtype is passed in (#89508)

    Fix for https://github.com/pytorch/torchdynamo/issues/1889. The wrapper was doing a downcast even when the dtype was explicitly passed in.

    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89508
    Approved by: https://github.com/anijain2305

commit 72110d783344c4121730b032ca0d269896604dcf
Author: Elias Ellison <elias.ellison@gmail.com>
Date:   Wed Nov 23 17:03:09 2022 +0000

    Fix Upsample Decomp Striding For Small Channels (#89528)

    Fix for https://github.com/pytorch/torchdynamo/issues/623.

    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89528
    Approved by: https://github.com/ngimel, https://github.com/anijain2305

commit b7483be06afe8d4242adeb559cfbe6e0e89419d0
Author: Jerry Zhang <jerryzh168@gmail.com>
Date:   Wed Nov 23 11:03:45 2022 -0800

    [quant][docs] Add docstrings for operators defined in torch.ops.quantized_decomposed namespace (#89547)

    Summary:
    no functionality changes

    Test Plan:
    NA

    Reviewers:

    Subscribers:

    Tasks:

    Tags:

    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89547
    Approved by: https://github.com/vkuzo

commit a188f05e8c1788d393c072868421991dfcb55b02
Author: Natalia Gimelshein <ngimel@fb.com>
Date:   Wed Nov 23 20:18:54 2022 +0000

    Reland #89031 Added conv constraint that infers layouts (#89530)

    Relands #89031
    Per title. We now set strides from fx graph only for convolutions and mm, which is a hack, but bmm in some cases caused extra copy, and there is no obvious way to fix that, we should rethink the strides anyway.

    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89530
    Approved by: https://github.com/Chillee

commit e800d27b10137727c68cb71bccabe3a93cf38e9e
Author: William Wen <williamwen@fb.com>
Date:   Wed Nov 23 20:11:39 2022 +0000

    [dashboard] Add graphs for all summary metrics, add additional testing flags (#89580)

    Title. Test post: https://github.com/pytorch/torchdynamo/issues/1831#issuecomment-1325572179

    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89580
    Approved by: https://github.com/davidberard98

commit 953f39578a7019c4c34bc1dbd6cb0facb554af79
Author: Charlie West-Taylor <charliew@graphcore.ai>
Date:   Wed Nov 23 19:51:50 2022 +0000

    Mark IPU device as not supports_as_strided (#89130)

    Currently causes issues in calls to `.to`.
    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89130
    Approved by: https://github.com/albanD

commit 37e46a503502cdeda791cf684522ef83b5655328
Author: Yanbo Liang <ybliang8@gmail.com>
Date:   Wed Nov 23 19:44:46 2022 +0000

    [Dynamo] Fix several bugs & code refactor in RangeVariable (#89322)

    Fix bug in [7k github models](https://github.com/pytorch/torchdynamo/issues/1884): https://github.com/jansel/pytorch-jit-paritybench/blob/master/generated/test_clovaai_stargan_v2.py
    ```
    E       TypeError: 'list' object cannot be interpreted as an integer
    E
    E       from user code:
    E          File "/scratch/ybliang/work/repos/pytorch-jit-paritybench/generated/test_clovaai_stargan_v2.py", line 335, in forward
    E           idx = torch.LongTensor(range(y.size(0)))
    ```

    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89322
    Approved by: https://github.com/jansel

commit 91dcef41ae96ede3f07375c2d38cb28d534e97f8
Author: Xilun Wu <12968408+XilunWu@users.noreply.github.com>
Date:   Wed Nov 23 19:43:28 2022 +0000

    Thread PG: add allreduce to threaded pg (#89043)

    Summary:
    Goal
    Add `all_reduce` collective  to multi-threaded ProcessGroup added in D40236769 (https://github.com/pytorch/pytorch/commit/6663ae5537f3c61030ba4d425bd57a097c51430a).

    Code Motion
    Added `allreduce` collective to ProcessLocalGroup (a subclass of c10d ProcessGroup).

    What's Next
    Add a DDP test utilizing the new allreduce op.
    Generalize `allreduce` to allow other `ReduceOp`s besides `SUM`.

    Test Plan:
    cd fbcode/caffe2
    buck2 test mode/dev //caffe2/test/distributed:multi_threaded

    Differential Revision: D41046606

    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89043
    Approved by: https://github.com/wanchaol

commit 27db806888c36b029f51197a40e5196cc10792db
Author: Charlie West-Taylor <charliew@graphcore.ai>
Date:   Wed Nov 23 19:41:07 2022 +0000

    Handle Tensor.__deepcopy__ via clone(), on IPU (#89129)

    Currently it falls through to a call to `storage()`, which the IPU doesn't support.

    I've made the minimal change here for ease of merging (this'd help us if it was in for 1.13.1), however...

    **QUESTION**: Is there any reason why `not torch._C._has_storage(self)` needs to *also* be guarded on `self.device.type == privateuseone`? in other words, could the condition for using `clone` not be this?

    ```python
    self.is_sparse
    or self.device.type
    in ["lazy", "xla", "mps", "ort", "meta", "hpu", "ipu"]
    or not torch._C._has_storage(self)
    or (type(self) is not Tensor and self.data_ptr() == 0)
    ```

    If the condition fails, the very next thing is a call to `self._typed_storage()` which will fail, so it feels to me like *any* case without storage shouldn't fall through to the `storage()` call.

    The original PR for adding the 'no storage and device is `PrivateUse1`' condition ([86557](https://github.com/pytorch/pytorch/pull/86557)) doesn't discuss whether this could be broadened.
    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89129
    Approved by: https://github.com/albanD

commit fa7a963f6536dd05c381fbf23270f4f009f9f113
Author: Sergii Dymchenko <sdym@fb.com>
Date:   Wed Nov 23 19:39:47 2022 +0000

    Remove BaseException TODO (#89540)

    After discussion in https://github.com/pytorch/pytorch/pull/88461#issuecomment-1318965664
    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89540
    Approved by: https://github.com/H-Huang

commit 9eed6b7f9aa4f5fc65075de3189acc9add221660
Author: Yanbo Liang <ybliang8@gmail.com>
Date:   Wed Nov 23 19:39:43 2022 +0000

    [Dynamo] Several fixes on TensorVariable & TorchVariable (#89486)

    This is a group of bug fixes for [7k github models](https://github.com/pytorch/torchdynamo/issues/1884), it would fix 30+ model tests.
    * Support ```tensor.type()```.
    * Support ```tensor.get_device()```.
    * Support ```torch.nn.functional._Reduction.get_enum```.
    * Support ```torch._utils._get_device_index()```.
    * Fallback ```tensor.data_ptr()```.
      * ```FakeTensor``` always returns 0
      * For no fake tensor propagation, we ```clone``` the input tensor, which makes no sense to track the original ```data_ptr```. And I don't think this is a very popular API.

    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89486
    Approved by: https://github.com/jansel

commit f03e6672fb6a694d6f03980e3f34d8181c7cc663
Author: Iris <wz337@cornell.edu>
Date:   Wed Nov 23 19:39:01 2022 +0000

    [Checkpoint][2D] Minor update for dedup_tensors.py (#89542)

    Rename variables for better readability.

    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89542
    Approved by: https://github.com/H-Huang

commit 74703eb50299b26082bc2a357770739a68460199
Author: Iris <wz337@cornell.edu>
Date:   Wed Nov 23 19:36:01 2022 +0000

    [Checkpoint] Add a logger to dedup_tensors (#89503)

    Add a logger to dedup_tensors to log the duplicate keys to remove in global plan (List of SavePlan).

    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89503
    Approved by: https://github.com/fduwjj

commit 57353c9608263df98156a73aaa6ed35a2a2306ad
Author: Brian Hirsh <hirsheybar@fb.com>
Date:   Wed Nov 23 08:29:08 2022 -0800

    first draft of input mutation handling for aot autograd (#88817)

    Pull Request resolved: https://github.com/pytorch/pytorch/pull/88817
    Approved by: https://github.com/ezyang, https://github.com/wconstab

commit 902e4e3926a9333178510f032580e4acd56c40da
Author: PyTorch MergeBot <pytorchmergebot@users.noreply.github.com>
Date:   Wed Nov 23 19:05:13 2022 +0000

    Revert "Fix the kineto daemon build condition (#89174)"

    This reverts commit 9fd00f194ae4e28948a9a03a6382c20dde04e4fd.

    Reverted https://github.com/pytorch/pytorch/pull/89174 on behalf of https://github.com/robieta due to For some reason this is interacting badly with NVFuser. I think it is instability in kineto, but until we figure out what's going on reverting is a necessary evil.

commit 049a0f2cd5916c8392c6bd1adc41c709de892f3a
Author: Bin Bao <binbao@fb.com>
Date:   Wed Nov 23 02:00:44 2022 +0000

    [inductor] Update CI model tests (#89499)

    Summary:
    1) Add model inference test
    2) Switch model training test to use AMP

    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89499
    Approved by: https://github.com/bertmaher

commit 95474e00a9477b1333e13fa95887a2ce05c4a6a6
Author: Jerry Zhang <jerryzh168@gmail.com>
Date:   Tue Nov 22 20:29:26 2022 -0800

    [quant][be] Remove unused util code (#89272)

    Summary:
    att

    Test Plan:
    python test/test_quantization.py TestQuantizeFx

    Reviewers:

    Subscribers:

    Tasks:

    Tags:

    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89272
    Approved by: https://github.com/andrewor14

commit 128faf2b69f62b55d3ae1b4cb3e24ec594af0009
Author: Jerry Zhang <jerryzh168@gmail.com>
Date:   Tue Nov 22 20:29:26 2022 -0800

    [quant][be] Refactor the error checking code for quantize_per_channel op (#89271)

    Summary:
    at

    Test Plan:
    make sure it compiles

    Reviewers:

    Subscribers:

    Tasks:

    Tags:
    Pull Request resolved: https://github.com/pytorch/pytorch/pull/89271
    Approve…
kulinseth pushed a commit to kulinseth/pytorch that referenced this pull request Dec 10, 2022
The preexisting logic here added in
pytorch/functorch#970 was very peculiar: if top_kwargs
was non-empty, then the inner compiled function supports kwargs.  Naively, this
would leave you to expect that there is some sort of correlation between
top_kwargs and kwargs.  But in fact, they're completely unrelated!  top_kwargs
is the AOTAutograd configuration knobs (e.g., fw_compiler/bw_compiler), but
kwargs is the RUNTIME kwargs that are to be passed to the compiled function.
But (1) we don't support this (the function to be compiled only takes a list
of tensors) and (2) even if we did support it, conditioning on whether or not
you had passed AOTAutograd configuration kwargs to support kwargs at runtime
is bonkers.

So delete it.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: pytorch#89664
Approved by: https://github.com/voznesenskym
@facebook-github-bot facebook-github-bot deleted the gh/ezyang/1584/head branch June 8, 2023 16:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ciflow/inductor ciflow/trunk Trigger trunk jobs on your pull request release notes: functorch release notes category; Pertaining to torch.func or pytorch/functorch topic: not user facing topic category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants