Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ONNX Export Support for CrossEntropyLoss #34830

Closed
wants to merge 92 commits into from

Conversation

@KsenijaS
Copy link
Collaborator

KsenijaS commented Mar 16, 2020

Add ONNX export support for torch.nn.CrossEntropyLoss.

This PR makes following changes:

  1. Updates nll_loss export
  2. Makes a post pass for SoftmaxCrossEntropy
KsenijaS added 30 commits Feb 14, 2020
@KsenijaS
Copy link
Collaborator Author

KsenijaS commented Apr 10, 2020

Could you rebase again?

@houseroad I did rebase, but now I have even more failed builds

@BowenBao
Copy link
Contributor

BowenBao commented Apr 13, 2020

@houseroad the errors look unrelated.

Cloning into '.'...
ssh: connect to host github.com port 22: Operation timed out

fatal: Could not read from remote repository.

Please make sure you have the correct access rights
and the repository exists.

exit status 128
panic: runtime error: comparing uncomparable type tar.headerError

goroutine 11 [running]:
net/http.(*Request).write(0xc42056e700, 0x55dd6f4e3660, 0xc4203a1080, 0x0, 0x0, 0x0, 0x55dd6f4e6260, 0xc420d85a60)
	/usr/local/go/src/net/http/request.go:624 +0x71d
net/http.(*persistConn).writeLoop(0xc4201ed7a0)
	/usr/local/go/src/net/http/transport.go:1825 +0x1ec
created by net/http.(*Transport).dialConn
	/usr/local/go/src/net/http/transport.go:1238 +0x981

Exited with code exit status 2
@KsenijaS
Copy link
Collaborator Author

KsenijaS commented Apr 14, 2020

@houseroad I noticed this PR #33328 had the same type of failures, and the common thing between 2 PRs is that they both update onnx submodule.

Copy link
Member

houseroad left a comment

please don't update the submodule, I will find sometime to update it from internal.

KsenijaS added 4 commits Apr 15, 2020
This reverts commit 388372a.
This reverts commit d937bd5.
This reverts commit 71bb6cf.
This reverts commit 6544f69.
@KsenijaS
Copy link
Collaborator Author

KsenijaS commented Apr 15, 2020

@houseroad some tests are failing because of missing update to ONNX submodule. Does this PR have to wait for the internal update to ONNX submodule before being merged?

@KsenijaS
Copy link
Collaborator Author

KsenijaS commented Apr 22, 2020

@houseroad It seems that CI failures are not caused by this PR.
I disabled the tests that were failing because of missing onnx update.
LGTM

@BowenBao
Copy link
Contributor

BowenBao commented Apr 22, 2020

@houseroad please have a look. @KsenijaS is separating this into two PRs. The second PR will enable CrossEntropy tests, once ONNX submodule is updated.

Copy link
Contributor

facebook-github-bot left a comment

@houseroad has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@@ -96,6 +96,22 @@
'|test_.*negative_ind.*' # negative axis is not supported yet
'|test_argmax_.*select_last_index.*' # unsupported case
'|test_argmin_.*select_last_index_.*' # unsupported case
'|test_celu.*' # cannot translate Celu op

This comment has been minimized.

Copy link
@houseroad

houseroad Apr 24, 2020

Member

since we don't update the submodule, we don't need to update this file.

@@ -877,6 +879,41 @@ def test_det(self):
x = torch.randn(2, 3, 5, 5, device=torch.device('cpu'))
self.assertONNX(lambda x: torch.det(x), x, opset_version=11)

@unittest.skip("disable test until onnx submodule is updated")

This comment has been minimized.

Copy link
@houseroad

houseroad Apr 24, 2020

Member

Shall we enable the tests in test_opeartor.py by setting enable_onnx_checker to False?

This comment has been minimized.

Copy link
@BowenBao

BowenBao Apr 24, 2020

Contributor

The assertONNX also runs onnx_checker.

This comment has been minimized.

Copy link
@houseroad

houseroad Apr 24, 2020

Member

Shall we add a flag to disable that as well in assertONNX?

This comment has been minimized.

Copy link
@houseroad

houseroad Apr 24, 2020

Member

Or we should remove the expect files :-)

This comment has been minimized.

Copy link
@KsenijaS

KsenijaS Apr 25, 2020

Author Collaborator

@houseroad there is no clean way to just disable onnx checker because export_to_pretty_string doesn't take enable_onnx_checker flag:

def export_to_pretty_string(model, args, f, export_params=True, verbose=False, training=None,

As of now it's only possible to disable onnx cheker for all tests or none.
Do you want expect files removed?

Copy link
Contributor

facebook-github-bot left a comment

@houseroad has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

facebook-github-bot commented Apr 26, 2020

@houseroad merged this pull request in 92e91ce.

zasdfgbnm added a commit that referenced this pull request Apr 28, 2020
commit 4cf03b29b7d39374ec2c424fd85c1815b4b27eb6
Author: Xiang Gao <qasdfgtyuiop@gmail.com>
Date:   Tue Apr 28 10:16:53 2020 -0700

    replace hypot with sqrt

commit 4cc4837dae4d14efd133b51fcc796a1f2d14771f
Author: Xiang Gao <qasdfgtyuiop@gmail.com>
Date:   Mon Apr 27 13:00:03 2020 -0700

    revert white space change

commit 0683bf1b743babd8507d985b7a1096962301a184
Author: Xiang Gao <qasdfgtyuiop@gmail.com>
Date:   Mon Apr 27 12:57:14 2020 -0700

    fix copy

commit b9e08bd42580d21acc2263e0212eb599612d7521
Author: Xiang Gao <qasdfgtyuiop@gmail.com>
Date:   Mon Apr 27 12:39:27 2020 -0700

    remove include of complex

commit 13d3df4816bec7a23469e088df79b717c209373a
Author: Xiang Gao <qasdfgtyuiop@gmail.com>
Date:   Mon Apr 27 12:38:32 2020 -0700

    fix scalar constructor

commit 2f5293d4bee8f1af7513adda922e8fe6961a2774
Author: Xiang Gao <qasdfgtyuiop@gmail.com>
Date:   Mon Apr 27 12:27:18 2020 -0700

    resolve review nits

commit 8bf035de3430a8af7837f43c392bd2e48136d2a6
Merge: a15f4d546c 201ba13911
Author: Xiang Gao <qasdfgtyuiop@gmail.com>
Date:   Mon Apr 27 12:23:15 2020 -0700

    Merge branch 'master' of github.com:pytorch/pytorch into hacking-dispatch

commit a15f4d546c5c3e5633fbad9c1d6d83a321ba86d0
Author: Xiang Gao <qasdfgtyuiop@gmail.com>
Date:   Mon Apr 27 12:16:44 2020 -0700

    revert all wrap changes

commit aac470d29e00f3bceedd2ec36c224ab4e75b34ab
Author: Xiang Gao <qasdfgtyuiop@gmail.com>
Date:   Mon Apr 27 12:06:54 2020 -0700

    fix

commit 3c5dd3130ef249222eb78f526c6ac59fb237de9d
Author: Xiang Gao <qasdfgtyuiop@gmail.com>
Date:   Mon Apr 27 11:35:38 2020 -0700

    revert white space change

commit 285d7c7d63f19d4a476511555505e734c0d47223
Author: Xiang Gao <qasdfgtyuiop@gmail.com>
Date:   Mon Apr 27 11:19:57 2020 -0700

    fix warning

commit 38fe795e80097bd7ce5638e2203978db4a8fefdb
Author: Xiang Gao <qasdfgtyuiop@gmail.com>
Date:   Mon Apr 27 11:17:27 2020 -0700

    USE_C10_COMPLEX

commit 201ba139115adeecae4f094a9c9790200e53ff99
Author: Parth Agarwal <iparthagarwal@gmail.com>
Date:   Mon Apr 27 11:11:35 2020 -0700

    Correct $ANDROID_HOME string empty check (#37064)

    Summary:
    Updated file to correct shell code to test whether $ANDROID_HOME env variable is empty or not.
    Pull Request resolved: https://github.com/pytorch/pytorch/pull/37064

    Differential Revision: D21181787

    Pulled By: IvanKobzarev

    fbshipit-source-id: 40c1d79d0fb730c7f68aa7472ce9b2398e91f2a2

commit 805c417ec94ad13b3d974a6f23d85bf69e9ffdb5
Author: Xiao Wang <24860335+xwang233@users.noreply.github.com>
Date:   Mon Apr 27 10:59:32 2020 -0700

    Implement avg_pool2d kernel for channels_last (#35855)

    Summary:
    Implement avg_pool2d for channels_last. This will close https://github.com/pytorch/pytorch/issues/34996.

    Performance compared with **avg_pool2d** contiguous can be found at https://github.com/xwang233/code-snippet/blob/ed6617c6bc48dac5757d9a1ca6f5db5a68e5d01b/avg-pool2d-channels-last/avg-pool2d-naive.ipynb

    cc csarofeen ptrblck
    Pull Request resolved: https://github.com/pytorch/pytorch/pull/35855

    Differential Revision: D21187360

    Pulled By: VitalyFedyunin

    fbshipit-source-id: b654b56168bc3982be306b634c7ed2f92018a9e5

commit ec8006cc1635a088aae36aa9263bc85140d9aa6e
Author: mattip <matti.picus@gmail.com>
Date:   Mon Apr 27 10:58:01 2020 -0700

    [ONNX] fix provider_version and add consistency test (#36797)

    Summary:
    forward port the test from pr gh-36795, xref issue gh-32561
    Pull Request resolved: https://github.com/pytorch/pytorch/pull/36797

    Differential Revision: D21257034

    Pulled By: ezyang

    fbshipit-source-id: d217da0e74f00a433c904defc0bf3eb5f594fd5e

commit 0048243f70f37a3ae74725fb21c88704d3ab62bb
Author: Lukas Koestler <lkskstlr@gmail.com>
Date:   Mon Apr 27 10:46:07 2020 -0700

    Check compiler -v to determine compiler (fix #33701) (#37293)

    Summary:
    As described in the issue (https://github.com/pytorch/pytorch/issues/33701) the compiler check
    	for building cpp extensions does not work with ccache.
    	In this case we check compiler -v to determine which
    	compiler is actually used and check it.
    Pull Request resolved: https://github.com/pytorch/pytorch/pull/37293

    Differential Revision: D21256913

    Pulled By: ezyang

    fbshipit-source-id: 5483a10cc2dbcff98a7f069ea9dbc0c12b6502dc

commit 6d409481b38692926930278002b50d2075396557
Author: Gao, Xiang <qasdfgtyuiop@gmail.com>
Date:   Mon Apr 27 10:29:07 2020 -0700

    Add overloads of std:: math functions for c10::complex (#35725)

    Summary:
    Issue: https://github.com/pytorch/pytorch/issues/35284

    ~This depends on and contains https://github.com/pytorch/pytorch/pull/35524. Please review after the dependency gets merged and I will rebase to get a clean diff.~

    The implementation of most functions follow the pattern

    ```C++
    template<typename T>
    C10_HOST_DEVICE c10::complex<T> some_function(c10::complex<T> x) {
    #if defined(__CUDACC__) || defined(__HIPCC__)
      return static_cast<c10::complex<T>>(thrust::some_function(static_cast<thrust::complex<T>>(x)));
    #else
      return static_cast<c10::complex<T>>(std::some_function(static_cast<std::complex<T>>(x)));
    #endif
    }
    ```
    Pull Request resolved: https://github.com/pytorch/pytorch/pull/35725

    Differential Revision: D21256854

    Pulled By: ezyang

    fbshipit-source-id: 2112ba6b79923450feafd7ebdc7184a3eaecadb6

commit a08a9f3b8222bc438ebdac86ecc44c1793d83c6b
Author: Ryad ZENINE <r.zenine@gmail.com>
Date:   Mon Apr 27 10:19:23 2020 -0700

    Enable uint8 upsampling 2 (#35029)

    Summary:
    Hi everyone,

    This is a supper small PR to enable `unit8` support for `nearest` up-sampling in `cpu` and `cuda`.
    This works enables us to move forward with the support of 'uint8' images in 'torchvision`.

    See impacted issues :
    https://github.com/pytorch/vision/issues/1375
    https://github.com/pytorch/vision/issues/1179#issuecomment-558197607

    Note: I wanted to add a unit test to ensure we have the expected behavior. I could not locate the `upsampling` unit tests for `nearest`. I can add the test if you point me to the right location.

    Thanks
    Pull Request resolved: https://github.com/pytorch/pytorch/pull/35029

    Reviewed By: cpuhrsch

    Differential Revision: D21227144

    Pulled By: fmassa

    fbshipit-source-id: 33c4b5188dedd8f7f872e9d797e2a9b58ee7315c

commit 5c9d1e48242587a9b1958df2d2efea3472072f4f
Author: Xingying Cheng <xcheng16@fb.com>
Date:   Mon Apr 27 10:16:59 2020 -0700

    Propagate module lints for mobile scripted module. (#37046)

    Summary:
    Pull Request resolved: https://github.com/pytorch/pytorch/pull/37046
    ghstack-source-id: 102669259

    Creating a python api entry to generate mobile model lints which takes a scripted module as argument and returns a map of module lints.

    The initial version is to create placeholder which included module bundled input as the first lint instance. More lints will be added in the future.

    Test Plan: python test/test_optimizer.py

    Reviewed By: dreiss

    Differential Revision: D21164648

    fbshipit-source-id: 9e8f4e19d74b5464a55cc73b9dc18f358c5947d6

commit 5b9f7f7b0e205a6d8d5f2e61f558eee378f0ce40
Author: Mo Zhou <cdluminate@gmail.com>
Date:   Mon Apr 27 09:34:52 2020 -0700

    [cmake] Add USE_SYSTEM_{GLOO,FP16,PTHREADPOOL,PSIMD,FXDIV,BENCHMARK} options (#14699) (#37277)

    Summary:
    These options are disabled by default, and are supposed to be used by
    linux distro developers. With the existing shortcut option
    USE_SYSTEM_LIBS toggled, these new options will be enabled as well.

    Additionally, when USE_SYSTEM_LIBS is toggled, setup.py should
    no longer check the existence of git submodules.

    ezyang
    Pull Request resolved: https://github.com/pytorch/pytorch/pull/37277

    Differential Revision: D21256999

    Pulled By: ezyang

    fbshipit-source-id: 84f97d008db5a5e41a289cb7bce94906de3c52cf

commit 3a0ff3cd2f04fcf3d4f6d152ab0772f048375cb6
Author: peterjc123 <peterghost86@gmail.com>
Date:   Mon Apr 27 08:28:56 2020 -0700

    Generate environment restore script for Windows build jobs (#37319)

    Summary:
    for better debugging purposes
    Pull Request resolved: https://github.com/pytorch/pytorch/pull/37319

    Differential Revision: D21257011

    Pulled By: ezyang

    fbshipit-source-id: 41c7f1aa440f3ea626536b64392cca32f7c32dd3

commit 007163407cd68a5131c159c2944bfb772ec913d4
Author: Mo Zhou <cdluminate@gmail.com>
Date:   Mon Apr 27 08:14:39 2020 -0700

    [cmake] Support "Generic" BLAS (#14699) (#37276)

    Summary:
    The "Generic" BLAS refers to the Netlib BLAS. This option is meaningful
    to the Debian family due to the "update-alternatives" mechanism, which
    enables the user to switch the libblas.so providers between different
    implementations at runtime, such as ATLAS, OpenBLAS, and Intel MKL.
    Such, building against generic BLAS provides much flexibility.

    This new option is not documented in setup.py because it's only supposed
    to be used by linux distro (especially Debian family) developersonly.

    ezyang
    Pull Request resolved: https://github.com/pytorch/pytorch/pull/37276

    Differential Revision: D21256877

    Pulled By: ezyang

    fbshipit-source-id: 55a5356653a1cfc763a5699b04afe5938f2007ec

commit 22ac071d9a173ea2358dd7c88a3a47fb1e2a2fe1
Author: Pavel Izmailov <izmailovpavel@gmail.com>
Date:   Mon Apr 27 07:39:50 2020 -0700

    Add SWA to PyTorch mainline (#35032)

    Summary:
    This PR is based on the issue https://github.com/pytorch/pytorch/issues/29994#issue-524418771 and the discussion in the previous version of the PR https://github.com/pytorch/pytorch/pull/30559. Specifically, I followed the interface outlined in this [comment](https://github.com/pytorch/pytorch/pull/30559#issuecomment-574864768).

    ## Structure
    - `torch/optim/swa_utils.py` contains the implementation of  `AveragedModel` class, `SWALR` learning rate scheduler and `update_bn` utility
    - `test/test_optim.py` contains unit tests for the three components of SWA
    - `torch/optim/swa_utils.pyi` describes the interface of `torch/optim/swa_utils.py`

    The new implementation consists of
    - `AveragedModel` class; this class creates a copy of a given model and allows to compute running averages of the parameters.
    - `SWALR` learning rate scheduler; after a certain number of epochs switches to a constant learning rate; this scheduler is supposed to be chained with other schedulers.
    - `update_bn` utility; updates the Batch Normalization activation statistics for a given model and dataloader; this utility is meant to be applied to `AveragedModel` instances.

    For `update_bn` I simplified the implementation compared to the [original PR](https://github.com/pytorch/pytorch/pull/30559) according to the sugestions by vadimkantorov.

    ## Example
    ```python
    loader, optimizer, model = ...
    swa_model = torch.optim.swa_utils.AveragedModel(model)
    # You can use custom averaging functions with `avg_fun` parameter
    ema_avg = lambda p_avg, p, n_avg: 0.1 * p_avg + 0.9 * p
    ema_model = torch.optim.swa_utils.AveragedModel(model,
                                        avg_function=ema_avg)
    scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer,
                                        T_max=300)
    swa_start = 160
    swa_scheduler = SWALR(optimizer, start_epoch=swa_start, swa_lr=0.05)

    for i in range(300):
         for input, target in loader:
             optimizer.zero_grad()
             loss_fn(model(input), target).backward()
             optimizer.step()
             scheduler.step()
             swa_scheduler.step()

         if i > swa_start:
             swa_model.update_parameters(model)

    # Update bn statistics for the swa_model at the end
    torch.optim.swa_utils.update_bn(loader, swa_model)
    ```

    UPDATED:
    ```python3
    loader, optimizer, model, loss_fn = ...
    swa_model = torch.optim.swa_utils.AveragedModel(model)
    scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max=300)
    swa_start = 160
    swa_scheduler = SWALR(optimizer, swa_lr=0.05)
    for i in range(300):
         for input, target in loader:
             optimizer.zero_grad()
             loss_fn(model(input), target).backward()
             optimizer.step()
         if i > swa_start:
             swa_model.update_parameters(model)
             swa_scheduler.step()
         else:
             scheduler.step()

    # Update bn statistics for the swa_model at the end
    torch.optim.swa_utils.update_bn(loader, swa_model)
    ```

    Fixes https://github.com/pytorch/pytorch/issues/29994
    cc soumith vincentqb andrewgordonwilson vadimkantorov
    Pull Request resolved: https://github.com/pytorch/pytorch/pull/35032

    Differential Revision: D21079606

    Pulled By: vincentqb

    fbshipit-source-id: e07f5e821f72ada63789814c2dcbdc31f0160c37

commit 828d590b06109f1ed1ab5d5e7fc6601aae4af198
Author: Jeff Daily <jeff.daily@amd.com>
Date:   Mon Apr 27 06:48:05 2020 -0700

    [ROCm] Update to ROCm 3.3 (#37247)

    Summary:
    CC ezyang .

    ROCm 3.3 packages went live on 2020-04-01.  Tag 376 was pushed on 2020-04-15, so it should be based on ROCm 3.3.

    The upgrade to ROCm 3.3 is required as part of the effort to stabilize ROCm CI.
    Pull Request resolved: https://github.com/pytorch/pytorch/pull/37247

    Differential Revision: D21256198

    Pulled By: ezyang

    fbshipit-source-id: 92ac21c0122eda360ec279d2c3d462c3e6bf4646

commit f41742ff2fd5c9507c037dc120d75f6f191a87b1
Author: Wanchao Liang <wanchaol@users.noreply.github.com>
Date:   Sun Apr 26 22:18:55 2020 -0700

    [autograd] remove spinning for dist engine (#36606)

    Summary:
    Pull Request resolved: https://github.com/pytorch/pytorch/pull/36606

    This PR refactor the continuation logic of the async mode on autograd
    engine, to avoid launch spinning works. To achieve that:
    1. remove the continuation logic in
    execute_graph_task_with_continuiation
    2. separate the usage of execute_graph_task between dist_engine and
    local engine, now dist_engine universally use
    `execute_graph_task_until_ready_queue_empty` (a better name appreciated
    here).
    3. remove enqueue_blocked_task_on_cpu
    4. remove the async mode in `execute_with_graph_task` as we don't need
    to use it in dist_engine

    Test Plan: Imported from OSS

    Differential Revision: D21032731

    Pulled By: wanchaol

    fbshipit-source-id: 708ea3bc14815bdc151b56afa15eb85b4ac0f4b1

commit ed9ec3c96fdc9656c5bac144887c312a0168469e
Author: Wanchao Liang <wanchaol@users.noreply.github.com>
Date:   Sun Apr 26 22:18:55 2020 -0700

    [autograd] refactor some functions (#37061)

    Summary:
    Pull Request resolved: https://github.com/pytorch/pytorch/pull/37061

    This PR refactors:
    1. `set_device` to make it out of Engine
    2. put `graph_task_completed` into GraphTask
    3. put `mark_graph_task_completed` into GraphTask

    This also make the distributed engine easy to call those functions.

    Test Plan: Imported from OSS

    Differential Revision: D21188688

    Pulled By: wanchaol

    fbshipit-source-id: f56106e6ed7d966cfa4d962781c7865cc3c5321d

commit 47fec01c45c696e247aff9e910f29a9586ae0869
Author: lixinyu <lixinyu@devgpu175.prn2.facebook.com>
Date:   Sun Apr 26 10:57:53 2020 -0700

    Fix cpp extension compile failure on some envs (#37221)

    Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37221

    Test Plan: Imported from OSS

    Differential Revision: D21226873

    Pulled By: glaringlee

    fbshipit-source-id: 0a390bbeaf153ee5ec355943f92c2dbcc5e04b59

commit b428f454e13f6e8055124ea19c32b554017137d0
Author: Mike Ruberry <mruberry@fb.com>
Date:   Sun Apr 26 04:25:28 2020 -0700

    Revert D18927220: if_constexpr for C++14

    Test Plan: revert-hammer

    Differential Revision:
    D18927220

    Original commit changeset: 19a135e00af6

    fbshipit-source-id: a1b8755a27903b98b742881b3ecce4f5e99543b2

commit b64fc3c4b5d927928770f9b343eb845123367084
Author: Mike Ruberry <38511765+mruberry@users.noreply.github.com>
Date:   Sat Apr 25 21:16:50 2020 -0700

    Changes warnings generated in cpp to show point of Python origination (#36052)

    Summary:
    Today in PyTorch, warnings triggered in C++ are printed to Python users like this:

    `../aten/src/ATen/native/BinaryOps.cpp:81: UserWarning: Integer division of tensors using div or / is deprecated, and in a future release div will perform true division as in Python 3. Use true_divide or floor_divide (// in Python) instead.`

    This may be unhelpful to Python users, who have complained it's difficult to relate these messages back to their programs. After this PR, warnings that go through the PyWarningHandler and allow it to add context print like this:

    ```
    test/test_torch.py:16463: UserWarning: Integer division of tensors using div or / is deprecated, and in a future release div will perform true division as in Python 3. Use true_divide or floor_divide (// in Python) instead. (Triggered internally at  ../aten/src/ATen/native/BinaryOps.cpp:81.)
      cpu_result = getattr(cpu_tensor, op_str)(*cpu_args)
    ```

    This relates the warning back to the user's program. The information about the cpp file and line number is preserved in the body of the warning message.

    Some warnings, like those generated in the JIT, already account for a user's Python context, and so they specify that they should be printed verbatim and are unaffected by this change. Warnings originating in Python and warnings that go through c10's warning handler, which prints to cerr, are also unaffected.

    A test is added to test_torch.py for this behavior. The test relies on uint8 indexing being deprecated and its warning originating from its current header file, which is an unfortunate dependency. We could implement a `torch.warn` function, instead.
    Pull Request resolved: https://github.com/pytorch/pytorch/pull/36052

    Differential Revision: D20887740

    Pulled By: mruberry

    fbshipit-source-id: d3515c6658a387acb7fccaf83f23dbb452f02847

commit f8ec51bd865bb488dc0c30f1e970c5dc49ce4727
Author: Peter Bell <peterbell10@live.co.uk>
Date:   Sat Apr 25 20:55:28 2020 -0700

    Ensure DataParallel replicas can be saved (#37307)

    Summary:
    Fixes https://github.com/pytorch/pytorch/issues/37182

    The `zero_grad` wrapper from `_replicate_for_data_parallel` can't be pickled. So instead, I set an attribute `_is_replica = True` and check for this in `Module.zero_grad`.
    Pull Request resolved: https://github.com/pytorch/pytorch/pull/37307

    Differential Revision: D21246119

    Pulled By: mrshenli

    fbshipit-source-id: 4755786d48a20bc247570ba672de9dd526914ce1

commit 2b050371b4cecd9c12b5f763e6867ff1c1019aab
Author: Omkar Salpekar <osalpekar@fb.com>
Date:   Sat Apr 25 20:11:33 2020 -0700

    Make listenLoopInternal non-virtual (#37265)

    Summary:
    Pull Request resolved: https://github.com/pytorch/pytorch/pull/37265

    In PGA, `listenLoopInternal` should not be virtual - PGA doesn't have any child classes that override this. Re-arranged some comments for `listenLoop` as well.
    ghstack-source-id: 102880792

    Test Plan: Sandcastle/CI

    Differential Revision: D21238761

    fbshipit-source-id: 5ec5058bc462182cf970faca9a734c11c7be2a32

commit d98ea604f4c31f86b2afe1afd96f283ef77c4da2
Author: Omkar Salpekar <osalpekar@fb.com>
Date:   Sat Apr 25 19:22:51 2020 -0700

    Improve Error Message for Dist Autograd Context Cleanup Failure (#37255)

    Summary:
    Pull Request resolved: https://github.com/pytorch/pytorch/pull/37255

    Improved error message logged when Distributed Autograd Context cleanup fails - added node information and underlying error. The previous error message also assumed that the cause of the error was due to too many RPC's failing, but this is not necessarily the case.
    ghstack-source-id: 102867620

    Test Plan: Ensuring Sandcastle/CI tests pass. Verified the correct message is logged when this code path is executed in `test_backward_node_failure` and `test_backward_node_failure_python_udf` .

    Differential Revision: D20950664

    fbshipit-source-id: 267318187b7ef386930753c9679a5dfab6d87018

commit b198796a2810ebd7fdefec3389c17be47ba6a6ce
Author: Zafar <cc.rafaz@zafar.cc>
Date:   Sat Apr 25 18:19:03 2020 -0700

    [quant] quantized reflection_pad1d (#36450)

    Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36450

    Test Plan: Imported from OSS

    Differential Revision: D20984967

    Pulled By: z-a-f

    fbshipit-source-id: 4731f16ba05a6aa57636d9ab85f12dfdeebcf08d

commit 7604f470ed083d55c6a25bee3f995c7e71ea488f
Author: Yinghai Lu <yinghai@fb.com>
Date:   Sat Apr 25 18:05:21 2020 -0700

    Add weight info in debug_ssa_net (#37262)

    Summary:
    Pull Request resolved: https://github.com/pytorch/pytorch/pull/37262

    It's convenient to have weights info in the debug_ssa_net so that we can tell what is weight and what is primary inputs. We can get their shape and size info with some post-processing script easily.

    Reviewed By: ChunliF

    Differential Revision: D21237537

    fbshipit-source-id: 1fadc605283ef2eed78c44494e062a16ccf135ab

commit 92e91cee8dc9d78314308ace125022835fcbc0c9
Author: Ksenija Stanojevic <ksenija.stanojevic@gmail.com>
Date:   Sat Apr 25 17:54:57 2020 -0700

    ONNX Export Support for CrossEntropyLoss (#34830)

    Summary:
    Add ONNX export support for torch.nn.CrossEntropyLoss.

    This PR makes following changes:
    1. Updates nll_loss export
    2. Makes a post pass for SoftmaxCrossEntropy
    Pull Request resolved: https://github.com/pytorch/pytorch/pull/34830

    Reviewed By: hl475

    Differential Revision: D21230712

    Pulled By: houseroad

    fbshipit-source-id: c81911a41968e23813ba10274340ce4d8ba1ed78

commit 205c6ffbc5febd27b810c37e1bfae50b9655f8e4
Author: Zafar <cc.rafaz@zafar.cc>
Date:   Sat Apr 25 17:04:23 2020 -0700

    [quant] Generalizing _calculate_dynamic_qparams in quantized test (#36449)

    Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36449

    Test Plan: Imported from OSS

    Differential Revision: D20984966

    Pulled By: z-a-f

    fbshipit-source-id: 17437297adae813bc5c6fa43c6c7514f72ce2f6c

commit ca39f99d48a6fc43384a86ecf745df40f038d21f
Author: Haixin Liu <haixin@fb.com>
Date:   Sat Apr 25 16:44:13 2020 -0700

    [Pytorch Numeric Suite] Add module level comparison (#37242)

    Summary:
    Pull Request resolved: https://github.com/pytorch/pytorch/pull/37242

    Add module level comparison API.
    ghstack-source-id: 102853727

    Test Plan: buck test mode/dev caffe2/test:quantization -- 'test_compare_model_stub'

    Reviewed By: raghuramank100

    Differential Revision: D21232277

    fbshipit-source-id: de707eea101a66a37869129460274c56e4e07db2

commit a04022c656516c08e3719628f39ac47a9328155a
Author: Nikita Shulga <nshulga@fb.com>
Date:   Sat Apr 25 15:53:00 2020 -0700

    Use `std::chrono::high_resolution_clock` for profiling on Mac (#37280)

    Summary:
    According to Darwin man-page:
        `CLOCK_REALTIME` the system's real time (i.e. wall time) clock, expressed as the amount of time since the Epoch.  This is the same as the value returned by `gettimeofday`(2).

    I.e. its returns timestamp with microsecond resolution, as can be obvserved by running following small program:
    ```
    #include <sys/time.h>
    #include <stdint.h>
    #include <stdbool.h>
    #include <stdio.h>

    bool conseq_time(clockid_t c) {
      struct timespec t1, t2;
      clock_gettime(c, &t1);
      clock_gettime(c, &t2);
      printf("t1={.tv_sec=%ld, .tv_nsec=%ld}\n", t1.tv_sec, t1.tv_nsec);
      printf("t2={.tv_sec=%ld, .tv_nsec=%ld}\n", t2.tv_sec, t2.tv_nsec);
      bool rc = t1.tv_sec == t2.tv_sec && t1.tv_nsec == t2.tv_nsec;
      printf("Two timestamps are %sequal\n", rc ? "" : "not ");
      return rc;
    }

    int main(void) {
      printf("using CLOCK_REALTIME\n");
      conseq_time(CLOCK_REALTIME);
      printf("using CLOCK_MONOTONIC_RAW\n");
      conseq_time(CLOCK_MONOTONIC_RAW);
      return 0;
    }
    ```
    which if compiled outputs something like:
    ```
    using CLOCK_REALTIME
    t1={.tv_sec=107519, .tv_nsec=860315000}
    t2={.tv_sec=107519, .tv_nsec=860315000}
    Two timestamps are equal
    using CLOCK_MONOTONIC_RAW
    t1={.tv_sec=107520, .tv_nsec=954297363}
    t2={.tv_sec=107520, .tv_nsec=954297426}
    Two timestamps are not equal
    ```

    But why do it, if all this platform specific logic is already nicely abstracted in `std::chrono::`:
    https://github.com/llvm/llvm-project/blob/master/libcxx/src/chrono.cpp#L117
    Pull Request resolved: https://github.com/pytorch/pytorch/pull/37280

    Differential Revision: D21246608

    Pulled By: malfet

    fbshipit-source-id: 6beada30657a2720000e34214b1348112e55be50

commit 59052e39b8daa12a7243ac9e0bbd6714a4fdb861
Author: Zafar <cc.rafaz@zafar.cc>
Date:   Sat Apr 25 15:50:38 2020 -0700

    [quant] qtensor resize (#36442)

    Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36442

    Test Plan: Imported from OSS

    Differential Revision: D20984080

    Pulled By: z-a-f

    fbshipit-source-id: 7fcf24bd2f92f038b670f510118b012d8c7acc74

commit bf860a4ebafbfcb75e61a8603bded72f6d0b0970
Author: Mike Ruberry <38511765+mruberry@users.noreply.github.com>
Date:   Sat Apr 25 15:34:39 2020 -0700

    Adds missing documentation . (#37295)

    Summary:
    Fixes torch.isclose documentation missing a `.`.
    Pull Request resolved: https://github.com/pytorch/pytorch/pull/37295

    Differential Revision: D21245426

    Pulled By: mruberry

    fbshipit-source-id: 88ce57ed68c2eac6aa83932780a6ba30e9fa69ea

commit 34284c127930dc12d612c47cab44cf09b432b522
Author: Raghuraman Krishnamoorthi <raghuraman@fb.com>
Date:   Sat Apr 25 14:50:40 2020 -0700

    Fix NaN error in dynamic quantization in qLinear, re-enable test_quantized_rnn (#36009)

    Summary:
    Pull Request resolved: https://github.com/pytorch/pytorch/pull/36009

    When scale is very small (less than float eps, but greater than minimum double precision value), computation of reciprocal of scale in floating point precision within FBGEMM returns inf, while QuantUtils does not. Changed computation in QuantUtils to occur with floating point precision to re-enable tests.
    ghstack-source-id: 102896302

    Test Plan:
    buck test caffe2/test:quantization -- 'test_quantized_rnn \(quantization\.test_quantization\.PostTrainingDynamicQuantTest\)' --print-passing-details --run-disabled
    Summary (total time 59.91s):
      PASS: 1
      FAIL: 0
      SKIP: 0
      FATAL: 0
      TIMEOUT: 0
      OMIT: 0

    Differential Revision: D20853000

    fbshipit-source-id: 948a888f5516b3ba9c6efb7de31ef2cc9d431991

commit 84a31fb4e7fb1b5dbe9e42f5e1e30be4a0440189
Author: Mike Ruberry <mruberry@fb.com>
Date:   Sat Apr 25 14:20:33 2020 -0700

    Revert D18927221: Boxing uses if_constexpr instead of SFINAE

    Test Plan: revert-hammer

    Differential Revision:
    D18927221

    Original commit changeset: 70d99025b45e

    fbshipit-source-id: a4b650bbb6d76dda6086d88eb554f3c3077b0f76

commit c90955e3d12391bb7ad22fb9a22eba8f768267a4
Author: James Reed <jamesreed@fb.com>
Date:   Sat Apr 25 13:53:12 2020 -0700

    [profiler] Sort by end interval as well when parsing CPU trace (#37297)

    Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37297

    Test Plan: Imported from OSS

    Reviewed By: ngimel

    Differential Revision: D21245463

    Pulled By: jamesr66a

    fbshipit-source-id: 8d307eaa32fa960b93dfd9a3b0b4c767fd903094

commit ea741f829e825e8ff87ed67cd80a71d65fbb9c73
Author: Nikita Shulga <nshulga@fb.com>
Date:   Sat Apr 25 13:51:10 2020 -0700

    Add `--repeat` option to python unit-test (#37281)

    Summary:
    This would run same testsuite (or individual test) multiple time
    Useful for detecting flaky tests

    Example usage: `python test_autograd.py TestAutograd.test_profiler -v --repeat=100`
    Pull Request resolved: https://github.com/pytorch/pytorch/pull/37281

    Differential Revision: D21244442

    Pulled By: malfet

    fbshipit-source-id: 3ecafec7ae87bc1e418aa28151bbc472ef37a713

commit 44345ad08c0aefcae400b948635f980c907f0f49
Author: Nikita Shulga <nshulga@fb.com>
Date:   Sat Apr 25 13:50:50 2020 -0700

    Do not define C10_IOS on Mac (#37283)

    Summary:
    Because MacOS is not iOS
    Pull Request resolved: https://github.com/pytorch/pytorch/pull/37283

    Test Plan: CI

    Differential Revision: D21244398

    Pulled By: malfet

    fbshipit-source-id: b822e216e83887e2f2961b5c5384eaf749629f61

commit cb27067b321dacbc8fd94d9a4b85c62d4244edbf
Author: Negin Raoof <neginmr@utexas.edu>
Date:   Sat Apr 25 12:21:03 2020 -0700

    [ONNX] Remove inverse op (#37005)

    Summary:
    ONNX inverse op is being removed.
    Pull Request resolved: https://github.com/pytorch/pytorch/pull/37005

    Reviewed By: hl475

    Differential Revision: D21230728

    Pulled By: houseroad

    fbshipit-source-id: 7e10414918c57938cda4ca03875c070319d429fb

commit b18f57e5480ce4461c7583d66188357c635e2cbc
Author: Sebastian Messmer <messmer@fb.com>
Date:   Sat Apr 25 11:29:38 2020 -0700

    Boxing uses if_constexpr instead of SFINAE (#31092)

    Summary:
    Pull Request resolved: https://github.com/pytorch/pytorch/pull/31092

    -
    ghstack-source-id: 102878439

    Test Plan: unit tests

    Reviewed By: ezyang

    Differential Revision: D18927221

    fbshipit-source-id: 70d99025b45edfaef11a0d587cf8bf8e749df6b8

commit f5e6f1f333b98a596daef9f277cb7f915de91c75
Author: Sebastian Messmer <messmer@fb.com>
Date:   Sat Apr 25 11:29:38 2020 -0700

    if_constexpr for C++14 (#31091)

    Summary:
    Pull Request resolved: https://github.com/pytorch/pytorch/pull/31091

    This implements a C++17 "if constexpr" like feature for C++14.
    This can be used, for example, to replace SFINAE or to force the compiler to remove some parts of a function in the assembly based on a condition.
    PRs stacked on top will use this to simplify some of our template metaprogramming.
    ghstack-source-id: 102867141

    Test Plan: unit tests

    Differential Revision: D18927220

    fbshipit-source-id: 19a135e00af6ebb0139ce3730353762d4512158f

commit 04b36fc264c63d31e481166c675935b1d99afc5e
Author: Bram Wasti <bwasti@fb.com>
Date:   Sat Apr 25 09:59:06 2020 -0700

    [TensorExpr] rfactor implementation (#36237)

    Summary:
    A similar interface to Halide's rfactor: https://halide-lang.org/tutorials/tutorial_lesson_18_parallel_associative_reductions.html
    Pull Request resolved: https://github.com/pytorch/pytorch/pull/36237

    Reviewed By: zheng-xq

    Differential Revision: D21233309

    Pulled By: bwasti

    fbshipit-source-id: d2706a9e90b707ee195e339f834ff4a54b63a256

commit c52deb694ed9a5e18520a81be07a249fd9a70567
Author: Shen Li <cs.shenli@gmail.com>
Date:   Sat Apr 25 09:33:11 2020 -0700

    Consolidate usage on torch::jit::toPyObject in RPC request_callback (#37249)

    Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37249

    Test Plan: Imported from OSS

    Differential Revision: D21234990

    Pulled By: mrshenli

    fbshipit-source-id: d07210151342bd2ad12d1364d9f22817ee59b0c2

commit 3d934c3d36f8967d79016b36e3cc7b9c2ffa6821
Author: Shen Li <cs.shenli@gmail.com>
Date:   Sat Apr 25 09:33:11 2020 -0700

    Add using torch::utils::Future to simplify code in RRefContext (#36811)

    Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36811

    Test Plan: Imported from OSS

    Differential Revision: D21093846

    Pulled By: mrshenli

    fbshipit-source-id: 61a6b1483ef1533803a18bec216ebe82aa187458

commit 269ec9a139d381605fa898539670163a92d0d107
Author: Shen Li <cs.shenli@gmail.com>
Date:   Sat Apr 25 09:33:11 2020 -0700

    Prevent RRef.to_here() to block an RPC thread on the callee using Future callbacks (#36805)

    Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36805

    Test Plan: Imported from OSS

    Differential Revision: D21093847

    Pulled By: mrshenli

    fbshipit-source-id: 81b0934874af36e03329fe6176628e3aca12811f

commit 6e1e55c1344400f1a38b3e2a2a40f96816cf81d3
Author: Shen Li <shenli@devfair017.maas>
Date:   Sat Apr 25 09:33:11 2020 -0700

    Prevent RRef unpickle to block waiting for OwnerRRef creation (#36785)

    Summary:
    Pull Request resolved: https://github.com/pytorch/pytorch/pull/36785

    Currently, RRef unpickle (both Python and TorchScript) will block
    until the OwnerRRef has been created by the original `rpc.remote`
    call, if it is an OwnerRRef. This is not ideal as the correctness
    would then depends on the number of threads configuration. This
    commit changed that behavior. Both `rpc.remote` and the unpickle
    can create OwnerRRefs. More specifically, whichever one arrives
    first will create the OwnerRRef and the subsequent ones will
    retrieve the same OwnerRRef, so that no one is blocking.

    Test Plan: Imported from OSS

    Differential Revision: D21083089

    Pulled By: mrshenli

    fbshipit-source-id: 34ef063d50549b01c968b47815c4fe9fac179d3d

commit 8872e00e11926833c1d7d5a0578524f808ebd631
Author: Xiang Gao <qasdfgtyuiop@gmail.com>
Date:   Sat Apr 25 09:08:22 2020 -0700

    fix type meta

commit d7f7c290e3d76a1e3019166644baf78de0d95a31
Author: Xiang Gao <qasdfgtyuiop@gmail.com>
Date:   Sat Apr 25 07:40:50 2020 -0700

    addmv migration [resubmit] (#37236)

    Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37236

    Differential Revision: D21232988

    Pulled By: anjali411

    fbshipit-source-id: ac6c0ee018aef3c841b039d76e6e1fbb3cd0292d

commit b9a2c35fdf203680a98d30f9201dac2b50548157
Author: Xiang Gao <qasdfgtyuiop@gmail.com>
Date:   Sat Apr 25 01:31:32 2020 -0700

    remove debug print

commit cfd70207b1753648ee5c474bee9905b0543d4db4
Author: Xiang Gao <qasdfgtyuiop@gmail.com>
Date:   Sat Apr 25 01:30:24 2020 -0700

    fix copy kernel

commit b6eb2a5f73640518b40708d6356ccb06b243d8ba
Author: Xiang Gao <qasdfgtyuiop@gmail.com>
Date:   Sat Apr 25 01:18:44 2020 -0700

    fix type in dispatch

commit 856e8cf0288fe3c1701d11fae61b214c08635b9d
Author: Ilia Cherniavskii <iliacher@fb.com>
Date:   Sat Apr 25 00:57:06 2020 -0700

    Revert D21213786: Enable global observers API

    Test Plan: revert-hammer

    Differential Revision:
    D21213786

    Original commit changeset: e618254da74a

    fbshipit-source-id: 425ea5d44fa55655ec0dd586c5075996b926177b

commit e6231c9e24c05e435eeb9dfcd66247e4520c559a
Author: Nikita Shulga <nshulga@fb.com>
Date:   Sat Apr 25 00:09:09 2020 -0700

    Do not run valgrind on the Aten unit tests compiled with clang (#37152)

    Summary:
    Valgrind detects some unitialized variables if torch_cpu is compiled with clang, which are not reproducible if the same code is compiled with gcc nor using address sanitizer tool
    See https://github.com/pytorch/pytorch/issues/37117
    Pull Request resolved: https://github.com/pytorch/pytorch/pull/37152

    Differential Revision: D21241577

    Pulled By: malfet

    fbshipit-source-id: 4a5dddf2a4fc4238dc9117cb92ee4e34af9e6064

commit c9b3d94a4dc571b9c711e7e8e6e378dbd78a2e0a
Author: Xiang Gao <qasdfgtyuiop@gmail.com>
Date:   Fri Apr 24 23:59:12 2020 -0700

    fix to

commit cd4688138bb52f36978ed6c9daab19ee864429b9
Author: Xiang Gao <qasdfgtyuiop@gmail.com>
Date:   Fri Apr 24 23:55:57 2020 -0700

    fix to

commit 6e659e928ba48afa8a6f5d734c37ab187734927b
Author: Ilia Cherniavskii <iliacher@fb.com>
Date:   Fri Apr 24 23:47:33 2020 -0700

    Enable global observers API (#37195)

    Summary:
    Pull Request resolved: https://github.com/pytorch/pytorch/pull/37195

    After adding c10::DispatchKey::Profiler the behavior of RecordFunction
    observers is also controlled by the dispatch key,
    this PR moves the logic outside of the profiler into the record function

    Reviewed By: ngimel

    Differential Revision: D21213786

    fbshipit-source-id: e618254da74a4f1ce16c51a3869bbd75a4f561ad

commit a0f1c2c97249b8c3e5fc9d537ebaee169cdab88a
Author: Xiang Gao <qasdfgtyuiop@gmail.com>
Date:   Fri Apr 24 23:25:53 2020 -0700

    fix item

commit 4e976b9334acbcaa015a27d56540cd2115c2639b
Author: Sebastian Messmer <messmer@fb.com>
Date:   Fri Apr 24 23:08:18 2020 -0700

    Remove callBoxedWorkaround (#36850)

    Summary:
    Pull Request resolved: https://github.com/pytorch/pytorch/pull/36850

    Since now all unboxing happens after dispatch, which means that all c10 ops support unboxing, we can now use op.callBoxed() for all ops and don't need callBoxedWorkaround (which was going through the JIT registry) anymore.
    ghstack-source-id: 102879558

    Test Plan: waitforsandcastle

    Differential Revision: D21102375

    fbshipit-source-id: d1e041116563a9650d5a86b07eb96d217d8756f3

commit 6efca91edcab0f258293467bc962c3fd1332f79a
Author: Xiang Gao <qasdfgtyuiop@gmail.com>
Date:   Fri Apr 24 22:48:46 2020 -0700

    enable data_ptr for std::complex

commit 6ea2aedab9da83a2dcf421880436c471ab40f0ec
Author: Hong Xu <hong@topbug.net>
Date:   Fri Apr 24 22:35:24 2020 -0700

    Cast shape_.size() to int64_t before comparing with squash_dim (#37109)

    Summary:
    This is generating a considerable amount of warning messages since TensorIterator.h is included from a lot of files:

        /home/hong/xusrc/pytorch/aten/src/ATen/native/TensorIterator.h:372:47:
        warning: comparison of integers of different signs: 'const int64_t' (aka 'const long') and 'c10::SmallVectorTemplateCommon::size_type' (aka 'unsigned long') [-Wsign-compare]
            TORCH_CHECK(squash_dim >= 0 && squash_dim < shape_.size(),
    Pull Request resolved: https://github.com/pytorch/pytorch/pull/37109

    Differential Revision: D21242163

    Pulled By: ngimel

    fbshipit-source-id: aec2978ee76750676a449eb6671142a782658de3

commit 35decf020b31c7ca25e5c65492046077ceb9f2cc
Author: Xiang Gao <qasdfgtyuiop@gmail.com>
Date:   Fri Apr 24 22:23:12 2020 -0700

    type meta

commit 30eb0bdf3257a62df303ab59991ad6eb784dd177
Author: Nikita Shulga <nshulga@fb.com>
Date:   Fri Apr 24 21:30:25 2020 -0700

    Do not define list "0" in torch/CMakeLists.txt (#37275)

    Summary:
    Per https://cmake.org/cmake/help/latest/command/list.html list insert arguments order is
    `list(INSERT <list> <index> [<element>...])`

    That is first argument is list name not the index it gets inserted into
    Pull Request resolved: https://github.com/pytorch/pytorch/pull/37275

    Differential Revision: D21243539

    Pulled By: malfet

    fbshipit-source-id: b947ad64f1a3549df68083383537899b19abd9ca

commit 904949382e36c282c547db545d98bde23553695f
Author: Raghuraman Krishnamoorthi <raghuraman@fb.com>
Date:   Fri Apr 24 20:55:32 2020 -0700

    Ensure that histogram observers have zero-point of zero for post ReLU activations (#37107)

    Summary:
    Pull Request resolved: https://github.com/pytorch/pytorch/pull/37107

    Currently histogram observers relax both the min and max values of the activations for performance speedup reasons. This causes an issue for glow where there is a slow down if the zero-point is not zero for post ReLU activations.
    ghstack-source-id: 102768017

    Test Plan: buck test caffe2/test:quantization -- 'test_histogram_observer_one_sided \(quantization\.test_quantization\.RecordHistogramObserverTest\)' --print-passing-details

    Differential Revision: D21187636

    fbshipit-source-id: 8d616b9e9caf2979a26a215e99434f71025e3d8b

commit ef9ec03e770d36b7138189ac5a96515487902a2f
Author: Xiaodong Wang <xdwang@fb.com>
Date:   Fri Apr 24 20:27:05 2020 -0700

    [CUDA11] Pytorch change (#37187)

    Summary:
    Pull Request resolved: https://github.com/pytorch/pytorch/pull/37187

    Adding CUDACC guard for gcc9+

    Reviewed By: ngimel

    Differential Revision: D21209798

    fbshipit-source-id: 5cc4efc7108577d74bee4c12c942ed1e5bf9bbac

commit 984f1ef2d6d967f3aca9b23f7b763f484260212b
Author: Xiang Gao <qasdfgtyuiop@gmail.com>
Date:   Fri Apr 24 20:25:48 2020 -0700

    ident

commit ea25b50901a5c56e8845ee364a95d82fc41f95e6
Author: Xiang Gao <qasdfgtyuiop@gmail.com>
Date:   Fri Apr 24 20:20:28 2020 -0700

    save

commit a80a438e3752d0f4b1820492e9d0051760b926bb
Author: Nikolay Korovaiko <korovaikon@gmail.com>
Date:   Fri Apr 24 20:10:21 2020 -0700

    correctly set and restore states in te tests (#37210)

    Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37210

    Differential Revision: D21238634

    Pulled By: Krovatkin

    fbshipit-source-id: 6462239753399c10c871baa5d5fdff5465cf2544

commit 686b521784a869cd48a75a16fce38bc25560a2ef
Author: Xiao Wang <24860335+xwang233@users.noreply.github.com>
Date:   Fri Apr 24 20:10:10 2020 -0700

    Update cusparse deprecated Xcsrmm2 call (#37202)

    Summary:
    Reland of https://github.com/pytorch/pytorch/issues/36845 due to Windows CI failure.

    binary_windows_wheel_3_7_cu102_build is passed, so the windows guard should be fine this time.
    Pull Request resolved: https://github.com/pytorch/pytorch/pull/37202

    Differential Revision: D21233358

    Pulled By: xw285cornell

    fbshipit-source-id: 707de0ff21d178686354ffaea7625f1d68b3e8d3

commit 22e79aaaa4cd395eee9409d13dc64f7ce1f85b1e
Author: Xiang Gao <qasdfgtyuiop@gmail.com>
Date:   Fri Apr 24 20:01:02 2020 -0700

    save

commit 4a72ddedcd2c645bb8fd507b375a0a42483ad1e1
Author: Gao, Xiang <qasdfgtyuiop@gmail.com>
Date:   Fri Apr 24 19:47:36 2020 -0700

    Show cpu info for macos jobs (#37220)

    Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37220

    Differential Revision: D21243205

    Pulled By: ezyang

    fbshipit-source-id: 77a4d904e80c59b6d4d39b1a1a0fb441d8a35f0c

commit 1d0334dd62ae18c7fd0c9fa5d048bf4a796e0c16
Author: Yang Gu <yangu@microsoft.com>
Date:   Fri Apr 24 19:46:53 2020 -0700

    Add cpu build and test to Windows CI (#37135)

    Summary:
    Add windows build and test for cpu
    Pull Request resolved: https://github.com/pytorch/pytorch/pull/37135

    Differential Revision: D21243189

    Pulled By: ezyang

    fbshipit-source-id: dd804ac258940e608facaf375d80ff5a0c59a7ae

commit 4a05558bd9de93fd90b85099b9606a03f53cd163
Author: Xiang Gao <qasdfgtyuiop@gmail.com>
Date:   Fri Apr 24 19:37:14 2020 -0700

    fix distribution

commit 5b7d9817c35fa4fb6adf31929b41e019cbdf958e
Author: Xiang Gao <qasdfgtyuiop@gmail.com>
Date:   Fri Apr 24 19:23:37 2020 -0700

    fix

commit f71593ee31f2366dafd660b4bbcfb3086dea0e81
Author: Xiang Gao <qasdfgtyuiop@gmail.com>
Date:   Fri Apr 24 19:11:47 2020 -0700

    fix

commit ff19d415d769d8e12dbd06ba5aae5e9ea951179d
Author: Xiang Gao <qasdfgtyuiop@gmail.com>
Date:   Fri Apr 24 19:01:42 2020 -0700

    fix comment

commit 3ada82d2364ca43e77eb7f77ff8e43fb858f296e
Author: Xiang Gao <qasdfgtyuiop@gmail.com>
Date:   Fri Apr 24 18:56:29 2020 -0700

    Automatically include c10/util/dont_wrap_complex.h

commit 398608de9a4bb2ab17dbfa826f435091fc75c8d3
Author: Xiang Gao <qasdfgtyuiop@gmail.com>
Date:   Fri Apr 24 18:53:12 2020 -0700

    fix

commit 093564d6918242aff70cd281554ab0142e01751e
Author: Xiang Gao <qasdfgtyuiop@gmail.com>
Date:   Fri Apr 24 18:38:16 2020 -0700

    fix

commit f71f97e17a4f3a4abf75ccd9d2f32a6312d8ebc5
Merge: 626473f5fe 1d8012a624
Author: Xiang Gao <qasdfgtyuiop@gmail.com>
Date:   Fri Apr 24 18:22:23 2020 -0700

    Merge branch 'master' of github.com:pytorch/pytorch into hacking-dispatch

commit 626473f5fe717d106e4888b9afdb49cd38782d81
Author: Xiang Gao <qasdfgtyuiop@gmail.com>
Date:   Fri Apr 24 18:20:26 2020 -0700

    Make c10::complex the C++ type for complex tensors

commit 1d8012a624e4dbc9f66c7942e82e168707796855
Author: Sebastian Messmer <messmer@fb.com>
Date:   Fri Apr 24 18:05:47 2020 -0700

    Delete dead code (#37254)

    Summary:
    Pull Request resolved: https://github.com/pytorch/pytorch/pull/37254

    This code is leftover from the KernelFactory deletion.
    ghstack-source-id: 102866045

    Test Plan: waitforsandcastle

    Differential Revision: D21235480

    fbshipit-source-id: 739ba677d2139ba9934d103f75a609638f1a3856

commit 1f08ff12ecd27cf18fe21cf1fcf90a1c824b3ff7
Author: Michael Suo <suo@fb.com>
Date:   Fri Apr 24 17:40:48 2020 -0700

    [jit] fix named tuples as attributes (#37251)

    Summary:
    Pull Request resolved: https://github.com/pytorch/pytorch/pull/37251

    This was broken by recent changes to how we serialize with type tags. We
    save a name (like `Dict[str, MyNamedTuple]`) and then relied on the
    mobile type parser to resolve that name back into a set of types.

    This doesn't work for any NamedTypes as the mobile type parser doesn't
    know how to resolve those. The unpickler allows the caller to inject a
    type resolver in for this purpose, use that so that when importing in a
    non-mobile environment you get the right results.

    A second problem also had to be fixed: the SourceImporter type loader
    would only load named types directly (e.g. `MyNamedTuple`) and choked if
    it was a general type that contained a named tupe (e.g.
    `List[MyNamedTuple]`). Fixed that and renamed `loadNamedType` to
    `loadType` for clarity.

    Test Plan: Imported from OSS

    Differential Revision: D21235213

    Pulled By: suo

    fbshipit-source-id: 16db0f4c5e91a890d67a8687cc8ababa6b94b0f4

commit 47c4dca1ab3fedfde7b1ce383e779454e7903e86
Author: Nikita Shulga <nshulga@fb.com>
Date:   Fri Apr 24 17:39:53 2020 -0700

    Remove python-2 or python<3.5 checks from unit tests (#37252)

    Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37252

    Test Plan: CI

    Differential Revision: D21241083

    Pulled By: malfet

    fbshipit-source-id: 44164b822f7905288abb2beda0175d2162d86143

commit 521910e0e97f6014c976cdab7dff024a038a0a76
Author: Michael Suo <suo@fb.com>
Date:   Fri Apr 24 17:17:39 2020 -0700

    Update clang_format_ci.sh (#37268)

    Summary:
    shellcheck led me astray!
    Pull Request resolved: https://github.com/pytorch/pytorch/pull/37268

    Differential Revision: D21241361

    Pulled By: suo

    fbshipit-source-id: 68244bb889e784ccd36d714209c2c15e2d6f04f8

commit b60c3dfdd963cd5b0879d9fae5130fac3ed79bbf
Author: James Reed <jamesreed@fb.com>
Date:   Fri Apr 24 16:22:25 2020 -0700

    Add fallback wrapper for profiler (#37194)

    Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37194

    Test Plan: Imported from OSS

    Reviewed By: ilia-cher, ngimel

    Differential Revision: D21217886

    Pulled By: jamesr66a

    fbshipit-source-id: b06195e9ac110979d128391e067d5c9f416c1873

commit 047488a7ffb42a4dad5c12992663738bd6c96004
Author: Basil Hosmer <bhosmer@fb.com>
Date:   Fri Apr 24 16:06:08 2020 -0700

    Mask all high dispatch keys in BackendSelect kernels (#37257)

    Summary:
    Pull Request resolved: https://github.com/pytorch/pytorch/pull/37257

    Previously, we were relying on fragile invariants to avoid collecting
    and feeding high precedence, non-backend dispatch keys to backend
    initialization machinery, which would assert on them. (These same
    keys are then used for redispatch, so a second latent problem lurks
    behind the first.) Here we mask off the BackendDispatch key and all
    keys to its left.

    Followup: move backend init code to backend-specific wrappers
    (`CPUType` etc.). This will let us remove the backend init code from
    both BackendSelect and STATIC_DISPATCH wrappers. (Though BackendSelect
    will still need to compute a dispatch key, so the logic introduced
    here will still be necessary.)

    Test Plan: Imported from OSS

    Differential Revision: D21235856

    Pulled By: bhosmer

    fbshipit-source-id: 1b8bd7897ed4b41a95718f3cfceddf4ee094744a

commit b6bb644e41b3928b5a515330ad35c8b447fcb876
Author: Zachary DeVito <zdevito@fb.com>
Date:   Fri Apr 24 15:12:12 2020 -0700

    Fix long line splitting issue in python_print (#37088)

    Summary:
    Pull Request resolved: https://github.com/pytorch/pytorch/pull/37088

    For an inlined expression tree like `(e_0, (e_1, e_long))` the previous
    algoritm only scanned the same statement as `e_long`, splitting the
    inlined expressions across lines. Because it did not scan `e_0`, `e_0`
    would still get emitted inline, causing it to reverse order with `e_1` and
    `e_long`. The new algorithm scans starting at `e_long` and going all
    the way back up the expression until it reaches the end of the inlined
    statement. Caching of what has already been scanned has been added so that
    if there was a second long long `e_long2` after `e_long`, it would not
    rescan and re-inline the statements that were already split.

    Test Plan: Imported from OSS

    Differential Revision: D21180394

    Pulled By: zdevito

    fbshipit-source-id: 4d142c83a04c89a47d04282f67a513f82cf153c0

commit d6ce6570f96e8edbf450728a5bfa080f181bcba0
Author: Hong Xu <hong@topbug.net>
Date:   Fri Apr 24 15:08:39 2020 -0700

    Remove unused imports in aten/src/ATen/function_wrapper.py (#37245)

    Summary:
    typing is available since Python 3.5, no need to try-import.
    Pull Request resolved: https://github.com/pytorch/pytorch/pull/37245

    Differential Revision: D21236650

    Pulled By: albanD

    fbshipit-source-id: daf150103835d0c6cd3c39300044e548bb6d311d

commit 4f3946a89b639f3b87c37b4190e2bc3dc22ee608
Author: anjali411 <chourdiaanjali123@gmail.com>
Date:   Fri Apr 24 15:03:38 2020 -0700

    Added complex dtypes to get_all_math_dtypes, complex acc type for cpu, fixed rdiv and pow for complex (#37193)

    Summary:
    Resolves https://github.com/pytorch/pytorch/issues/36730 https://github.com/pytorch/pytorch/issues/36057
    Partially resolves: https://github.com/pytorch/pytorch/issues/36671
    ```
    >>> 2j / torch.tensor([4], dtype = torch.complex64)
    tensor([(0.0000+0.5000j)], dtype=torch.complex64)
    >>> 1 / torch.tensor(3+4j)
    tensor((0.1200-0.1600j), dtype=torch.complex64)
    ```
    rdiv is more generally broken for all dtypes because it doesn't promote the types properly
    eg.
    ```
    >>> 1 / torch.tensor(2)
    tensor(0)
    >>> 2j / torch.tensor(4)
    tensor(0)
    ```
    so that issue should be fixed in a separate PR

    Adding CPU acc types for complex
    Added cumsum, cumprod for complex dtypes

    Added complex dtypes to get_all_math_dtypes to expand testing for complex dtypes

    Old PR - https://github.com/pytorch/pytorch/pull/36747
    Pull Request resolved: https://github.com/pytorch/pytorch/pull/37193

    Differential Revision: D21229373

    Pulled By: anjali411

    fbshipit-source-id: 8a086136d8c10dabe62358d276331e3f22bb2342

commit c38dcd45d70b2850047d9956e45ff3312966a078
Author: Wanchao Liang <wanchaol@users.noreply.github.com>
Date:   Fri Apr 24 14:45:11 2020 -0700

    [jit] fix return different types bug in tracing module calls (#37190)

    Summary:
    Pull Request resolved: https://github.com/pytorch/pytorch/pull/37190

    if module call return different types, we need to record them correctly

    Test Plan: Imported from OSS

    Differential Revision: D21214871

    Pulled By: wanchaol

    fbshipit-source-id: 46ba98f08ed4ade22f9740cb3fca84b29557e125

commit 5362a0b948450e2d2ba5f8ce2157a65b2f06b392
Author: Wanchao Liang <wanchaol@users.noreply.github.com>
Date:   Fri Apr 24 14:45:11 2020 -0700

    [jit] fix lifting bug in tracing module calls (#37189)

    Summary:
    Pull Request resolved: https://github.com/pytorch/pytorch/pull/37189

    This fix bug in tracing module calls to correct lift values with its
    correponding value type, rather than the default tensor type.

    Test Plan: Imported from OSS

    Differential Revision: D21214872

    Pulled By: wanchaol

    fbshipit-source-id: f635154851365e2d7b88186d6e47634123eac42f

commit a13b5b0ae85ea6b9ba6038f99658a88039e23782
Author: Xiang Gao <qasdfgtyuiop@gmail.com>
Date:   Fri Apr 24 14:16:54 2020 -0700

    Split reduction compile units (#37205)

    Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37205

    Test Plan: Imported from OSS

    Differential Revision: D21233254

    Pulled By: ngimel

    fbshipit-source-id: 68b37ebbdd715a30c616e425a39b6b21c01b37e2
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

8 participants
You can’t perform that action at this time.