MPS: Fix handling of 1D tensors in linear backward #80759

kulinseth · 2022-07-01T05:51:37Z

Fixes ##79784

facebook-github-bot · 2022-07-01T05:51:44Z

🔗 Helpful links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/80759
📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓Need help or want to give feedback on the CI? Visit our office hours

❌ 2 New Failures

As of commit 4953dc1 (more details on the Dr. CI page):

Expand to see more

2/2 failures introduced in this PR

🕵️ 2 new failures recognized by patterns

The following CI failures do not appear to be due to upstream breakages

trunk / linux-bionic-rocm5.1-py3.7 / test (default, 2, 2, linux.rocm.gpu) (1/2)

Step: "Save test results" (full log | diagnosis details | 🔁 rerun)

2022-07-01T23:53:34.6532294Z ##[error]Process completed with exit code 1.

2022-07-01T23:53:34.5953382Z ##[error]The operation was canceled.
2022-07-01T23:53:34.6092711Z ##[group]Run # copy test results back to the mounted workspace, needed sudo, resulting permissions were correct
2022-07-01T23:53:34.6093395Z �[36;1m# copy test results back to the mounted workspace, needed sudo, resulting permissions were correct�[0m
2022-07-01T23:53:34.6093994Z �[36;1mdocker exec -t "" sh -c "cd ../pytorch && sudo cp -R test/test-reports ../workspace/test"�[0m
2022-07-01T23:53:34.6117036Z shell: /bin/bash -e {0}
2022-07-01T23:53:34.6117456Z env:
2022-07-01T23:53:34.6117724Z   GIT_DEFAULT_BRANCH: master
2022-07-01T23:53:34.6118097Z   DOCKER_HOST: unix:///run/user/1123/docker.sock
2022-07-01T23:53:34.6118442Z ##[endgroup]
2022-07-01T23:53:34.6494370Z Error: No such container: 
2022-07-01T23:53:34.6532294Z ##[error]Process completed with exit code 1.
2022-07-01T23:53:34.6599796Z Prepare all required actions
2022-07-01T23:53:34.6600693Z Getting action download info
2022-07-01T23:53:34.9369588Z Download action repository 'nick-fields/retry@71062288b76e2b6214ebde0e673ce0de1755740a' (SHA:71062288b76e2b6214ebde0e673ce0de1755740a)
2022-07-01T23:53:35.4110487Z ##[group]Run ./.github/actions/get-workflow-job-id
2022-07-01T23:53:35.4110785Z with:
2022-07-01T23:53:35.4111404Z   github-token: ***
2022-07-01T23:53:35.4111638Z env:
2022-07-01T23:53:35.4111858Z   GIT_DEFAULT_BRANCH: master
2022-07-01T23:53:35.4112155Z   DOCKER_HOST: unix:///run/user/1123/docker.sock
2022-07-01T23:53:35.4112425Z ##[endgroup]

pull / linux-xenial-py3-clang5-mobile-custom-build-static / build (2/2)

Step: "Setup Linux" (full log | diagnosis details | 🔁 rerun)

2022-07-01T18:07:08.6813838Z ##[error]Process completed with exit code 1.

2022-07-01T18:06:52.8871318Z �[36;1mretry () { "$@"  || (sleep 1 && "$@") || (sleep 2 && "$@") }�[0m
2022-07-01T18:06:52.8871738Z �[36;1mretry aws ecr get-login*** "$AWS_DEFAULT_REGION" | docker login --username AWS \�[0m
2022-07-01T18:06:52.8872070Z �[36;1m    --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com"�[0m
2022-07-01T18:06:52.8882504Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2022-07-01T18:06:52.8882735Z env:
2022-07-01T18:06:52.8882908Z   AWS_RETRY_MODE: standard
2022-07-01T18:06:52.8883084Z   AWS_MAX_ATTEMPTS: 5
2022-07-01T18:06:52.8883288Z   AWS_DEFAULT_REGION: us-east-1
2022-07-01T18:06:52.8883477Z ##[endgroup]
2022-07-01T18:07:08.6787997Z Error response from daemon: Get "https://308535385114.dkr.ecr.us-east-1.amazonaws.com/v2/": net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2022-07-01T18:07:08.6813838Z ##[error]Process completed with exit code 1.
2022-07-01T18:07:08.6859514Z Prepare all required actions
2022-07-01T18:07:08.6859752Z Getting action download info
2022-07-01T18:07:08.8240726Z Download action repository 'nick-fields/retry@71062288b76e2b6214ebde0e673ce0de1755740a' (SHA:71062288b76e2b6214ebde0e673ce0de1755740a)
2022-07-01T18:07:08.9279251Z ##[group]Run ./.github/actions/get-workflow-job-id
2022-07-01T18:07:08.9279468Z with:
2022-07-01T18:07:08.9279813Z   github-token: ***
2022-07-01T18:07:08.9279998Z ##[endgroup]
2022-07-01T18:07:08.9304692Z ##[group]Run nick-fields/retry@71062288b76e2b6214ebde0e673ce0de1755740a
2022-07-01T18:07:08.9304919Z with:
2022-07-01T18:07:08.9305065Z   shell: bash

This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

kulinseth · 2022-07-01T18:02:56Z

@pytorchbot rebase

pytorchmergebot · 2022-07-01T18:04:23Z

@pytorchbot successfully started a rebase job. Check the current status here

Fix handling of 1D tensors in linear backward

pytorchmergebot · 2022-07-01T18:04:27Z

Successfully rebased linear_1D onto refs/remotes/origin/master, please pull locally before adding more changes (for example, via git checkout linear_1D && git pull --rebase)

ezyang · 2022-07-04T01:59:06Z

test/test_mps.py

@@ -368,6 +368,12 @@ def _linear_helper(self, in_features, out_features, shape, bias=True, backward_p
                self.assertEqual(cpu_linear.bias.grad.size(), mps_linear.bias.grad.size())
                self.assertEqual(cpu_linear.bias.grad, mps_linear.bias.grad.to("cpu"), atol=8e-04, rtol=10.4e-05)

+    def test_linear1D(self):
+        self._linear_helper(in_features=2, out_features=3, shape=([2]), bias=True, backward_pass=False)


This test case would have been automatically applied for you if MPS was using OpInfo testing:

def sample_inputs_linear(self, device, dtype, requires_grad, **kwargs): features_options = [[3, 4], [8, 8]] batch_options: List[List[int]] = [ [], # no batch [0], [8], [2, 3], ]

Please consider making the MPS tests use OpInfos!

cc @mruberry

Scaffolding for this just landed: #79532
There is still some work to reduce flakyness in MPS ops though.

ezyang · 2022-07-04T02:01:35Z

@pytorchbot merge

pytorchmergebot · 2022-07-04T02:02:54Z

@pytorchbot successfully started a merge job. Check the current status here

pytorchmergebot · 2022-07-04T02:02:58Z

Merge failed due to Refusing to merge as mandatory check(s) pull failed for rule superuser
Raised by https://github.com/pytorch/pytorch/actions/runs/2607232182

ezyang · 2022-07-04T02:04:50Z

@pytorchbot merge -f

pytorchmergebot · 2022-07-04T02:06:10Z

@pytorchbot successfully started a merge job. Check the current status here

github-actions · 2022-07-04T02:06:51Z

Hey @kulinseth.
You've committed this PR, but it does not have both a 'release notes: ...' and 'topics: ...' label. Please add one of each to the PR. The 'release notes: ...' label should represent the part of PyTorch that this PR changes (fx, autograd, distributed, etc) and the 'topics: ...' label should represent the kind of PR it is (not user facing, new feature, bug fix, perf improvement, etc). The list of valid labels can be found here for the 'release notes: ...' and here for the 'topics: ...'.
For changes that are 'topic: not user facing' there is no need for a release notes label.

Summary: Fixes ##79784 Pull Request resolved: #80759 Approved by: https://github.com/ezyang Test Plan: contbuild & OSS CI, see https://hud.pytorch.org/commit/pytorch/pytorch/0e3953fc52ecab510c17f6d6ecb46655174ab8f5 Reviewed By: mehtanirav Differential Revision: D37604743 Pulled By: mehtanirav fbshipit-source-id: 2b0771449666512a8b935d854879f08479fdf5fc

Fixes #pytorch#79784 Pull Request resolved: pytorch#80759 Approved by: https://github.com/ezyang

* MPS: Fixes (#78930) Cast integer to float in UnaryOps Add tensor dtype in key generation Enable FP16 scalars and use placeholder for alpha tensor in add/sum ops Fixes #ISSUE_NUMBER Pull Request resolved: #78930 Approved by: https://github.com/albanD * MPS: Binary cast fix by proper type promotion and remove spurious copy warning (#79185) Fixes #78019, #78020 Fixes #79185 Pull Request resolved: #79185 Approved by: https://github.com/albanD, https://github.com/razarmehr * MPS: add exponential op (#79188) Add exponential distribution Fixes #ISSUE_NUMBER Pull Request resolved: #79188 Approved by: https://github.com/razarmehr, https://github.com/albanD * [MPS] Delete unused vars from OperationUtils.mm Pull Request resolved: #79514 Approved by: https://github.com/kulinseth, https://github.com/albanD * [MPS] Fix getDefaultGenerator and copy_kernel_mps Returning reference to stack memory is really bad Pull Request resolved: #79515 Approved by: https://github.com/albanD * [MPS][BE]Do not use `new/delete[]` in `chainViewOperation` `std::array` will do just fine Pull Request resolved: #79516 Approved by: https://github.com/albanD * [MPS] Support stride of stride Fixes #79181 Pull Request resolved: #79521 Approved by: https://github.com/kulinseth * MPS: TopK raise an error if K>16 (#79677) * Error out in TopK when k>16. * Add a test case too. Fixes #78915 Pull Request resolved: #79677 Approved by: https://github.com/albanD * [MPS]: Add fix for squeezed input axes handling in BCE loss (#79676) Fixes #79527 Pull Request resolved: #79676 Approved by: https://github.com/razarmehr, https://github.com/albanD * MPS: Add amax and amin Ops with tests (#79682) * Add amax and amin with tests Fixes #ISSUE_NUMBER Pull Request resolved: #79682 Approved by: https://github.com/albanD * [MPS] Fix torch.uint8 support (#80049) `ScalarType.Byte` should be cast to `MPSDataTypeUInt8` And support for `torch.int8` as well as test those conversions in `TestMPS.test_to` Fixes #80006 Pull Request resolved: #80049 Approved by: https://github.com/albanD * [MPS] Fix binary ops between int32 tensor with int64 scalar (#80220) For some reason, tensor *op* scalar does not follow the normal binary promotion rules So cast output tensor to expected type if needed It seems that one should have casted input tensors to expected output tensor type, but it does not really work for boolean binary ops, so... Add output tensor type/shape to cached graph key Extend `TestMPS. test_add_scalars` to test for this regression Fixes #79835 Pull Request resolved: #80220 Approved by: https://github.com/albanD * [MPS] Add equal operator (#80195) Which is, in essence is composite of `eq`->`all`->`item` `native/mps/operators/Equal.cpp` is an almost verbatim copy of `native/cuda/Equal.cpp` Fix codegen by generating MPSFunctions headers Pull Request resolved: #80195 Approved by: https://github.com/albanD * [MPS] add `aten::normal.Tensor_float` `aten::normal.float_Tensor` `aten::normal.Tensor_Tensor` (#80297) Fixes #ISSUE_NUMBER Pull Request resolved: #80297 Approved by: https://github.com/albanD, https://github.com/kulinseth * [MPS] Add flip (#80214) Fixes #ISSUE_NUMBER Pull Request resolved: #80214 Approved by: https://github.com/DenisVieriu97, https://github.com/albanD * [MPS] Add logical ops (#80216) This PR adds `logical_not`, `logical_and`, `logical_or`, `logical_xor`. Pull Request resolved: #80216 Approved by: https://github.com/albanD, https://github.com/kulinseth * [MPS] Add glu (#79866) Adds mps op for `aten::glu.out`. Pull Request resolved: #79866 Approved by: https://github.com/kulinseth, https://github.com/albanD * [MPS] Fix std/var cache issue (#80502) Use `getTensorsStringKey` which has tensor shape info added as part of the key to prevent cache lookup issue when the shape of input tensor is changed. Fixes #80499 Pull Request resolved: #80502 Approved by: https://github.com/malfet, https://github.com/kulinseth * Add scatter support for view operations (#79939) * Add scatter support for view operations; #78074, #78886, #79672 * Update test_slicing_replace_column to properly test different sizes * Handle in-place changes for binary ops; add new testcase * Add new view ops testing scatter; add MPSDebugConfig.h config file for debugging purposes * Merge gatherViewTensor and scatterViewTensor into a generic function * Add scatter on demand in scatterViewOperation instead of caching it into a generic graph * Create separate graphs for scatter and gather; * Create scatter graph at scatter time Fixes #ISSUE_NUMBER Pull Request resolved: #79939 Approved by: https://github.com/razarmehr * MPS: Fix handling of 1D tensors in linear backward (#80759) Fixes ##79784 Pull Request resolved: #80759 Approved by: https://github.com/ezyang * [MPS] Move the View ops to a separate file and reduce the number of graphs created (#80491) This is dependent on the PR to go in first: #79939 Remove the data_ptr from the View Graph key which reduces the number of graphs created significantly. Don't wait when copying from MPS to MPS tensors Pull Request resolved: #80491 Approved by: https://github.com/malfet * [MPS] Add softplus backward (#79873) Fixes #ISSUE_NUMBER Pull Request resolved: #79873 Approved by: https://github.com/malfet * [MPS] Add argmin (#80828) This PR 1. adds argmin 2. refactors `reduction_type` in `ReduceOps.mm` with enum. Co-authored by Kulin Seth <kulinseth@gmail.com> Pull Request resolved: #80828 Approved by: https://github.com/malfet * [MPS] Fix LSTM batch_first output transposed (#80597) The output of LSTM with `batch_first` should be transposed back to batch first format. Fixes #80306 Pull Request resolved: #80597 Approved by: https://github.com/kulinseth * [MPS][BE] Introduce MPSUnaryCachedGraph (#81033) I.e. CachedGraph that has input and output tensors Also, add `MPSGraphCache::LookUpAs` template, which combines LookUp with static_cast to target type Pull Request resolved: #81033 Approved by: https://github.com/kulinseth * [MPS] Add test consistency from OpInfo based tests from PR 78504 (#79532) Pull Request resolved: #79532 Approved by: https://github.com/albanD, https://github.com/malfet * [MPS] Add huber loss (#80163) Fixes #ISSUE_NUMBER Pull Request resolved: #80163 Approved by: https://github.com/kulinseth, https://github.com/malfet * Remove two tests dependent on the MPS serialization checkin. * Fix lint error (FLAKE8) F401 * Remove the serialization test from test_mps as its support is not there in 1.12.1. Co-authored-by: Kulin Seth <kulinseth@gmail.com> Co-authored-by: Nikita Shulga <nikita.shulga@gmail.com> Co-authored-by: Kulin Seth <kulin_seth@apple.com> Co-authored-by: Abhishek Pathak <abhipathak97@gmail.com> Co-authored-by: Nikita Shulga <nshulga@fb.com> Co-authored-by: qqaatw <qqaatw@gmail.com> Co-authored-by: Ramin Azarmehr <razarmehr@apple.com>

kulinseth requested a review from albanD July 1, 2022 05:51

facebook-github-bot added the cla signed label Jul 1, 2022

kulinseth requested a review from razarmehr July 1, 2022 05:51

kulinseth added the ciflow/trunk Trigger trunk jobs on your pull request label Jul 1, 2022

pytorchbot added the open source label Jul 1, 2022

Merge pull request #39 from kulinseth/mse_backward

4953dc1

Fix handling of 1D tensors in linear backward

pytorchmergebot force-pushed the linear_1D branch from 93404d1 to 4953dc1 Compare July 1, 2022 18:04

kulinseth requested a review from malfet July 1, 2022 22:30

ezyang reviewed Jul 4, 2022

View reviewed changes

ezyang approved these changes Jul 4, 2022

View reviewed changes

pytorchmergebot added the Merged label Jul 4, 2022

pytorchmergebot closed this in 0e3953f Jul 4, 2022

qqaatw mentioned this pull request Jul 6, 2022

.backward() on MSELoss fails with IndexError: Dimension out of range on MPS #79784

Closed

kulinseth pushed a commit to kulinseth/pytorch that referenced this pull request Jul 9, 2022

MPS: Fix handling of 1D tensors in linear backward (pytorch#80759)

87dc222

Fixes #pytorch#79784 Pull Request resolved: pytorch#80759 Approved by: https://github.com/ezyang

atalman pushed a commit to atalman/pytorch that referenced this pull request Jul 22, 2022

MPS: Fix handling of 1D tensors in linear backward (pytorch#80759)

feaac11

Fixes #pytorch#79784 Pull Request resolved: pytorch#80759 Approved by: https://github.com/ezyang

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MPS: Fix handling of 1D tensors in linear backward #80759

MPS: Fix handling of 1D tensors in linear backward #80759

kulinseth commented Jul 1, 2022

facebook-github-bot commented Jul 1, 2022 •

edited

Loading

🕵️ 2 new failures recognized by patterns

trunk / linux-bionic-rocm5.1-py3.7 / test (default, 2, 2, linux.rocm.gpu) (1/2)

pull / linux-xenial-py3-clang5-mobile-custom-build-static / build (2/2)

kulinseth commented Jul 1, 2022

pytorchmergebot commented Jul 1, 2022

pytorchmergebot commented Jul 1, 2022

ezyang Jul 4, 2022

ezyang Jul 4, 2022

albanD Jul 4, 2022

ezyang commented Jul 4, 2022

pytorchmergebot commented Jul 4, 2022

pytorchmergebot commented Jul 4, 2022

ezyang commented Jul 4, 2022

pytorchmergebot commented Jul 4, 2022

github-actions bot commented Jul 4, 2022

MPS: Fix handling of 1D tensors in linear backward #80759

MPS: Fix handling of 1D tensors in linear backward #80759

Conversation

kulinseth commented Jul 1, 2022

facebook-github-bot commented Jul 1, 2022 • edited Loading

🔗 Helpful links

❌ 2 New Failures

🕵️ 2 new failures recognized by patterns

trunk / linux-bionic-rocm5.1-py3.7 / test (default, 2, 2, linux.rocm.gpu) (1/2)

pull / linux-xenial-py3-clang5-mobile-custom-build-static / build (2/2)

kulinseth commented Jul 1, 2022

pytorchmergebot commented Jul 1, 2022

pytorchmergebot commented Jul 1, 2022

ezyang Jul 4, 2022

Choose a reason for hiding this comment

ezyang Jul 4, 2022

Choose a reason for hiding this comment

albanD Jul 4, 2022

Choose a reason for hiding this comment

ezyang commented Jul 4, 2022

pytorchmergebot commented Jul 4, 2022

pytorchmergebot commented Jul 4, 2022

ezyang commented Jul 4, 2022

pytorchmergebot commented Jul 4, 2022

github-actions bot commented Jul 4, 2022

facebook-github-bot commented Jul 1, 2022 •

edited

Loading