Fake: fix conv_transpose2d striding #82846

eellison · 2022-08-04T21:41:01Z

Stack from ghstack (oldest at bottom):

-> Fake: fix conv_transpose2d striding #82846

The output striding channels-last preservation logic differs between cuda and cpu. For the meta kernel, we can peek at the fake tensor device and use that to determine whether to do cpu or cuda.

You could argue there's a leaking of abstraction here but this seems like a pretty minimal leak and I'm not sure there's a much cleaner way forward for device-specific striding tracing logic.

[ghstack-poisoned]

facebook-github-bot · 2022-08-04T21:41:07Z

🔗 Helpful links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/82846
📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓Need help or want to give feedback on the CI? Visit our office hours
↩️ [fb-only] Re-run with SSH instructions

❌ 2 New Failures

As of commit 658636d (more details on the Dr. CI page):

Expand to see more

2/2 failures introduced in this PR

🕵️ 2 new failures recognized by patterns

The following CI failures do not appear to be due to upstream breakages

pull / linux-bionic-cuda11.6-py3.10-gcc7 / test (default, 1, 4, linux.4xlarge.nvidia.gpu) (1/2)

Step: "Test" (full log | diagnosis details)

2022-08-05T02:07:19.3798736Z RuntimeError: test_ops failed!

2022-08-05T02:07:15.7747154Z Generated XML report: test-reports/python-unittest/test_ops/TEST-TestCommonCUDA-20220805004020.xml
2022-08-05T02:07:15.9923374Z Generated XML report: test-reports/python-unittest/test_ops/TEST-TestCompositeComplianceCUDA-20220805004020.xml
2022-08-05T02:07:16.1054624Z Generated XML report: test-reports/python-unittest/test_ops/TEST-TestFakeTensorNonErroringCUDA-20220805004020.xml
2022-08-05T02:07:16.2623200Z Generated XML report: test-reports/python-unittest/test_ops/TEST-TestMathBitsCUDA-20220805004020.xml
2022-08-05T02:07:16.3574312Z Generated XML report: test-reports/python-unittest/test_ops/TEST-TestTagsCUDA-20220805004020.xml
2022-08-05T02:07:19.3794521Z Traceback (most recent call last):
2022-08-05T02:07:19.3795058Z   File "/var/lib/jenkins/workspace/test/run_test.py", line 974, in <module>
2022-08-05T02:07:19.3795415Z     main()
2022-08-05T02:07:19.3795727Z   File "/var/lib/jenkins/workspace/test/run_test.py", line 952, in main
2022-08-05T02:07:19.3798364Z     raise RuntimeError(err_message)
2022-08-05T02:07:19.3798736Z RuntimeError: test_ops failed!
2022-08-05T02:07:19.6723309Z 
2022-08-05T02:07:19.6724293Z real	87m6.068s
2022-08-05T02:07:19.6724607Z user	85m50.130s
2022-08-05T02:07:19.6724858Z sys	1m23.460s
2022-08-05T02:07:19.6778667Z ##[error]Process completed with exit code 1.
2022-08-05T02:07:19.6819575Z Prepare all required actions
2022-08-05T02:07:19.6819931Z Getting action download info
2022-08-05T02:07:19.9511978Z ##[group]Run ./.github/actions/get-workflow-job-id
2022-08-05T02:07:19.9512304Z with:
2022-08-05T02:07:19.9512984Z   github-token: ***

pull / linux-bionic-py3.7-clang9 / test (dynamo, 2, 2, linux.2xlarge) (2/2)

Step: "Test" (full log | diagnosis details)

2022-08-05T01:54:35.7565935Z FAIL [1.138s]: test_load_standalone (__main__.TestStandaloneCPPJIT)

2022-08-05T01:54:34.6181571Z AssertionError: String comparison failed: '/tmp/tmp6f_a5_ho/standalone_load_test_v2' != '/tmp/tmp6f_a5_ho/standalone_load_test'
2022-08-05T01:54:34.6181910Z - /tmp/tmp6f_a5_ho/standalone_load_test_v2
2022-08-05T01:54:34.6182158Z ?                                      ---
2022-08-05T01:54:34.6182358Z + /tmp/tmp6f_a5_ho/standalone_load_test
2022-08-05T01:54:34.6182491Z 
2022-08-05T01:54:34.6182496Z 
2022-08-05T01:54:35.7564375Z   test_load_standalone (__main__.TestStandaloneCPPJIT) ... FAIL (1.138s)
2022-08-05T01:54:35.7565105Z     test_load_standalone failed - num_retries_left: 0
2022-08-05T01:54:35.7565375Z 
2022-08-05T01:54:35.7565529Z ======================================================================
2022-08-05T01:54:35.7565935Z FAIL [1.138s]: test_load_standalone (__main__.TestStandaloneCPPJIT)
2022-08-05T01:54:35.7566524Z ----------------------------------------------------------------------
2022-08-05T01:54:35.7566925Z Traceback (most recent call last):
2022-08-05T01:54:35.7567351Z   File "test_utils.py", line 626, in test_load_standalone
2022-08-05T01:54:35.7567719Z     def test_load_standalone(self):
2022-08-05T01:54:35.7568275Z   File "test_utils.py", line 652, in test_load_standalone
2022-08-05T01:54:35.7568715Z     os.path.join(build_dir, f"standalone_load_test{ext}")
2022-08-05T01:54:35.7569458Z   File "/opt/conda/lib/python3.7/site-packages/torch/testing/_internal/common_utils.py", line 2393, in assertEqual
2022-08-05T01:54:35.7570096Z     msg=(lambda generated_msg: f"{generated_msg} : {msg}") if isinstance(msg, str) and self.longMessage else msg,
2022-08-05T01:54:35.7570909Z   File "/opt/conda/lib/python3.7/site-packages/torch/testing/_comparison.py", line 1093, in assert_equal
2022-08-05T01:54:35.7571416Z     raise error_metas[0].to_error(msg)

This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

ezyang

input sniffing is cool, we do it in other places too.

The output striding channels-last preservation logic differs between cuda and cpu. For the meta kernel, we can peek at the fake tensor device and use that to determine whether to do cpu or cuda. You could argue there's a leaking of abstraction here but this seems like a pretty minimal leak and I'm not sure there's a much cleaner way forward for device-specific striding tracing logic. [ghstack-poisoned]

torch/_meta_registrations.py

ngimel · 2022-08-04T22:34:46Z

torch/_meta_registrations.py

-            return torch.channels_last
-        elif input_tensor.is_contiguous(memory_format=torch.contiguous_format):
+    def is_channels_last(ten):
+        return ten.is_contiguous(memory_format=torch.channels_last)


This is too permissive, e.g. weight tensors with 1x1 kernel will be channels-last contiguous, but they won't be channels-last for the purposes of memory format propagation

ngimel

The right thing would be to expose suggest_memory_format to python because this is what's used internally to decide whether something is channels-last. But for now probably a TODO will suffice

The output striding channels-last preservation logic differs between cuda and cpu. For the meta kernel, we can peek at the fake tensor device and use that to determine whether to do cpu or cuda. You could argue there's a leaking of abstraction here but this seems like a pretty minimal leak and I'm not sure there's a much cleaner way forward for device-specific striding tracing logic. [ghstack-poisoned]

ghstack-source-id: 1d86fdd Pull Request resolved: #82846

The output striding channels-last preservation logic differs between cuda and cpu. For the meta kernel, we can peek at the fake tensor device and use that to determine whether to do cpu or cuda. You could argue there's a leaking of abstraction here but this seems like a pretty minimal leak and I'm not sure there's a much cleaner way forward for device-specific striding tracing logic. [ghstack-poisoned]

ghstack-source-id: 892e86e Pull Request resolved: #82846

pytorch-bot · 2022-09-20T15:24:30Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/82846

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures, 3 Pending

As of commit 20f8d2d:
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

ezyang · 2022-09-20T16:44:31Z

This might be the root cause of pytorch/torchdynamo#1274 it would be good to land it

eellison · 2022-09-20T16:45:55Z

Yup, that's the intention. But also setting up cross-ref testing now..

eellison · 2022-09-20T17:07:10Z

@pytorchbot help

pytorch-bot · 2022-09-20T17:07:11Z

❌ 🤖 pytorchbot command failed:

@pytorchbot: error: argument command: invalid choice: 'help' (choose from 'merge', 'revert', 'rebase', 'label')

usage: @pytorchbot [-h] {merge,revert,rebase,label} ...

Try @pytorchbot --help for more info.

eellison · 2022-09-20T17:07:59Z

@pytorchbot --help

pytorch-bot · 2022-09-20T17:08:01Z

PyTorchBot Help

usage: @pytorchbot [-h] {merge,revert,rebase,label} ...

In order to invoke the bot on your PR, include a line that starts with
@pytorchbot anywhere in a comment. That line will form the command; no
multi-line commands are allowed. 

Example:
    Some extra context, blah blah, wow this PR looks awesome

    @pytorchbot merge

optional arguments:
  -h, --help            Show this help message and exit.

command:
  {merge,revert,rebase,label}
    merge               Merge a PR
    revert              Revert a PR
    rebase              Rebase a PR
    label               Add label to a PR

Merge

usage: @pytorchbot merge [-g | -f MESSAGE | -l | -r [REBASE]]

Merge an accepted PR, subject to the rules in .github/merge_rules.json.
By default, this will wait for all required checks (lint, pull) to succeed before merging.

optional arguments:
  -g, --green           Merge when all status checks running on the PR pass. To add status checks, use labels like `ciflow/trunk`.
  -f MESSAGE, --force MESSAGE
                        Merge without checking anything. This requires a reason for auditting purpose, for example:
                        @pytorchbot merge -f 'Minor update to fix lint. Expecting all PR tests to pass'
  -l, --land-checks     Merge with land time checks. This will create a new branch with your changes rebased on viable/strict and run a majority of trunk tests _before_ landing to increase trunk reliability and decrease risk of revert. The tests added are: pull, Lint and trunk. Note that periodic is excluded. (EXPERIMENTAL)
  -r [REBASE], --rebase [REBASE]
                        Rebase the PR to re run checks before merging.  It will accept a branch name and will default to master if not specified.

Revert

usage: @pytorchbot revert -m MESSAGE -c
                          {nosignal,ignoredsignal,landrace,weird,ghfirst}

Revert a merged PR. This requires that you are a Meta employee.

Example:
  @pytorchbot revert -m="This is breaking tests on trunk. hud.pytorch.org/" -c=nosignal

optional arguments:
  -m MESSAGE, --message MESSAGE
                        The reason you are reverting, will be put in the commit message. Must be longer than 3 words.
  -c {nosignal,ignoredsignal,landrace,weird,ghfirst}, --classification {nosignal,ignoredsignal,landrace,weird,ghfirst}
                        A machine-friendly classification of the revert reason.

Rebase

usage: @pytorchbot rebase [-s | -b BRANCH]

Rebase a PR. Rebasing defaults to the default branch of pytorch (master).
You, along with any member of the pytorch organization, can rebase your PR.

optional arguments:
  -s, --stable          Rebase to viable/strict
  -b BRANCH, --branch BRANCH
                        Branch you would like to rebase to

Label

usage: @pytorchbot label labels [labels ...]

Adds label to a PR

positional arguments:
  labels  Labels to add to given Pull Request

eellison · 2022-09-20T17:08:45Z

@pytorchbot -h merge

pytorch-bot · 2022-09-20T17:08:46Z

PyTorchBot Help

usage: @pytorchbot [-h] {merge,revert,rebase,label} ...

In order to invoke the bot on your PR, include a line that starts with
@pytorchbot anywhere in a comment. That line will form the command; no
multi-line commands are allowed. 

Example:
    Some extra context, blah blah, wow this PR looks awesome

    @pytorchbot merge

optional arguments:
  -h, --help            Show this help message and exit.

command:
  {merge,revert,rebase,label}
    merge               Merge a PR
    revert              Revert a PR
    rebase              Rebase a PR
    label               Add label to a PR

Merge

usage: @pytorchbot merge [-g | -f MESSAGE | -l | -r [REBASE]]

Merge an accepted PR, subject to the rules in .github/merge_rules.json.
By default, this will wait for all required checks (lint, pull) to succeed before merging.

optional arguments:
  -g, --green           Merge when all status checks running on the PR pass. To add status checks, use labels like `ciflow/trunk`.
  -f MESSAGE, --force MESSAGE
                        Merge without checking anything. This requires a reason for auditting purpose, for example:
                        @pytorchbot merge -f 'Minor update to fix lint. Expecting all PR tests to pass'
  -l, --land-checks     Merge with land time checks. This will create a new branch with your changes rebased on viable/strict and run a majority of trunk tests _before_ landing to increase trunk reliability and decrease risk of revert. The tests added are: pull, Lint and trunk. Note that periodic is excluded. (EXPERIMENTAL)
  -r [REBASE], --rebase [REBASE]
                        Rebase the PR to re run checks before merging.  It will accept a branch name and will default to master if not specified.

Revert

usage: @pytorchbot revert -m MESSAGE -c
                          {nosignal,ignoredsignal,landrace,weird,ghfirst}

Revert a merged PR. This requires that you are a Meta employee.

Example:
  @pytorchbot revert -m="This is breaking tests on trunk. hud.pytorch.org/" -c=nosignal

optional arguments:
  -m MESSAGE, --message MESSAGE
                        The reason you are reverting, will be put in the commit message. Must be longer than 3 words.
  -c {nosignal,ignoredsignal,landrace,weird,ghfirst}, --classification {nosignal,ignoredsignal,landrace,weird,ghfirst}
                        A machine-friendly classification of the revert reason.

Rebase

usage: @pytorchbot rebase [-s | -b BRANCH]

Rebase a PR. Rebasing defaults to the default branch of pytorch (master).
You, along with any member of the pytorch organization, can rebase your PR.

optional arguments:
  -s, --stable          Rebase to viable/strict
  -b BRANCH, --branch BRANCH
                        Branch you would like to rebase to

Label

usage: @pytorchbot label labels [labels ...]

Adds label to a PR

positional arguments:
  labels  Labels to add to given Pull Request

eellison · 2022-09-20T17:10:36Z

@pytorchbot merge --green

pytorchmergebot · 2022-09-20T17:15:07Z

@pytorchbot successfully started a merge job. Check the current status here.
The merge job was triggered with the green (-g) flag. This means that your change will be merged once all checks on your PR have passed (ETA: 0-4 Hours). If this is not the intended behavior, feel free to use some of the other merge options in the wiki.
Please reach out to the PyTorch DevX Team with feedback or questions!

eellison · 2022-09-20T17:58:04Z

@pytorchbot merge

pytorchmergebot · 2022-09-20T17:58:29Z

The merge job was canceled. If you believe this is a mistake,then you can re trigger it through pytorch-bot.

pytorchmergebot · 2022-09-20T18:00:57Z

@pytorchbot successfully started a merge job. Check the current status here.
The merge job was triggered without a flag. This means that your change will be merged once all checks on your PR have passed (ETA: 0-4 Hours). If this is not the intended behavior, feel free to use some of the other merge options in the wiki.
Please reach out to the PyTorch DevX Team with feedback or questions!

github-actions · 2022-09-20T18:01:59Z

Hey @eellison.
You've committed this PR, but it does not have both a 'release notes: ...' and 'topics: ...' label. Please add one of each to the PR. The 'release notes: ...' label should represent the part of PyTorch that this PR changes (fx, autograd, distributed, etc) and the 'topics: ...' label should represent the kind of PR it is (not user facing, new feature, bug fix, perf improvement, etc). The list of valid labels can be found here for the 'release notes: ...' and here for the 'topics: ...'.
For changes that are 'topic: not user facing' there is no need for a release notes label.

The output striding channels-last preservation logic differs between cuda and cpu. For the meta kernel, we can peek at the fake tensor device and use that to determine whether to do cpu or cuda. You could argue there's a leaking of abstraction here but this seems like a pretty minimal leak and I'm not sure there's a much cleaner way forward for device-specific striding tracing logic. Pull Request resolved: #82846 Approved by: https://github.com/ezyang

Fake: fix conv_transpose2d striding

1944b7f

[ghstack-poisoned]

eellison requested review from mruberry and ngimel as code owners August 4, 2022 21:41

facebook-github-bot added the cla signed label Aug 4, 2022

eellison requested a review from ezyang August 4, 2022 21:43

ezyang approved these changes Aug 4, 2022

View reviewed changes

ngimel reviewed Aug 4, 2022

View reviewed changes

torch/_meta_registrations.py Outdated Show resolved Hide resolved

ngimel reviewed Aug 4, 2022

View reviewed changes

eellison added a commit that referenced this pull request Aug 5, 2022

Fake: fix conv_transpose2d striding

c0d7a39

ghstack-source-id: 1d86fdd Pull Request resolved: #82846

eellison added a commit that referenced this pull request Sep 20, 2022

Fake: fix conv_transpose2d striding

f042ac0

ghstack-source-id: 892e86e Pull Request resolved: #82846

eellison mentioned this pull request Sep 20, 2022

aot eager accuracy error: coat_lite_mini pytorch/torchdynamo#1274

Closed

pytorchmergebot added the Merged label Sep 20, 2022

pytorchmergebot closed this in a3afb2c Sep 20, 2022

facebook-github-bot deleted the gh/eellison/305/head branch September 24, 2022 14:20

Fake: fix conv_transpose2d striding #82846

Fake: fix conv_transpose2d striding #82846

Uh oh!

Conversation

eellison commented Aug 4, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

facebook-github-bot commented Aug 4, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful links

❌ 2 New Failures

🕵️ 2 new failures recognized by patterns

pull / linux-bionic-cuda11.6-py3.10-gcc7 / test (default, 1, 4, linux.4xlarge.nvidia.gpu) (1/2)

pull / linux-bionic-py3.7-clang9 / test (dynamo, 2, 2, linux.2xlarge) (2/2)

Uh oh!

ezyang left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ngimel Aug 4, 2022

Choose a reason for hiding this comment

Uh oh!

ngimel left a comment

Choose a reason for hiding this comment

Uh oh!

pytorch-bot bot commented Sep 20, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/82846

✅ No Failures, 3 Pending

Uh oh!

ezyang commented Sep 20, 2022

Uh oh!

eellison commented Sep 20, 2022

Uh oh!

eellison commented Sep 20, 2022

Uh oh!

pytorch-bot bot commented Sep 20, 2022

Uh oh!

eellison commented Sep 20, 2022

Uh oh!

pytorch-bot bot commented Sep 20, 2022

PyTorchBot Help

Merge

Revert

Rebase

Label

Uh oh!

eellison commented Sep 20, 2022

Uh oh!

pytorch-bot bot commented Sep 20, 2022

PyTorchBot Help

Merge

Revert

Rebase

Label

Uh oh!

eellison commented Sep 20, 2022

Uh oh!

pytorchmergebot commented Sep 20, 2022

Uh oh!

eellison commented Sep 20, 2022

Uh oh!

pytorchmergebot commented Sep 20, 2022

Uh oh!

pytorchmergebot commented Sep 20, 2022

Uh oh!

github-actions bot commented Sep 20, 2022

Uh oh!

Uh oh!

eellison commented Aug 4, 2022 •

edited

Loading

facebook-github-bot commented Aug 4, 2022 •

edited

Loading

pytorch-bot bot commented Sep 20, 2022 •

edited

Loading