improve memory footprint of torch.testing.assert_close #96131

pmeier · 2023-03-06T20:34:15Z

Stack from ghstack (oldest at bottom):

-> improve memory footprint of torch.testing.assert_close #96131

Redo of #90172 out of stack.

[ghstack-poisoned]

pytorch-bot · 2023-03-06T20:34:18Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/96131

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

⏳ No Failures, 1 Pending

As of commit 8e1f3aa:
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

pmeier · 2023-03-06T22:09:50Z

torch/testing/_comparison.py

@@ -1008,28 +1013,10 @@ def _compare_regular_values_close(
            )
        else:
            msg = make_tensor_mismatch_msg(
-                actual, expected, ~matches, rtol=rtol, atol=atol, identifier=identifier


In the error case, we created the mismatches = ~matches tensor here and turned it back into a matches inside make_tensor_mismatch_msg. With a minor refactoring, we no longer need to invert matches and can use it directly.

pmeier · 2023-03-06T22:10:07Z

torch/testing/_comparison.py

@@ -991,7 +997,6 @@ def _compare_regular_values_close(
        identifier: Optional[Union[str, Callable[[str], str]]] = None,
    ) -> None:
        """Checks if the values of two tensors are close up to a desired tolerance."""
-        actual, expected = self._promote_for_comparison(actual, expected)


We unconditionally upcasted here in the past, since that was needed for isclose. This is no longer the case and so we can just drop that.

pmeier · 2023-03-06T22:11:11Z

torch/testing/_comparison.py

+    if not actual.dtype.is_floating_point and not actual.dtype.is_complex:
+        # TODO: Instead of always upcasting to int64, it would be sufficient to cast to the next higher dtype to avoid
+        #  overflow
+        actual_flat = actual_flat.to(torch.int64)
+        expected_flat = expected_flat.to(torch.int64)


However, we still need to upcast in the error case, since we want to display the absolute diff and that is not supported for torch.bool and might overflow for other integer dtypes.

pmeier · 2023-03-06T22:11:32Z

torch/testing/_comparison.py

+    actual_flat = actual.flatten()
+    expected_flat = expected.flatten()


Driveby renaming. a and b were only used in the beginning and should be actual and expected now.

Redo of #90172 out of stack. [ghstack-poisoned]

torch/testing/_comparison.py

Redo of #90172 out of stack. [ghstack-poisoned]

pearu

LGTM! Thanks, @pmeier!

I have an OT feature request, so feel free to ignore it.

pearu · 2023-03-07T13:28:42Z

torch/testing/_comparison.py

    # Ensure that only mismatches are used for the max_abs_diff computation
    abs_diff[matches_flat] = 0
    max_abs_diff, max_abs_diff_flat_idx = torch.max(abs_diff, 0)

-    rel_diff = abs_diff / torch.abs(b_flat)
+    rel_diff = abs_diff / torch.abs(expected_flat)


A slight OT suggestion: could we have a better normalization factor here (say, (torch.abs(actual) + torch.abs(expected)) / 2) for the case where expected contains zeros (having zeros is typical, say when comparing the indices of sparse tensors)? Atm, the mismatch messages from assert_close depends on the order of inputs, for example:

>>> torch.testing.assert_close(torch.tensor([1, 0]), torch.tensor([1, 1])) <snip> Mismatched elements: 1 / 2 (50.0%) Greatest absolute difference: 1 at index (1,) Greatest relative difference: 1.0 at index (1,) >>> torch.testing.assert_close(torch.tensor([1, 1]), torch.tensor([1, 0])) <snip> Mismatched elements: 1 / 2 (50.0%) Greatest absolute difference: 1 at index (1,) Greatest relative difference: inf at index (1,)

(btw, reporting relative differences for non-float tensors is often pointless as well).

Atm, the mismatch messages from assert_close depends on the order of inputs, for example:

It's not just the messages, it is the actual op. Internally, we rely on torch.isclose and that is already asymmetric. It defines closeness as abs(actual - expected) <= atol + rtol * abs(expected). Believe me when I say, we (torch.testing team) wanted to change that, but there is just too much inertia. PyTorch is not an outlier here; numpy (and virtually every other array library) is doing the same.

Pythons math module is doing the more sensible thing in defining closeness as abs(actual - expected) <= max(atol, rtol * max(abs(actual), abs(expected))). You can read more about this whole issue in PEP485.

At some point we tried to get this behavior specified by the Array API, but couldn't gain enough traction. See data-apis/array-api#170.

(btw, reporting relative differences for non-float tensors is often pointless as well).

Doesn't that somewhat contradict the use case you gave earlier?

for the case where expected contains zeros (having zeros is typical, say when comparing the indices of sparse tensors)

Redo of #90172 out of stack. [ghstack-poisoned]

pytorchmergebot · 2023-03-11T05:39:53Z

Rebased gh/pmeier/55/orig onto refs/remotes/origin/viable/strict because #96132 was rebased, please pull locally before adding more changes (for example, via ghstack checkout https://github.com/pytorch/pytorch/pull/96131)

mruberry

pmeier · 2023-03-17T08:01:32Z

@pytorchbot merge -r

pytorchmergebot · 2023-03-17T08:03:37Z

@pytorchbot successfully started a rebase job. Check the current status here

Redo of #90172 out of stack. [ghstack-poisoned]

pytorchmergebot · 2023-03-17T08:03:54Z

Successfully rebased gh/pmeier/54/orig onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via ghstack checkout https://github.com/pytorch/pytorch/pull/96131)

ghstack-source-id: c42b920568ee9fe210a5f8ec42961ca1272c65ca Pull Request resolved: #96131

pytorchmergebot · 2023-03-17T08:05:02Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2023-03-17T08:30:23Z

Merge failed

Reason: 1 jobs have failed, first few of them are: trunk / macos-12-py3-arm64-mps / test (default, 1, 1)

Details for Dev Infra team

Raised by workflow job

Redo of #90172 out of stack. [ghstack-poisoned]

pmeier · 2023-03-17T09:03:26Z

test/test_mps.py

@@ -10070,7 +10070,7 @@ def test_mps_compat(self):
        # If this test is successful, that means that all operations in the comparison logic are supported natively on
        # the MPS backend. Please remove this test as well as the compatibility logic in
        # torch.testing._comparison.TensorLikePair._equalize_attributes
-        actual = torch.tensor(1.0, device="mps")
+        actual = torch.zeros(2, 3, 4, 5, device="mps")


@kulinseth I've increased the shape to 4 dimensions here, because otherwise this test would pass although torch.testing.assert_close is not ready. See #95538 for details.

Sounds good

pmeier · 2023-03-20T08:47:01Z

@kulinseth It seems the test is still passing: https://hud.pytorch.org/pr/96131#12075596190. Does that mean the behavior was fixed? Otherwise, could you send me a patch that consistently makes this test fail? I don't have access to a MPS machine and don't want to waste CI resources by pushing multiple times just for this one test.

pmeier · 2023-03-28T07:32:37Z

@kulinseth any update on this?

kulinseth · 2023-03-28T14:12:06Z

@kulinseth any update on this?

Sorry for delay @pmeier . I think we have support till 4 dims, if we increase the dimensions to 5 , then test starts failing .

pmeier · 2023-03-28T14:23:18Z

Argh, my bad. Let me fix that.

Redo of #90172 out of stack. [ghstack-poisoned]

pmeier · 2023-03-29T08:12:56Z

@kulinseth Test fails now, but unfortunately, the error is not recoverable

test_mps.py::TestNoRegression::test_mps_compat Assertion failed: (0 <= mpsAxis && mpsAxis < 4 && "Runtime canonicalization must simplify reduction axes to minor 4 dimensions."), function encodeNDArrayOp, file GPUReductionOps.mm, line 76.
Fatal Python error: Aborted

and thus we never hit the xfail. Not sure what to do with this. I'll remove the test to unblock and leave a comment in #95538. LMK if you want to handle it differently.

Redo of #90172 out of stack. [ghstack-poisoned]

ghstack-source-id: 1b796fd7695e8ba2673eb05cccf2f7d9174b21bd Pull Request resolved: #96131

pmeier · 2023-03-29T14:56:33Z

@pytorchbot merge -r viable/strict

pytorchmergebot · 2023-03-29T14:58:21Z

@pytorchbot successfully started a rebase job. Check the current status here

Redo of #90172 out of stack. [ghstack-poisoned]

pytorchmergebot · 2023-03-29T14:58:39Z

Successfully rebased gh/pmeier/54/orig onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via ghstack checkout https://github.com/pytorch/pytorch/pull/96131)

ghstack-source-id: ec7cd022806cea09dfd1cd4e1e91477d4d5dedf4 Pull Request resolved: #96131

pytorchmergebot · 2023-03-29T14:59:48Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2023-03-29T16:10:56Z

Merge failed

Reason: GraphQL query
fragment PRReviews on PullRequestReviewConnection {
nodes {
author {
login
}
state
}
pageInfo {
startCursor
hasPreviousPage
}
}

fragment PRCheckSuites on CheckSuiteConnection {
edges {
node {
app {
name
databaseId
}
workflowRun {
workflow {
name
}
url
}
checkRuns(first: 50) {
nodes {
name
conclusion
detailsUrl
databaseId
}
pageInfo {
endCursor
hasNextPage
}
}
conclusion
}
cursor
}
pageInfo {
hasNextPage
}
}

fragment CommitAuthors on PullRequestCommitConnection {
nodes {
commit {
author {
user {
login
}
email
name
}
oid
}
}
pageInfo {
endCursor
hasNextPage
}
}

query ($owner: String!, $name: String!, $number: Int!) {
repository(owner: $owner, name: $name) {
pullRequest(number: $number) {
closed
isCrossRepository
author {
login
}
title
body
headRefName
headRepository {
nameWithOwner
}
baseRefName
baseRepository {
nameWithOwner
isPrivate
defaultBranchRef {
name
}
}
mergeCommit {
oid
}
commits_with_authors: commits(first: 100) {
...CommitAuthors
totalCount
}
commits(last: 1) {
nodes {
commit {
checkSuites(first: 10) {
...PRCheckSuites
}
status {
contexts {
context
state
targetUrl
}
}
pushedDate
oid
}
}
}
changedFiles
files(first: 100) {
nodes {
path
}
pageInfo {
endCursor
hasNextPage
}
}
reviews(last: 100) {
...PRReviews
}
comments(last: 5) {
nodes {
bodyText
createdAt
author {
login
}
authorAssociation
editor {
login
}
databaseId
}
pageInfo {
startCursor
hasPreviousPage
}
}
labels(first: 100) {
edges {
node {
name
}
}
}
}
}
}
, args {'name': 'pytorch', 'owner': 'pytorch', 'number': 96131} failed: [{'message': 'Something went wrong while executing your query. Please include 0402:1535:7C69A:100353:6424630D when reporting this issue.'}]

Details for Dev Infra team

Raised by workflow job

Redo of #90172 out of stack. [ghstack-poisoned]

ghstack-source-id: 11844b06eccc59a5eca1d577c2d6538427e74461 Pull Request resolved: #96131

pmeier · 2023-03-29T20:55:01Z

@pytorchbot merge

pytorchmergebot · 2023-03-29T20:56:54Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

improve memory footprint of torch.testing.assert_close

9ddf529

[ghstack-poisoned]

pmeier mentioned this pull request Mar 6, 2023

remove obsolete MPS compat in torch.testing.assert_close #96132

Closed

pmeier added module: testing Issues related to the torch.testing module (not tests) topic: not user facing topic category labels Mar 6, 2023

pytorchbot added the open source label Mar 6, 2023

pmeier commented Mar 6, 2023

View reviewed changes

Update on "improve memory footprint of torch.testing.assert_close"

010d56a

Redo of #90172 out of stack. [ghstack-poisoned]

pmeier commented Mar 6, 2023

View reviewed changes

torch/testing/_comparison.py Outdated Show resolved Hide resolved

Update on "improve memory footprint of torch.testing.assert_close"

772150c

Redo of #90172 out of stack. [ghstack-poisoned]

pmeier marked this pull request as ready for review March 7, 2023 08:32

pmeier requested review from mruberry and pearu March 7, 2023 08:32

Update on "improve memory footprint of torch.testing.assert_close"

8a79f26

Redo of #90172 out of stack. [ghstack-poisoned]

pearu approved these changes Mar 7, 2023

View reviewed changes

Update on "improve memory footprint of torch.testing.assert_close"

22ef4ff

Redo of #90172 out of stack. [ghstack-poisoned]

mruberry reviewed Mar 17, 2023

View reviewed changes

mruberry approved these changes Mar 17, 2023

View reviewed changes

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Mar 17, 2023

Update on "improve memory footprint of torch.testing.assert_close"

98f6ef0

Redo of #90172 out of stack. [ghstack-poisoned]

pytorchmergebot pushed a commit that referenced this pull request Mar 17, 2023

improve memory footprint of torch.testing.assert_close

61d5d45

ghstack-source-id: c42b920568ee9fe210a5f8ec42961ca1272c65ca Pull Request resolved: #96131

Update on "improve memory footprint of torch.testing.assert_close"

cde78f2

Redo of #90172 out of stack. [ghstack-poisoned]

pmeier commented Mar 17, 2023

View reviewed changes

pmeier requested a review from kulinseth March 17, 2023 09:03

pmeier mentioned this pull request Mar 28, 2023

improve memory footprint of assert_close #90172

Closed

Update on "improve memory footprint of torch.testing.assert_close"

bd80681

Redo of #90172 out of stack. [ghstack-poisoned]

Update on "improve memory footprint of torch.testing.assert_close"

93429ba

Redo of #90172 out of stack. [ghstack-poisoned]

pytorch-bot bot added the ciflow/mps Run MPS tests (subset of trunk) label Mar 29, 2023

pmeier added a commit that referenced this pull request Mar 29, 2023

improve memory footprint of torch.testing.assert_close

95f2e96

ghstack-source-id: 1b796fd7695e8ba2673eb05cccf2f7d9174b21bd Pull Request resolved: #96131

pmeier mentioned this pull request Mar 29, 2023

Tensor.all() fails on MPS for tensors with more than 4 dimensions #95538

Open

Update on "improve memory footprint of torch.testing.assert_close"

d56b79b

Redo of #90172 out of stack. [ghstack-poisoned]

pytorchmergebot pushed a commit that referenced this pull request Mar 29, 2023

improve memory footprint of torch.testing.assert_close

3c3216a

ghstack-source-id: ec7cd022806cea09dfd1cd4e1e91477d4d5dedf4 Pull Request resolved: #96131

Update on "improve memory footprint of torch.testing.assert_close"

8e1f3aa

Redo of #90172 out of stack. [ghstack-poisoned]

pmeier added a commit that referenced this pull request Mar 29, 2023

improve memory footprint of torch.testing.assert_close

29b525d

ghstack-source-id: 11844b06eccc59a5eca1d577c2d6538427e74461 Pull Request resolved: #96131

pytorchmergebot added the Merged label Mar 29, 2023

pytorchmergebot closed this in 2f6c18d Mar 29, 2023

pmeier mentioned this pull request Mar 30, 2023

fix ImagePair MAE comparison pytorch/vision#7477

Merged

facebook-github-bot deleted the gh/pmeier/54/head branch June 8, 2023 18:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

improve memory footprint of torch.testing.assert_close #96131

improve memory footprint of torch.testing.assert_close #96131

pmeier commented Mar 6, 2023 •

edited

pytorch-bot bot commented Mar 6, 2023 •

edited

pmeier Mar 6, 2023

pmeier Mar 6, 2023

pmeier Mar 6, 2023

pmeier Mar 6, 2023

pearu left a comment

pearu Mar 7, 2023

pmeier Mar 7, 2023

pytorchmergebot commented Mar 11, 2023

mruberry left a comment

pmeier commented Mar 17, 2023

pytorchmergebot commented Mar 17, 2023

pytorchmergebot commented Mar 17, 2023

pytorchmergebot commented Mar 17, 2023

pytorchmergebot commented Mar 17, 2023

pmeier Mar 17, 2023 •

edited

kulinseth Mar 18, 2023

pmeier commented Mar 20, 2023

pmeier commented Mar 28, 2023

kulinseth commented Mar 28, 2023

pmeier commented Mar 28, 2023

pmeier commented Mar 29, 2023

pmeier commented Mar 29, 2023

pytorchmergebot commented Mar 29, 2023

pytorchmergebot commented Mar 29, 2023

pytorchmergebot commented Mar 29, 2023

pytorchmergebot commented Mar 29, 2023

pmeier commented Mar 29, 2023

pytorchmergebot commented Mar 29, 2023

		actual_flat = actual.flatten()
		expected_flat = expected.flatten()

improve memory footprint of torch.testing.assert_close #96131

improve memory footprint of torch.testing.assert_close #96131

Conversation

pmeier commented Mar 6, 2023 • edited

pytorch-bot bot commented Mar 6, 2023 • edited

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/96131

⏳ No Failures, 1 Pending

pmeier Mar 6, 2023

Choose a reason for hiding this comment

pmeier Mar 6, 2023

Choose a reason for hiding this comment

pmeier Mar 6, 2023

Choose a reason for hiding this comment

pmeier Mar 6, 2023

Choose a reason for hiding this comment

pearu left a comment

Choose a reason for hiding this comment

pearu Mar 7, 2023

Choose a reason for hiding this comment

pmeier Mar 7, 2023

Choose a reason for hiding this comment

pytorchmergebot commented Mar 11, 2023

mruberry left a comment

Choose a reason for hiding this comment

pmeier commented Mar 17, 2023

pytorchmergebot commented Mar 17, 2023

pytorchmergebot commented Mar 17, 2023

pytorchmergebot commented Mar 17, 2023

Merge started

pytorchmergebot commented Mar 17, 2023

Merge failed

pmeier Mar 17, 2023 • edited

Choose a reason for hiding this comment

kulinseth Mar 18, 2023

Choose a reason for hiding this comment

pmeier commented Mar 20, 2023

pmeier commented Mar 28, 2023

kulinseth commented Mar 28, 2023

pmeier commented Mar 28, 2023

pmeier commented Mar 29, 2023

pmeier commented Mar 29, 2023

pytorchmergebot commented Mar 29, 2023

pytorchmergebot commented Mar 29, 2023

pytorchmergebot commented Mar 29, 2023

Merge started

pytorchmergebot commented Mar 29, 2023

Merge failed

pmeier commented Mar 29, 2023

pytorchmergebot commented Mar 29, 2023

Merge started

pmeier commented Mar 6, 2023 •

edited

pytorch-bot bot commented Mar 6, 2023 •

edited

pmeier Mar 17, 2023 •

edited