Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

move TypedStorage handling to assertEqual #89557

Closed
wants to merge 17 commits into from

Conversation

pmeier
Copy link
Collaborator

@pmeier pmeier commented Nov 23, 2022

#85303 added a patch to torch.testing.assert_close to handle torch.storage.TypedStorage's. This change is not reflected in the docs and is not intended for the public API. This PR removes the patch ones again and moves the behavior to TestCase.assertEqual instead. Meaning, TypedStorage's are again not supported by the public API, but the behavior is the same for all internal use cases.

cc @gujinghui @PenghuiCheng @XiaobingSuper @jianyuh @jgong5 @mingfeima @sanchitintel @ashokei @jingxu10 @min-jean-cho @yanbing-j @Guobing-Chen @Xia-Weiwen

@pmeier pmeier added module: testing Issues related to the torch.testing module (not tests) topic: not user facing topic category labels Nov 23, 2022
@pytorch-bot
Copy link

pytorch-bot bot commented Nov 23, 2022

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/89557

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 254491c:
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Copy link
Collaborator Author

@pmeier pmeier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are two distinct errors left:

  1. https://github.com/pytorch/pytorch/actions/runs/3539134271/jobs/5941041682#step:10:99929

    def test_save_different_dtype_unallocated(self):

    Unless I'm mistaken we should simply add exact_dtype=False to the assertEqual calls.

  2. https://github.com/pytorch/pytorch/actions/runs/3539134271/jobs/5940935728#step:10:29047

    def test_storage_setitem(self, device, dtype):

    The problem is only occurs for the quantized dtypes. It happens, because you can't instantiate a new tensor with such a dtype directly, but our logic currently tries to.

torch/testing/_internal/common_utils.py Outdated Show resolved Hide resolved
@kurtamohler
Copy link
Collaborator

kurtamohler commented Dec 6, 2022

As for your errors:

Unless I'm mistaken we should simply add exact_dtype=False to the assertEqual calls.

I think we want the dtypes to match when we perform the comparison--otherwise it wouldn't be possible to ensure an exact match, right? I'm not sure why the dtypes disagree. The loaded storage's dtype at least matches the saved storage's dtype in test/test_serialization.py in save_load_check, right? We definitely need that to be true

The problem is only occurs for the quantized dtypes. It happens, because you can't instantiate a new tensor with such a dtype directly, but our logic currently tries to.

You may be able to use something like this trick to interpret the quantized type as the appropriate nonquantized type:

pytorch/torch/storage.py

Lines 530 to 543 in 7322f73

if self.dtype in [torch.quint8, torch.quint4x2, torch.quint2x4, torch.qint32, torch.qint8]:
interpret_dtypes = {
torch.quint8: torch.uint8,
torch.quint4x2: torch.uint8,
torch.quint2x4: torch.uint8,
torch.qint32: torch.int32,
torch.qint8: torch.int8
}
tmp_dtype = interpret_dtypes[self.dtype]
tmp_tensor = torch.tensor([], dtype=tmp_dtype, device=self._untyped_storage.device)
tmp_tensor.set_(TypedStorage(
wrap_storage=self._untyped_storage,
dtype=tmp_dtype,
_internal=True))

@pmeier
Copy link
Collaborator Author

pmeier commented Dec 7, 2022

I think we want the dtypes to match when we perform the comparison--otherwise it wouldn't be possible to ensure an exact match, right? I'm not sure why the dtypes disagree. The loaded storage's dtype at least matches the saved storage's dtype in test/test_serialization.py in save_load_check, right? We definitely need that to be true

I've looked more closely into the test and as is it makes little sense. Since we operate without any data

a = torch.tensor([], dtype=dtype, device=device)

the only thing we could be checking with

self.assertEqual(a, a_loaded)
self.assertEqual(b, b_loaded)

is the dtype.

However, due to some quirks in assertEqual we aren't doing that either: for #67796 we agreed offline (I'm too lazy now to find the specific commit in the PR that added this) that storages should be compared as if they were a list of Python Scalars. This is why before this PR we had them listed in sequence_types. Meaning, an empty tensor will be turned into an empty list and no check is happening at all:

>>> a = torch.tensor([], dtype=torch.int)
>>> b = a.float()
>>> torch.testing.assert_close(a, b)
AssertionError: The values for attribute 'dtype' do not match: torch.int32 != torch.float32.
>>> torch.testing.assert_close(a.tolist(), b.tolist())

Even if we had some data before, the check wouldn't have been doing a dtype check due to some legacy quirks in assertEqual

class RelaxedNumberPair(NumberPair):
"""Pair for number-like inputs.
In contrast to the builtin :class:`NumberPair`, this class also supports one input being a single element
tensor-like or a :class:`enum.Enum`. (D)Type checks are disabled, meaning comparing 1 to 1.0 succeeds even when
``check_dtype=True`` is passed.

In conclusion, before this PR assertEqual will only check the values, but never the dtypes of typed storages. Since there are no values, the test is doing nothing for typed storages.


Now for the part why this is failing after this PR. A minimal reproduction is

import io
import torch

dtype = torch.float64
other_dtype = torch.float32

t = torch.tensor([], dtype=dtype)
s = torch.TypedStorage(wrap_storage=t.storage().untyped(), dtype=other_dtype)

with io.BytesIO() as buffer:
    torch.save([t, s], buffer)
    buffer.seek(0)
    *_, s_loaded = torch.load(buffer)

assert s_loaded.dtype == s.dtype

Is the mismatch between s.dtype and t.dtype intentional? We create s from t.storage() that has dtype, but we explicitly set dtype=other_dtype. If yes, the failing test in this PR actually is a valid failure, since s_loaded has dtype and not other_dtype like s.

Interestingly, the behavior is correct when only serializing s, i.e. torch.save([s], buffer). Plus, if we actually try to do the same with some data, i.e. t = torch.tensor([0], dtype=dtype), we are not even allowed to serialize both at the same time:

RuntimeError: Cannot save multiple tensors or storages that view the same data as different types

@kurtamohler
Copy link
Collaborator

kurtamohler commented Dec 8, 2022

Ah sorry, I forgot about this until now. Yes the mismatch between s.dtype and a.dtype is intentional. When I wrote test_save_different_dtype_unallocated, I should have added a comment to explain it.

Historically, it has not been possible to save/load multiple tensors/storages that point to the same location (though I think it may be fairly straightforward to add that feature now that all the storage refactoring is complete). In torch.save we check the data_ptr of each tensor and storage and throw an error if two have the same data_ptr but different dtypes. However, if a tensor and storage are unallocated (as in tensor([])), their data_ptrs are both 0, and the tensor and storage are not actually pointing to the same memory location, since they're not pointing at any memory location. This means that doing this

a = torch.tensor([])
s = a.storage()

and this

a = torch.tensor([])
s = torch.Storage()

produce exactly the same a and s.

We do allow saving and loading unallocated tensors/storages of different dtypes. The only reason why the test for this exists is because I broke this behavior at one point and had to fix it. The history of this is here: #58970

It probably would have been better if I had not used a.storage() in this test, and instead created the storage separately, so that it's clear that a and s aren't actually linked at all.

@kurtamohler
Copy link
Collaborator

kurtamohler commented Dec 8, 2022

And it appears that the dtype mismatch between the saved and loaded storage is an upstream bug. If I change the test to do this:

self.assertEqual(a.dtype, a_loaded.dtype)
self.assertEqual(b.dtype, b_loaded.dtype)

it fails. I had assumed assertEqual checks the dtype. I will look into why this failure happens

@pmeier
Copy link
Collaborator Author

pmeier commented Dec 8, 2022

I had assumed assertEqual checks the dtype.

It will after this PR. Should we maybe xfail this test in this PR and merge it so you can have a testing ground?

@pmeier
Copy link
Collaborator Author

pmeier commented Dec 8, 2022

Should we maybe xfail this test

I took the liberty to do that in 822f6b2. In case you want to go another round, I'll revert later.

@pmeier pmeier marked this pull request as ready for review December 8, 2022 09:46
@pmeier pmeier requested a review from a team as a code owner December 8, 2022 09:46
@kurtamohler
Copy link
Collaborator

Expected failure seems like a good idea to me

@github-actions github-actions bot added the module: mkldnn Related to Intel IDEEP or oneDNN (a.k.a. mkldnn) integration label Dec 9, 2022
Copy link
Collaborator

@mruberry mruberry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stamped, but be sure to revert the fbgemm changes

@pmeier
Copy link
Collaborator Author

pmeier commented Dec 12, 2022

@pytorchbot merge -g

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Dec 12, 2022
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks on your PR pass since you used the green (-g) flag (ETA: 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ciflow/trunk Trigger trunk jobs on your pull request Merged module: mkldnn Related to Intel IDEEP or oneDNN (a.k.a. mkldnn) integration module: testing Issues related to the torch.testing module (not tests) open source topic: not user facing topic category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants