Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

IValue::deepcopy(GenericDict) returns incorrect result when dictionary values are Tensor views #125635

Open
mbasmanova opened this issue May 6, 2024 · 3 comments
Assignees
Labels
high priority module: correctness (silent) issue that returns an incorrect result silently module: cpp Related to C++ API triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Comments

@mbasmanova
Copy link

mbasmanova commented May 6, 2024

馃悰 Describe the bug

I'm seeing that IValue::deepcopy returns incorrect result for GenericDict of Tensor views.

When calling deepcopy on a GenericDict with 5 entries, the result is a dictionary with 5 entries, where keys match the source, but all values are the same and match the value of the 'first' key in source dictionary.

Let's say original dictionary is

{
  k1: v1,
  k2: v2,
  k3: v3
}

, where v1, v2, v3 share storage, but have different storage_offsets. Then, deepcopy returns

{
  k1: v1,
  k2: v1,
  k3: v1
}

Notice that all values are now the same.

I believe this is because of the use of Tensor::is_alias_of, which return true if tensors share storage even if their storage offsets are not the same. At the end of deepcopy, there is this code:

  if (!isAliasOf(copy)) {
    memo[*this] = copy;
  }

isAliasOf contains:

    // Tensors should be compared based on internal storage
    if (this->isTensor()) {
      return isAliasOf(this->toTensor(), rhs.toTensor());
    }

which is

  static bool isAliasOf(const at::Tensor& a, const at::Tensor& b) {
    if (a.is_sparse()) {
      return isAliasOf(a._values(), b) || isAliasOf(a._indices(), b);
    }
    if (b.is_sparse()) {
      return isAliasOf(a, b._values()) || isAliasOf(a, b._indices());
    }
    if (a.is_sparse_csr()) {
      return isAliasOf(a.values(), b) || isAliasOf(a.crow_indices(), b) ||
          isAliasOf(a.col_indices(), b);
    }
    if (b.is_sparse_csr()) {
      return isAliasOf(a, b.values()) || isAliasOf(a, b.crow_indices()) ||
          isAliasOf(a, b.col_indices());
    }

    // Opaque tensors such as the ones constructed by the MKL-DNN backend
    // don't have storage so we just compare their TensorImpls.
    // TODO: Find way to expose alias info for opaque tensors.
    if (!a.has_storage() || !b.has_storage()) {
      return a.unsafeGetTensorImpl() == b.unsafeGetTensorImpl();
    }

    return a.is_alias_of(b);
  }

Versions

latest

cc @ezyang @gchanan @zou3519 @kadeng @msaroufim @jbschlosser

@ezyang ezyang assigned ezyang and jerryzh168 and unassigned ezyang May 7, 2024
@ezyang ezyang added module: cpp Related to C++ API module: correctness (silent) issue that returns an incorrect result silently and removed triage review labels May 13, 2024
@ezyang
Copy link
Contributor

ezyang commented May 13, 2024

NB: this is an internal request

@ezyang
Copy link
Contributor

ezyang commented May 13, 2024

The internal users have fixed this, but this is still egregiously wrong and we should fix it.

@ezyang
Copy link
Contributor

ezyang commented May 14, 2024

#126089 and @davidberard98 is going to finish it

@mikaylagawarecki mikaylagawarecki added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label May 15, 2024
davidberard98 added a commit that referenced this issue May 15, 2024
Copy of #126089, with some additional fixes & tests

Partial fix for #125635: previously, the deepcopy implementation would group together any tensors with any aliasing relationship and assign them to the same tensor. This was sort of good if you have two tensors `b = a.detach()`, because then if you deepcopy `list = [a, b]` to `list2 = list.deepcopy()`, then writes to `list2[0]` will also modify `list2[1]`. But for the most part, it's bad; (1) if you have `b = a.as_strided((4, 4), (16, 1), 16)`, then it'll make `b == a` in the deepcopied implementation, which is completely wrong; and (2) even if you have `b = a.detach()`, these are still initially two different tensors which become the same tensor after the old deepcopy implementation.

The new implementation only groups together tensors that have the same identity. This is a partial fix, but it's more reasonable. What changes:
* (becomes more correct): different views of the same base tensor will no longer all become equal after deepcopying
* (still kind of wrong): views won't actually alias each other after deepcopying.
* (arguably a minor regression): equivalent views of the same tensor will no longer be copied to the same tensor - so they won't alias.

BC breaking: C++ deepcopy interface changes from accepting `IValue::HashAliasedIValueMap memo` to accepting `IValue::HashIdentityIValueMap memo`. If there are objections, we can keep the old API. However, it seems likely that users generally won't try to deepcopy from C++.

Differential Revision: [D57340612](https://our.internmc.facebook.com/intern/diff/D57340612)

cc ezyang gchanan

[ghstack-poisoned]
davidberard98 added a commit that referenced this issue May 15, 2024
Copy of #126089, with some additional fixes & tests

Partial fix for #125635: previously, the deepcopy implementation would group together any tensors with any aliasing relationship and assign them to the same tensor. This was sort of good if you have two tensors `b = a.detach()`, because then if you deepcopy `list = [a, b]` to `list2 = list.deepcopy()`, then writes to `list2[0]` will also modify `list2[1]`. But for the most part, it's bad; (1) if you have `b = a.as_strided((4, 4), (16, 1), 16)`, then it'll make `b == a` in the deepcopied implementation, which is completely wrong; and (2) even if you have `b = a.detach()`, these are still initially two different tensors which become the same tensor after the old deepcopy implementation.

The new implementation only groups together tensors that have the same identity. This is a partial fix, but it's more reasonable. What changes:
* (becomes more correct): different views of the same base tensor will no longer all become equal after deepcopying
* (still kind of wrong): views won't actually alias each other after deepcopying.
* (arguably a minor regression): equivalent views of the same tensor will no longer be copied to the same tensor - so they won't alias.

BC breaking: C++ deepcopy interface changes from accepting `IValue::HashAliasedIValueMap memo` to accepting `IValue::HashIdentityIValueMap memo`. If there are objections, we can keep the old API. However, it seems likely that users generally won't try to deepcopy from C++.

Differential Revision: [D57340612](https://our.internmc.facebook.com/intern/diff/D57340612)

cc ezyang gchanan

[ghstack-poisoned]
pytorchmergebot pushed a commit that referenced this issue May 17, 2024
Copy of #126089, with some additional fixes & tests

Partial fix for #125635: previously, the deepcopy implementation would group together any tensors with any aliasing relationship and assign them to the same tensor. This was sort of good if you have two tensors `b = a.detach()`, because then if you deepcopy `list = [a, b]` to `list2 = list.deepcopy()`, then writes to `list2[0]` will also modify `list2[1]`. But for the most part, it's bad; (1) if you have `b = a.as_strided((4, 4), (16, 1), 16)`, then it'll make `b == a` in the deepcopied implementation, which is completely wrong; and (2) even if you have `b = a.detach()`, these are still initially two different tensors which become the same tensor after the old deepcopy implementation.

The new implementation only groups together tensors that have the same identity. This is a partial fix, but it's more reasonable. What changes:
* (becomes more correct): different views of the same base tensor will no longer all become equal after deepcopying
* (still kind of wrong): views won't actually alias each other after deepcopying.
* (arguably a minor regression): equivalent views of the same tensor will no longer be copied to the same tensor - so they won't alias.

BC breaking: C++ deepcopy interface changes from accepting `IValue::HashAliasedIValueMap memo` to accepting `IValue::HashIdentityIValueMap memo`. If there are objections, we can keep the old API. However, it seems likely that users generally won't try to deepcopy from C++.

Differential Revision: [D57406306](https://our.internmc.facebook.com/intern/diff/D57406306)
Pull Request resolved: #126126
Approved by: https://github.com/ezyang
ZelboK pushed a commit to ZelboK/pytorch that referenced this issue May 19, 2024
Copy of pytorch#126089, with some additional fixes & tests

Partial fix for pytorch#125635: previously, the deepcopy implementation would group together any tensors with any aliasing relationship and assign them to the same tensor. This was sort of good if you have two tensors `b = a.detach()`, because then if you deepcopy `list = [a, b]` to `list2 = list.deepcopy()`, then writes to `list2[0]` will also modify `list2[1]`. But for the most part, it's bad; (1) if you have `b = a.as_strided((4, 4), (16, 1), 16)`, then it'll make `b == a` in the deepcopied implementation, which is completely wrong; and (2) even if you have `b = a.detach()`, these are still initially two different tensors which become the same tensor after the old deepcopy implementation.

The new implementation only groups together tensors that have the same identity. This is a partial fix, but it's more reasonable. What changes:
* (becomes more correct): different views of the same base tensor will no longer all become equal after deepcopying
* (still kind of wrong): views won't actually alias each other after deepcopying.
* (arguably a minor regression): equivalent views of the same tensor will no longer be copied to the same tensor - so they won't alias.

BC breaking: C++ deepcopy interface changes from accepting `IValue::HashAliasedIValueMap memo` to accepting `IValue::HashIdentityIValueMap memo`. If there are objections, we can keep the old API. However, it seems likely that users generally won't try to deepcopy from C++.

Differential Revision: [D57406306](https://our.internmc.facebook.com/intern/diff/D57406306)
Pull Request resolved: pytorch#126126
Approved by: https://github.com/ezyang
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
high priority module: correctness (silent) issue that returns an incorrect result silently module: cpp Related to C++ API triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

No branches or pull requests

5 participants