Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Serialize conj and neg bits #88182

Closed

Conversation

kshitij12345
Copy link
Collaborator

@kshitij12345 kshitij12345 commented Nov 1, 2022

Fixes #81690

TODO:

  • C++ Unpickler Fix (locally tested pickled in Python and unpickled in C++)
  • C++ Pickler Fix (locally tested pickled in C++ and unpickled in Python)
  • Do quant_tensor, sparse_tensor, etc require similar changes? (Sparse and Quant don't need this)
  • Add Comments
  • How to make sure C++ and Python are in sync? (Functions in pickler.h help in getting and setting Tensor Metadata (math-bits for now) on a tensor. They are the only place which should handle this.)

Notes:
Quant Tensor don't support complex dtypes and for float they segfault with _neg_view : #88484

Sparse Tensor:

>>> a = torch.tensor([[0, 2.], [3j, 0]]).to_sparse()
>>> a.conj().is_conj()
False
>>> a._neg_view()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NotImplementedError: Cannot access storage of SparseTensorImpl

@pytorch-bot
Copy link

pytorch-bot bot commented Nov 1, 2022

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/88182

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 5152d8c:
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorch-bot pytorch-bot bot added the release notes: jit release notes category label Nov 3, 2022
if len(data.args) == 6:
storage, offset, size, stride, requires_grad, hooks = data.args
else:
storage, offset, size, stride, requires_grad, hooks, math_bits = data.args
storage_info = get_storage_info(storage)
return {"__tensor_v2__": [storage_info, offset, size, stride, requires_grad]}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This ties up to some javascript rendering helper. We ignore MathBits in that javascript rendering of this information.

if (data.__tensor_v2__) {
const [storage, offset, size, stride, grad] = data.__tensor_v2__;
const [dtype, key, device, numel] = storage;
return this.renderTensor(
"tensor", dtype, key, device, numel, offset, size, stride, grad, []);
}

// set MathBits on Tensor from map
inline void setTensorMathBits(
const at::Tensor& t,
c10::Dict<c10::IValue, c10::IValue> math_bits_idict) {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This overload is required for C++ unpickler.cpp which returns a c10::Dict<IValue, IValue>.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a comment?

@kshitij12345 kshitij12345 marked this pull request as ready for review November 4, 2022 09:02
@kshitij12345
Copy link
Collaborator Author

It would be great to know if this makes sense and looks in the right direction. Thanks!

@kshitij12345 kshitij12345 changed the title [WIP] [fix] MathBits: serialization [fix] MathBits: serialization Nov 4, 2022
torch/_tensor.py Outdated
@@ -358,6 +358,7 @@ def _reduce_ex_internal(self, proto):
self.stride(),
self.requires_grad,
backward_hooks,
torch._utils.get_math_bits(self),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can't do this. This will break serialization FC.

What you need to do is test if get_math_bits is the default value (all not set). If none of it is set, then you must omit it from the argument list, so that it is compatible with old rebuild_tensor.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sense. Thanks!

py_module.def(
"_set_tensor_mathbits",
static_cast<void (*)(
const at::Tensor&, std::unordered_map<int8_t, bool>)>(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a big fan of this being a map from ints to bools. If someone is inspecting the Pickle data by hand, they will see an opaque numeric constant that they will have to grovel in the C++ enum definition to resolve the name of. Also, you don't even say on the enum that these numbers are serialized and must be kept consistent.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a more forward looking format would be to have a dictionary from string to bool. This can be conceptualized as arbitrary extra metadata attached to the tensor, and lets us add more properties to it in the future if needed. Additionally, you should NOT add entries to the dictionary if they are the default values.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did start with map of <string, bool> but string seemed like an overkill for key. But I think what you have mentioned makes sense. Have updated the type of map.

Thanks!

@ezyang
Copy link
Contributor

ezyang commented Nov 4, 2022

Thanks a lot for fixing this, it is long overdue. I have some comments about the serialization format.

Copy link
Contributor

@anjali411 anjali411 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall. I think we should add a test for C++ serialization API https://github.com/pytorch/pytorch/blob/master/test/cpp/api/serialize.cpp

@kshitij12345
Copy link
Collaborator Author

Looks good overall. I think we should add a test for C++ serialization API https://github.com/pytorch/pytorch/blob/master/test/cpp/api/serialize.cpp

Thanks for sharing that link. Will update the name and also add a test for C++!

}

{
auto expected = torch::conj(torch::_neg_view(x));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
auto expected = torch::conj(torch::_neg_view(x));
auto expected = torch::imag(torch::conj(x));

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we are trying to set both neg and conj bits.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry I meant to post this for the only neg bit case

@anjali411
Copy link
Contributor

Looks great. Just occurred to me that it might be nice to preemptively add serialization support for ZeroTensor bit. We can't currently get a ZeroTensor through a public API, so adding an assert will be fine too. Don't have to be in this PR, i.e., can be done in a follow up PR. thank you!

@kshitij12345 kshitij12345 added release notes: complex release notes category topic: bug fixes topic category labels Nov 9, 2022
@kshitij12345
Copy link
Collaborator Author

Sure. Will have a follow-up PR for ZeroTensor. Thanks!

@kshitij12345
Copy link
Collaborator Author

@pytorchbot merge

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Nov 9, 2022
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: HTTP Error 502: Bad Gateway

Details for Dev Infra team Raised by workflow job

@anjali411
Copy link
Contributor

@pytorchbot merge

@anjali411
Copy link
Contributor

@pytorchbot merge -g

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks on your PR pass since you used the green (-g) flag (ETA: 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

pytorchmergebot pushed a commit that referenced this pull request Nov 11, 2022
kulinseth pushed a commit to kulinseth/pytorch that referenced this pull request Dec 10, 2022
Fixes pytorch#81690

TODO:

* [x] C++ Unpickler Fix (locally tested pickled in Python and unpickled in C++)
* [x] C++ Pickler Fix (locally tested pickled in C++ and unpickled in Python)
* [x] Do quant_tensor, sparse_tensor, etc require similar changes? (Sparse and Quant don't need this)
* [x] Add Comments
* [x] How to make sure C++ and Python are in sync? (Functions in `pickler.h` help in getting and setting Tensor Metadata (math-bits for now) on a tensor. They are the only place which should handle this.)

Notes:
Quant Tensor don't support complex dtypes and for float they segfault with `_neg_view` : pytorch#88484

Sparse Tensor:
```python
>>> a = torch.tensor([[0, 2.], [3j, 0]]).to_sparse()
>>> a.conj().is_conj()
False
>>> a._neg_view()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NotImplementedError: Cannot access storage of SparseTensorImpl
```

Pull Request resolved: pytorch#88182
Approved by: https://github.com/ezyang, https://github.com/anjali411
kulinseth pushed a commit to kulinseth/pytorch that referenced this pull request Dec 10, 2022
@lezcano lezcano changed the title [fix] MathBits: serialization Serialize conj and neg bits Feb 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ciflow/trunk Trigger trunk jobs on your pull request Merged open source release notes: complex release notes category release notes: jit release notes category topic: bug fixes topic category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Conjugate resetting after loading a model
5 participants