-
Notifications
You must be signed in to change notification settings - Fork 25k
[pytorch] Fix serialization memory lifetime issue. #30603
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Pickler object needs to be kept in scope until data is written out to the final serialized string. tensorData in particular is a reference to memory owned by the descoped Pickle object. Noticed this by inspection, though it's not as bad as it looks - for device==cpu, the is no actual issue since the memory is the same as the input args. Differential Revision: [D18760463](https://our.internmc.facebook.com/intern/diff/D18760463/) [ghstack-poisoned]
Pickler object needs to be kept in scope until data is written out to the final serialized string. tensorData in particular is a reference to memory owned by the descoped Pickle object. Noticed this by inspection, though it's not as bad as it looks - for device==cpu, the is no actual issue since the memory is the same as the input args. Differential Revision: [D18760463](https://our.internmc.facebook.com/intern/diff/D18760463/) ghstack-source-id: 94741950 Pull Request resolved: #30603
Sorry about this one... fwiw, it's not clear to me that this is related to any of the test issues (especially because rollback failed to make those failures go away), but I spent some more time this evening staring at code and noticed this memory issue. |
Pickler object needs to be kept in scope until data is written out to the final serialized string. tensorData in particular is a reference to memory owned by the descoped Pickle object. Noticed this by inspection. In practice, this potential read-after-free here is limited to non-cpu tensors, and any such use was very soon after free. Differential Revision: [D18760463](https://our.internmc.facebook.com/intern/diff/D18760463/) [ghstack-poisoned]
Pickler object needs to be kept in scope until data is written out to the final serialized string. tensorData in particular is a reference to memory owned by the descoped Pickle object. Noticed this by inspection. In practice, this potential read-after-free here is limited to non-cpu tensors, and any such use was very soon after free. Differential Revision: [D18760463](https://our.internmc.facebook.com/intern/diff/D18760463/) [ghstack-poisoned]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason for keeping the pickler alive is not immediately clear from looking at the code. I can imagine this breaking again in the future. I recommend leaving the pickler lifetime as-is, but either 1) capturing the vector of tensors from pickler.tensorData()
by value, or 2) adding an at::Tensor
field to the tuple you put in entries
so the data pointer is colocated with the object that extends its lifetime.
Thanks for the suggestion, agree that it would be good to avoid future regressions. |
Pickler object needs to be kept in scope until data is written out to the final serialized string. tensorData in particular is a reference to memory owned by the descoped Pickle object. Noticed this by inspection. In practice, this potential read-after-free here is limited to non-cpu tensors, and any such use was very soon after free. Differential Revision: [D18760463](https://our.internmc.facebook.com/intern/diff/D18760463/) [ghstack-poisoned]
Pull Request resolved: #30603 Pickler object needs to be kept in scope until data is written out to the final serialized string. tensorData in particular is a reference to memory owned by the descoped Pickle object. Noticed this by inspection. In practice, this potential read-after-free here is limited to non-cpu tensors, and any such use was very soon after free. ghstack-source-id: 94756036 Differential Revision: [D18760463](https://our.internmc.facebook.com/intern/diff/D18760463/)
@@ -201,12 +203,13 @@ std::string wireSerialize( | |||
pickler.protocol(); | |||
pickler.pushIValue(tensors); | |||
pickler.stop(); | |||
auto writeable_tensors = pickler.tensorData(); | |||
// tensorData is in function scope so that the data() pointers stay valid. | |||
tensorData = pickler.tensorData(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How expensive is it to capture by value here? I would assume WriteableTensorData
is just a thin wrapper. If not, shall we add some microbenchmark to see how much perf do we lose?
Fortunately, we do have a benchmark for it checked in: In practice, the results end up being about the same either way: This is mostly because the copy-by-value is largely a refcount operation. If I run several times in a row, the variations between runs is more than the variation between the two impls. I'm completely happy using the other version as well (that just extends the life of the Pickler) - it should be slightly more efficient, though not in a very measurable manner. I simply want to get a fix for this read-after-free checked in. :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks!
Summary: Pull Request resolved: pytorch#30603 Pickler object needs to be kept in scope until data is written out to the final serialized string. tensorData in particular is a reference to memory owned by the descoped Pickle object. Noticed this by inspection. In practice, this potential read-after-free here is limited to non-cpu tensors, and any such use was very soon after free. ghstack-source-id: 94756036 Test Plan: existing test suite at buck test mode/dev-nosan caffe2/test:rpc_fork Differential Revision: D18760463 fbshipit-source-id: 9de890d66626aa48f13ca376dd9bd50b92e0cb00
Stack from ghstack:
Pickler object needs to be kept in scope until data is written out to the
final serialized string. tensorData in particular is a reference to memory
owned by the descoped Pickle object.
Noticed this by inspection. In practice, this potential read-after-free here
is limited to non-cpu tensors, and any such use was very soon after free.
Differential Revision: D18760463