-
Notifications
You must be signed in to change notification settings - Fork 90
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix freeing of loaned chunks #365
Conversation
388590b
to
a7d3439
Compare
@eboasson @MatthiasKillat Can you please review the changes? Currently, I am depending on iceoryx directly to free the chunk in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Logic appears correct, there is some potential for minor efficiency gains (to be discussed).
@@ -164,7 +164,14 @@ static uint32_t serdata_rmw_size(const struct ddsi_serdata * dcmn) | |||
|
|||
static void serdata_rmw_free(struct ddsi_serdata * dcmn) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unhelpful parameter name, can we change this here (even just data
or serdata
would be less confusing ...). I suppose it is for something like common but this does also not really match with later conversions.
rmw_cyclonedds_cpp/src/serdata.cpp
Outdated
auto * d = static_cast<serdata_rmw *>(dcmn); | ||
|
||
#ifdef DDS_HAS_SHM | ||
if(d->iox_chunk && d->iox_subscriber) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The second check for d->iox_subscriber
should not be necessary. There would be no chunk in this case, could only be non-null due to a bug or misuse (like changing d->iox_subscriber
directly as it is public).
Up to you, I favor performance in low level APIs and avoid checks like this which just cause overhead if everything works correctly (and only leads to less severe errors if not ... i.e. no crash here).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like to have this check to be sure that the loaned chunk is returned back to iceoryx only in the subscriber case
#ifdef DDS_HAS_SHM | ||
if(d->iox_chunk && d->iox_subscriber) { | ||
iox_sub_release_chunk(*static_cast<iox_sub_t *>(d->iox_subscriber), d->iox_chunk); | ||
d->iox_chunk = nullptr; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless the destructor tries to access d->iox_chunk
(in the delete d
call) it this should also not be needed.
rmw_cyclonedds_cpp/src/serdata.cpp
Outdated
const struct sertype_rmw * type = static_cast<const struct sertype_rmw *>(typecmn); | ||
auto d = std::make_unique<serdata_rmw>(type, kind); | ||
d->iox_chunk = iox_buffer; | ||
d->iox_subscriber = sub; | ||
return d.release(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why create a unique pointer if we release it to return a raw pointer after two instructions that cannot throw (i.e. there is no benefit from RAII here)? We could create a raw pointer instead which would be slightly more efficient.
If we are free to do so we could return the unique_ptr
, but I am not sure whether we can change the API here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@MatthiasKillat these functions are called from C, so returning a unique_ptr
isn't a possibility.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This pattern is used in multiple places in the code base, I think it is better to change this in a follow-up rather than in the scope of this MR, but I have wrapped the creation of new serdata in a try-catch block to catch std::bad_alloc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @MatthiasKillat's and @clalancette's comments, but also have one of my own that doesn't fit in a comment attached to a line in the diff. The issue is that I run into trouble mentally tracking lifetimes of things.
We construct an rmw_serdata
with a chunk attached in the sending and in the receiving path, and because of how Iceoryx is integrated, the two are entirely separate.
For the publishing, the serdata is constructed by publish_loaned_int
or rmw_publish_serialized_message
. This case obviously has no iox_subscriber
attached and ownership is transferred to Iceoryx. This eventually ends up in serdata_rmw_free
(because all serdata's end up there) but it should leave any iox_chunk
alone.
As a precautionary measure, dds_write
and friends clear iox_chunk
on handing it over to Iceoryx. serdata_rmw_free
is invoked when its refcount has dropped to 0, so iox_chunk
will be a null pointer by the time it ends up in serdata_rmw_free
.
In short, the publishing case is fine, and with without this PR and @MatthiasKillat is correct that the iox_subscriber
test in there is, strictly speaking, superfluous if that "precautionary measure" is elevated to a "contractual behaviour". Checking iox_subscriber
avoids that and also makes it quite clear that this doesn't apply to the publishing case.
For the subscribing case, the serdata is always constructed by serdata_rmw_from_iox
. It is dropped when:
-
It is pushed out of the RHC (triggering this PR). The only function called in this case is
serdata_rmw_free
and it follows that in this case, this PR is correct in releasing the Iceoryx chunk there. -
It is
dds_take
n out of the RHC, in which case it is preceded byserdata_rmw_to_sample
which in turn always deserializes. This doesn't affect the chunk.dds_take
is called by thermw_take
family exceptrmw_take_serialized
andrmw_take_loaned_message
. I suspect this PR also fixes a leak there, but I may have overlooked something. -
When
rmw_take_serialized
drops the last reference. This one frees theiox_chunk
but with this PR I think it should no longer do so. -
When
rmw_take_loaned_message
takes ownership of the chunk. There are two cases supported:- Where the serdata contains a sample. This case just returns a pointer and it is
rmw_return_loaned_message_from_subscription
that is responsible for freeing it; - Where the serdata contains a serialized sample. In this case the "loaned" sample in reality is a newly allocated thing. In this case the Iceoryx chunk could be freed much earlier.
In both cases (and like above, I may have overlooked something) I don't see a call toserdata_unref
, which suggests there may be a memory leak there. In the second case, it may involve shared memory as well.
In either case, with this PR, I would expect a double-free if
serdata_rmw_free
were called, becausereturn_loaned_message
frees the Iceoryx chunk but I see no reason whyserdata_rmw_free
would not try to free it again. - Where the serdata contains a sample. This case just returns a pointer and it is
The changes in this PR are good, I just think it is incomplete and that completing it will reduce the complexity in rmw_node.cpp
, because it will no longer need to deal with actually releasing chunks of shared memory except in one case: that where take_loaned_message
actually returned a pointer to shared memory and return_loaned_message
must free that memory.
Ideally, one would track the serdata and not do any shared memory management in return_loaned_message
, but that is impossible because of the RMW interface (unless one caches a set of outstanding loans in the subscriber). What should work is clearing iox_chunk
and iox_subscriber
in rmw_take_loaned_message
when it returns a pointer to shared memory.
rmw_cyclonedds_cpp/src/serdata.cpp
Outdated
const struct sertype_rmw * type = static_cast<const struct sertype_rmw *>(typecmn); | ||
auto d = std::make_unique<serdata_rmw>(type, kind); | ||
d->iox_chunk = iox_buffer; | ||
d->iox_subscriber = sub; | ||
return d.release(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@MatthiasKillat these functions are called from C, so returning a unique_ptr
isn't a possibility.
df1b655
to
f4ed794
Compare
@eboasson Thanks for the comment, I did a code walkthrough for memory audit in the shared memory case and fixed a couple of memory leaks. I tried to summarize below the data path with allocations to identify if we missed anything Publishing
|
f4ed794
to
1accbe8
Compare
f6c0670
to
d295843
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
… iox chunk Signed-off-by: Sumanth Nirmal <sumanth.724@gmail.com>
…l available Currently we depend on iceoryx_binding_c directlyd for the API to release the chunk, but this will be fixed when RMW is updated to the latest Cyclone DDS version
Signed-off-by: Sumanth Nirmal <sumanth.724@gmail.com>
Signed-off-by: Sumanth Nirmal <sumanth.724@gmail.com>
- Decrement the refcount when serdata is not needed in the take path. When the refcount drops to 0, the serdata will be freed in serdata_rmw_free - In rmw_take_loan scenario, reset the iox_chunk pointer to null before returning the loaned pointer to user, so rmw_serdata_free only frees the serdata with out actually releasing the iox_chunk. The iox_chunk will be released when rmw_return_loaned_message_from_subscription is called Signed-off-by: Sumanth Nirmal <sumanth.724@gmail.com>
- In error, release loaned chunk and decrement the serdata refcount, so it will be eventually freed Signed-off-by: Sumanth Nirmal <sumanth.724@gmail.com>
Signed-off-by: Sumanth Nirmal <sumanth.724@gmail.com>
d295843
to
ec6ad7e
Compare
@eboasson I see that linters fail in the CI jobs here #365 (review), but then realized that these are already fixed in this PR #363. I have now rebased this branch onto the latest master and the CI should hopefully be good now. Can you please run the CI again? |
Hi @sumanth-nirmal, I've re-run the full CI: The Windows failure is "tf2_ros_py.pytest.missing_result" not producing a test result, which I'm pretty sure is unrelated. So I think it it is good to go. |
@clalancette I notice that GitHub says your review was requested and I don't to go ahead and merge it if there's a possibility that you're reviewing it at the same time. As far as I am concerned, this is good to go. Please let me know (or simply press "merge" as a way of letting me know 🙂) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems reasonable enough to me (though I only did a cursory look and verified that the dependencies look correct). @eboasson Feel free to merge.
Fixes #364