Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix freeing of loaned chunks #365

Merged
merged 7 commits into from
Feb 16, 2022
Merged

Conversation

sumanth-nirmal
Copy link
Contributor

Fixes #364

@sumanth-nirmal
Copy link
Contributor Author

@eboasson @MatthiasKillat Can you please review the changes?

Currently, I am depending on iceoryx directly to free the chunk in serdata_free under the assumption that we can remove this dependency once we update RMW to use the latest Cyclone tag. Please let me know if it doesn't make sense.

Copy link
Contributor

@MatthiasKillat MatthiasKillat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logic appears correct, there is some potential for minor efficiency gains (to be discussed).

rmw_cyclonedds_cpp/CMakeLists.txt Outdated Show resolved Hide resolved
@@ -164,7 +164,14 @@ static uint32_t serdata_rmw_size(const struct ddsi_serdata * dcmn)

static void serdata_rmw_free(struct ddsi_serdata * dcmn)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unhelpful parameter name, can we change this here (even just data or serdata would be less confusing ...). I suppose it is for something like common but this does also not really match with later conversions.

auto * d = static_cast<serdata_rmw *>(dcmn);

#ifdef DDS_HAS_SHM
if(d->iox_chunk && d->iox_subscriber) {
Copy link
Contributor

@MatthiasKillat MatthiasKillat Jan 10, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The second check for d->iox_subscriber should not be necessary. There would be no chunk in this case, could only be non-null due to a bug or misuse (like changing d->iox_subscriber directly as it is public).

Up to you, I favor performance in low level APIs and avoid checks like this which just cause overhead if everything works correctly (and only leads to less severe errors if not ... i.e. no crash here).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to have this check to be sure that the loaned chunk is returned back to iceoryx only in the subscriber case

#ifdef DDS_HAS_SHM
if(d->iox_chunk && d->iox_subscriber) {
iox_sub_release_chunk(*static_cast<iox_sub_t *>(d->iox_subscriber), d->iox_chunk);
d->iox_chunk = nullptr;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless the destructor tries to access d->iox_chunk (in the delete d call) it this should also not be needed.

const struct sertype_rmw * type = static_cast<const struct sertype_rmw *>(typecmn);
auto d = std::make_unique<serdata_rmw>(type, kind);
d->iox_chunk = iox_buffer;
d->iox_subscriber = sub;
return d.release();
Copy link
Contributor

@MatthiasKillat MatthiasKillat Jan 10, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why create a unique pointer if we release it to return a raw pointer after two instructions that cannot throw (i.e. there is no benefit from RAII here)? We could create a raw pointer instead which would be slightly more efficient.

If we are free to do so we could return the unique_ptr, but I am not sure whether we can change the API here.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MatthiasKillat these functions are called from C, so returning a unique_ptr isn't a possibility.

Copy link
Contributor Author

@sumanth-nirmal sumanth-nirmal Jan 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This pattern is used in multiple places in the code base, I think it is better to change this in a follow-up rather than in the scope of this MR, but I have wrapped the creation of new serdata in a try-catch block to catch std::bad_alloc

Copy link
Collaborator

@eboasson eboasson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @MatthiasKillat's and @clalancette's comments, but also have one of my own that doesn't fit in a comment attached to a line in the diff. The issue is that I run into trouble mentally tracking lifetimes of things.

We construct an rmw_serdata with a chunk attached in the sending and in the receiving path, and because of how Iceoryx is integrated, the two are entirely separate.

For the publishing, the serdata is constructed by publish_loaned_int or rmw_publish_serialized_message. This case obviously has no iox_subscriber attached and ownership is transferred to Iceoryx. This eventually ends up in serdata_rmw_free (because all serdata's end up there) but it should leave any iox_chunk alone.

As a precautionary measure, dds_write and friends clear iox_chunk on handing it over to Iceoryx. serdata_rmw_free is invoked when its refcount has dropped to 0, so iox_chunk will be a null pointer by the time it ends up in serdata_rmw_free.

In short, the publishing case is fine, and with without this PR and @MatthiasKillat is correct that the iox_subscriber test in there is, strictly speaking, superfluous if that "precautionary measure" is elevated to a "contractual behaviour". Checking iox_subscriber avoids that and also makes it quite clear that this doesn't apply to the publishing case.

For the subscribing case, the serdata is always constructed by serdata_rmw_from_iox. It is dropped when:

  • It is pushed out of the RHC (triggering this PR). The only function called in this case is serdata_rmw_free and it follows that in this case, this PR is correct in releasing the Iceoryx chunk there.

  • It is dds_taken out of the RHC, in which case it is preceded by serdata_rmw_to_sample which in turn always deserializes. This doesn't affect the chunk. dds_take is called by the rmw_take family except rmw_take_serialized and rmw_take_loaned_message. I suspect this PR also fixes a leak there, but I may have overlooked something.

  • When rmw_take_serialized drops the last reference. This one frees the iox_chunk but with this PR I think it should no longer do so.

  • When rmw_take_loaned_message takes ownership of the chunk. There are two cases supported:

    • Where the serdata contains a sample. This case just returns a pointer and it is rmw_return_loaned_message_from_subscription that is responsible for freeing it;
    • Where the serdata contains a serialized sample. In this case the "loaned" sample in reality is a newly allocated thing. In this case the Iceoryx chunk could be freed much earlier.
      In both cases (and like above, I may have overlooked something) I don't see a call to serdata_unref, which suggests there may be a memory leak there. In the second case, it may involve shared memory as well.

    In either case, with this PR, I would expect a double-free if serdata_rmw_free were called, because return_loaned_message frees the Iceoryx chunk but I see no reason why serdata_rmw_free would not try to free it again.

The changes in this PR are good, I just think it is incomplete and that completing it will reduce the complexity in rmw_node.cpp, because it will no longer need to deal with actually releasing chunks of shared memory except in one case: that where take_loaned_message actually returned a pointer to shared memory and return_loaned_message must free that memory.

Ideally, one would track the serdata and not do any shared memory management in return_loaned_message, but that is impossible because of the RMW interface (unless one caches a set of outstanding loans in the subscriber). What should work is clearing iox_chunk and iox_subscriber in rmw_take_loaned_message when it returns a pointer to shared memory.

const struct sertype_rmw * type = static_cast<const struct sertype_rmw *>(typecmn);
auto d = std::make_unique<serdata_rmw>(type, kind);
d->iox_chunk = iox_buffer;
d->iox_subscriber = sub;
return d.release();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MatthiasKillat these functions are called from C, so returning a unique_ptr isn't a possibility.

@sumanth-nirmal sumanth-nirmal force-pushed the fix-freeing-samples branch 3 times, most recently from df1b655 to f4ed794 Compare January 26, 2022 21:30
@sumanth-nirmal
Copy link
Contributor Author

@eboasson Thanks for the comment, I did a code walkthrough for memory audit in the shared memory case and fixed a couple of memory leaks. I tried to summarize below the data path with allocations to identify if we missed anything

Publishing

rmw_publish_serialized_message
  1. Construct the serdata from the serialized message (serdata allocation)
  2. With dds_data_allocator we loan the chunk from iceoryx (Chunk allocation)
  3. Deserialize the sample to the loaned chunk (this additional deserialization will be fixed when we update rmw to use the latest Cyclone)
  4. Update the serdata with the loaned chunk
  5. Hand over the serdata to the dds_writecdr()
    1. The loaned chunk is handover to the iceoryx (Chunk free)
    2. The chunk pointer is reset to null in serdata
    3. The serdata ref count is decremented
      1. Since the ref count will be 0, the serdata will be deleted in serdata_rmw_free (serdata free)
  • Error scenario
    1. When deserialize fails, serdata refcount is decremented, which will eventually free the serdata in serdata_rmw_free (serdata free)
rmw_publish_loaned_message
  1. Construct the serdata with the placement new operator (serdata allocation)
  2. Update the serdata with the loaned chunk (the chunk is loaned with rmw_borrow_loaned_message) (Chunk allocation)
  3. Hand over the serdata to the dds_writecdr()
    1. The loaned chunk is handover to the iceoryx (Chunk free)
    2. The chunk pointer is reset to null in serdata
    3. The serdata ref count is decremented
      1. Since the ref count will be 0, the serdata will be deleted in serdata_rmw_free (serdata free)
  • Error scenario
    1. When dds_writecdr() fails, serdata refcount is decremented, which will eventually free the serdata in serdata_rmw_free (serdata free)

Subscription

rmw_take_serialized_message/with_info
  1. Get serdata with dds_takecdr() (serdata allocation)
  2. The serdata has valid iox_chunk (updated in serdata_rmw_from_iox) (Chunk allocation)
  3. Serialize the sample from the loaned chunk (this additional serialization will be fixed when we update rmw to use the latest Cyclone)
  4. The serdata ref count is decremented (the ref count will be 0 after this)
    1. The iox_chunk will be released to the iox_subscriber in serdata_rmw_free (Chunk free)
    2. The serdata will be freed in serdata_rmw_free (serdata free)
  • Error scenario
    1. When serialize fails, serdata refcount is decremented, the ref count will drop to 0 which will end in serdata_rmw_free
      1. The iox_chunk will be released to the iox_subscriber in serdata_rmw_free (Chunk free)
      2. The serdata will be freed in serdata_rmw_free (serdata free)
    2. When the received sample is not valid
      1. We decrement the ref count of the serdata (the ref count will be 0 after this)
      1. With SHM, the iox_chunk will be released to the iox_subscriber (Chunk free), and the serdata will be freed in serdata_rmw_free (serdata free)
      2. without SHM, the serdata is freed in serdata_rmw_free (serdata free)
rmw_take_loaned_message/with_info
  1. Get serdata with dds_takecdr() (serdata allocation)
  2. If the serdata has valid iox_chunk (updated in serdata_rmw_from_iox) (Chunk allocation)
    1. Give the reference to the chunk to the user
    2. Initialize the data_alloc with this subscription
    3. The chunk will be freed in rmw_return_loaned_message_from_subscription() (Chunk free)
    4. The iox_chunk pointer is reset to null and serdata ref count is decremented (the ref count will be 0 after this)
      1. The serdata will be deleted in serdata_rmw_free (serdata free)
        (the loaned chunk will not be released here as the iox_chunk pointer is null)
  3. If the serdata doesn't have the chunk (meaning SHM is disabled on the publisher but enabled on subscription)
    1. Allocate memory for the sample on the heap (Sample allocation)
    2. deserialize the sample to this allocated memory
    3. Give the reference to this allocated memory to the user
    4. The allocated memory will be freed in rmw_return_loaned_message_from_subscription() (Sample free)
    5. The serdata ref count is decremented (the ref count will be 0 after this)
      1. The serdata will be deleted in serdata_rmw_free (serdata free)
  • Error scenario
    1. When the received sample is not valid
      1. We decrement the ref count of the serdata (the ref count will be 0 after this)
        1. With SHM, the iox_chunk will be released to the iox_subscriber (Chunk free), and the serdata will be freed in serdata_rmw_free (serdata free)
        2. without SHM, the serdata is freed in serdata_rmw_free (serdata free)

I think with this MR, all the previously unhandled memory leaks with SHM should have been fixed (unless I missed anything)!

Copy link
Collaborator

@eboasson eboasson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, it think it covers every path now. CI:

  • Linux Build Status

  • Linux aarch64 Build Status

  • Windows Build Status

… iox chunk

Signed-off-by: Sumanth Nirmal <sumanth.724@gmail.com>
…l available

Currently we depend on iceoryx_binding_c directlyd for the API to release the chunk, but this will be fixed when RMW is updated to the latest Cyclone DDS version
Signed-off-by: Sumanth Nirmal <sumanth.724@gmail.com>
Signed-off-by: Sumanth Nirmal <sumanth.724@gmail.com>
  - Decrement the refcount when serdata is not needed in the take path.
 When the refcount drops to 0, the serdata will be freed in serdata_rmw_free
  - In rmw_take_loan scenario, reset the iox_chunk pointer to null
before returning the loaned pointer to user, so rmw_serdata_free only frees
the serdata with out actually releasing the iox_chunk. The iox_chunk will be
released when rmw_return_loaned_message_from_subscription is called

Signed-off-by: Sumanth Nirmal <sumanth.724@gmail.com>
  - In error, release loaned chunk and decrement the serdata refcount,
so it will be eventually freed

Signed-off-by: Sumanth Nirmal <sumanth.724@gmail.com>
Signed-off-by: Sumanth Nirmal <sumanth.724@gmail.com>
@sumanth-nirmal
Copy link
Contributor Author

@eboasson I see that linters fail in the CI jobs here #365 (review), but then realized that these are already fixed in this PR #363. I have now rebased this branch onto the latest master and the CI should hopefully be good now. Can you please run the CI again?

@eboasson
Copy link
Collaborator

Hi @sumanth-nirmal, I've re-run the full CI:

  • Linux Build Status
  • Linux aarch64 Build Status
  • Windows Build Status

The Windows failure is "tf2_ros_py.pytest.missing_result" not producing a test result, which I'm pretty sure is unrelated. So I think it it is good to go.

@eboasson
Copy link
Collaborator

@clalancette I notice that GitHub says your review was requested and I don't to go ahead and merge it if there's a possibility that you're reviewing it at the same time. As far as I am concerned, this is good to go. Please let me know (or simply press "merge" as a way of letting me know 🙂)

Copy link
Contributor

@clalancette clalancette left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems reasonable enough to me (though I only did a cursory look and verified that the dependencies look correct). @eboasson Feel free to merge.

tdinesh added a commit to tdinesh/rmw_cyclonedds that referenced this pull request Nov 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Shared memory chunks are lost when the subscriber callback is delayed
4 participants