Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Callback after cancel #2281

Merged
merged 5 commits into from Apr 15, 2024
Merged

Conversation

jmachowinski
Copy link
Contributor

fixes #2265

Note, this commit fixes a bug, that makes the current version of the tutorial
not work any more, as the code now acts according to its documentation.

@fujitatomoya fujitatomoya self-assigned this Aug 25, 2023
@alsora
Copy link
Collaborator

alsora commented Apr 1, 2024

Some more thoughts on this PR (also discussed in the last Client Library WG meeting).

  • I think that the expected behavior should be that the action client keeps the goals alive unless explicitly instructed to drop them.
  • As @wjwwood mentioned, it's ok that this behavior is different from what we do when creating ROS entities (e.g. if you create a subscription, you are responsible for keeping the subscription shared_ptr in scope)

This means that the documentation is wrong, but the demo code is correct.

Having to hold the future even if you don't do anything with it is a very strange user experience, since std::future is a standard C++ class with some expected usage and behavior, we shouldn't overload it with ROS-specific expectations.

I proposed to fix the bug by:

  1. Have the client class store shared_ptr rather than weak_ptr here https://github.com/ros2/rclcpp/blob/rolling/rclcpp_action/include/rclcpp_action/client.hpp#L757
  2. do not capture the goal_handle as a shared_ptr in the lambdas, but rather only as a weak_ptr

This should ensure proper destruction of things

@jmachowinski
Copy link
Contributor Author

  • I think that the expected behavior should be that the action client keeps the goals alive unless explicitly instructed to drop them.

Note, this will break the code of users not setting the callback.

Just to make sure that we are all on the same page, the current behavior is:
If you give a callback to the result:

  • The goal will keep itself alive, as the share_ptr is caputred in the lambda to the result callback

If you don't set a result callback:

  • It works as documented, and the goal will terminate the second you drop the shared_ptr from the future

Whatever we decide, we will break the code of someone...

@alsora
Copy link
Collaborator

alsora commented Apr 2, 2024

Whatever we decide, we will break the code of someone...

This is true, that's why I think we should rather focus on what we think is the correct behavior.

  • Current rolling code:
  • Proposal 1: require the users to keep the std::shared_future (and goal handle) in scope
    • this is the current state of this PR f574591
    • abuses the meaning of a std::future, which not only is used for waiting, but also to keep the goal alive
    • fixes the bug
    • respects the documentation
    • breaks the demo examples (where the future is ignored)
    • the [[no-discard]] attribute can produce a warning (in most compilers) if users don't hold the future, but this can be easily ignored.
    • breaks the code for users that ignore the warning and do not keep the future alive. Goals will be unexpectedly cancelled under the users feet.
  • Proposal 2: have the action client automatically keep goals alive and use a dedicated API to drop them
    • proposed in the Client Library WG meeting and summarized in Callback after cancel #2281 (comment)
    • fixes the bug
    • does not respect the documentation
    • the demo examples are ok
    • breaks the code for users that were relying on dropping the future / goal handle to terminate the goal. These users now need to use the dedicated API

I think that this is a good summary of the the current state and the two proposals

My vote is for "proposal 2"

@clalancette @fujitatomoya @wjwwood @mjcarroll

@mjcarroll
Copy link
Member

I think @ros2/team is worth pinging here.

@jmachowinski
Copy link
Contributor Author

I vote for 1, as users at least get a compile warning.

Additional I would propose, that we directly return the goal handle by the async_send_goal method, as this would also give us a way to drop the goal, before it was accepted. This is currently a second design flaw in the API.

@clalancette
Copy link
Contributor

I think that the expected behavior should be that the action client keeps the goals alive unless explicitly instructed to drop them.

Note, this will break the code of users not setting the callback.

Can you explain more of your thinking around this?

Today, if users are not setting the result callback, then either:

  1. They are holding onto the std::future, and thus keeping it alive, or
  2. They are forgetting to hold onto the handle, and getting UB (though they may get lucky and this may work by accident)

If we change it so that the action client keeps the goals alive, then in case 1., they are unnecessarily holding the std::future, and may delay it from being destroyed, but things should still work. We will fix case 2 for them, in that we'll hold onto the handle. In both cases, I don't think we make things worse for users who are not setting the result callback.

But it is entirely possible I'm missing something here.

@jmachowinski
Copy link
Contributor Author

jmachowinski commented Apr 3, 2024

Can you explain more of your thinking around this?

Users that don't set the callback might rely on the fact, that the goal gets canceled the second they drop the handle.
If we change this to proposal 2, their code will silently break, as goals do not terminate any more after the handle gets dropped.

@clalancette
Copy link
Contributor

Users that don't set the callback might rely on the fact, that the goal gets canceled the second they drop the handle.
If we change this to proposal 2, their code will silently break, as goals do not terminate any more after the handle gets dropped.

Ah, right. Thanks.

So I have a proposal, and then an opinion.

The proposal is option 3, where we add in a completely new API for sending goals (async_send_goal2, for lack of a better idea). This API would implement behavior 2 as specified by @alsora , i.e. it would hold onto the handle and have a dedicated API to drop it. Then we leave async_send_goal as it is, with a deprecation warning. In that case, then, we would have a viable path to telling users that we are going to break their semantics, and we have a "new" way for users to change that they have to explicitly opt into. It is entirely possible that it is difficult to implement both of these behaviors internally, but I think it at least deserves consideration.

My opinion is this: I like option 3 the best, as it is clear to users that changes to these APIs are coming. If we don't want to do option 3 today, then I think we should go for option 1, as it is the least surprising for users. In the future we can always decide to go for option 3 if we think that is better.

Thoughts?

@jmachowinski
Copy link
Contributor Author

I don't think we can implement option 3 until the feature freeze next week. I would also go more into the direction of a whole API redesign, introducing two new client classes, that solve the common 'I called wait on the future and deadlocked' issue.

@ros-discourse
Copy link

This pull request has been mentioned on ROS Discourse. There might be relevant details there:

https://discourse.ros.org/t/client-library-wg-meeting-april-5-2024/36998/1

@fujitatomoya
Copy link
Collaborator

fujitatomoya commented Apr 5, 2024

Edit: Sorry my vote is proposal-1.

I second proposal-1. (from #2281 (comment)) I am okay with current implementation.

(note, i do not think i can make it tomorrow meeting, depends on traffic.)

Copy link
Collaborator

@fujitatomoya fujitatomoya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

still under discussion but lgtm with green CI atm.

@jmachowinski
Copy link
Contributor Author

Summary of the discussion:
We'll add

async_send_goal_unowned()

This version will work as described in the documentation.
We will keep async_send_goal with the current broken behavior, but update the documentation.
Open for discussion : Will we add a [[deprecated, "Deprecated, as the behavior bla bla bla"]] prior to jazzy.

My vote is for adding the deprecated prior to jazzy, as users might then recognize a bug they were experiencing.

@fujitatomoya
Copy link
Collaborator

Open for discussion : Will we add a [[deprecated, "Deprecated, as the behavior bla bla bla"]] prior to jazzy.

just checking. so the decision is, to keep async_send_goal at this moment with broken behavior, but eventually we will remove async_send_goal. and will be replaced with async_send_goal_unowned?
Or do we keep both of them?

@alsora
Copy link
Collaborator

alsora commented Apr 5, 2024

Or do we keep both of them?

Jazzy will have both of them.
The future is less set in stone =)

@jmachowinski
Copy link
Contributor Author

While implementing the proposed solution I rechecked the code and figured out, that the problem is different than described by me in this thread before.

The current behavior of the action client library is :

async_send_goal

  • Send a goal, that is fire & forget. It will be executed regardless if you hold the handle or not.
  • The given handle only controls, if you will receive callbacks.
  • The need to keep the handle is inconsistent.
    • If you only set the feedback callback, you will need to hold the handle, or you will receive no feedback.
    • As soon as you set the result callback the handle is self referencing.

Using the current implementation of async_send_goal will create a memory leak, as the goal handle will not be deleted.

  • This can be patched for nominal cases, as we could delete the result callback after the result callback was called.

I am undecided on what to do with the async_send_goal function, still migrate to a new version owned ?

I would propose, to add a new function

void stop_callbacks(typename GoalHandle::SharedPtr goal_handle)

instead of

void drop_goal_handle(typename GoalHandle::SharedPtr goal_handle)

I would also update the documentation.

Opinions? @clalancette @fujitatomoya @alsora @mjcarroll @wjwwood

@jmachowinski jmachowinski force-pushed the callback_after_cancel branch 3 times, most recently from d419b35 to f366e26 Compare April 12, 2024 16:15
@jmachowinski
Copy link
Contributor Author

@alsora updated docs

@mjcarroll
Copy link
Member

mjcarroll commented Apr 12, 2024

  • Linux Build Status
  • Linux-aarch64 Build Status
  • Windows Build Status

@alsora
Copy link
Collaborator

alsora commented Apr 12, 2024

Thank you!
Looks good to me with green CI

Janosch Machowinski and others added 5 commits April 13, 2024 13:13
This function allows us to drop the handle in a locked context.
If we do not do this within a lock, there will be a race condition between
the deletion of the shared_ptr of the handle and the result / feedback
callbacks.

Signed-off-by: Janosch Machowinski <J.Machowinski@cellumation.com>
This fixes deadlocks due to release of goal handles in callbacks etc.

Signed-off-by: Janosch Machowinski <j.machowinski@cellumation.com>
This fixes a memory leak due to a self reference in the ClientGoalHandle.
Note, this fix will only work, if the ClientGoalHandle ever receives
a result callback.

Signed-off-by: Janosch Machowinski <j.machowinski@cellumation.com>
Signed-off-by: Janosch Machowinski <j.machowinski@cellumation.com>
Signed-off-by: Janosch Machowinski <J.Machowinski@cellumation.com>
jmachowinski pushed a commit to jmachowinski/build_desc that referenced this pull request Apr 13, 2024
@jmachowinski
Copy link
Contributor Author

rebased to rolling.

CI seems to hang, can someone restart it ?

@wjwwood
Copy link
Member

wjwwood commented Apr 14, 2024

I don't see where it's hung, but I restarted CI:

  • Linux Build Status
  • Linux-aarch64 Build Status
  • Windows Build Status

@jmachowinski
Copy link
Contributor Author

Ready for merge ?

@mjcarroll mjcarroll merged commit 839348c into ros2:rolling Apr 15, 2024
3 checks passed
Copy link
Collaborator

@fujitatomoya fujitatomoya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jmachowinski lgtm anyway, but one question for doc section.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Action Client: Feeback callback called after cancel
10 participants