Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SmaccAsyncClientBehavior onExit() freezes when cancelGoal() is called on the attached action client #540

Closed
sukhrajklair opened this issue May 1, 2024 · 9 comments

Comments

@sukhrajklair
Copy link
Contributor

sukhrajklair commented May 1, 2024

Describe the bug
I created a custom asynchronous client behavior by inheriting from SmaccAsyncClientBehavior which sends a goal to an action server (using the attached smacc action client) and cancel the goal inside onExit(). The goal gets sent successfully and EvCbFinished is transmitted. But when onExit() for the behavior is called due to a transition to another state, the cb's onExit() never finishes. The action server does receive a request to cancel all goals and actually cancels the goals. I have other orthogonals in this state which include subscriber clients and even their callbacks don't get called.

Expected behavior
The onExit() of CB should finish and the state should transition

Environment (please complete the following information):

ROS DETAILS:

  • OS: Ubuntu 22.04 LTS
  • Humble

REPO DETAILS:

  • Repository: robosoft-ai/SMACC2
  • Branch: humble
  • Commit (First 5 Digits): c1fa655

BUILD DETAILS

  • Build Command: colcon build --symlink-install

Additional context
I took a look at the implementation of the smacc_action_client_base and believe that this line is an issue. This line pauses all of the callback execution of the node and the node never receives the cancellation response from the action server consequently the future never returns. When I replaced it with rclcpp::spin_until_future_complete(getNode()->get_node_base_interface(), fut);, onExit() of the CB finishes and the state machine transitions to the next state.

┆Issue is synchronized with this Jira Task by Unito

@pabloinigoblasco
Copy link
Contributor

Hello @sukhrajklair, thank you for sharing your findings.

I appreciate your efforts in addressing this issue. It's crucial that we explore this further as it may reveal potential improvements. Nevertheless, I'd like to carefully consider your hypothesis and proposed solution.

You mentioned that a specific line halts all callback executions within the node, which consequently prevents the node from receiving the cancellation response from the action server, leaving the future unresolved. However, I'm inclined to disagree that all callbacks are halted. There's another thread, the "signal detector thread," that operates independently from the onExit function of the AsynchronousClientBehavior. This thread should handle the cancel request/response and unlock the future.

Regarding your solution—replacing the problematic line with rclcpp::spin_until_future_complete(getNode()->get_node_base_interface(), fut)—it's intriguing that this change allowed the onExit of the CB to complete and facilitated the state transition. However, I believe this approach may introduce conflicts with the "signal detector" thread's main ROS loop. Therefore, I'm hesitant to endorse this solution without further examination.

To proceed effectively, I suggest creating a basic test case to replicate this behavior. This would allow us to collaboratively assess whether a fix or modification is necessary.

@sukhrajklair
Copy link
Contributor Author

Hello @pabloinigoblasco

I'm still new to the SMACC2 library, so plesae take everything I say with a grain of salt. I've created an example state machine and clients which reproduce the issue I faced. Please check it out here
image

ClModeSelect+CbModeSelect subscribes to a topic called "/mode_command" and generates an event based on the value received.
ClFibonacci+CbFibonacci creates an action client and sends goal to the Fibonacci_action_server provided in action_tutorials_cpp pkg

Please follow these steps to reproduce the issue:

  1. build the simple_action_client_example pkg that I added to the smacc2_sm_reference_library directory in my above linked forked repo
  2. run the state machine: ros2 run simple_action_client_example simple_action_client_example_node
  3. run the Fibonacci action server: ros2 run action_tutorials_cpp fibonacci_action_server
  4. publish msg on /mode_command topic: ros2 topic pub /mode_command example_interfaces/msg/Int32 "{data: 1}". This should cause the state machine to transition to StStart2. Upon entry to StStart2, CbFibonacci sends a goal to the fibonacci_action_server.
  5. publish another msg /mode_command topic: ros2 topic pub /mode_command example_interfaces/msg/Int32 "{data: 0}". This generates an event and should trigger a transition back to StStart1 which will consequently call onExit() of CbFibonacci. Inside onExit(), it calls cancelGoal() of ClFibonacci. The action server receives a request to cancel all goals and so it does. However, the onExit() never finishes. Any further msgs published on /mode_command topic do not generate any event in the state machine.

When I say "all callback executions within the node are halted", I basically extrapolated this from my observation in the step 5 of the above process.

When I replace fut.wait() with rclcpp::spin_until_future_complete(getNode()->get_node_base_interface(), fut) and repeats the above steps, the onExit() of CbFibonacci finishes and state machine transitions back to StState1.

@pabloinigoblasco
Copy link
Contributor

Thank you for sharing this information, @sukhrajklair. I appreciate your help. I'm going to replicate your case and perform some debugging to better understand the issue. I'll follow up with you soon to provide further insights.

@pabloinigoblasco
Copy link
Contributor

Hello @sukhrajklair we have been analyzing this.
Indeed the bug is there, I can confirm that.
However the proposed solution is incorrect because it blocks the signal detector thread. We are working in an alternative solution that I hope it will be around in the next few days.

Thanks.

@pabloinigoblasco
Copy link
Contributor

This is the solution I propose, esentially removing the condition of synchronicity in a cancel request.

#542

There is some small discussion here why that future wait was created, though. Your demo still works with this solution.

Regards.

@sukhrajklair
Copy link
Contributor Author

@pabloinigoblasco I can't think of any reason, at least for my application, why I would need the cancel request to be synchronous. So this solution works. I appreciate your effort on this.

@brettpac
Copy link
Collaborator

brettpac commented Jun 4, 2024

I just wanted to say thank you @sukhrajklair for writing a fantastic ticket, along with a code sample so that we could reproduce the issue.

Would you be willing to create a pull request with the state machine sm_simple_action_client in the reference library that we could add to the main branch?

@brettpac brettpac reopened this Jun 4, 2024
@sukhrajklair
Copy link
Contributor Author

@brettpac I appreciate the timely response. I've created a PR with my code.

@brettpac
Copy link
Collaborator

brettpac commented Jun 6, 2024

Thank you @sukhrajklair, we'll be merging it shortly.

@brettpac brettpac closed this as completed Jun 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants