Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] State machine blocks with AsyncCb cannot return and busy waiting std::future. #330

Closed
ZhenshengLee opened this issue Nov 12, 2022 · 9 comments

Comments

@ZhenshengLee
Copy link

ZhenshengLee commented Nov 12, 2022

Describe the bug

The ClRoslaunch is about to launch a python launch which will call some ros2 service call
Some times the service call in launch file would stuck, so the launch thread would block.

the output shows waiting for finishing client behavior, before leaving the state. Is the client behavior stuck? requesting force finish

RCLCPP_WARN_THROTTLE(
getLogger(), *(getNode()->get_clock()), 1000,
"[%s] waiting for finishing client behavior, before leaving the state. Is the client "
"behavior stuck? requesting force finish",
this->getName().c_str());

I've seen the code that launching thread in the client behavior is called by std::async, meaning that it
s a async operation and would return right away.

void ClRosLaunch::launch()
{
this->result_ = executeRosLaunch(
packageName_, launchFileName_, [this]() { return this->cancellationToken_.load(); });
}
void ClRosLaunch::stop() { cancellationToken_.store(true); }
std::future<std::string> ClRosLaunch::executeRosLaunch(
std::string packageName, std::string launchFileName, std::function<bool()> cancelCondition)
{
return std::async(std::launch::async, [=]() {
RCLCPP_WARN_STREAM(
rclcpp::get_logger("smacc2"), "[ClRosLaunch static] starting ros launch thread ");
std::stringstream cmd;
cmd << "ros2 launch " << packageName << " " << launchFileName;
std::array<char, 128> buffer;
std::string result;
std::unique_ptr<FILE, decltype(&pclose)> pipe(popen(cmd.str().c_str(), "r"), pclose);
if (!pipe)
{
throw std::runtime_error("popen() failed!");
}
std::stringstream ss;
bool cancelled = false;
while (fgets(buffer.data(), buffer.size(), pipe.get()) != nullptr &&
!(cancelled = cancelCondition()))
{
ss << buffer.data();
}
result = ss.str();
RCLCPP_WARN_STREAM(
rclcpp::get_logger("smacc2"), "[ClRosLaunch static]] RESULT = \n " << ss.str());
return result;
});
}

QST,

  1. why this async behavior block? whe this client behavior happen?
  2. when this happens, how to recovery the behavior of state machine? Must I kill the node and restart the node?
  3. How to avoid this client behavior blocking?

┆Issue is synchronized with this Jira Task by Unito

@pabloinigoblasco
Copy link
Contributor

pabloinigoblasco commented Nov 21, 2022

Thanks @ZhenshengLee for this ticket. I am not sure if the reason why your state is stuck is because of CbLaunch.
CbRosLaunch has not been extensively tested. In theory, it should not block, even if the roslaunch got stuck.

May that stuck be caused by other different client behavior?

Anyways, It would be nice to get a new example inside sm_reference_library to replicate and that specific case you have.

Now I will answer your generic questions (independently if this is related or not with the CbLaunch functioning)

1 - why this async behavior block? whe this client behavior happen?

Asynchronous client behaviors are just client behaviors that are to to run code of the onEntry function in a detached thread. That is why onEntry "finishes" very fast. However, we asume that the Asyncrhonous Client Behavior "Is Running" while that thread is alive. Something very important to know is that the state cant be left until all the client behaviors threads are finished. In order to do that, the state tries to interrupt all of unfinished ones (but that does not always work and depends on the internal implementation of the asyncrhonous function)

Then, the key question around is the following: What happens our state machine needs to move from one new state leaving the current state with some unfinished asynchronous behavior?

Short answer: If the client behavior gets stuck, the state will also do that.

Good client behavior needs to implement their own mechanisms to be interrupted in the case we are leaving the state.
We have a solution for this, but it is responsibility of the client behavior implementation to use it correctly:

  /// \brief onEntry is executed in a new thread. However the current state cannot be left
  /// until the onEntry thread finishes. This flag can be checked from the onEntry thread to force finishing the thread.
  // All asyncrhonous client behaviors should implement the ability of interruping the onEntry thread.
  // to avoid blocking the state machine.
  inline bool isShutdownRequested() { return isShutdownRequested_; }

2. when this happens, how to recovery the behavior of state machine? Must I kill the node and restart the node?

I think the only solution is using asyncrhonous behaviors that are prepared to be interrupted.

3. How to avoid this client behavior blocking?

Using uninterruptible client behaviors.

@ZhenshengLee
Copy link
Author

I think it may be related with std::future, because similar behavior/issue happens with https://github.com/robosoft-ai/SMACC2/blob/galactic/smacc2/include/smacc2/client_behaviors/cb_call_service.hpp

client_ = getNode()->create_client<ServiceType>(serviceName_);
result_ = client_->async_send_request(request_).get();

std::shared_ptr<typename ServiceType::Response> call(
std::shared_ptr<typename ServiceType::Request> & request)
{
auto result = client_->async_send_request(request);
//rclcpp::spin_until_future_complete(getNode(), result);
return result.get();
}

sometimes the call() function cannot return and the Cb could be stuck, thus making state machine stuck.

@ZhenshengLee ZhenshengLee changed the title [BUG] ClRoslaunch std::async launching thread cannot return and client behavior blocks. [BUG] State machine blocks with AsyncCb cannot return and busy waiting std::future. Nov 22, 2022
@Crowdedlight
Copy link

Good client behavior needs to implement their own mechanisms to be interrupted in the case we are leaving the state.
We have a solution for this, but it is responsibility of the client behavior implementation to use it correctly:

I am having the same error even with isShutdownRequested used the same way as you have in the reference example cb_blinking.hpp

My Entry looks like this and this->checkDistanceToTarget(); is not blocking. The state is set to continue to the next one as checkDistanceToTarget() calls postSuccessEvent();.

  void onEntry() override {

    log_info("OnEntry, Checking pos with 2hz");
    rclcpp::Rate rate(2); // 2Hz

    // remember to check the shutdown request, so we don't block state transitions
    while (!isShutdownRequested()) {
      log_info("Is shutdown requested: " + std::to_string(isShutdownRequested()));
      log_info("Checking distance...");
      this->checkDistanceToTarget();
      rate.sleep();
    }
    log_info("Exited loop...");

  }

My log, it will keep repeating these entries which indicates that the function this->checkDistanceToTarget(); keeps being called multiple iterations and the loop never stops. It seems like the isShutdownRequested() never gets updated.

[navigator_node-2] [WARN] [1669644347.285719186] [navigator]: [navigator::cl_drone_interface::CbTargetPosChecker] waiting for finishing client behavior, before leaving the state. Is the client behavior stuck?
[navigator_node-2] [INFO] [1669645400.507621181] [navigator]:  [CbTargetPosChecker] Is shutdown requested: 0
[navigator_node-2] [INFO] [1669644347.664684620] [navigator]:  [CbTargetPosChecker] Checking distance...
[navigator_node-2] [INFO] [1669644347.664729279] [navigator]:  [CbTargetPosChecker] Dist to target pos: 0.001001, Threshold: 0.1m
[navigator_node-2] [INFO] [1669644347.664740655] [navigator]:  [CbTargetPosChecker] At target, returning success
[navigator_node-2] [INFO] [1669644347.664758714] [navigator]: Event smacc2::EvCbSuccess<navigator::cl_drone_interface::CbTargetPosChecker, navigator::OrDroneInterface>
[navigator_node-2] [WARN] [1669644347.664772301] [navigator]: [ISmaccStateMachine] CURRENT STATE SCOPED EVENT DISCARDED, state is exiting/transitioning smacc2::EvCbSuccess<navigator::cl_drone_interface::CbTargetPosChecker, navigator::OrDroneInterface>

Either I am missing something, or it appears that isShutdownRequested() is not behaving as intended?

@pabloinigoblasco
Copy link
Contributor

@Crowdedlight your code looks good to me.
When the state is leaving, it must notify all client behaviors to leave via the SmaccAsyncClientBehavior::requestForceFinish() method.

One question, what branch/commit are you using?

@pabloinigoblasco
Copy link
Contributor

pabloinigoblasco commented Dec 28, 2022

@ZhenshengLee thanks for notifying that problematic use case (CbServiceCall). As I mentioned, ClientBehavior should implement the "interruption ability".

By the moment I created a new PR for galactic (and also for humble) to improve CbServiceCall and allow interruption of the wating process.

https://github.com/robosoft-ai/SMACC2/pull/375/files

@ZhenshengLee
Copy link
Author

https://github.com/robosoft-ai/SMACC2/pull/375/files

once the PR being merged, this canbe closed.

@pabloinigoblasco
Copy link
Contributor

PR is merged into galactic.
Humble is in progress.

@pabloinigoblasco
Copy link
Contributor

pabloinigoblasco commented Dec 30, 2022

CbServiceCall improvements also merged in humble #379

@Crowdedlight
Copy link

@Crowdedlight your code looks good to me. When the state is leaving, it must notify all client behaviors to leave via the SmaccAsyncClientBehavior::requestForceFinish() method.

One question, what branch/commit are you using?

Sorry for the late reply @pabloinigoblasco, vacation and sickness kept me away for a while.

Is SmaccAsyncClientBehavior::requestForceFinish() a method I have to explicitly call when I leave the state or is it called automatically by smacc when a state change is triggered?

We are using branch foxy atm, but planning to upgrade to humble within a few months.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants