Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nav2 single trigger node test fail-> halt does not does not convert state to idle #3310

Closed
stevedanomodolor opened this issue Dec 5, 2022 · 21 comments · Fixed by #3605
Closed
Labels
bug Something isn't working open-navigation-priority

Comments

@stevedanomodolor
Copy link
Contributor

Bug report

Required Info:

  • Operating System: ubuntu 22
  • ROS2 Version: rolling
  • Version or commit hash:
  • DDS implementation:

Steps to reproduce issue

  • Building current nav2 codebase

Expected behavior

All test should pass

Actual behavior

/opt/overlay_ws/src/navigation2/nav2_behavior_tree/test/plugins/decorator/test_single_trigger_node.cpp:75
Expected equality of these values:
  bt_node_->executeTick()
    Which is: FAILURE
  BT::NodeStatus::RUNNING
    Which is: RUNNING/opt/overlay_ws/src/navigation2/nav2_behavior_tree/test/plugins/decorator/test_single_trigger_node.cpp:77
Expected equality of these values:
  bt_node_->executeTick()
    Which is: FAILURE
  BT::NodeStatus::SUCCESS
    Which is: SUCCESS

Additional information

  • I tried debuging to finding the root of the issue, it seem that the halt function does not change the bt node to idle, it just remains in failure. The single node plugin is acting as it is supposed to.
    This is bad debugging style but It shows what I mean, for example the test for the single node plugins,
std::cout << "Testing from here" << std::endl;
 std::cout << "Dummy node -> Status() :" << dummy_node_->status() << std::endl;
 std::cout << "Bt node -> Status() :" << bt_node_->status() << std::endl;

 bt_node_->halt();
 std::cout << "Dummy node -> Status() :" << dummy_node_->status() << std::endl;
 std::cout << "Bt node -> Status() :" << bt_node_->status() << std::endl;
 dummy_node_->changeStatus(BT::NodeStatus::RUNNING);
 std::cout << "Dummy node -> Status() :" << dummy_node_->status() << std::endl;
 std::cout << "Bt node -> Status() :" << bt_node_->status() << std::endl;
 EXPECT_EQ(bt_node_->executeTick(), BT::NodeStatus::RUNNING);
 std::cout << "Dummy node -> Status() :" << dummy_node_->status() << std::endl;
 std::cout << "Bt node -> Status() :" << bt_node_->status() << std::endl;
 dummy_node_->changeStatus(BT::NodeStatus::SUCCESS);
 std::cout << "Dummy node -> Status() :" << dummy_node_->status() << std::endl;
 std::cout << "Bt node -> Status() :" << bt_node_->status() << std::endl;

As you will see below, the bt node is never able to back to idle.

Here: SUCCESS
23: Testing from here
23: Dummy node -> Status() :IDLE
23: Bt node -> Status() :FAILURE
23: Dummy node -> Status() :IDLE
23: Bt node -> Status() :FAILURE
23: Dummy node -> Status() :RUNNING
23: Bt node -> Status() :FAILURE
23: /opt/overlay_ws/src/navigation2/nav2_behavior_tree/test/plugins/decorator/test_single_trigger_node.cpp:83: Failure
23: Expected equality of these values:
23:   bt_node_->executeTick()
23:     Which is: FAILURE
23:   BT::NodeStatus::RUNNING
23:     Which is: RUNNING
23: Dummy node -> Status() :RUNNING
23: Bt node -> Status() :FAILURE
23: Dummy node -> Status() :SUCCESS
23: Bt node -> Status() :FAILURE
23: [  FAILED  ] SingleTriggerTestFixture.test_behavior (0 ms)
23: [----------] 1 test from SingleTriggerTestFixture (0 ms total)
23: 
23: [----------] Global test environment tear-down
23: [==========] 1 test from 1 test suite ran. (115 ms total)
23: [  PASSED  ] 0 tests.
23: [  FAILED  ] 1 test, listed below:
23: [  FAILED  ] SingleTriggerTestFixture.test_behavior
23: 
23:  1 FAILED TEST
23: -- run_test.py: return code 1
@SteveMacenski
Copy link
Member

SteveMacenski commented Dec 5, 2022

@facontidavide this might point to a regression in BT.CPP. I looked to find where this started and the first PR that started to fail with this didn't change this file or anything related to it. I think it was just the first PR to turn over with a new release of BT.CPP. When did the last v3 release hit binaries? Looking over these files, they haven't changed in over a year.

This error would point to the BT::DecoratorNode not going back to BT::NodeStatus::IDLE once halt() is called upon it.

@SteveMacenski
Copy link
Member

SteveMacenski commented Dec 5, 2022

I've looked at the recent change to those files

The first added

   if( !child_node_ ){
        return;
    }

Which shouldn't be a problem since we do have a child node on that single trigger decorator.

The second removes setStatus(NodeStatus::IDLE); from halt() in the action and condition base nodes. The control nodes now use resetStatus in haltChild, but its own node has setStatus(NodeStatus::IDLE); removed from the halt implementation. Action and Decorator nodes also has setStatus(NodeStatus::IDLE); removed with resetStatus set on the child only!

That only works resursively - not if the root node of the actual BT is a decorator / action / condition - that BT node then would not be reset back to idle - which is what our unit tests do. As a result, everyone's BTs will not have the root node reset to IDLE.

In light of that @facontidavide, I think the setStatus(NodeStatus::IDLE); in halt() should be readded to deal with this issue. I can submit the PR if you agree.

@stevedanomodolor do you agree that's likely the issue?

@facontidavide
Copy link
Contributor

facontidavide commented Dec 5, 2022

I will look at this in the next days. Thanks for reporting

@stevedanomodolor
Copy link
Contributor Author

I've looked at the recent change to those files

The first added

   if( !child_node_ ){
        return;
    }

Which shouldn't be a problem since we do have a child node on that single trigger decorator.

The second removes setStatus(NodeStatus::IDLE); from halt() in the action and condition base nodes. The control nodes now use resetStatus in haltChild, but its own node has setStatus(NodeStatus::IDLE); removed from the halt implementation. Action and Decorator nodes also has setStatus(NodeStatus::IDLE); removed with resetStatus set on the child only!

That only works resursively - not if the root node of the actual BT is a decorator / action / condition - that BT node then would not be reset back to idle - which is what our unit tests do. As a result, everyone's BTs will not have the root node reset to IDLE.

In light of that @facontidavide, I think the setStatus(NodeStatus::IDLE); in halt() should be readded to deal with this issue. I can submit the PR if you agree.

@stevedanomodolor do you agree that's likely the issue?

@SteveMacenski Pretty much yes, the issue was not from nav2 side but the halt not updating the bt root node. Yes, you can submit the PR when you can.

@facontidavide
Copy link
Contributor

Let me explain my reasoning, considering also that I am NOT AGAINST revert my changes.

The concept I am trying to push is that:

  1. A Node must NEVER reset itself (where "reset" means changing the status to IDLE).
  2. A node parent (Control, Decorator) MUST reset its children.

This influence the way Decorators and Control nodes should be implemented.
For instance, I don't see you doing this in any of your plugins.

You should use:

DecoratorNode::haltChild() or ControlNode::haltChildren() whenever your Decorator/Control is completed.

Since "halt" is a bad name here, I just pushed this change in branch v3.8: BehaviorTree/BehaviorTree.CPP@87c3669

Summarizing, I think that your unit test is expecting something to happen that is not intended in the library.

@SteveMacenski
Copy link
Member

SteveMacenski commented Dec 7, 2022

A Node must NEVER reset itself

That sounds nice in theory, as long as all the default nodes properly reset all children when halt is called and the root node of the BT has some special logic to be halted (which is missing and causing this concern). But its also error prone if anyone makes a BT node without that same philosophy and/or comes to expect the previous behavior of BT.CPP that up until now has had the base nodes reset themselves as well for protection. I think that's wise defensive programming to have the nodes reset themselves especially for something exposed to users. It gives them the blueprint for doing that as well as they're using your BT nodes as models and removes a very subtle potential misunderstanding about the root nodes.

A node parent (Control, Decorator) MUST reset its children.

Yeah, that seems sensible - no argument here. But I'd ask "why not both?" especially for the default control / decorator nodes. Just cover your bases in case the user messes up with their custom nodes & provides a blueprint for success so the subtle nature of the root node (since there isn't special handling in the tree to halt that one -- yet?) isn't even something they need to worry about.

There's been a handful of changes to BT.CPP like this (and the PR we asked to be reverted persisting state between executions) which create changes in behavior that makes it problematic to re-use behavior trees. We expect halting the tree to properly reset the state fully and we rely on that in Nav2. Up until recently, this has chugged along for us no problem. I don't think these are done intentionally to remove support for re-using behavior trees for multiple tasks sequentially, but seems like something that needs to be added to the review process of changes to make sure its not breaking that behavioral support. It may be being overlooked.

DecoratorNode::haltChild() or ControlNode::haltChildren() whenever your Decorator/Control is completed.

We call BT::ControlNode::halt(); in halt() on all of our control nodes - which does that for us https://github.com/BehaviorTree/BehaviorTree.CPP/blob/v3.8/src/control_node.cpp#L34. We simply don't override halt() in the decorators and use the default base implementation https://github.com/BehaviorTree/BehaviorTree.CPP/blob/v3.8/src/decorator_node.cpp#L34 which does that as well.

Note that separate of halting we also reset children / halt at the end of a cycle in the control nodes (ex I checked them all). We don't for decorators though.

But regardless, the issue that we're having is that the root node - which is commonly going to be a decorator or a control flow node of some kind, isn't being reset in the BT because there is no parent to it for resetChildren or similar to be called upon it. As a result, either there needs to be special handling for the root node or the default nodes should defensively reset themselves -- which comes with the added benefit of fixing users' mess ups without them needing to concern themselves with it if using the default halt implementations (like we often do)

@stevedanomodolor
Copy link
Contributor Author

@SteveMacenski has any pr been pushed regarding this?

@SteveMacenski
Copy link
Member

Nope, I think we're still discussing this with @facontidavide to know what to do

@SteveMacenski
Copy link
Member

@facontidavide following up here on this

@facontidavide
Copy link
Contributor

the request changes where applied here:
BehaviorTree/BehaviorTree.CPP@87c3669

I need to "bloom" the new version 3.8.2

@SteveMacenski
Copy link
Member

@facontidavide has it been bloomed up yet? We're still experiencing this error keeping our CI in the red 😢

@SteveMacenski
Copy link
Member

SteveMacenski commented Jun 5, 2023

This is still a bug impacting us

@SteveMacenski SteveMacenski added bug Something isn't working open-navigation-priority labels Jun 5, 2023
@SteveMacenski
Copy link
Member

SteveMacenski commented Jun 7, 2023

I investigated this today. I found that the change that caused it was in the commit backporting a bunch of changes from V4 to V3.8 that we use BehaviorTree/BehaviorTree.CPP@cc77171. I went line by line and agree with most all of it except change removing the setStatus(NodeStatus::IDLE); from all of the halt functions. No shock there, that's what the discussion above pointed to.

@facontidavide 's opinion was that it was the role of the parent nodes to reset the child's. I generally agree with that. The issue is resetting the root-most node, which is done in the tree's tickRoot() function when the tree is completed, but not self-contained when we're just playing with the node API directly without the tree's use for execution.

I'll follow up in an hour when I look more deeply at that commit and decide if (1) we just need to refactor the test or (2) this is still an outstanding problem in BT.CPP that is worth me opening a PR with a suggested repose so we can get our CI green again.

I'm very confident I should have a solution for this by EOD

@SteveMacenski
Copy link
Member

SteveMacenski commented Jun 7, 2023

OK... in summary, I think the test just needs to be updated. It would be nice if the root node was automatically reset in the state especially since setStatus() is a protected method which makes testing harder but if used in context of the Tree execution as intended, there's no problem. The failure we see in CI is a result of bypassing the normal entry-point to do lower-level tests.

I think the only change necessary is to move at least resetStatus, if not also setStatus to public or protected methods so we can access them to write tests.

@SteveMacenski
Copy link
Member

BehaviorTree/BehaviorTree.CPP#579 and #3605 resolve

@SteveMacenski
Copy link
Member

BT.CPP merged, just needs release

@SteveMacenski
Copy link
Member

@facontidavide can we get a release before the Rolling sync?

@facontidavide
Copy link
Contributor

facontidavide commented Jun 18, 2023

Which version are you talking about?

3.8.3 or latest branch v3.8?
do you want me to create 3.8.4?

I am a little worry about this: BehaviorTree/BehaviorTree.CPP#563

Solving it obliged me to do a change that breaks the ABI, and even if I think it is a good change, @Rid0n mentions that it doesn't fix the issue and introduces a regression.

Can someone in Nav2 check if the latest version of branch v3.8 works as expected for them?

@SteveMacenski
Copy link
Member

SteveMacenski commented Jun 18, 2023

I'm not sure how you do versioning, but yeah bumping a new subversion with that commit would be good. We merged the change needed to fix our tests (exposing the resetStatus as protected rather than private) into the branch v3.8 as you probably recall. To merge our PR to fix Nav2's CI, we need BT.CPP's binaries to reflect it, so a release would need to be run in bloom.

Solving it obliged me to do a change that breaks the ABI, and even if I think it is a good change, @Rid0n mentions that it doesn't fix the issue and introduces a regression.

Obviously I'm not in the trenches with you on that particular issue, but if those changes didn't fix the issue, why not simply revert it? Then there's no ABI issues + no more broken/fixed than before. I assume that your changes are closer in the right direction, but if it doesn't fully solve it that seems like a reasonable enough solution and re-apply the patch once a fuller solution is found.

We don't make much use of subtrees TBH so I don't know that any of the changes made w.r.t. that would change one way or another to us

@facontidavide
Copy link
Contributor

Almost there, I should release 3.9 soon

@SteveMacenski
Copy link
Member

OK! Its not a huge rush or anything, but I would like the CI green as soon as practically possible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working open-navigation-priority
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants