Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix race with launch context changes when loading composable nodes #166

Merged
merged 8 commits into from
Aug 17, 2020

Conversation

jacobperron
Copy link
Member

This bug was discovered when trying load composable nodes from a GroupAction.
The ROS namespace (and presumably other remaps) pushed onto the context stack was popped after the LoadComposableNodes execute() function finished.
But because the loading happens asynchronously, we need to make sure we get the necessary information from the context before execute() finishes.

This bug was discovered when trying load composable nodes from a GroupAction.
The ROS namespace (and presumably other remaps)  pushed onto the context stack
was popped after the LoadComposableNodes execute() function finished.
But because the loading happens asynchronously, we need to make sure we get the
necessary information from the context before execute() finishes.

Signed-off-by: Jacob Perron <jacob@openrobotics.org>
@jacobperron jacobperron added the bug Something isn't working label Aug 4, 2020
@jacobperron jacobperron self-assigned this Aug 4, 2020
Copy link
Member

@ivanpauno ivanpauno left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a regression test?

@ivanpauno
Copy link
Member

@jacobperron Does this fix #114?

Copy link
Contributor

@hidmic hidmic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix and +1 to having a regression test.

@jacobperron
Copy link
Member Author

Can you add a regression test?

Will do.

Does this fix #114?

I had completely forgotten about that issue. I'll check to see if it fixes it.

@jacobperron
Copy link
Member Author

So, it appears this PR fixes the example mentioned in #114 (comment), which involves setting the namespace, but not the original issue that involves setting parameters on the node. I'll look into if a similar fix to the one proposed in this PR can be used to fix #114.

@ivanpauno
Copy link
Member

So, it appears this PR fixes the example mentioned in #114 (comment), which involves setting the namespace, but not the original issue that involves setting parameters on the node. I'll look into if a similar fix to the one proposed in this PR can be used to fix #114.

That's weird, because get_composable_node_load_request also resolves the parameters to be loaded.

@jacobperron
Copy link
Member Author

That's weird, because get_composable_node_load_request also resolves the parameters to be loaded.

Is there a difference between "global" params (aka launch config ros_params) and parameters passed directly to a node?

@jacobperron
Copy link
Member Author

jacobperron commented Aug 5, 2020

hmm, but it's not the parameters that are being dropped, rather a user-defined launch configuration "bar_arg". I can do some more debugging.

Signed-off-by: Jacob Perron <jacob@openrobotics.org>
@jacobperron
Copy link
Member Author

I've added some regression tests in 719ced8.

There's still a bug related to parameters (and some other corner-cases); I'll open up separate PRs for those.

assert get_node_name_count(context, f'/{TEST_CONTAINER_NAME}') == 1

# Check the remaps in load service request
assert len(load_composable_nodes_action._LoadComposableNodes__load_node_requests) == 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe the sent request could be a property, so we don't have to check a private member variable

Copy link
Member Author

@jacobperron jacobperron Aug 7, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I followed a similar pattern that is used in test_node.py.

I think a user should not have access to that information; it seems like an implementation detail subject to change.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I followed a similar pattern that is used in test_node.py.

Yeah, it's not a great pattern, but acceptable I guess.

I think a user should have access to that information; it seems like an implementation detail subject to change.

That sounds contradictory, I guess you meant "should not".


I prefer not testing implementation details, but I don't feel strongly in this case.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, sorry, I meant "should not" (updated my comment).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the ideal way to write tests in this situation would be to write a mock component container and assert things about it (e.g. if the correct service request was received). It adds a bit more to the test complexity, but could be better long-term. What do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think?

Sounds like a good proposal to me, though I don't mind approving this as-is.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went ahead and refactored the tests to use a mock object. I think it's much nicer. PTAL 5d9dcf3

Signed-off-by: Jacob Perron <jacob@openrobotics.org>
Copy link
Member

@ivanpauno ivanpauno left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Signed-off-by: Jacob Perron <jacob@openrobotics.org>
This was only needed for the tests before the refactor.

Signed-off-by: Jacob Perron <jacob@openrobotics.org>
assert request.parameters[0].value.type == ParameterType.PARAMETER_STRING
assert request.parameters[0].value.string_value == 'test_value_param1'
assert request.parameters[1].name == 'test.param2'
# TODO(jacobperron): I would expect this to be a double value, but we can only pass strings
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll take a look at resolving this in a follow-up.

@jacobperron
Copy link
Member Author

ugh, there's an issue with the tests where they hang sometimes... I can reproduce locally. I think there's an issue shutting down the launch context before LoadComposableNodes has had a chance to make a service request.

@jacobperron
Copy link
Member Author

I've discovered a race in launch (reported in #169), with a proposed fix in ros2/launch#449. With the proposed fix, the tests in the PR pass, otherwise we can expect them to hang forever.

There's no need to run asynchronously and manually send a shutdown signal.

Signed-off-by: Jacob Perron <jacob@openrobotics.org>
@ivanpauno
Copy link
Member

I've discovered a race in launch (reported in #169), with a proposed fix in ros2/launch#449. With the proposed fix, the tests in the PR pass, otherwise we can expect them to hang forever.

I understand the race, but what is causing it here?
The thread you're creating isn't emiting any event, so I would like to understand what exactly is causing the race here (the launch service is running in a single threaded event loop)

Copy link
Member

@wjwwood wjwwood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to me for getting the right values when the action is executed.

@jacobperron jacobperron added this to Proposed in Foxy Patch Release 3 via automation Aug 12, 2020
@jacobperron
Copy link
Member Author

Sadly, it still seems like there's a race somewhere causing the tests to hang somteimes 😞

Also added some debug logs to the load node action for posterity.

Signed-off-by: Jacob Perron <jacob@openrobotics.org>
@jacobperron
Copy link
Member Author

I wasn't properly shutting down the mock container and so the synchronous service call in LoadComposableNodes was blocking indefinitely. I've fixed the test in 9c50db2 and opened up a follow-up ticket to improve LoadComposableNodes such that we avoid this situation #171.

Signed-off-by: Jacob Perron <jacob@openrobotics.org>
@jacobperron
Copy link
Member Author

Note, I don't expect the PR CI job to pass until ros2/launch#449 is released.

@jacobperron
Copy link
Member Author

jacobperron commented Aug 14, 2020

Copy link
Member

@ivanpauno ivanpauno left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!
I didn't realize that shutdown wasn't done properly

@jacobperron
Copy link
Member Author

@ros-pull-request-builder retest this please

@jacobperron jacobperron merged commit e181799 into master Aug 17, 2020
@jacobperron jacobperron deleted the jacob/fix_load_composable_node_remap branch August 17, 2020 16:59
@jacobperron jacobperron removed this from Needs backport in Foxy Patch Release 3 Oct 9, 2020
@jacobperron jacobperron added this to Proposed in Foxy Patch Release 4 via automation Oct 9, 2020
@jacobperron jacobperron moved this from Proposed to Needs backport in Foxy Patch Release 4 Oct 9, 2020
jacobperron added a commit that referenced this pull request Nov 19, 2020
)

* Fix race with launch context changes when loading composable nodes

This bug was discovered when trying load composable nodes from a GroupAction.
The ROS namespace (and presumably other remaps) pushed onto the context stack
was popped after the LoadComposableNodes execute() function finished.
But because the loading happens asynchronously, we need to make sure we get the
necessary information from the context before execute() finishes.

Signed-off-by: Jacob Perron <jacob@openrobotics.org>

* Add regression tests for LoadComposableNode

Signed-off-by: Jacob Perron <jacob@openrobotics.org>

* Properly shutdown mock conatiner node

Also added some debug logs to the load node action for posterity.

Signed-off-by: Jacob Perron <jacob@openrobotics.org>
jacobperron added a commit that referenced this pull request Nov 19, 2020
)

* Fix race with launch context changes when loading composable nodes

This bug was discovered when trying load composable nodes from a GroupAction.
The ROS namespace (and presumably other remaps) pushed onto the context stack
was popped after the LoadComposableNodes execute() function finished.
But because the loading happens asynchronously, we need to make sure we get the
necessary information from the context before execute() finishes.

Signed-off-by: Jacob Perron <jacob@openrobotics.org>

* Add regression tests for LoadComposableNode

Signed-off-by: Jacob Perron <jacob@openrobotics.org>

* Properly shutdown mock conatiner node

Also added some debug logs to the load node action for posterity.

Signed-off-by: Jacob Perron <jacob@openrobotics.org>
jacobperron added a commit that referenced this pull request Nov 19, 2020
)

* Fix race with launch context changes when loading composable nodes

This bug was discovered when trying load composable nodes from a GroupAction.
The ROS namespace (and presumably other remaps) pushed onto the context stack
was popped after the LoadComposableNodes execute() function finished.
But because the loading happens asynchronously, we need to make sure we get the
necessary information from the context before execute() finishes.

Signed-off-by: Jacob Perron <jacob@openrobotics.org>

* Add regression tests for LoadComposableNode

Signed-off-by: Jacob Perron <jacob@openrobotics.org>

* Properly shutdown mock conatiner node

Also added some debug logs to the load node action for posterity.

Signed-off-by: Jacob Perron <jacob@openrobotics.org>
jacobperron added a commit that referenced this pull request Nov 19, 2020
)

* Fix race with launch context changes when loading composable nodes

This bug was discovered when trying load composable nodes from a GroupAction.
The ROS namespace (and presumably other remaps) pushed onto the context stack
was popped after the LoadComposableNodes execute() function finished.
But because the loading happens asynchronously, we need to make sure we get the
necessary information from the context before execute() finishes.

Signed-off-by: Jacob Perron <jacob@openrobotics.org>

* Add regression tests for LoadComposableNode

Signed-off-by: Jacob Perron <jacob@openrobotics.org>

* Properly shutdown mock conatiner node

Also added some debug logs to the load node action for posterity.

Signed-off-by: Jacob Perron <jacob@openrobotics.org>
jacobperron added a commit that referenced this pull request Nov 19, 2020
)

* Fix race with launch context changes when loading composable nodes

This bug was discovered when trying load composable nodes from a GroupAction.
The ROS namespace (and presumably other remaps) pushed onto the context stack
was popped after the LoadComposableNodes execute() function finished.
But because the loading happens asynchronously, we need to make sure we get the
necessary information from the context before execute() finishes.

Signed-off-by: Jacob Perron <jacob@openrobotics.org>

* Add regression tests for LoadComposableNode

Signed-off-by: Jacob Perron <jacob@openrobotics.org>

* Properly shutdown mock conatiner node

Also added some debug logs to the load node action for posterity.

Signed-off-by: Jacob Perron <jacob@openrobotics.org>
jacobperron added a commit that referenced this pull request Nov 19, 2020
)

* Fix race with launch context changes when loading composable nodes

This bug was discovered when trying load composable nodes from a GroupAction.
The ROS namespace (and presumably other remaps) pushed onto the context stack
was popped after the LoadComposableNodes execute() function finished.
But because the loading happens asynchronously, we need to make sure we get the
necessary information from the context before execute() finishes.

Signed-off-by: Jacob Perron <jacob@openrobotics.org>

* Add regression tests for LoadComposableNode

Signed-off-by: Jacob Perron <jacob@openrobotics.org>

* Properly shutdown mock conatiner node

Also added some debug logs to the load node action for posterity.

Signed-off-by: Jacob Perron <jacob@openrobotics.org>
jacobperron added a commit that referenced this pull request Nov 20, 2020
)

* Fix race with launch context changes when loading composable nodes

This bug was discovered when trying load composable nodes from a GroupAction.
The ROS namespace (and presumably other remaps) pushed onto the context stack
was popped after the LoadComposableNodes execute() function finished.
But because the loading happens asynchronously, we need to make sure we get the
necessary information from the context before execute() finishes.

Signed-off-by: Jacob Perron <jacob@openrobotics.org>

* Add regression tests for LoadComposableNode

Signed-off-by: Jacob Perron <jacob@openrobotics.org>

* Properly shutdown mock conatiner node

Also added some debug logs to the load node action for posterity.

Signed-off-by: Jacob Perron <jacob@openrobotics.org>
@jacobperron jacobperron removed this from Needs backport in Foxy Patch Release 4 Nov 20, 2020
jacobperron added a commit that referenced this pull request Nov 20, 2020
…odes (#166) (#206)

* Fix race with launch context changes when loading composable nodes (#166)

* Fix race with launch context changes when loading composable nodes

This bug was discovered when trying load composable nodes from a GroupAction.
The ROS namespace (and presumably other remaps) pushed onto the context stack
was popped after the LoadComposableNodes execute() function finished.
But because the loading happens asynchronously, we need to make sure we get the
necessary information from the context before execute() finishes.

Signed-off-by: Jacob Perron <jacob@openrobotics.org>

* Add regression tests for LoadComposableNode

Signed-off-by: Jacob Perron <jacob@openrobotics.org>

* Properly shutdown mock conatiner node

Also added some debug logs to the load node action for posterity.

Signed-off-by: Jacob Perron <jacob@openrobotics.org>

* Fix docblock in LoadComposableNodes (#207)

Signed-off-by: Jacob Perron <jacob@openrobotics.org>
@jacobperron jacobperron removed this from Proposed in Dashing Patch Release 8 Nov 21, 2020
@jacobperron jacobperron added this to Proposed in Dashing Patch Release 9 via automation Nov 21, 2020
@clalancette clalancette moved this from Proposed to Needs Backport in Dashing Patch Release 9 Jan 21, 2021
clalancette pushed a commit that referenced this pull request May 18, 2021
)

* Fix race with launch context changes when loading composable nodes

This bug was discovered when trying load composable nodes from a GroupAction.
The ROS namespace (and presumably other remaps) pushed onto the context stack
was popped after the LoadComposableNodes execute() function finished.
But because the loading happens asynchronously, we need to make sure we get the
necessary information from the context before execute() finishes.

Signed-off-by: Jacob Perron <jacob@openrobotics.org>

* Add regression tests for LoadComposableNode

Signed-off-by: Jacob Perron <jacob@openrobotics.org>

* Properly shutdown mock conatiner node

Also added some debug logs to the load node action for posterity.

Backporting note: As far as I can tell, the *Composable* launch_ros classes in
Dashing are not setup to be able to be tested.  We would need to add a bunch
of additional dependencies to write tests here.  I didn't think
that was worth it for Dashing, so I just removed the tests.

Signed-off-by: Jacob Perron <jacob@openrobotics.org>
Signed-off-by: Chris Lalancette <clalancette@openrobotics.org>
clalancette added a commit that referenced this pull request May 20, 2021
) (#238)

* Fix race with launch context changes when loading composable nodes

This bug was discovered when trying load composable nodes from a GroupAction.
The ROS namespace (and presumably other remaps) pushed onto the context stack
was popped after the LoadComposableNodes execute() function finished.
But because the loading happens asynchronously, we need to make sure we get the
necessary information from the context before execute() finishes.

Signed-off-by: Jacob Perron <jacob@openrobotics.org>

* Add regression tests for LoadComposableNode

Signed-off-by: Jacob Perron <jacob@openrobotics.org>

* Properly shutdown mock conatiner node

Also added some debug logs to the load node action for posterity.

Backporting note: As far as I can tell, the *Composable* launch_ros classes in
Dashing are not setup to be able to be tested.  We would need to add a bunch
of additional dependencies to write tests here.  I didn't think
that was worth it for Dashing, so I just removed the tests.

Signed-off-by: Jacob Perron <jacob@openrobotics.org>
Signed-off-by: Chris Lalancette <clalancette@openrobotics.org>

Co-authored-by: Jacob Perron <jacob@openrobotics.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

None yet

4 participants