Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trigger the intraprocess guard condition with data #2164

Merged
merged 4 commits into from Apr 12, 2023

Conversation

mjcarroll
Copy link
Member

@mjcarroll mjcarroll commented Apr 11, 2023

If the intraprocess buffer still has data after taking, re-trigger the guard condition to ensure that the executor will continue to service it, even if incoming publications stop.

I found this bug when working on #2142. There is a situation where if an intraprocess publisher sends all of it's messages before the subscription starts servicing it, then the executor will fail to service the entire queue.

Note that in the current rclcpp implementation, this really only impacts the StaticSingleThreadedExecutor, as the SingleThreadedExecutor and MultiThreadedExector both automatically trigger the waitset at the end of an executable cycle. I discovered this because the rclcpp::Waitset based executor also does not recompute work. I added the test to run against all executors to prevent regression.

If the intraprocess buffer still has data after taking, re-trigger the
guard condition to ensure that the executor will continue to service it,
even if incoming publications stop.

Signed-off-by: Michael Carroll <michael@openrobotics.org>
Signed-off-by: Michael Carroll <michael@openrobotics.org>
@mjcarroll mjcarroll force-pushed the mjcarroll/retrigger_intraprocess branch from 67a1607 to 0100d73 Compare April 11, 2023 15:07
@mjcarroll mjcarroll marked this pull request as ready for review April 11, 2023 15:08
@mjcarroll
Copy link
Member Author

  • Linux Build Status
  • Linux-aarch64 Build Status
  • Windows Build Status

Signed-off-by: Michael Carroll <michael@openrobotics.org>
@mjcarroll
Copy link
Member Author

  • Linux Build Status
  • Linux-aarch64 Build Status
  • Windows Build Status

Copy link
Contributor

@clalancette clalancette left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me with green CI.

Signed-off-by: Michael Carroll <michael@openrobotics.org>
@mjcarroll
Copy link
Member Author

mjcarroll commented Apr 11, 2023

  • Linux Build Status
  • Linux-aarch64 Build Status
  • Windows Build Status

@mjcarroll mjcarroll merged commit 5f9695a into rolling Apr 12, 2023
2 of 3 checks passed
@delete-merged-branch delete-merged-branch bot deleted the mjcarroll/retrigger_intraprocess branch April 12, 2023 00:20
Copy link
Collaborator

@fujitatomoya fujitatomoya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mjcarroll i think this is one of the critical bug, should we backport this to humble? i am inclined to backport since many application possibly meets this issue when they use StaticSingleThreadedExecutor? (no backporting is also fine, since this is under experimental.)

CC: @clalancette @alsora

@mjcarroll
Copy link
Member Author

I think it is a bug, but unclear how many people would actually run into it.

If you have anything else that wakes up the executor, it will still service the intraprocess subscriptions unconditionally (which may be another bug, but seems to work).

In the case of the current SingleThreadedExecutor implementation, it actually fires the interrupt guard condition after each executable, guaranteeing that it will re-evaluate. In the case of the StaticSingleThreadedExecutor, you would need to have an executor with only intraprocess subscriptions that weren't actively being published to and held data.

@fujitatomoya
Copy link
Collaborator

yeah, that is we never know... 💭
under the condition, i would say that that is unlikely to see, but still that is the bug to address.
so that we can avoid the possible problem for user application. besides, it does support ABI compatibility.

probably we can give it a couple of weeks to see if nothing happens on rolling, and then backport to humble.

any objections?

@mjcarroll
Copy link
Member Author

No objections! better safe than sorry.

@fujitatomoya
Copy link
Collaborator

@Mergifyio backport to humble

@mergify
Copy link

mergify bot commented Apr 13, 2023

backport to humble

❌ No backport have been created

  • Backport to branch to failed

GitHub error: Branch not found

  • Backport to branch humble in progress

mergify bot pushed a commit that referenced this pull request Apr 13, 2023
If the intraprocess buffer still has data after taking, re-trigger the
guard condition to ensure that the executor will continue to service it,
even if incoming publications stop.

Signed-off-by: Michael Carroll <michael@openrobotics.org>
(cherry picked from commit 5f9695a)
@fujitatomoya
Copy link
Collaborator

Backport to branch to failed

😢

@Mergifyio backport humble

@fujitatomoya
Copy link
Collaborator

@mjcarroll I guess i did something wrong and lead mergifyio not to create backport PR for humble ... 😓

So i just create backport PR for humble, could you take a look at #2167?

alsora pushed a commit to irobot-ros/rclcpp that referenced this pull request Apr 28, 2023
If the intraprocess buffer still has data after taking, re-trigger the
guard condition to ensure that the executor will continue to service it,
even if incoming publications stop.


Signed-off-by: Michael Carroll <michael@openrobotics.org>
alsora pushed a commit to irobot-ros/rclcpp that referenced this pull request Apr 29, 2023
If the intraprocess buffer still has data after taking, re-trigger the
guard condition to ensure that the executor will continue to service it,
even if incoming publications stop.


Signed-off-by: Michael Carroll <michael@openrobotics.org>
alsora pushed a commit to irobot-ros/rclcpp that referenced this pull request Apr 29, 2023
If the intraprocess buffer still has data after taking, re-trigger the
guard condition to ensure that the executor will continue to service it,
even if incoming publications stop.


Signed-off-by: Michael Carroll <michael@openrobotics.org>
alsora pushed a commit to irobot-ros/rclcpp that referenced this pull request Apr 29, 2023
If the intraprocess buffer still has data after taking, re-trigger the
guard condition to ensure that the executor will continue to service it,
even if incoming publications stop.


Signed-off-by: Michael Carroll <michael@openrobotics.org>
alsora pushed a commit to irobot-ros/rclcpp that referenced this pull request May 3, 2023
If the intraprocess buffer still has data after taking, re-trigger the
guard condition to ensure that the executor will continue to service it,
even if incoming publications stop.


Signed-off-by: Michael Carroll <michael@openrobotics.org>
fujitatomoya added a commit that referenced this pull request Jun 15, 2023
If the intraprocess buffer still has data after taking, re-trigger the
guard condition to ensure that the executor will continue to service it,
even if incoming publications stop.

Signed-off-by: Michael Carroll <michael@openrobotics.org>
(cherry picked from commit 5f9695a)

Co-authored-by: Michael Carroll <michael@openrobotics.org>
Barry-Xu-2018 pushed a commit to Barry-Xu-2018/rclcpp that referenced this pull request Jan 12, 2024
If the intraprocess buffer still has data after taking, re-trigger the
guard condition to ensure that the executor will continue to service it,
even if incoming publications stop.


Signed-off-by: Michael Carroll <michael@openrobotics.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants