Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Independent mutex to check and remove timer from scheduled timer #1168

Open
wants to merge 6 commits into
base: rolling
Choose a base branch
from

Conversation

peterpena
Copy link
Contributor

This PR tries to fix issue #1148. An independent mutex is added to protect the scheduled_timer_ set. This enables one thread to remove a timer from scheduled_timer_ while the other thread asynchronously tries to find the next executable. This removes the issue where one thread cannot remove the timer while the other thread keeps waking up and tries to get the timer but cannot execute it because the other thread is locked by the mutex. I ran tests to verify single and multi threaded executors can run the timer callbacks without any issues:

-- run_test.py: extra environment variables to append:
 - LD_LIBRARY_PATH+=/home/peter/ros2_foxy/build/rclcpp
-- run_test.py: invoking following command in '/home/peter/ros2_foxy/build/rclcpp':
 - /home/peter/ros2_foxy/build/rclcpp/test_wait_mutex --gtest_output=xml:/home/peter/ros2_foxy/build/rclcpp/test_results/rclcpp/test_wait_mutex.gtest.xml

Running main() from /home/peter/ros2_foxy/install/gtest_vendor/src/gtest_vendor/src/gtest_main.cc
[==========] Running 2 tests from 1 test case.
[----------] Global test environment set-up.
[----------] 2 tests from TestWaitMutex
[ RUN      ] TestWaitMutex.set_timer_wait_mutex_multi_threaded
[       OK ] TestWaitMutex.set_timer_wait_mutex_multi_threaded (12031 ms)
[ RUN      ] TestWaitMutex.set_timer_wait_mutex_single_threaded
[       OK ] TestWaitMutex.set_timer_wait_mutex_single_threaded (12024 ms)
[----------] 2 tests from TestWaitMutex (24055 ms total)

[----------] Global test environment tear-down
[==========] 2 tests from 1 test case ran. (24062 ms total)
[  PASSED  ] 2 tests.


-- run_test.py: return code 0
-- run_test.py: inject classname prefix into gtest result file '/home/peter/ros2_foxy/build/rclcpp/test_results/rclcpp/test_wait_mutex.gtest.xml'
-- run_test.py: verify result file '/home/peter/ros2_foxy/build/rclcpp/test_results/rclcpp/test_wait_mutex.gtest.xml'

Copy link
Collaborator

@fujitatomoya fujitatomoya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some minor comments.

rclcpp/test/test_wait_mutex.cpp Outdated Show resolved Hide resolved
rclcpp/test/test_wait_mutex.cpp Outdated Show resolved Hide resolved
rclcpp/test/test_wait_mutex.cpp Outdated Show resolved Hide resolved
rclcpp/test/test_wait_mutex.cpp Outdated Show resolved Hide resolved
rclcpp/test/test_wait_mutex.cpp Outdated Show resolved Hide resolved

using namespace std::chrono_literals;

class TestWaitMutex : public ::testing::Test
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that this test is to check wall_timer if the user callback is issued as expected times? if so, it would be better to change the file name and class name? like test_wall_timer_count.cpp or something?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed it to test_time_call_count.cpp

@fujitatomoya
Copy link
Collaborator

@peterpena

btw, i confirmed the following difference between before and after,

  • before
[INFO] [1591939878.346141939] [multi_timer]: Timer executed!
[DEBUG] [1591939878.346165295] [rcl]: Timeout calculated based on next scheduled timer: false
[DEBUG] [1591939878.346255275] [rcl]: Guard condition in wait set is ready
[DEBUG] [1591939878.346336561] [rcl]: Waiting with timeout: 1s + 997190395ns
[DEBUG] [1591939878.346362808] [rcl]: Timeout calculated based on next scheduled timer: true
[DEBUG] [1591939880.343696436] [rcl]: Timer in wait set is ready
[DEBUG] [1591939880.343978198] [rcl]: Waiting with timeout: 0s + 0ns
[DEBUG] [1591939880.344031401] [rcl]: Timeout calculated based on next scheduled timer: true
[DEBUG] [1591939880.344079956] [rcl]: Timer in wait set is ready
[DEBUG] [1591939880.344231231] [rcl]: Waiting with timeout: 0s + 0ns
[DEBUG] [1591939880.344254341] [rcl]: Timeout calculated based on next scheduled timer: true
[DEBUG] [1591939880.344274941] [rcl]: Timer in wait set is ready
[DEBUG] [1591939880.344339097] [rcl]: Waiting with timeout: 0s + 0ns
[DEBUG] [1591939880.344353732] [rcl]: Timeout calculated based on next scheduled timer: true
[DEBUG] [1591939880.344371500] [rcl]: Timer in wait set is ready
[DEBUG] [1591939880.344482527] [rcl]: Waiting with timeout: 0s + 0ns
[DEBUG] [1591939880.344898532] [rcl]: Timeout calculated based on next scheduled timer: true
[DEBUG] [1591939880.344935506] [rcl]: Timer in wait set is ready
[DEBUG] [1591939880.345049010] [rcl]: Waiting with timeout: 0s + 0ns
[DEBUG] [1591939880.345086660] [rcl]: Timeout calculated based on next scheduled timer: true
[DEBUG] [1591939880.345131275] [rcl]: Timer in wait set is ready
[DEBUG] [1591939880.345275321] [rcl]: Waiting with timeout: 0s + 0ns
[DEBUG] [1591939880.345309318] [rcl]: Timeout calculated based on next scheduled timer: true
[DEBUG] [1591939880.345347118] [rcl]: Timer in wait set is ready
[DEBUG] [1591939880.345414981] [rcl]: Calling timer
[INFO] [1591939880.345447669] [multi_timer]: Timer executed!
  • after
[INFO] [1591941505.293213375] [multi_timer]: Timer executed!
[DEBUG] [1591941505.293298531] [rcl]: Waiting without timeout
[DEBUG] [1591941505.293502878] [rcl]: Timeout calculated based on next scheduled timer: false
[DEBUG] [1591941505.293542102] [rcl]: Guard condition in wait set is ready
[DEBUG] [1591941505.293691008] [rcl]: Waiting with timeout: 1s + 999020401ns
[DEBUG] [1591941505.293723050] [rcl]: Timeout calculated based on next scheduled timer: true
[DEBUG] [1591941507.292893558] [rcl]: Timer in wait set is ready
[DEBUG] [1591941507.293109779] [rcl]: Calling timer
[INFO] [1591941507.293163348] [multi_timer]: Timer executed!

Copy link
Collaborator

@fujitatomoya fujitatomoya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, but there seems to be a confliction.

@peterpena peterpena force-pushed the peterpena/timer-mutex branch 2 times, most recently from 61e37df to f0a59be Compare June 15, 2020 22:56
Copy link
Collaborator

@fujitatomoya fujitatomoya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, I'd like someone else to check.

@ivanpauno
Copy link
Member

This removes the issue where one thread cannot remove the timer while the other thread keeps waking up and tries to get the timer but cannot execute it because the other thread is locked by the mutex

This was only a performance problem, right?
At some point the first thread unlocked the mutex and the others can start executing.

@ros2 ros2 deleted a comment from fujitatomoya Jun 16, 2020
Copy link
Member

@ivanpauno ivanpauno left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using two mutexes LGTM!

rclcpp/test/rclcpp/test_timer_call_count.cpp Show resolved Hide resolved
@ivanpauno
Copy link
Member

@ivanpauno ivanpauno deleted a comment from fujitatomoya 2 minutes ago

I was deleting a couple comments I did when reviewing (that were wrong), and by accident I deleted your comment @fujitatomoya 😁 . I don't know how to restore a comment, hope you remember what it was about.

@peterpena
Copy link
Contributor Author

This removes the issue where one thread cannot remove the timer while the other thread keeps waking up and tries to get the timer but cannot execute it because the other thread is locked by the mutex

This was only a performance problem, right?
At some point the first thread unlocked the mutex and the others can start executing.

Yes, that is correct. After some number of iterations, the first thread is able to remove the timer from the scheduled_timers_.

Copy link
Member

@ivanpauno ivanpauno left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM


executor.add_node(node);
executor.spin();
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: missing new line before EOF

@ivanpauno
Copy link
Member

Yes, that is correct. After some number of iterations, the first thread is able to remove the timer from the scheduled_timers_.

Thanks for clarifying!!

@peterpena peterpena force-pushed the peterpena/timer-mutex branch 2 times, most recently from f0a59be to fedf242 Compare June 19, 2020 22:20
@ivanpauno
Copy link
Member

ivanpauno commented Jun 29, 2020

@peterpena is this ready?

(edit) there seems to be a bunch of liner failures

@peterpena
Copy link
Contributor Author

@ivanpauno I still have to check why I am getting errors that were not there before.

@peterpena peterpena force-pushed the peterpena/timer-mutex branch 5 times, most recently from 0a40fcf to 4345721 Compare July 6, 2020 20:59
Signed-off-by: Pedro Pena <peter.a.pena@gmail.com>
Signed-off-by: Pedro Pena <peter.a.pena@gmail.com>
Signed-off-by: Pedro Pena <peter.a.pena@gmail.com>
Signed-off-by: Pedro Pena <peter.a.pena@gmail.com>
Signed-off-by: Pedro Pena <peter.a.pena@gmail.com>
Signed-off-by: Pedro Pena <peter.a.pena@gmail.com>
@peterpena
Copy link
Contributor Author

Its good to go now. I believe @wjwwood wanted to take a look before I merge it though.

Copy link
Member

@wjwwood wjwwood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm! sorry for the delay in reviewing.

@peterpena
Copy link
Contributor Author

CI:

  • Linux Build Status
  • Linux-aarch64 Build Status
  • macOS Build Status
  • Windows Build Status

@peterpena
Copy link
Contributor Author

  • Linux Build Status
  • Linux-aarch64 Build Status
  • macOS Build Status
  • Windows Build Status

@peterpena
Copy link
Contributor Author

  • Linux Build Status
  • Linux-aarch64 Build Status
  • macOS Build Status
  • Windows Build Status

@peterpena
Copy link
Contributor Author

  • Linux Build Status
  • Linux-aarch64 Build Status
  • macOS Build Status
  • Windows Build Status

Copy link
Collaborator

@fujitatomoya fujitatomoya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think get_next_executable and scheduled_timers_ should be protected by the same mutex, unless there will be possibility that the same timer is executed multiple times.

}
continue;
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. context switch here with already scheduled timer by Thread-1. (Thread-2)

if (yield_before_execute_) {
std::this_thread::yield();
}

execute_any_executable(any_exec);

if (any_exec.timer) {
std::lock_guard<std::mutex> wait_lock(wait_mutex_);
std::lock_guard<std::mutex> wait_lock(scheduled_timers_mutex_);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. execute timer and remove it from scheduled_timers_mutex_ (Thread-1)

}

if (any_exec.timer) {
std::lock_guard<std::mutex> wait_lock(scheduled_timers_mutex_);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. the same timer will be scheduled again and executed. (Thread-2)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants