-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow participants to sync up with remote databases when discrepancies arise #145
Conversation
Signed-off-by: Geoffrey Biggs <gbiggs@killbots.net>
Signed-off-by: Michael X. Grey <grey@openrobotics.org>
Signed-off-by: Michael X. Grey <grey@openrobotics.org>
Signed-off-by: Michael X. Grey <grey@openrobotics.org>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looks amazing! Thanks for the explanation and comments along the way. I just have a few nitpicks, but in general I'm seeing the intended behavior and it should be in good shape to be merged after the base branch is updated.
rmf_traffic_ros2/test/mock_schedule_nodes/missing_participant.cpp
Outdated
Show resolved
Hide resolved
rmf_traffic_ros2/test/mock_participants/repetitive_delay_participant.cpp
Outdated
Show resolved
Hide resolved
rmf_traffic_ros2/test/mock_schedule_nodes/missing_participant.cpp
Outdated
Show resolved
Hide resolved
Signed-off-by: Michael X. Grey <grey@openrobotics.org>
Signed-off-by: Michael X. Grey <grey@openrobotics.org>
Signed-off-by: Michael X. Grey <grey@openrobotics.org>
…open-rmf/rmf_ros2 into gbiggs/add-participant-robustness
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tests and demos are working fine, the example you provided with is working as expected too. Thanks!
There is a very small probability in the event of server restart race conditions that a participant may disagree with its remote database on details like its participant ID and description.
This PR allows participants to identify when these issues occur and work out a correction with the remote database.
We limit the rate at which these corrections can occur, because there is a risk of a pathological case where two participant instances believe they both own the same name. If that happens then they might viciously cycle against each other, each trying to "correct" the "bad" information that's being pushed by the other. In the worst case scenario that could be a very tight loop of hammering "corrective" messages at the schedule node. We limit the rate to 3 corrections per minute. Corrections happening at a higher rate than that should be brought to the attention of an operator.
To test this PR you can run four terminals with the following commands:
Then watch as the participant makes corrections to the sabotage that the mocked up schedule node is causing. This scenario is also designed to demonstrate the rate limiting of the corrections.
This PR depends on