Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Put query updates on a diet #86

Merged
merged 19 commits into from
Aug 26, 2021
Merged

Put query updates on a diet #86

merged 19 commits into from
Aug 26, 2021

Conversation

gbiggs
Copy link
Collaborator

@gbiggs gbiggs commented Jul 20, 2021

Closes open-rmf/rmf#75.

New feature implementation

Implemented feature

Remove the query and query ID from the MirrorUpdate message.

Implementation description

The query ID was previously used to know if a query update should be ignored or applied to the mirror's internal copy of the database. This is no longer necessary because query updates are separated by topic, so a mirror will only receive the updates for its registered query. Thus the query ID is removed from the MirrorUpdate message along with checks on it in the MirrorManager class.

The query was added to the query update message as part of the initial redundancy work to allow a MirrorManager to verify that it is receiving updates for the correct query (it hasn't subscribed to the wrong topic, or the schedule node hasn't got its queries mixed up, etc.). This is inefficient, as the query may be large and query updates are frequent. Therefore the query is removed from the query update message. To retain the functionality that it allowed, the MirrorManager now subscribes to the latched "registered queries" broadcast topic. Each time it receives the list of registered queries from the Schedule node, it verifies both that its own query is present and that the query ID assigned by the Schedule node matches its own opinion of the query ID.

When the query is missing, or the ID is incorrect, the MirrorManager will re-register its query and request a new update.

To handle possible corner cases when the Schedule node fails and the redundant backup takes over (such as a query being registered just before failure and the redundant backup not being aware of it), the MirrorManager will force query validation mode on when it receives the fail-over announcement. However there is a possibility of a query update from the new schedule node arriving before the fail-over announcement, which may not be for the correct query anymore. To identify this situation, the Schedule node is given an "edition" number. Each time fail-over occurs, the new schedule node gets a new edition number. The edition number is sent with each query update. The MirrorManager checks the received edition number, and if it is greater than the expected edition number, it also enters query validation mode. While in query validation mode, it stashes all updates. If the query is validated, then it processes the stashed updates. If the query is found to be missing or the ID is found to be incorrect, the stashed updates are dumped and a fresh update is requested.

Signed-off-by: Geoffrey Biggs <gbiggs@killbots.net>
Signed-off-by: Geoffrey Biggs <gbiggs@killbots.net>
Signed-off-by: Geoffrey Biggs <gbiggs@killbots.net>
Signed-off-by: Geoffrey Biggs <gbiggs@killbots.net>
Signed-off-by: Geoffrey Biggs <gbiggs@killbots.net>
@gbiggs gbiggs requested a review from mxgrey July 20, 2021 06:09
@gbiggs gbiggs self-assigned this Jul 20, 2021
@gbiggs gbiggs added the enhancement New feature or request label Jul 20, 2021
@gbiggs gbiggs added this to In Review in Research & Development via automation Jul 20, 2021
@gbiggs gbiggs changed the title Geoff/put query updates on a diet Put query updates on a diet Jul 20, 2021
Copy link
Contributor

@mxgrey mxgrey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't reviewed everything yet, but I've looked over the MirrorManager class pretty closely and left some feedback. This design and implementation is looking fantastic; I just have some recommendations, most of which are nitpicks, but some may be actual soundness concerns.

});
update_timer = node.create_wall_timer(
5s,
[&]() -> void
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is relevant to #71

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That just shows how easy a mistake it is to make...

Copy link
Contributor

@mxgrey mxgrey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've finished reviewing the rest of the files! Just adding a few more comments. I'll start doing some manual tests of the fail over now.

Signed-off-by: Michael X. Grey <grey@openrobotics.org>
Signed-off-by: Michael X. Grey <grey@openrobotics.org>
Signed-off-by: Michael X. Grey <grey@openrobotics.org>
gbiggs and others added 11 commits July 29, 2021 08:41
Signed-off-by: Geoffrey Biggs <gbiggs@killbots.net>
Signed-off-by: Geoffrey Biggs <gbiggs@killbots.net>
Signed-off-by: Geoffrey Biggs <gbiggs@killbots.net>
Signed-off-by: Geoffrey Biggs <gbiggs@killbots.net>
Signed-off-by: Geoffrey Biggs <gbiggs@killbots.net>
…updates-on-a-diet

Signed-off-by: Geoffrey Biggs <gbiggs@killbots.net>
Signed-off-by: Geoffrey Biggs <gbiggs@killbots.net>
Signed-off-by: Michael X. Grey <grey@openrobotics.org>
Signed-off-by: Michael X. Grey <grey@openrobotics.org>
@codecov
Copy link

codecov bot commented Aug 26, 2021

Codecov Report

Merging #86 (22ab9e0) into main (3d8874d) will decrease coverage by 0.15%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##             main      #86      +/-   ##
==========================================
- Coverage   22.11%   21.95%   -0.16%     
==========================================
  Files         410      828     +418     
  Lines       32898    66160   +33262     
  Branches    16050    32044   +15994     
==========================================
+ Hits         7274    14528    +7254     
- Misses      17795    36043   +18248     
- Partials     7829    15589    +7760     
Flag Coverage Δ
tests 21.95% <ø> (-0.16%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...2/rmf_fleet_adapter/src/rmf_fleet_adapter/Task.hpp
...rmf_fleet_adapter/test/phases/test_RequestLift.cpp
.../src/rmf_task_ros2/bidding/internal_Auctioneer.hpp
...xcpp/RxCpp-4.1.0/Rx/v2/src/rxcpp/rx-observable.hpp
...et_adapter/src/rmf_fleet_adapter/jobs/Planning.cpp
...os2/rmf_traffic_ros2/examples/participant_node.cpp
...lude/rmf_fleet_adapter_python/graph/PyExecutor.hpp
...c_ros2/src/rmf_traffic_ros2/convert_Trajectory.cpp
...src/rmf_fleet_adapter/jobs/detail/impl_Rollout.hpp
.../src/rmf_traffic_ros2/schedule/NegotiationRoom.cpp
... and 1228 more

@mxgrey mxgrey merged commit 1953ae2 into main Aug 26, 2021
Research & Development automation moved this from In Review to Done Aug 26, 2021
@mxgrey mxgrey deleted the geoff/put-query-updates-on-a-diet branch August 26, 2021 10:42
arjo129 pushed a commit that referenced this pull request Oct 12, 2021
Signed-off-by: Geoffrey Biggs <gbiggs@killbots.net>

Signed-off-by: Michael X. Grey <grey@openrobotics.org>
Co-authored-by: Michael X. Grey <grey@openrobotics.org>
Signed-off-by: Arjo Chakravarty <arjo@openrobotics.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Development

Successfully merging this pull request may close these issues.

Put query update message on a diet
2 participants