Skip to content
This repository has been archived by the owner on Jul 22, 2021. It is now read-only.

Feature/persistent ids #242

Merged
merged 34 commits into from
Feb 25, 2021
Merged

Feature/persistent ids #242

merged 34 commits into from
Feb 25, 2021

Conversation

arjo129
Copy link
Member

@arjo129 arjo129 commented Dec 31, 2020

This PR adds persistence to the IDs so that fleet adapters/schedule node may be restarted without the system loosing the IDs. It does it via maintaining a transaction log. An entry is added to this log whenever a participant registers/unregisters. The log also maintains footprint and vicinity information. Upon a restart, the log file will be read and rerun through the Database in sequential order, this will automatically restore the correct ids as ids are issued sequentially.

This PR also adds checks to enforce the (name, owner) pair is unique during registration as this makes it much easier to create a log file. The architecture of the PR is simple: there is a YamlLogger class which is in charge of the actual file I/O and YAML. This class inherits from the AbstractParticipantLogger which defines an interface. If one wishes to use some other format other than YAML that is also possible by simply inheriting AbstractParticipantLogger This is useful for testing purposes as well.

The ParticpantRegistry class is in charge of input validation and acts as a wrapper around the Database class for handling participant registration and unregistration. These are converted to AtomicEvent objects and serialized by whatever AbstractParticipantLogger has been passed to it.

On the node side we have a minor change during initiallization it looks for the rosparam log_file_location and reads from/creates a log file in this location. If no location is passed it defaults to loading or logging in the file .rmf_schedule_node.yml.

This PR also adds scaffolding for unit-testing as I needed to test the correctness of the serialization without entering a tmux hell.
Finally, this PR adds a dependency on yaml-cpp (already used in traffic editor) so you may or may not need to run rosdep again.

Expected behaviours and failure modes

  • If a log file is not found, the system will create a log file.
  • If a log file is invalid, the system will immediately shutdown and throw an error. The rationale for this is that in the event user feeds in the wrong file as a logfile, we should not be erasing the users data.
  • Currently, it does not support updates to a participant's description yet. If a participant description has changed it should unregister first then re-register with a different ID. This will be implemented in a separate PR (likely needs a change within the register_participant in Database.cpp OR a new update_participant call also in Database.cpp).

Copy link
Member

@mxgrey mxgrey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a very well designed and well implemented approach to persistent participant IDs.

Many of my review comments are nitpicks, and most of them are aimed at improving the failure modes. However, there's an important fundamental change that deserves special attention:

After thinking it over very carefully, I've come to the conclusion that we should not unregister participants from the schedule node, ever. The rmf_traffic::schedule::Database class can keep its unregister_participant function, but we should not use it for a distributed system that intends to have consistent, persistent identities for participants.

Because of the RAII design of the rmf_traffic::schedule::Participant class, we will receive an UnregisterParticipant message any time a fleet adapter is gracefully torn down, but we should not treat that as a true unregistering within the schedule node. Instead we should erase the participant's itinerary while keeping it registered. Then if that participant "registers" itself back later, we just give it back its original ID. The motivation for this is to allow fleet adapters to tear down and spin back up without their participant IDs changing.

I can't think of a compelling reason that a realistic deployment would ever want a participant to have its ID reassigned. Also, the memory savings of permanently unregistering a participant are pretty negligible; erasing the itinerary would free up roughly the same amount of resources.

I've left an inline comment that spells out the changes needed for this. Mostly it's replacing the database->unregister_participant(~) function and getting rid of the remove_participant API for the participant logger.

return ParticipantDescription::Rx::Unresponsive;
if(response == "Responsive")
return ParticipantDescription::Rx::Responsive;
throw std::runtime_error("Responsiveness field contains invalid identifier");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: Since Invalid is one of the identifiers, it would be more accurate to say Responsiveness field contains unknown identifier.


/// Removes a participant from the registry.
/// \param[in] id - participant to remove
void remove_participant(ParticipantId id);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's get rid of this function for now.

enum class OpType : uint8_t
{
Add = 0,
Remove
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's get rid of OpType::Remove for now.


} // end namespace rmf_traffic_ros2

#endif
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: Include a newline at the end of the file.

get_parameter_or<std::string>(
"log_file_location",
log_file_name,
".rmf_schedule_node.yml");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: Official docs recommend the use of .yaml as the extension name: https://yaml.org/faq.html

Implementation(std::string file_path):
_file_path(file_path)
{
_counter = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
_counter = 0;
_counter = 0;
if (!std::filesystem::exists(file_path))
{
std::filesystem::create_directories(
std::filesystem::absolute(file_path).parent_path());
return;
}

If the file doesn't exist, let's create the directory here (to avoid an exception later) and skip trying to read it.

#define CATCH_CONFIG_MAIN
#include <rmf_utils/catch.hpp>

// This will create the main(int argc, char* argv[]) entry point for testing
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: Add new line at the end of the file.

}
}
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: Add a newline at the end of the file.

YamlLogger::YamlLogger(std::string file_path):
_pimpl(rmf_utils::make_unique_impl<Implementation>(file_path))
{

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: I recommend putting something like // Do nothing in any function (including constructors) where the body is empty. It's a way of saying "This function is intentionally left blank".

}
}

bool file_exists(const char *fileName)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're using C++17 so you can start using std::filesystem::exists(~) instead of doing this.

- Switched to unique pointers
- Added new line at end of the file
- Moved serialization functions into internal header.
- Removed `remove()`
@arjo129
Copy link
Member Author

arjo129 commented Jan 25, 2021

@mxgrey I have addressed all the issues. I think its ready for round 2.

@mxgrey
Copy link
Member

mxgrey commented Jan 26, 2021

An important detail just occurred to me. When a participant is rebooted, the schedule node will have to tell it what its last known schedule version was so that the rebooted participant can pick back up where it left off. I'm afraid this might require a small API breakage, although maybe I can come up with a clever way to work it in.

I think this will be a blocking issue for this PR, unfortunately. I'll work on a fix for this as soon as time permits.

@arjo129
Copy link
Member Author

arjo129 commented Feb 4, 2021

Alternatively, we don't actually need to change any API and add persistence on the fleet adapter side as well.

Signed-off-by: Michael X. Grey <grey@openrobotics.org>
@mxgrey mxgrey merged commit e8dc70b into master Feb 25, 2021
@mxgrey mxgrey deleted the feature/persistent-ids branch February 25, 2021 06:57
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants