Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Attempt to load any available RMW implementation. #189

Merged
merged 3 commits into from
May 5, 2021

Conversation

clalancette
Copy link
Contributor

The old logic was to load what the user asked for (via the
RMW_IMPLEMENTATION environment variable). If the user didn't
explicitly ask for an implementation, then it would attempt
to load the default RMW implementation.

The new logic is to load what the user asked for. If the user
didn't explicitly ask for an implementation, attempt to load
the default RMW. If that fails for some reason, fall back to
loading any available RMW. If all of that fails, fail the
load.

Signed-off-by: Chris Lalancette clalancette@openrobotics.org

One important thing to note about this PR is that it introduces a dependency between rmw_implementation and ament_index_cpp. I don't think that is a problem, but it bears thinking about.

Besides making the code more robust, this PR should fix problems we are seeing in running demo PR jobs, like https://build.ros2.org/job/Rpr__demos__ubuntu_focal_amd64/131 .

I'm going to run full CI on this; I'm leaving this in draft state until that is done. Any early feedback is welcome.

The old logic was to load what the user asked for (via the
RMW_IMPLEMENTATION environment variable).  If the user didn't
explicitly ask for an implementation, then it would attempt
to load the default RMW implementation.

The new logic is to load what the user asked for.  If the user
didn't explicitly ask for an implementation, attempt to load
the default RMW.  If that fails for some reason, fall back to
loading any available RMW.  If all of that fails, fail the
load.

Signed-off-by: Chris Lalancette <clalancette@openrobotics.org>
@clalancette
Copy link
Contributor Author

CI:

  • Linux Build Status
  • Linux-aarch64 Build Status
  • macOS Build Status
  • Windows Build Status

@clalancette clalancette marked this pull request as ready for review May 2, 2021 18:50
Signed-off-by: Chris Lalancette <clalancette@openrobotics.org>
Copy link
Member

@ivanpauno ivanpauno left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The patch seems reasonable to me, I left some minimal comments about error handling.

Besides making the code more robust, this PR should fix problems we are seeing in running demo PR jobs, like https://build.ros2.org/job/Rpr__demos__ubuntu_focal_amd64/131 .

I don't get why those tests are failing to find rmw_cyclonedds_cpp, could you explain?

rmw_implementation/src/functions.cpp Outdated Show resolved Hide resolved
rmw_implementation/src/functions.cpp Show resolved Hide resolved
@clalancette
Copy link
Contributor Author

I don't get why those tests are failing to find rmw_cyclonedds_cpp, could you explain?

It's tricky, and has to do with how our PR jobs and Debian packages are setup and because the demos repository is "special".

The way that the PR jobs work is that they look through all of the packages in a particular repository, and find all of the package dependencies declared in the package.xml. The job then installs the dependencies listed via apt-get (this may or may not be done through rosdep, it isn't clear to me). One of the (recursive) dependencies is ros-rolling-rmw-implementation.

Our Debian package for rmw_implementation has this line in it:

Depends: ros-rolling-rmw-cyclonedds-cpp | ros-rolling-rmw-connextdds | ros-rolling-rmw-fastrtps-cpp

(I've removed the other dependencies for brevity). What that means is that any of the rmw-cyclonedds-cpp, rmw-connextdds, or rmw-fastrtps-cpp packages will satisfy the necessary dependency; if none are explicitly chosen somewhere else, the first one in the list (rmw_cyclonedds_cpp) will be selected by apt.

Our demos repository is special in that a package in it (demo_nodes_cpp_native) declares a dependency on a specific RMW implementation that is not the default in Galactic, namely rmw_fastrtps_cpp. That's because demo_nodes_cpp_native uses rmw_fastrtps_cpp APIs directly (avoiding the rmw layer).

Putting this all together, what happens is that the PR job goes to find dependencies. Since rmw_fastrtps_cpp is explicitly listed as a dependency, it satisfies the dependency line for rmw_implementation above, and so rmw_cyclonedds_cpp is never installed. When we go to actually run the tests for the demos, the current code first tries to load RMW_IMPLEMENTATION, but that isn't specified in the test. It then tries to load the default RMW implementation, but that isn't installed. At this point, it just gives up and throws an error message (failed to load shared library 'librmw_cyclonedds_cpp.so' due to dlopen error: librmw_cyclonedds_cpp.so: cannot open shared object file: No such file or directory).

One other interesting thing to note is that only 3 tests in the demos repository currently fail this way. You'd think that all of the tests would fail, but this doesn't happen because most of the tests in that repository are parameterized on the available RMW implementations. Thus, we get tests like test_showimage_cam2image__rmw_fastrtps_cpp, which work fine since they explicitly call out an RMW to load via RMW_IMPLEMENTATION. It's only ones that are trying to find the default rmw_cyclonedds_cpp that fail.

The new code fixes this by searching for any additional RMW implementations if the default isn't installed, so it will find the installed rmw_fastrtps_cpp.

Let me know if that all makes sense.

@ivanpauno
Copy link
Member

Let me know if that all makes sense.

Makes sense to me, thanks for clarifying!!!

One important thing to note about this PR is that it introduces a dependency between rmw_implementation and ament_index_cpp. I don't think that is a problem, but it bears thinking about.

It doesn't sound like a problem to me.
The other alternative is to look for all the rmw implementations that where available when building (which can be done through some cmake magic + configuring a define), but I don't mind strongly.

@clalancette
Copy link
Contributor Author

The other alternative is to look for all the rmw implementations that where available when building (which can be done through some cmake magic + configuring a define), but I don't mind strongly.

Yeah, that would also work. Yet another way to fix this would be to change the demo_nodes_cpp_native code to use CycloneDDS, rather than Fast-RTPS, as the "native" backend.

But changing the loader this way seemed like an overall improvement anyway to me, so I went with this code.

Signed-off-by: Chris Lalancette <clalancette@openrobotics.org>
@clalancette
Copy link
Contributor Author

CI:

  • Linux Build Status
  • Linux-aarch64 Build Status
  • macOS Build Status
  • Windows Build Status
  • Windows Debug Build Status

@clalancette
Copy link
Contributor Author

The macOS failures are unrelated. Since this is approved, I'm going to go ahead and merge it, then do a release into Rolling. That should hopefully fix the PR jobs.

@clalancette clalancette merged commit f1c92dd into master May 5, 2021
@clalancette clalancette deleted the clalancette/load-any-available-rmw branch May 5, 2021 15:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants