Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Setup CI #310

Merged
merged 5 commits into from
May 10, 2021
Merged

Setup CI #310

merged 5 commits into from
May 10, 2021

Conversation

hsd-dev
Copy link
Contributor

@hsd-dev hsd-dev commented Apr 3, 2021

This PR sets up CI using GIthub Actions. It builds and tests ros1_bridge for foxy (ros:foxy-ros1-bridge-focal) and rolling (ros:rolling-ros1-bridge-focal).

It also sets up ccache which helps reduce the build times from ~18 minutes to ~2 minutes. Here are the ccache stats

cache directory                     /home/runner/.ccache
primary config                      /home/runner/.ccache/ccache.conf
secondary config      (readonly)    /etc/ccache.conf
stats updated                       Fri Apr  2 10:43:59 2021
cache hit (direct)                   249
cache hit (preprocessed)               0
cache miss                           247
cache hit rate                     50.20 %
called for link                       42
cleanups performed                     0
files in cache                       496
cache size                         744.5 MB
max cache size                       5.0 GB

Finally a small fix (FieldHash in ros1_bridge/__init__.py to lowercase) for the tests to pass.

Here is link for the push job.

Copy link
Member

@jacobperron jacobperron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Considering that ros1_bridge does not have CI from the ROS build farm, I think adding a GitHub action for CI could be useful. However, I think it would be better to reuse existing ROS actions if possible (see my comment below).

@@ -860,12 +860,12 @@ def load_ros2_service(ros2_srv):


# make field types hashable
def FieldHash(self):
def field_hash(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was this change necessary?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To clarify, I'm a bit surprised that this is causing tests to fail considering it's been in the code for over 6 years. Perhaps we're running a different set of tests for our manually triggered CI?

Copy link
Contributor Author

@hsd-dev hsd-dev Apr 23, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now I am not sure which test had failed. I reverted the change and all the tests passed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reverted this change.

.github/workflows/main.yml Outdated Show resolved Hide resolved
Comment on lines 22 to 27
uses: actions/cache@v2
with:
path: ${{ env.CCACHE_DIR }}
key: ccache-${{ matrix.distro }}-${{github.run_id}}
restore-keys: |
ccache-${{ matrix.distro }}
Copy link
Member

@jacobperron jacobperron Apr 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nuclearsandwich @cottsay As we've had issues on the build farm related to ccache, could you comment as to if you think there's any issue with this? Perhaps it's better to avoid caching altogether and live with longer CI times.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Part of the reason we disabled ccache on the buildfarm is that the implementation leveraged a shared ccache across packages, meaning that one package could poison the cache for everyone. Since this is a single ccache only for this package, I don't think that's a concern here. You may want to brush up on how to purge the cache and also determine the scope of the cache - for example, can someone open a PR with "bad" changes which poison the cache for normal runs.

I'd guess that the buildfarm concerns were rooted in security, and since the results of this job are discarded, it's less of a concern.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You may want to brush up on how to purge the cache and also determine the scope of the cache - for example, can someone open a PR with "bad" changes which poison the cache for normal runs.

I think the cache is not uploaded for runs that are not successful. But I can check this again.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To add to @cottsay's answer. A threat faced by the build farm is that ccache itself has no security model so in addition to ccache placing objects in its own database as part of regular operations. Any script with access to the underlying ccache directory can also place potentially malicious objects manually which would be picked up as cache hits on subsequent builds. One reason this matters on the build farm is that the binary artifacts are the primary output of (the binary package jobs, at least) that farm. Here the produced artifacts are not propagated beyond the action and are only used for in-context testing. So an "attack" on the ccache database for this action's primary impact is a sort of denial-of-service attack by pushing a malformed cache object which would cause subsequent builds to fail all tests until the object was purged either by a change in the sources or a full cache purge.

All that's to say, I think ccache in this context is fine. I'd want to explore further if this action produced artifacts.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting!

runs-on: ubuntu-latest
strategy:
matrix:
distro: ["ros:rolling-ros1-bridge-focal", "ros:foxy-ros1-bridge-focal"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The master branch is not targeting Foxy, so I would only target Rolling for CI here. Later, we can port this PR to the foxy branch, making the necessary changes to target Foxy.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

now targets only rolling

@jacobperron jacobperron added the enhancement New feature or request label Apr 22, 2021
hsd-dev added a commit to hsd-dev/ros1_bridge that referenced this pull request Apr 28, 2021
ros2#310 (comment)

Signed-off-by: Harsh Deshpande <harshavardhan.deshpande@ipa.fraunhofer.de>
@hsd-dev
Copy link
Contributor Author

hsd-dev commented May 6, 2021

The cache hit rate with ros-tooling/action-ros-ci is just 1.44 % (see here; and the config) and it still takes ~30 min to build and test. With my custom scripts, I was just building and testing, but with ros-tooling/action-ros-ci, the steps performed are:

Invoking: bash -c 'source /opt/ros/noetic/setup.sh && source /opt/ros/rolling/setup.sh && colcon build --event-handlers console_cohesion+ --packages-up-to ros1_bridge   --symlink-install'
Invoking: bash -c 'source /opt/ros/noetic/setup.sh && source /opt/ros/rolling/setup.sh && colcon lcov-result --initial'
Invoking: bash -c 'source /opt/ros/noetic/setup.sh && source /opt/ros/rolling/setup.sh && colcon test --event-handlers console_cohesion+ --return-code-on-test-failure --packages-select ros1_bridge '
Invoking: bash -c 'source /opt/ros/noetic/setup.sh && source /opt/ros/rolling/setup.sh && colcon lcov-result --filter  --packages-select ros1_bridge --verbose'
Invoking: bash -c 'source /opt/ros/noetic/setup.sh && source /opt/ros/rolling/setup.sh && colcon coveragepy-result --packages-select ros1_bridge --verbose --coverage-report-args -m'

If build times are not a major concern, we can leave out ccache entirely.

@jacobperron
Copy link
Member

I don't think a 30min CI time is anything to worry about. We typically run this Jenkins job to test ros1_bridge, which takes about 2 hours. Leaving out ccache sounds good to me, keeping things simple.

Signed-off-by: Harsh Deshpande <harshavardhan.deshpande@ipa.fraunhofer.de>
Signed-off-by: Harsh Deshpande <harshavardhan.deshpande@ipa.fraunhofer.de>
Signed-off-by: Harsh Deshpande <harshavardhan.deshpande@ipa.fraunhofer.de>
ros2#310 (comment)

Signed-off-by: Harsh Deshpande <harshavardhan.deshpande@ipa.fraunhofer.de>
build times are not a concern and it is not significantly improved when using ros-tooling actions

Signed-off-by: Harsh Deshpande <harshavardhan.deshpande@ipa.fraunhofer.de>
@hsd-dev
Copy link
Contributor Author

hsd-dev commented May 10, 2021

@jacobperron should be good to go.

Copy link
Member

@jacobperron jacobperron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ipa-hsd Thanks for the contribution!

@jacobperron jacobperron merged commit 5d77ce0 into ros2:master May 10, 2021
@hsd-dev hsd-dev deleted the ci branch May 10, 2021 19:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants