Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: merging moveit_ci #452

Closed
wants to merge 3 commits into from
Closed

Conversation

rhaschke
Copy link
Contributor

As a follow up of our discussion in moveit/moveit_ci#100, I looked a little deeper into the differences between industrial_ci and moveit_ci. I like your code base very much! It's a lot cleaner and more modular than ours, supporting several CI providers, ROS versions, and builders. On the other hand, moveit_ci has some features that are missing in industrial_ci. Thus, I would like to see these two projects to merge. From my point of view, features of moveit_ci that are interesting to port are:

  • Hierarchically nested folding
    Travis supports nested folding and we make heavy use of it to nicely structure the web output,
    see here for an example.
    Recently, Travis also added support for nested timing, which we didn't yet implement.
  • Unbound variable checking
    You already use bash options errexit and pipefail. I'm wondering, why you don't use nounset as well? Checking for unbound variables can easily detect spelling errors in variables (see best practices). We hardened moveit_ci a while ago along this line and found a few issues doing so...
    My suggestion is to "declare" all variables with their defaults in a separate variables.sh file to avoid introducing default syntax ${var:-} everywhere (which would be error-prone again).
  • Consider CI provider's global timeout
    Usually CI providers have a global timeout for the job, the default for public Travis jobs is 50min.
    If a build takes longer (which frequently occurs for MoveIt, when built from scratch), Travis doesn't save the cache (most notably ccache), such that the next build attempt will not be any faster.
    To avoid this issue, we implemented (moveit/moveit_ci@b66a50a) an internal timeout mechanism that deliberately exits the main CI script, leaving Travis enough time to save the cache.
    I'm not sure Travis changed the default behavior and saves the cache meanwhile... If not, this is an essential feature for large projects like MoveIt.
  • Fail on cmake/compiler warnings
    We have a mechanism to check for any warnings during the build process: check_warnings.sh.
    A poor man's solution would simply set -Werror, but this would make the compiler fail on the first warning. Our solution compiles the whole project and subsequently checks for stderr output.
  • Support for codecov.io
    This was only recently added (Add codecov.io support moveit/moveit_ci#96) and just compiles everything with option --coverage with subsequent calls to lcov.
    For better results, the build should use debug options as well, but this caused issues with ccache on Travis (Add codecov.io support moveit/moveit_ci#96 (comment)).
  • Speed optimizations
    I have seen some optimization potential with regard to build time:
    • In moveit_ci, we restrict slow tests, like clang-tidy, to files that actually changed in a PR.
    • The base for an ABI check usually doesn't change across many tests and thus is a good candidate for caching. Marking the corresponding folder for caching in .travis.yml and checking for it's existence might be sufficient.
    • I particularly liked the DOCKER_COMMIT feature. Is it possible to commit the image to dockerhub as well? How do you handle login credentials in this case?
  • Verbosity
    Personally, I don't like the generated output of industrial_ci. Marking sections via >>>>> and <<<<< clutters the output and imho is superfluous if you have folding. In moveit_ci we use boldface to mark those main section headers. The first commit in this PR ports moveit_ci's folding approach generating a much cleaner output.
    Additionally, I have the following remarks:
    • In contrast to the documentation, VERBOSE_OUTPUT seems to be true by default (see here).
    • You suppress the output of many commands via ici_quiet. While this is a great function, I would reduce its use to a minimum. Using hierarchical folding will allow to better structure the output and avoid clutter while keeping those outputs.
    • Because most commands are not echoed, it's hard to follow the bash script from the web output.
      In moveit_ci we have various travis_run_* functions that essentially all echo the passed command and most bash commands go through this wrapper...
    • When a particular command fails and thus exits the main CI script, I suggest to not close open folds. This allows us to immediately browse to the erroneous output, which wouldn't be hidden in a - potentially deeply nested - fold otherwise.

This PR just comprises a few tests that I initially did as well as a few minor cleanups/fixes.
Consider them as a starting point for further discussions. I'm looking forward to contribute to this package and bring together the best from two worlds...

@mathias-luedtke
Copy link
Member

Thank you very much for your patches and the comments!
I am a little busy this week, but I will try to answer and review as soon as I find more time.

This PR has a merge conflict, because it is not based the latest master (including #451).
That's why the test dont't get run. I will push it to my fork to get this version tested.

@rhaschke
Copy link
Contributor Author

@ipa-mdl, just to clarify: I don't think that these commits can/should be merged in the present form.
Rather, I consider them as initial ideas how certain aspects of the (admittedly long) list above could be implemented. Let's start a discussion on which items to tackle first, and which implementation to prefer...

@mathias-luedtke
Copy link
Member

In contrast to the documentation, VERBOSE_OUTPUT seems to be true by default (see here).

Actually, that's a bug. Will be fixed in #458

@mathias-luedtke
Copy link
Member

RE folding

Hierarchically nested folding
Travis supports nested folding and we make heavy use of it to nicely structure the web output,
see here for an example.

This is a great addition. I will try cherry-pick this. However, we cannot rely 100% on this feature, as it is not supported by all CI providers. Local CI runs do not support folding at all.

When a particular command fails and thus exits the main CI script, I suggest to not close open folds. This allows us to immediately browse to the erroneous output, which wouldn't be hidden in a - potentially deeply nested - fold otherwise.

For nested folds this makes sense. I don't have strong opinion here.
Perhaps this can be made a style option (see below).

Marking sections via >>>>> and <<<<< clutters the output and imho is superfluous if you have folding. In moveit_ci we use boldface to mark those main section headers.

I was not a big fan either, but one gets used to it after a while ;)
As already said, not all system support folding (yet). These markers are very helpful in browsing raw logs or - my use case - reviewing local runs.
#451 is the preparation for supporting folding styles.
Your style could become the default for Travis, if users agree.

@mathias-luedtke
Copy link
Member

RE new features

Fail on cmake/compiler warnings

👍

Support for codecov.io

Again, 👍

In moveit_ci, we restrict slow tests, like clang-tidy, to files that actually changed in a PR.

Good idea, shouldn't be too hard to add.

The base for an ABI check usually doesn't change across many tests and thus is a good candidate for caching. Marking the corresponding folder for caching in .travis.yml and checking for it's existence might be sufficient.

Which folder do you cache here?
Right now the base can be a branch as well, so just checking the existence is not enough.

Consider CI provider's global timeout

I wonder if timeout 47m ./travis.sh would not be sufficient.

@mathias-luedtke
Copy link
Member

Unbound variable checking
You already use bash options errexit and pipefail. I'm wondering, why you don't use nounset as well? Checking for unbound variables can easily detect spelling errors in variables (see best practices). We hardened moveit_ci a while ago along this line and found a few issues doing so...

We have hardened the scripts with shellcheck, which detects spelling errors etc. as well.
Moving towards unbound variable checking will take some time, but I agree that it would be worth the effort.

My suggestion is to "declare" all variables with their defaults in a separate variables.sh file to avoid introducing default syntax ${var:-} everywhere (which would be error-prone again).

I'd like to refactor the variable system in a general.
I could imagine to "register" the common variables and a set with defaults for each test.

@mathias-luedtke
Copy link
Member

I particularly liked the DOCKER_COMMIT feature. Is it possible to commit the image to dockerhub as well?

For CI testing the images can be pushed and reused, I am using this feature frequently with rerun_ci.
For regular CI runs the reuse part is a little bit more complicated. Perhaps we can simplify it with some fallback mechanism in case the image was not pushed yet.

How do you handle login credentials in this case?

Docker Registry login? This should be done with docker login in .travis.yml etc.
SSH keys etc. should be wiped before pushing in AFTER_SCRIPT.

@@ -39,8 +39,8 @@ function run_clang_tidy {
return 0
fi

local build; build="$(dirname "$db")"
local name; name="$(basename "$build")"
local build="$(dirname "$db")"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@davetcoleman
Copy link
Contributor

I think standardizing on industrial_ci for MoveIt is a great idea! Thanks for putting together this list, @rhaschke!

We use moveit_ci extensively at PickNik for our projects, and we've contributed a lot to it over the years (I was the original developer), though @rhaschke has done a ton of work on it since then. We'd like to switch our projects to industrial_ci also, but I'd feel better if more people had admin access to this repo so we can help review PRs and contribute improvements. I nominate @rhaschke of course, and maybe someone from the PickNik team (note: I'm not a good fit anymore). Thoughts?

@rhaschke
Copy link
Contributor Author

I still think that both, industrial_ci and moveit_ci could benefit from a merge. However, I didn't (and won't have) time since December (and until summer at least).
@davetcoleman, I guess you are talking about maintainer access, not admin?

@davetcoleman
Copy link
Contributor

Either or...

@rhaschke
Copy link
Contributor Author

rhaschke commented Apr 8, 2021

Closing due to inactivity. I will file smaller PRs...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants