Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

workspace caching feature from circle ci learning (pipelineing ?) #548

Open
tylerjw opened this issue Jul 9, 2020 · 5 comments
Open

workspace caching feature from circle ci learning (pipelineing ?) #548

tylerjw opened this issue Jul 9, 2020 · 5 comments

Comments

@tylerjw
Copy link
Contributor

tylerjw commented Jul 9, 2020

Some context:

@ruffsl dis some work a while ago to help us use CircleCI for moveit2: moveit/moveit2#44

It hasn't been merged but I've been playing around with his approach to try to understand the way you need to use circle CI to take advantage of their caching.

In my own testing it seems much more crude than the caching done by ccache in other systems and I haven't found a way to integrate ccache in circleci. However the way it does caching is still interesting and would apply to gitlab caches too. The approach is if nothing has changed in a whole workspace (like upstream and it's dependencies in /opt/ros) then don't rebuild it, keep the whole workspace as a cache so that CI runs in the near future can reuse it until it changes.

To support this sort of caching from industrial_ci we'd need to support running only part of the workflow at a time. If we could do this we could support using industrial_ci in a more pipeline'd fashion (like how gitlab seems to encourage you to setup your ci). This behavior would enable reusing the outputs of any stage of the pipeline (by storing the output of that stage in a cache (sort of like a docker layer) and re-using it until the inputs (upstream package commit hashes + hash of debian dependencies for example) changed.

This is to start a discussion around supporting pipeline like workflows in industrial_ci to enable caching the output of steps.

This is the workflow I image:

  1. Calculate hash of debian dependencies of upstream.
  2. Look for docker image using that hash, if it doesn't exist create it.
  3. Calculate hash of upstream source dependencies (git commit versions?)
  4. Look for cache tagged with that hash, if it doesn't exist, create it.
  5. Calculate hash of target workspace, if it doesn't exist, create it.
  6. Run tests
  7. Clean out old cached data.

It would be awesome if we could somehow combine this workflow with ccache. That would enable us to skip re-building entire workspaces when they haven't changed, and when they have changed we use ccache to not rebuild things that haven't changed in the existing workspace. This could either be built into industrial ci (with some sort of s3 hook) or if industrial ci could be run in stages in a pipeline it could be done natively in the ci system.

@tylerjw
Copy link
Contributor Author

tylerjw commented Jul 10, 2020

After carefully re-reading the caching documentation for circleci I do think there is a nice native way to do ccache caching.

@ruffsl
Copy link

ruffsl commented Jul 10, 2020

@tylerjw , I've been adding a bunch of docs about the nav2 CI setup, and it may clarify a few things, (we are using both ccache and circleci's caching across jobs and workflows). I think if you go through the PR carefully, you may have a more concrete idea on what you'd need to replicate from CircleCI, as your workflow above is of yet a bit too unstructured:

One thing I'd like to get to is to refactor this to use Github actions, as the executor for Actions vs CircleCI is open source, such that one could host there own bare metal worker machine for running container to test CI workflows. One could then use stuff like nvidia-docker with GH actions to have a local/affordable GPU CI worker for DNN, reinforcement learning, or heavy CV simulations, while still benefiting from the GH integrations.

Additionally, to push workspace caching further to the package level, I've been thinking on how one may write a colcon plugin to detect when packages with modified source trees needs to be rebuilt, as well as any downstream dependencies

@mathias-luedtke
Copy link
Member

Not sure if it is applicable in this use case, but the DOCKER_COMMIT option is meant for multi-stage builds.
It is used in rerun_ci, which hashes the input variables.

@tylerjw
Copy link
Contributor Author

tylerjw commented Jul 10, 2020

@ipa-hsd I'm not sure how one would use DOCKER_COMMIT to get a multi-stage build.

Some of what I'm trying to achieve is the ability to cache the entire upstream workspace between runs of ci if it hasn't changed. (as determined by the commit hashes)

@mathias-luedtke
Copy link
Member

@tylerjw: You can run the build once and push the committed result (incl. dependencies and build folder) somewhere. This image can be used in other stages or to speed up the next build..

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants