New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement proper Caching #147
Comments
I like the proposal in #143 We can cache folders using volumes. Maybe something like this in the yaml
On the most machine, the directories would need to follow some sort of naming convention that includes the repository and branch. For example When we build a new branch, and no cache exists, we could copy the cache from master. |
I have this in a local branch and it works, however, there are a few minor gotchas I found Permissions Paths
I've added code to turn this into an absolute path, relative to where the code is cloned in the container:
However, the following examples will FAIL:
|
added docs to the README: This is still alpha quality given the above issues. The biggest issue will be permission related when the container USER is not root. The workaround is to chown the directory in the container as part of your build script. That being said, feel free to play around with it and add your feedback to this thread |
Great work @bradrydzewski will give it a whirl today and test it. |
Thanks @bradrydzewski, I'll give it a try later today. |
I added caching to the .drone.yml
and ran into the following error in the drone console log when building:
I'm using my own java docker image since I needed Maven 3.1.1. |
@ralfschimmel I ran into the same problem, but everything worked fine after I moved the cached directory out of my repo into /tmp. (which makes sense, because the cache is mounted before the repo is cloned, and the clone needs to happen into an empty directory) |
Interesting ... @ralfschimmel thanks for testing and @mnutt thanks for troubleshooting and finding the root cause. Does anyone know if there is a command line flag we can use to force clone into a non-empty folder? |
Indeed, using absolute paths works just fine! @bradrydzewski Two options I can think of top of mind;
|
I just tried with: and According to: https://github.com/drone/drone/blob/master/pkg/build/build.go#L360 it should be 0777 so I'm not sure what's going on. Is the .deb package available here http://downloads.drone.io/latest/drone.deb up to date with what's in master or do I need to build .deb myself ? |
@Propheris are you using the ruby1.9.3 image? For me, /tmp is 0777. The drone.deb package is automatically built on every commit to master. (that drone can successfully build) |
I also found that
|
@Propheris this is because the default Drone images run as The solution is pretty simple, although a bit of a pain. I need to re-create and re-test all the Drone images (at github.com/drone/images) to run as root instead of ubuntu. |
👍 The cache feature is awesome! image: node0.10
script:
# workaround for cache, see https://github.com/drone/drone/issues/147
- mkdir -p /tmp/npm
- sudo chown -R ubuntu:ubuntu /tmp/npm
- npm config set cache /tmp/npm
# actual script:
- npm test
cache:
- /tmp/npm |
I think we're going to need to alter our caching approach, and I wanted to describe my thoughts here. So why change the existing approach? There are few issues, but I'm going to focus on the most critical. Our current approach requires us to have physical access to the machine that is running the build (to create the cache folders, remove the folders, etc). What if we want to spread builds across multiple servers? We can do almost everything via Docker's remote API, over TCP, with the exception of creating and managing our cache directories. This means we have two options. 1) we can create an agent that is installed on each machine to execute filesystem commands or 2) we can come up with a caching solution that works with the Docker remote API. I'd like to explore the latter option. I'm going to experiment with snapshotting container images. We can split the The
As mentioned we could snapshot the container after the I'm hoping to get some feedback or ideas for alternate approaches. I'll create an experimental branch for this and comment on the thread when it is ready for review. |
@bradrydzewski The approach you describe would actually be really awesome because it would actually unlock the real advantages of docker for a CI environment. I just wonder how to make the snapshotting work with stuff like I still don't see however that the |
I think it would work well with
We would split the build into two parts. First we would:
And then we would:
Next time we run the build, use Bonus: since Docker uses unique hashes and overlay filesystems, we won't have to worry about two builds altering the same cache. I think this could work, of course it is just an idea in my head. It will also be kind of a pain to implement, but we do have very good mock testing at that layer... |
Just to confirm that I get you right: You always want to snapshot after successful setup runs and use these snapshots the next time a build starts? Concerning the node.js example: So in the second build when you use
So the second build will make use of the npm cache in My initial idea was to be able to re-use the state of the image after a successful Thinking of the OK, so far this was just to understand your idea again, and for the cases I can think of the solution sounds reasonable ;). One more thing I would like to understand better is about parallel and distributed builds. Assuming we have two parallel builds that start off the same base image. They will produce two different snapshots after the setup phase. Which one will be used for the next build? Will docker take care of this? |
My concern with snapshotting is how we would revert back to the base image if we screwed something up. Say we do something in I really like the idea of snapshotting. It's elegant. But I also want protection from shooting myself in the foot. |
Fair point. I think we could provide various mechanisms to flush the cache. These are just some ideas that I can think of off the top of my head:
|
Another idea I just want to throw in would be that steps in the If I understand the docker Of course that won't help for setup-steps that do not define a file as a dependency... |
We could also use the sha value of the |
Unless I misunderstand Brad's last proposal, the effective sequence of steps executed would be:
(Note that in commitN might actually be earlier than commit{N-1}, in case commitN is being rebuilt.) This can cause subtle bugs: assume that you've accidentally removed a dependency from the project and setup doesn't install it anymore. In that case you wouldn't notice the problem until you've flushed the cache. Ctavan's proposal is free of this problem. In that case, the effective sequence would be:
This would provide a truly stateless build and, if we used docker's build mechanism for the setup phase, would give us correct cache invalidation for free. A downside of this proposal that I see is that we'd need to check out the files required in the setup phase somewhere outside of the container. If this problem can be overcome without large complications I'd be much in favour of ctavan's proposal. |
I think the discussion got sidetracked. For rubygems caching, the current solution of having
does the job pretty well. The only issue I found is that this cache doesn't persist between builds for different branches, so is almost never used when running builds on pull requests. |
@grk This approach is hard to use when the docker host and the host that drone runs on are different and we can't rely on the contents of any directories on the docker host. Please correct me if I'm wrong: isn't it the case that using that approach allows such subtle bugs as I've mentioned to appear? |
Hi. Are there any news regarding this issue? |
I would really like for my build to run faster, all the |
i am doing a |
it looks like the cache is currently branch-specific, which makes it awkward for feature-branch based PRs, as the cache gets duped for every feature branch, but doesn't have the advantage of faster builds |
Per-branch caching is removed per #912 (thanks @nathwill) Note that #902 will expose much more of the underlying Docker implementation and will allow mounting volumes from the yaml file. So #902 will end up replacing the We're also working on modularity and plugins. The This will give us much more flexibility and should allow us to perform a |
consider this: you have cached installed dependencies and run a build again. since the last build the dependencies have changed. so we have to run the commands from setup section again to update the dependencies. that would be still faster than installing all again. |
@glaszig #912 changes the cached folder from per-repo-branch to per-repo, but is still repo specific, and not generic to the build-box (maybe you're testing different repos?). you can also poke around under /tmp/drone on the build host and find the cached directory for direct inspection. in any case, it's definitely working on our system; maybe you can share your .drone.yml and drone version? |
alright. that's what i read from the code; what i expected.
no. always the same. only different branches. so, i should see your changes having an effect.
yeah. there's a folder structure there and also my
drone.yml
|
seems right to me, but i noticed that the drone version didn't update when my patch went in... the version i have installed is:
outside of that, i've no idea why it might not be caching for you. |
same version. somehow can't get the cache working. giving up for now.
|
follow-up. during a build today i ran
what i see there is an assumably correct Drone/docker is writing the content of my cache folder to a new directory during every build. That's why the cache is always empty. Any idea what is wrong here? |
I think |
update: @donny-dont has been working on proper caching, including the ability to cache portions of the git directory (which prior to his changes complains if you clone into a non-empty directory). I think this will take time to perfect, but it will be a really good start |
Hoping to get through the pull request process today and then this should be closed. Will write some docs around it too. |
drone-plugins/drone-git#1 is needed for caching as |
Alrighty drone-plugins/drone-git#1 is merged just need to write docs and this can close. |
This change is adding the following: - Global HarnessContext injection - Block API KEY authentication for global context (They are tied to accounts) - /user API endpoint for embedded mode
I was playing around with Drone.io over the weekend and I'm really impressed.
However, there is one big issue for our Rails project: Bundling all the Gems (~ 250 Gems, where about 10 of these are git checkouts) takes about 10 minutes for each build, as I've found no way to provide a cacheable directory to Bundler.
I've seen the issues #43 and #143, but as far as I understood the solution proposed in #43, the cache would only be invalidated when the actual setup commands have changed. In my case, it would need to re-run the commands when the content of the project's
Gemfile.lock
has changed.Furthermore, it would be neat to be able to share a cache directory between different projects. In our current Jenkins setup, we're sharing a global Bundler directory, which speeds up new project builds enormously.
Here is an excerpt of our build file:
The text was updated successfully, but these errors were encountered: