Use cached layers if repository has a yarn lockfile #91

rclark · 2017-02-09T18:35:25Z

Right now, images are built with the --no-cache flag. Using cached layers is a way to significantly decrease build times, and long build times are one of the biggest bummers about our current CI flow.

One of the arguments pro --no-cache is that without it, npm install with semver version identifiers in package.json could lead to images that use old cached layers for node.js dependencies. This could lead to unexpected (and very non-deterministic) mismatches between your local environment and your production environment.

Yarn's use of a lockfile that pins node.js dependency versions and is committed in the repo avoids this misstep, and makes me wonder if we could drop the --no-cache flag if there's yarn file in the repo.

However there are still a few other questions to weigh against such a decision:

You would only get build-time caching benefits sometimes and not all the time. This depends on whether your conex worker task lands on an EC2 that still has the cached layers from a previous build.
Due to ^^, you'd may want to try and keep cached layers laying around on the EC2s for longer, and this leads to disk space management problems.

It may be worth exploring this anyways, without adjusting anything about how we have our EC2s clean up old images/layers. If we can demonstrate a significant benefit for projects with hefty node.js dependency trees or huge unix package dependencies, it may be worthwhile.

cc @springmeyer @scothis @mcwhittemore @mapsam @GretaCB

The text was updated successfully, but these errors were encountered:

mapsam · 2017-02-09T18:39:09Z

You would only get build-time caching benefits sometimes and not all the time

Personally this doesn't feel like a reason to not do it, but I can see why it would be confusing if ecs-conex times start varying wildly.

disk space management problems

This would impact the entire EC2 right? So any task running on this EC2 could be affected, not just the service which has started caching images? Which services currently use up the most disk space?

rclark · 2017-02-09T18:42:23Z

Yes, disk space is an EC2-wide problem. And controls on disk space are really outside the scope of ecs-conex's responsibilities, except that at present, conex explicitly cleans up after itself, removing everything about the image that it just built, in an effort to reduce impact on any other services that might be sharing the host EC2s.

mcwhittemore · 2017-02-09T20:58:16Z

If we can demonstrate a significant benefit for projects with hefty node.js dependency trees or huge unix package dependencies, it may be worthwhile.

The move to yarn itself will help a lot with node.js dependencies so if that's a requirement and there is a downside to doing this, we might want to see what the gains from using yarn are first.

springmeyer · 2017-02-10T00:42:41Z

Thanks for writing this up and providing a view into the state of things.

non-deterministic builds feels like a major drawback.

If we can demonstrate a significant benefit for projects with hefty node.js dependency trees or huge unix package dependencies, it may be worthwhile.

My feeling is that we should continue to focus on trimming the trees and unix package dependencies as a way to speed things up. There are still a lot of duplicate npm packages happening in projects and we can move to using more mason package in place of apt deps to drop weight.

mapsam · 2017-02-10T23:41:32Z

That's a good point @mcwhittemore - yarn will definitely bring in some big savings, esp around duplicate npm packages.

mcwhittemore · 2017-02-13T16:44:35Z

There are still a lot of duplicate npm packages happening in projects

Agreed. This is one benefit of yarn. Another is that it is deterministic unlike npm due to its yarn's lockfile.

lukasmartinelli · 2017-03-03T16:38:38Z

One of the arguments pro --no-cache is that without it, npm install with semver version identifiers in package.json could lead to images that use old cached layers for node.js dependencies. This could lead to unexpected (and very non-deterministic) mismatches between your local environment and your production environment.

What about being able to specify whether caching should be enabled or not in a package.json or .ecs-conex file?

A simple way for caching would be to download the image of the previous commit and then run the build without no-cache.

rclark · 2017-03-03T16:43:42Z

download the image of the previous commit and then run the build without no-cache

I believe we originally did this, and then removed it once we implemented --no-cache. Worth exploring for sure, though you'd want to some way to make sure that downloading time isn't going to be a significant penalty.

lukasmartinelli · 2017-03-16T14:51:34Z

IMHO it is the responsibility of the image creator that it always yields consistent results - not of the build infrastructure.

The long wait times for a image build slow down iteration time a lot.

I believe we originally did this, and then removed it once we implemented --no-cache. Worth exploring for sure, though you'd want to some way to make sure that downloading time isn't going to be a significant penalty.

In my experience fetching from NPM or apt is usually always slower than downloading the compressed filesystem layer from the internal network (even though AWS provides own mirrors).

rclark · 2017-03-16T16:10:57Z

I think that the next action here is a PR to download previous images before a build, and then gather some metrics on build durations to confirm that we're seeing a benefit. Once #94 lands, the duration of each build will be captured in CloudWatch automatically (new watchbot feature).

In terms of consistency between builds, I'm comfortable using the existence of a yarn lockfile as a queue that we should download a prior image and build off of it. I agree with the sentiment that images should handle this themselves @lukasmartinelli, but the reality of our npm usage is that we aren't there for most of our builds.

lukasmartinelli · 2017-05-02T20:33:00Z

🤗 I still really really want faster image builds! Give us caching platform overlords 🤗

lukasmartinelli · 2017-05-02T20:38:00Z

rclark · 2017-05-03T21:56:27Z

@lukasmartinelli let me just clone myself real quick. brb.

lukasmartinelli · 2017-05-03T22:39:03Z

@lukasmartinelli let me just clone myself real quick. brb.

Imagine if we had three clarks!! 🤗 ❤️

lukasmartinelli · 2017-06-22T20:41:28Z

🤗 🤗 🤗

Bump. I think this will lead to a lot more
developer productivity 🔨 + saved ⏲ => saved 💵 + more dev 😄.

Especially when iterating on CloudFormation templates.
When one edits the CloudFormation files the Docker image shouldn't need to be rebuild.

emilymcafee · 2017-06-22T20:56:21Z

Per chat, this is not something that is likely to move in the next couple months.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use cached layers if repository has a yarn lockfile #91

Use cached layers if repository has a yarn lockfile #91

rclark commented Feb 9, 2017 •

edited

Loading

mapsam commented Feb 9, 2017

rclark commented Feb 9, 2017

mcwhittemore commented Feb 9, 2017

springmeyer commented Feb 10, 2017

mapsam commented Feb 10, 2017

mcwhittemore commented Feb 13, 2017

lukasmartinelli commented Mar 3, 2017

rclark commented Mar 3, 2017

lukasmartinelli commented Mar 16, 2017

rclark commented Mar 16, 2017

lukasmartinelli commented May 2, 2017

lukasmartinelli commented May 2, 2017

rclark commented May 3, 2017

lukasmartinelli commented May 3, 2017

lukasmartinelli commented Jun 22, 2017 •

edited

Loading

emilymcafee commented Jun 22, 2017

Use cached layers if repository has a yarn lockfile #91

Use cached layers if repository has a yarn lockfile #91

Comments

rclark commented Feb 9, 2017 • edited Loading

mapsam commented Feb 9, 2017

rclark commented Feb 9, 2017

mcwhittemore commented Feb 9, 2017

springmeyer commented Feb 10, 2017

mapsam commented Feb 10, 2017

mcwhittemore commented Feb 13, 2017

lukasmartinelli commented Mar 3, 2017

rclark commented Mar 3, 2017

lukasmartinelli commented Mar 16, 2017

rclark commented Mar 16, 2017

lukasmartinelli commented May 2, 2017

lukasmartinelli commented May 2, 2017

rclark commented May 3, 2017

lukasmartinelli commented May 3, 2017

lukasmartinelli commented Jun 22, 2017 • edited Loading

emilymcafee commented Jun 22, 2017

rclark commented Feb 9, 2017 •

edited

Loading

lukasmartinelli commented Jun 22, 2017 •

edited

Loading