Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rails 5.1 Webpacker Cache Support #892

Closed
wants to merge 22 commits into from

Conversation

kpheasey
Copy link

@kpheasey kpheasey commented Jun 14, 2019

Rails 5.1 includes Webpacker by default, but heroku-buildback-ruby only has the most basic support for calling yarn install. Currently, builds are taking a long time and are very bulky. This PR aims to complete the webpacker integration by adding the necessary caching and cleanup operations to allow for quick and trim builds.

Features

Defines new Rails51 langauge pack that does the following:

  • Drastically increase asset compilation by caching the following directories:
    • node_modules
    • ~/.yarn-cache
    • ~/.cache/yarn
    • tmp/cache/webpacker
    • public/packs
  • Reduce build size by removing the following non-essential directories from the final build:
    • node_modules
    • tmp/cache/webpacker
  • Add Rails51#load_asset_cache method that is called before assets:precompile is run
  • Add Rails51#store_asset_cache method that is called after assets:precompile is run

Notes

If you are using webpacker 4.1 or older, the slug size will keep increasing because the old packs are not removed. Install the clean-webpack-plugin to delete old packs.

With our Rails 5.2 application, we saw the following improvements:

  • Build time reduced from ~7 minutes to ~3 minutes
  • Build size reduced from ~150MB to ~100MB

UPDATE 2019-11-14

Refactored the way we define cache paths and included the updated yarn cache path.

UPDATE 2019-09-18

Added public/packs back. Included note on installing clean-webpack-plugin.

UPDATE 2019-07-25

Removed public/packs from the asset cache directories because there was no way to remove old assets. rake assets:clean only removes old assets from public/assets, there is an open webpack request for this same functionality, rails/webpacker#1410.

This change increased build time from ~3 minutes to ~5 minutes. It's still faster and smaller.

@kpheasey
Copy link
Author

This would fix #654

@kpheasey
Copy link
Author

@michaelherold This is ready for review again.

@michaelherold
Copy link

It all makes sense to me, but I think it requires review from one of the owners of the repo. I was commenting merely because our app wouldn't receive the benefit of this change if we excluded 6.0.0.rc1. 😁

@kpheasey
Copy link
Author

@schneems @hone , any chance of a review or feedback on why this hasn't been merged. I see other stuff coming through, but not activity on this.

@kpheasey
Copy link
Author

@schneems @hone A month later and still nothing. What's the point of making this open source if you won't take or event look at community contributions. If there is a problem or some reason the PR would never be merged, please let me know.

…emove old versions, container image keeps growing
@2468ben
Copy link

2468ben commented Jul 25, 2019

@kpheasey I could also greatly use this, I've got pretty much the exact same problem and build times as you do.
Is there a way to publish this as a custom buildpack, while we're waiting for Heroku's peer review to eventually come around?

@kpheasey
Copy link
Author

@2468ben

Is there a way to publish this as a custom buildpack, while we're waiting for Heroku's peer review to eventually come around?

You can use it straight from github.

Screen Shot 2019-07-25 at 1 45 08 PM

@samstickland
Copy link

Would also love to see this PR merged, as we currently have deploy times of 5 to 10 minutes.

@schneems
Copy link
Contributor

schneems commented Aug 2, 2019

Hello. I previously closed a similar PR due to my concerns here: #876 (comment). TLDR is that webpack does not ship with caching and is not generally cache aware (i.e. cache invalidation is not a "solved" problem in webpack). I do not want to bake in caching support for a framework that does not directly provide caching support.

In general, I would say that i'm willing to cache whatever heroku/nodejs is willing to cache, so node modules seems fair game, I'm less familiar with the yarn cache. I would guess that caching tmp/cache/webpacker out of the box is a non-starter based on what I listed as concerns in the issue comment.

have deploy times of 5 to 10 minutes.

In my experience, caching node modules would save about 60 seconds, but the other 4-9 minutes might be coming from webpack configuration. People who are reporting webpack build times of 30-45 minutes (rare but it happens) are definetly in this camp.

Any thoughts here @jmorrell

@kpheasey
Copy link
Author

kpheasey commented Aug 2, 2019

@schneems

You make a valid argument here and in the previous comments. That being said, build times are crippling workflows for anyone building a modern Rails application, and something needs to be done about it.

This PR simply utilizes existing caching features as much as possible in order to reduce build times and slug size. It's not meant to improve caching in Webpacker, Sprockets, or any other tool.

I do not want to bake in caching support for a framework that does not directly provide caching support.

I understand that this solution isn't perfect, but it does reduce build times and reduce slug size. It does not reinvent the wheel or try to solve caching problems for tools that don't provide caching. Yarn and Webpacker both have caching support in some way.

node modules seems fair game, I'm less familiar with the yarn cache

Webpacker uses yarn as the package manager and the benefits of doing that are lost without including the yarn cache. https://yarnpkg.com/lang/en/docs/cli/cache/

caching node modules would save about 60 seconds, but the other 4-9 minutes might be coming from webpack configuration

The node_modules cache saves the package download time, however, if the yarn cache isn't saved yarn will re-download everything because nothing is linked.

The other time saved is during assets:precompile, because webpacker has a cache to build from. Without the webpacker cache, the precompile starts from scratch. This is equivalent to rails assets:clobber assets:precompile in sprockets, even if you made no changes to your assets!

@2468ben
Copy link

2468ben commented Aug 2, 2019

I noticed that the nodejs buildpack respects my package.json's cacheDirectories list for what folders to cache; could the Ruby buildpack do the same, so we can choose our folders? Instead of having no caching, or a stalemate over what folders should be hard coded in.

@brandoncc
Copy link

I found this PR from this blog post. I'd love to see this problem solved in the official buildpack!

@aaronchi
Copy link

This is great but it should load based on the presence of webpacker instead of the Rails version.

@brandoncc
Copy link

Agreed

@schneems
Copy link
Contributor

could the Ruby buildpack do the same, so we can choose our folders

Ideally I want to get out of the node/webpack game. One way to do that would be to delegate responsibility to the node buildpack, but that's been "on the table" for the last few years and we've made no progress.

I think the best intermediate option is to get together with the node maintainer and try to reach some kind of a parity, or alternatively figure out a way to better let the node buildpack do more weight lifting for people who want to use it. Right now if you have the node buildpack and then the ruby buildpack we install your dependencies twice. The only way I could avoid that is by detecting it somehow and then telling the assets:precompile task to skip yarn:install somehow. I'm not sure what other integration points we would have to coordinate.

…es not remove old versions, container image keeps growing"

This reverts commit a9781db.
# Conflicts:
#	lib/language_pack.rb
@schneems
Copy link
Contributor

The webpacker gem (>= 4.2.2) now cleans assets the same way that sprockets does. I believe that means there is nothing blocking this PR from being merged.

That's worth looking into more. When sprockets introduced the same feature it took a full Rails version release to get the bugs out. I'll have to dig in some more. Next on my plate is shipping CNB branch and some other housekeeping and performance fixes. Also observability work is queued up.

Ideally, I would still like to not have this problem. I.e. delegate to the Node buildpack instead of i both of us reimplementing logic. We got a new Node maintainer who is also focused on shipping and refining CNB. We've gone down this path before but didn't make much room. There's a number of edge cases to consider such as how to stop webpacker from re-running it's own yarn:install.

Short term

That's all long term. I think the best chance in getting this functionality short term might look like this:

task "assets:precompile" => [:environment] do
  custom_load_from_cache_implemented_in_this_gem if ENV["HEROKU_BUILD_CACHE_LOCATION"]
  Rake::Task["assets:precompile"]
  custom_store_in_cache_implemented_in_this_gem if ENV["HEROKU_BUILD_CACHE_LOCATION"]
end

So anyone who wanted this perf bump could take two steps: Add a buildpack, add a gem. And that's it. I know it's not as nice as natively supported code in the Ruby buildpack, but at least it's something people could use today rather than being blocked by my official blessing and support.

Timeline

I can't forecast when i'll be able to prioritize the webpacker cache work. I do appreciate the effort that has gone in here and also being updated with developments in webpacker. If you or someone else on this thread wants to be able to move forward my suggestion would be to use existing supported interfaces to hook into Heroku to get the functionality you want today.

@PChambino
Copy link

I would like to share a short-term solution we have been using which doesn't require a fork of the buildpack or a wrapper gem.

We use these buildpacks to provide caching:
https://github.com/carwow/heroku-buildpack-cacheload
https://github.com/carwow/heroku-buildpack-cachesave

and this one to cleanup:
https://github.com/carwow/heroku-buildpack-cleanup

These are very simple buildpacks which we forked and customised the cacheload/save to allow for caching files outside of the project directory (but not really necessary).

If you add these buildpacks in this order:

https://github.com/carwow/heroku-buildpack-cacheload
nodejs
ruby
https://github.com/carwow/heroku-buildpack-cachesave
https://github.com/carwow/heroku-buildpack-cleanup

And add these files to your project:
.buildcache:

public/packs
tmp/cache/webpacker
~/.cache/yarn

.slugcleanup:

node_modules
tmp

That should provide you with webpacker cache support and you can customise it for other things. We for example use Elm which requires caching more directories as well.

@bbugh
Copy link

bbugh commented May 11, 2020

@PChambino that's great, thanks for sharing. I like the nice simple buildpacks.

One thing to note is yarn defaults to always installing packages from the Internet, even if the package is available in the cache. With Rails default yarn:install setup, caching yarn will have no effect and will take the same amount of time to install. For, with the caching buildpack our yarn install was 90 seconds, but with --prefer-offline and caching it was 12 seconds.

To read from the cache first, and then fallback to online, you have to use the --prefer-offline argument for yarn. You will have to override the yarn:install rake task to add this switch, unfortunately.

yarn install --prefer-offline ...

@0duaht
Copy link

0duaht commented Jun 8, 2020

@PChambino thanks for sharing.

I see from your .slugcleanup file that you're removing the node_modules directory.

How do you handle binaries required while running though, since the node_modules/.bin directory gets cleaned up? Heroku's .slugcleanup doesn't allow us exclude folders

@bbugh
Copy link

bbugh commented Jun 8, 2020

Thanks again @PChambino for these build pack references, it's worked really well for us.

  1. Thanks to the buildpack cachesave/load, we are able to completely skip yarn installs if webpacker doesn't need to run again by overriding the yarn:install task and checking if the compiler is stale:
namespace :yarn do
  task :install do
    # don't run if webpacker doesn't need to run
    next unless Webpacker::Instance.new.compiler.stale?

    # These yarn settings come from the nodejs buildpack
    # https://github.com/heroku/heroku-buildpack-nodejs/blob/02af461e42c37e7d10120e94cb1f2fa8508cb2a5/lib/dependencies.sh
    yarn_flags = "--production=false --frozen-lockfile --ignore-engines --prefer-offline --cache-folder=~/.cache/yarn"

    puts "*** [yarn:install] Configuring yarn with all build dependencies"
    command = "yarn install #{yarn_flags}"
    puts "Running: #{command}"
    system command
  end
end
  1. We had issues cleaning up the tmp folder because puma (3.12.6) is looking for tmp/pids. We're removing tmp/cache instead of tmp, which is where the big stuff is. Our .slugcleanup:
app/assets
app/javascript
node_modules
spec
tmp/cache

This has reduced our slug size by a couple hundred megabytes, even with the extra cache directory from buildpack-cachesave. Partial builds that don't need to build JS or install any new gems/packages are now super fast. A++++.

(By the way, Bootsnap loosely and unofficially recommends that you clean the bootsnap cache with some regularity.)

@PChambino
Copy link

@0duaht We don't rely on any binaries during runtime only during build, so that is not an issue for us. There are other clean buildpacks that can do a bit more complex things, so you can look for one that works for you or change that one. Or just keep the node_modules directory, it is mostly for reducing slug size but usually that is not really required.

@0duaht
Copy link

0duaht commented Jun 9, 2020

@PChambino thanks, yeah for now, we've had to keep the directory and just clear out other directories

linkesch added a commit to makerinc/heroku-buildpack-ruby that referenced this pull request Jul 18, 2020
@nicalpi
Copy link

nicalpi commented Sep 11, 2020

@PChambino & @bbugh thank you so much for sharing these!

@schneems schneems closed this Nov 3, 2020
@schneems
Copy link
Contributor

schneems commented Nov 3, 2020

I'm not going to merge this. I've suggested in a prior comment a way to expose the build cache if you wanted to implement something like this yourself.

I'm working towards a re-write of the ruby buildpack and going forward i'm going to rely on the node buildpack to install and cache node and yarn dependencies. I'll also look to them for finding out how generally safe caching webpack assets is.

@justin808
Copy link

In terms of caching build-related products, specifically, the public/webpack directory which contains the bundles, React on Rails Pro shipped with asset precompilation caching for this issue, so you can build your assets in CI, so Review apps and Staging use that cache. It's easy to configure. We're using it on some super high-load sites. Feel free to email me for more details.

FriedSock added a commit to DFE-Digital/early-careers-framework that referenced this pull request Oct 20, 2021
Parallelize Cypress
There is a build in parallel functionality with cypress:
See https://github.com/cypress-io/github-action#parallel

But we can't use this without paying for a cypress.io account
(originally tried it and we blew through the limit of free usage in
1 day. Instead, add a new command that randomly splits feature files
based on the number of nodes available. We get diminishing
returns the more nodes we add, so 6 seems like a good sweet spot.

Parallelize rspec

Adds a new command that randomly splits tests randomly, similar to
cypress

Additional js improvements:
 - Improve yarn install command
    use flags based off the node heroku buildpack:
    heroku/heroku-buildpack-ruby#892 (comment)
    This seems to prevent the occasional times that yarn was
    just ignoring the cache and taking 5 minutes to install
    rather than 30s

- Remove existing caching steps
   These are already covered by the action we are using

- Remove yarn installation step
   Yarn is preinstalled

- Remove Misc step definition meta tests
  This is a test for a single step that isn't used in any test and takes ages to run
@schneems
Copy link
Contributor

schneems commented Nov 3, 2022

I was thinking this was mostly around build performance and speed, but I'm now wondering how webpack expects to be used regarding multiple assets over time. Sprockets will retain the last 3 asset versions, Heroku supports this via the cache, but we do not cache public/pack. Does webpacker also support a similar "last N versions" that just doesn't work on Heroku out of the box right now?

I'm surprised if that's the case, that this hasn't come up before.

@granth
Copy link

granth commented Nov 4, 2022

@schneems Yes, the webpacker:clean removes older assets based on count or age. It hooks itself to assets:clean.

https://github.com/rails/webpacker/blob/4c5d13a2aedd3e63e1cdadf46575bc832a126235/lib/tasks/webpacker/clean.rake#L5-L12

I worked around the missing build cache by using the cacheload and cachesave buildpacks mentioned above, and caching public/packs and tmp/cache/webpacker.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.