New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reintroduce gzip file generation #193

Merged
merged 1 commit into from Dec 3, 2015

Conversation

Projects
None yet
10 participants
@schneems
Member

schneems commented Dec 3, 2015

This change re-introduces compressed file generation and parallel file writing.

Gzip file generation was taken out in this PR: sstephenson#589

This was then discussed in this issue: #26

It was decided that compressing assets to maximize compression ratio at compile time was a valuable feature and within the scope of sprockets. This is one possible implementation.

Assets are written to disk in parallel using Concurrent::Future, since the gzip file cannot be generated until the original file is written to disk, it must process that file first. Speed impacts of writing files in parallel vary based on the number of assets being written to disk, disk speed, and IO contention.

Gzipping can be turned off at the environment level.

cc @fxn

@rails-bot

This comment has been minimized.

rails-bot commented Dec 3, 2015

warning Warning warning

  • Pull requests are usually filed against the master branch for this repo, but this one is against 3.x. Please double check that you specified the right target!
@@ -0,0 +1,64 @@
require 'concurrent'

This comment has been minimized.

@rafaelfranca

rafaelfranca Dec 3, 2015

Member

Can we require this in the file that is using it? Also we could require only future file since we only need it.

This comment has been minimized.

@schneems

schneems Dec 3, 2015

Member

Good call, I moved it when I did some refactoring. I'll see about only requiring future.

@fxn

This comment has been minimized.

Member

fxn commented Dec 3, 2015

❤️ ❤️ ❤️ ❤️ ❤️ ❤️

@@ -158,3 +158,4 @@ module Sprockets
end
require 'sprockets/legacy'
require 'sprockets/utils/gzip'

This comment has been minimized.

@rafaelfranca

rafaelfranca Dec 3, 2015

Member

We can require this in the manifest file that is the only place that needs it.

#
# Since it deals heavily with disk IO, it include logic
# to compress files in the background. To ensure the
# compression is completed before the process exits

This comment has been minimized.

@rafaelfranca

rafaelfranca Dec 3, 2015

Member

That is not true. This logic in at the manifest class.

@charset = asset.charset
end
# Private: Returns whether or not an asset can be compressed

This comment has been minimized.

@rafaelfranca

rafaelfranca Dec 3, 2015

Member

Missing .

# files as they may already be compressed and running them
# through a compression algorithm would make them larger.
#
# Return boolean

This comment has been minimized.

@rafaelfranca

rafaelfranca Dec 3, 2015

Member

missing .

This comment has been minimized.

@rafaelfranca
false
end
# Private: Opposite of `can_compress?`

This comment has been minimized.

@rafaelfranca

rafaelfranca Dec 3, 2015

Member

missing .

# Private: Opposite of `can_compress?`
#
# Returns Boolean

This comment has been minimized.

@rafaelfranca

rafaelfranca Dec 3, 2015

Member

Missing .

!can_compress?(mime_types)
end
# Private: Generates a gzipped file based off of reference asset

This comment has been minimized.

@rafaelfranca

rafaelfranca Dec 3, 2015

Member

missing .

#
# Compresses the target asset's contents and puts it into a file with
# the same name plus a `.gz` extension in the same folder as the original.
# Does not modify the target asset

This comment has been minimized.

@rafaelfranca

rafaelfranca Dec 3, 2015

Member

Missing .

gz.write(@source)
gz.close
end
return nil

This comment has been minimized.

@rafaelfranca

rafaelfranca Dec 3, 2015

Member

✂️ return.

test "do not compress binary assets" do
manifest = Sprockets::Manifest.new(@env, @dir)
%W{ blank.gif }.each do |file_name|

This comment has been minimized.

@rafaelfranca

rafaelfranca Dec 3, 2015

Member

extra whitespace before the }

@schneems schneems force-pushed the schneems/gzip branch from d69518e to 12903b3 Dec 3, 2015

@schneems

This comment has been minimized.

Member

schneems commented Dec 3, 2015

Updated

@rafaelfranca

This comment has been minimized.

Member

rafaelfranca commented Dec 3, 2015

:shipit:

@rafaelfranca

This comment has been minimized.

Member

rafaelfranca commented Dec 3, 2015

❤️ 💚 💙 💛 💜

@@ -2,6 +2,29 @@
This guide is for using a Sprockets::Environment to process assets. You would use this class directly if you were building a feature similar to Rail's asset pipeline. If you aren't building an asset processing frameworks, you will want to refer to the [End User Asset Generation](end_user_asset_generation.md) guide instead. For a reference use of `Sprockets::Environemnt` see [sprockets-rails](github.com/rails/sprockets-rails).
## Gzip
By default when sprockets generates a compiled asset file it will also produce a gzipped copy of that file. Sprockets only gzips non-binary files such as CSS, javascript, and SVG files.

This comment has been minimized.

@fxn

fxn Dec 3, 2015

Member

Here (and later), "javascript" would be probably "JavaScript".

This comment has been minimized.

@fxn

fxn Dec 3, 2015

Member

There are some instances of "sprockets" with lowercase "s", does that refer to the executable bin/sprockets?

# can be compressed.
return true if @charset
# Svg images are text but do not have

This comment has been minimized.

@fxn

fxn Dec 3, 2015

Member

"SVG" here.

# Svg images are text but do not have
# a charset defined, this is special cased.
return true if @content_type == "image/svg+xml".freeze

This comment has been minimized.

@fxn

fxn Dec 3, 2015

Member

This method could be writte as @charset || @content_type == "image/svg+xml".freeze plus the comments... only feedback, if you prefer this implementation fine.

%W{ gallery.css application.js logo.svg }.each do |file_name|
original_path = @env[file_name].digest_path
manifest.compile(file_name)
refute File.exist?("#{@dir}/#{original_path}.gz"), "Expecting '#{original_path}' to generate gzipped file: '#{original_path}.gz' but it did not"

This comment has been minimized.

@fxn

fxn Dec 3, 2015

Member

The assertion message here should be negated maybe?

This comment has been minimized.

@schneems

schneems Dec 3, 2015

Member

You mean "assert !"? Instead of "refute"?

This comment has been minimized.

@rafaelfranca

rafaelfranca Dec 3, 2015

Member

I think he means the message "Expecting ... to not generate gzipped file ..."

@schneems schneems force-pushed the schneems/gzip branch 2 times, most recently from 1874504 to f918345 Dec 3, 2015

@schneems

This comment has been minimized.

Member

schneems commented Dec 3, 2015

All concerns addressed. I'll merge when green. Going to cut a a release, 3.5.0. I'll port this to master.

@elia

This comment has been minimized.

Contributor

elia commented Dec 3, 2015

❤️

Reintroduce gzip file generation
This change re-introduces compressed file generation and parallel file writing.

Gzip file generation was taken out in this PR: sstephenson#589

This was then discussed in this issue: #26

It was decided that compressing assets to maximize compression ratio at compile time was a valuable feature and within the scope of sprockets. This is one possible implementation.

Assets are written to disk in parallel using `Concurrent::Future`, since the gzip file cannot be generated until the original file is written to disk, it must process that file first. Speed impacts of writing files in parallel vary based on the number of assets being written to disk, disk speed, and IO contention.

Gzipping can be turned off at the environment level.

cc @fxn

@schneems schneems force-pushed the schneems/gzip branch from f918345 to 7faa6ed Dec 3, 2015

schneems added a commit that referenced this pull request Dec 3, 2015

Merge pull request #193 from rails/schneems/gzip
Reintroduce gzip file generation

@schneems schneems merged commit d074aea into 3.x Dec 3, 2015

2 checks passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
continuous-integration/travis-ci/push The Travis CI build passed
Details

@schneems schneems referenced this pull request Dec 3, 2015

Closed

Bug in dataflow? #478

@mattbrictson

This comment has been minimized.

Contributor

mattbrictson commented Dec 3, 2015

I am new to concurrent-ruby, so perhaps I am missing something here, but it looks like the introduction of Futures means that any exceptions raised by write_to or compress are being captured but not re-raised.

An example, to illustrate:

future = Concurrent::Future.execute { raise StandardError }
future.wait # => returns cleanly without raising
future.rejected? => true
future.reason => # => #<StandardError: StandardError>

In other words, this code:

concurrent_writers.each(&:wait)

is ignoring that one or more of the writers could have failed.

This seems like a regression, in that exceptions that previously were raised are now being swallowed?

@mattbrictson

This comment has been minimized.

Contributor

mattbrictson commented Dec 3, 2015

I think an easy fix is to use Future#wait! (bang version) instead, which does raise.

future = Concurrent::Future.execute { raise StandardError }
future.wait! # => StandardError: StandardError
@schneems

This comment has been minimized.

Member

schneems commented Dec 4, 2015

Yes, that's the safe thing to do. Good catch. Can you give me a PR to the 3.x branch?

@mattbrictson

This comment has been minimized.

Contributor

mattbrictson commented Dec 4, 2015

@schneems Will do.

@nateberkopec

This comment has been minimized.

Contributor

nateberkopec commented Dec 4, 2015

I love that nearly half of this PR is documentation. 👏

@eliotsykes

This comment has been minimized.

eliotsykes commented Dec 4, 2015

Thx @schneems - I'm intending to work on the zopfli vs gzip benchmark script this weekend.

Anyone know any other gzip-compatible compression algos that would be worth including in the benchmark?

@eliotsykes

This comment has been minimized.

eliotsykes commented Dec 4, 2015

Adding this note for completeness. Zopfli (gzip-compatible) + Brotli (not gzip-compatible) suggested by @igrigorik tweet. Brotli out of scope clearly, but interesting to note that multiple output compressed formats might be optimal (one day).

@igrigorik

This comment has been minimized.

igrigorik commented Dec 4, 2015

Why is Brotli out of scope? It's available in FF44 and it's coming to Chrome as well..

@schneems

This comment has been minimized.

Member

schneems commented Dec 4, 2015

How does it work now? You serve a gziped file generated via brotli and FF44 just does the right thing (TM)? What happens to other browsers who can't decompress it, do they just fail to load?

@igrigorik

This comment has been minimized.

igrigorik commented Dec 4, 2015

@schneems that's the magic of content-negotiation at work -- see my FF link above, they explain well. In short, it's the same mechanism that determines whether the server can serve gzip; if client advertises support it picks gzip version and provides correct content-encoding header.

@eliotsykes

This comment has been minimized.

eliotsykes commented Dec 6, 2015

First attempt at comparing file sizes for gzip with/without zopfli:

$ bundle exec ruby compare.rb
----------------------------------------------------------------------
                    File: codetriage.js
                 Gzipped: 43752 bytes
Zopflied (1 iterations): 42257 bytes (96.58% of codetriage.js.gzip.gz)
Zopflied (15 iterations): 42199 bytes (96.45% of codetriage.js.gzip.gz)
Zopflied (225 iterations): 42185 bytes (96.42% of codetriage.js.gzip.gz)
----------------------------------------------------------------------
                    File: codetriage.css
                 Gzipped: 5018 bytes
Zopflied (1 iterations): 4848 bytes (96.61% of codetriage.css.gzip.gz)
Zopflied (15 iterations): 4828 bytes (96.21% of codetriage.css.gzip.gz)
Zopflied (225 iterations): 4824 bytes (96.13% of codetriage.css.gzip.gz)
----------------------------------------------------------------------
                    File: codetriage.svg
                 Gzipped: 1466 bytes
Zopflied (1 iterations): 1403 bytes (95.7% of codetriage.svg.gzip.gz)
Zopflied (15 iterations): 1399 bytes (95.43% of codetriage.svg.gzip.gz)
Zopflied (225 iterations): 1398 bytes (95.36% of codetriage.svg.gzip.gz)
----------------------------------------------------------------------

Source: https://github.com/eliotsykes/gz-compare#results

If there's anything that'd be helpful to add/change please ask or submit PR

@schneems

This comment has been minimized.

Member

schneems commented Dec 6, 2015

Awesome, thanks! I'll play around with those scripts a little when I get a chance!

@schneems

This comment has been minimized.

Member

schneems commented Dec 7, 2015

Using those same methods, I compared the speed

SOURCE =  File.read("codetriage.js")

require 'benchmark/ips'
require 'tmpdir'

Benchmark.ips do |x|
  x.report("zlib") {
    Dir.mktmpdir do |dir|
      gzip_compress(SOURCE, File.join(dir, "codetriage.js.gz".freeze))
    end
  }
  x.report("zopfli") {
    Dir.mktmpdir do |dir|
      zopfli_compress(SOURCE, File.join(dir, "codetriage.js.gz".freeze))
    end
  }
end

# Calculating -------------------------------------
#                 zlib    10.000  i/100ms
#               zopfli     1.000  i/100ms
# -------------------------------------------------
#                 zlib    106.450  (± 6.6%) i/s -    530.000
#               zopfli      1.804  (± 0.0%) i/s -     10.000  in   5.550239s

It looks like standard library zlib compression that ships with Ruby is about 100 times faster than this implementation of zopfli.

@eliotsykes

This comment has been minimized.

eliotsykes commented Dec 8, 2015

The above benchmark is with Zopfli using its default 15 iterations. Results with a single Zopfli iteration bring the difference down to zlib being ~25 times faster than Zopfli. A single Zopfli iteration is almost as good as 15 Zopfli iterations in terms of the sample file size reductions.

Calculating -------------------------------------
                zlib     9.000  i/100ms
zopfli (1 iteration)     1.000  i/100ms
-------------------------------------------------
                zlib     96.592  (± 6.2%) i/s -    486.000 
zopfli (1 iteration)      4.195  (± 0.0%) i/s -     21.000 
SOURCE =  File.read("codetriage.js")
ZOPFLI_ITERATIONS = 1

require 'benchmark/ips'
require 'tmpdir'

Benchmark.ips do |x|
  x.report("zlib") {
    Dir.mktmpdir do |dir|
      gzip_compress(SOURCE, File.join(dir, "codetriage.js.gz".freeze))
    end
  }
  x.report("zopfli (1 iteration)") {
    Dir.mktmpdir do |dir|
      zopfli_compress(SOURCE, File.join(dir, "codetriage.js.gz".freeze), ZOPFLI_ITERATIONS)
    end
  }
end
@schneems

This comment has been minimized.

Member

schneems commented Dec 8, 2015

That's better, still a bit too slow to make the default I think. Maybe we can add it in and make it configurable.

It looks like very little of the code is actually written in C. We could probably get a larger speed up by re-writing more of it in C and doing some benchmarking.

I have another concern with adding this in. I know libraries like Rails ship with gems with C extensions (nokogiri) and somehow they manage to play nice with other rubies like JRuby, but i'm not sure how exactly. We need to make sure we don't break jruby compatibility. There's no way to conditionally add something to a gemspec based on Ruby implementation (that i'm aware of). I also want to be cautious about adding c-extensions to dependencies. They take much longer to install, and by declaring it in the gemspec it would be installed even if someone was not using it. Deploy timeouts from too many c-extensions are a thing.

So what would a configurable solution look like? An easy to use, but hard to implement solution would look like:

environment.compression = :zopfli

This is hard, because it would require that we put zopfli in as a dependency and maintain all the zopfli compression and configuration logic. It adds a c-extension to dependencies, and may make jruby installs harder. I also guarantee other people will want other compression algorithms supported due to reasons.

The other option we could have is to have a block as a configuration option:

require 'zopfli'
environment.compression = Proc.new do |options|
  zopfli_compress(options[:source], options[:target], 1)
end

This would be easier to implement, because it puts the ownership of declaring the dependency and maintaining the compression code outside of sprockets. However it might be harder to support going forwards. For example maybe we figure out that we are memory constrained and instead of keeping the source in memory it makes sense to read from disk. It would be hard to deprecate or modify that interface. It adds much more surface area to the sprockets API. Maybe we could pass in a value object and have values accessed by methods. This would be easier to deprecate, but it isn't the fastest implementation. With this method of using a proc instead of supporting 2 strategies, we support custom code execution and infinite strategies. This method would allow the creation of a gem like sprockets-rails-zopfli that declares the dependency and sets the compression for you.

I'm mostly thinking aloud right now.

I also looked into brotli. That link on FF44 is really helpful. It sounds like we would generate file-\<sha\>.js.br then we would have to add extra logic to any servers to make use of that file. It wouldn't be that hard on ActionDispatch::Static. The down side of adding it to the current implementation is it will require yet another disk access. This would add an unnecessary slow down to anyone serving files that knows for sure that they don't have a brotli generated file on disk. I'm thinking more and more that we need to write a replacement for ActionDispatch::Static that is optimized around a happy path workflow where files don't change on disk and configuration can be toggled at a lower level (such as disabling gz or br file detection).

I'm not sure what mechanisms exist in nginx/apache to tell it to serve that file when a Accept-Encoding: gzip br comes across the line. If we make compression configurable it would be pretty easy to add something in to your app if you know that you'll use it:

require 'zopfli'
environment.compression = Proc.new do |options|
  zopfli_compress(options[:source], options[:target], 1)
  brotli_compress(options[:source], options[:target], 1)
end

Since you would want both files, since not all browsers support brotli. The brotli gem has even less traffic than the zopfli gem https://rubygems.org/gems/brotli.

It seems like the most reasonable method of moving forwards would be to allow a proc object to be used, and to pass in a custom value object so we can deprecate where we need. When not set, we can default to zlib implementation that ships with sprockets. This maximizes extensibility, has a reasonable maintenance/deprecation plan moving forwards and covers all the use cases.

@nateberkopec

This comment has been minimized.

Contributor

nateberkopec commented Dec 8, 2015

I also guarantee other people will want other compression algorithms supported due to reasons.

There's clearly been a lot of changes in compression algorithms in the last year or so in web-land. Whatever solution sprockets goes with, I think it needs to be compatible with a future where compression algorithms are changing rapidly. Brotli is a great example - it performs great, but doesn't have a lot of adoption yet, but will soon. The proposed Proc interface does a good job of making new compression algorithms easy enough to support.

@eliotsykes

This comment has been minimized.

eliotsykes commented Dec 8, 2015

A proc sounds ideal.

What do you think would be an acceptable way to nudge developers towards exploring using this Proc and raise the awareness of Zopfli & Brotli within the community? Maybe something in the Rails guide? Maybe how the default Gemfile mentions Unicorn & Capistrano but has them commented out?

@schneems

This comment has been minimized.

Member

schneems commented Dec 8, 2015

Unsure, we should ship an interface first. Figure out how to promote it later.

@sfcgeorge

This comment has been minimized.

sfcgeorge commented Dec 9, 2015

Nokogiri ships both C and Java extensions—that's how it works on both Ruby implementations. So any extension has to be written twice.

@rafaelfranca

This comment has been minimized.

Member

rafaelfranca commented Dec 9, 2015

@schneems about the API why not just reintroduce the assets encoding API? That way people would be able to write they own encoding.

@rafaelfranca rafaelfranca deleted the schneems/gzip branch Dec 9, 2015

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment