Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Already on GitHub? Sign in to your account

static file serving fails to set etag based on file contents #626

Closed
awwx opened this Issue Jan 19, 2013 · 9 comments

Comments

Projects
None yet
4 participants
Contributor

awwx commented Jan 19, 2013

I was watching assets being loaded from the Meteor server and was surprised to see "200 OK" responses instead of "304 Not Modified" for files that hadn't changed. Turns out gzippo isn't setting the etag header based on the file contents, but instead is using the file modification time (!)

https://github.com/tomgco/gzippo/blob/master/lib/staticGzip.js#L120

The result is after a redeployment (when file modification times change even if the content hasn't), the browser ends up downloading everything again -- increasing load on the server and slowing things down for the user.

connect.static doesn't set the etag header correctly either, sigh.

Just as a sidenote: That shouldn't be an issue in production mode, right? Just to make sure you are aware of it.

Oops, sorry, this is about assets so you mean the public folder? In that case production mode probably won't change anything.

Contributor

awwx commented Jan 22, 2013

Right, it will affect images and other assets in public. (I suppose it might also affect bundled code in production if the user manually refreshes the page and if the browser bypasses the cache in this case, but not much of a concern even if true).

One trick would be to have the bundler preserve the file timestamps when it created the bundle -- this would also be a solution and a simpler implementation than having gzippo compute a file hash for the etag, and cache the hash in memory for efficiency, check for file modification to recalculate the hash, etc.

Owner

n1mmy commented Jan 23, 2013

Good catch, @awwx!

I'm not sure exactly how to solve this.

Having the bundler preserve the timestamps could work, but is a little tricky. For example, pre-processed files can have more than one input file (eg .less file with an @import statement).

Using the hash of the file as the etag instead of the timestamp would also avoid the problem, but is complicated as you suggest. For meteor's purposes, it is probably OK to skip the file modification check and cache, since re-bundling the meteor app means the server will be restarted. But I don't think gzippo would take a patch that didn't support changing files out from under a running server like their current code.

Contributor

awwx commented Jan 23, 2013

I have an idea.

The bundler already calculates file hashes for the "cache busting" URLs. The bundler is a good place to calculate these hashes because it knows which files it's putting into the bundle.

In turns out that an app cache manifest has the same issue as etag's do. An appcache smart package generating a manifest needs the file hashes in order to update the manifest when file contents change. But it's unnecessary for appcache to hash the files at runtime when the bundler is already calculating file hashes. So it would be better for the appcache package to get the hashes from the bundler.

For gzippo, there's tomgco/gzippo#46, but that would be inefficient for us because it calculates the file hash at runtime. Someone else might want an etag implementation that does the file-modification-check-and-cache, but we don't need that because we know the file contents ahead of time. And yet someone else might prefer the current gzippo behavior of an etag based merely on the file size and modification time. So for gzippo I think it might be better to have an option to pass in a custom etag generator function, and different people could then use their own preferred etag implementation strategy.

Then for Meteor we could use an etag generator which returned the hash calculated ahead of time by the bundler.

Owner

n1mmy commented Jan 24, 2013

This is an interesting idea! The bundler does seem a natural place to put hash computation. How do you envision the bundler communicating this information to the app?

I don't really worry so much about the overhead of computing the hashes at runtime. gzippo caches the results for a day (or a configurable amount). As long as we're not recomputing the hash for a file every time we serve it, the savings of precomputing is relatively small. This is not to say precomputing isn't an easier or better solution, just saying that the performance wins alone don't seem worth a lot of added complexity or a bigger API surface area.

Contributor

awwx commented Jan 24, 2013

I agree it's not a performance problem to calculate and cache the hash at runtime. I think though that it's going to end up being conceptually simpler to have the bundler publish the hashes. We're calculating hashes for the same file in three different places now:

  1. in the bundler, to create the "cache busting" URL.
  2. in the file server, to generate a better etag.
  3. in the app cache, to ensure that file changes get to the browser.

So we've duplicating of the same logic "hash the file, and cache the hash for efficiency, but we don't need to invalidate the cache for file changes because the file is static" in multiple places.

The bundler is creating a static distribution, so it's reasonable for the bundle to include a manifest of the files to be published and their hash.

And for the app cache we already have to be able get the "cache busting" URLs the bundler is creating, because the manifest file has to contain the exact same URLs. So for the app cache to work the bundler has to be recording the URLs anyway. (In my experimental app cache branch https://github.com/awwx/meteor/tree/bundler-record-urls I add a "urls" array to app.json). Since we have to have the URLs, it's a small step to also include the hash for those URLs. For example, we could have a "publish" array in app.json:

{
  "publish": [
    {
      "type": "js",
      "url": "/packages/underscore/underscore.js?47479149fe12fc56685a9de90c5a9903390cb451",
      "hash": "47479149fe12fc56685a9de90c5a9903390cb451"
    },
    {
      "type": "static",
      "url": "/cat.jpg",
      "hash': "08a501a50f0b2ebf1d24e2b7a7f8232b48af9057"
    }
  ]
  ...
}

The bundler is saying "these are the URLs that need to be published and here are their hashes". Which is a good job for the bundler to do because it is figuring out which files need to be published and is already calculating the hash for some of them.

Now the etag generator is extremely simple. It doesn't need to calculate hashes or cache them, it simply uses the hash provided by the bundler.

Also consider that this is the right direction to be going in. Today the bundler creates app.html which is rather ad-hoc. (Why should the bundler have to know about HTML?? Why do we have to run templates at bundle time??) If the bundler publishes its knowledge in json, a smart package can be the one that creates app_html instead, which is a cleaner approach.

@awwx awwx added a commit to awwx/meteor that referenced this issue Feb 6, 2013

@awwx awwx Generate a bundler manifest.
Creates a manifest of the static files delivered to the client, for use
by the app cache.  The manifest is also designed to be usable to generate
etag's for issue #626.

In the original bundler code `self.css` and `self.js.client` starts
out as an array of os-specific file paths and later becomes an array
of URLs (including query parameters).  While I tried to minimize code
changes to avoid creating extra work for the engine project, this
turned out to be too crazy to deal with.  In this version `self.css`
and `self.js.client` stay as file paths, and _generate_app_html now
pulls the client URLs from the new manifest.

This PR is thus proposing a design where the bundler manifest becomes
the source of knowledge about client static resources included in the
bundle, and is then used to generate the app html, the app cache, and
perhaps etag's in the future.  (If it made sense then the `load` list
of server Javascript files could also be folded into the manifest,
making the manifest the source of knowledge about all static
resources... but the code in this PR don't include any steps in that
direction).
18f9661

@n1mmy n1mmy added a commit that referenced this issue Feb 12, 2013

@awwx @n1mmy awwx + n1mmy Generate a bundler manifest.
Creates a manifest of the static files delivered to the client, for use
by the app cache.  The manifest is also designed to be usable to generate
etag's for issue #626.

In the original bundler code `self.css` and `self.js.client` starts
out as an array of os-specific file paths and later becomes an array
of URLs (including query parameters).  While I tried to minimize code
changes to avoid creating extra work for the engine project, this
turned out to be too crazy to deal with.  In this version `self.css`
and `self.js.client` stay as file paths, and _generate_app_html now
pulls the client URLs from the new manifest.

This PR is thus proposing a design where the bundler manifest becomes
the source of knowledge about client static resources included in the
bundle, and is then used to generate the app html, the app cache, and
perhaps etag's in the future.  (If it made sense then the `load` list
of server Javascript files could also be folded into the manifest,
making the manifest the source of knowledge about all static
resources... but the code in this PR don't include any steps in that
direction).
99ebc78
Contributor

awwx commented Feb 28, 2013

Ooof, it turns out that gzippo delegates to connect.static.send for files that it doesn't gzip (it only gzips text, javascript, and json content types by default), so generating an etag from the hash means either also patching connect.static.send, gzipping all files, or rolling our own static file sender.

@glasser glasser changed the title from gzippo fails to set etag based on file contents to static file serving fails to set etag based on file contents Mar 28, 2015

Owner

glasser commented Mar 28, 2015

Note that we no longer use gzippo; we call send directly. And send does this:


SendStream.prototype.setHeader = function(stat){
  var res = this.res;
  if (!res.getHeader('Accept-Ranges')) res.setHeader('Accept-Ranges', 'bytes');
  if (!res.getHeader('ETag')) res.setHeader('ETag', utils.etag(stat));
  if (!res.getHeader('Date')) res.setHeader('Date', new Date().toUTCString());
  if (!res.getHeader('Cache-Control')) res.setHeader('Cache-Control', 'public, max-age=' + (this._maxage / 1000));
  if (!res.getHeader('Last-Modified')) res.setHeader('Last-Modified', stat.mtime.toUTCString());
};

So on the one hand, we can just call res.setHeader('ETag', hash) before calling send. On the other hand, it still ALSO sets Last-Modified by default, and so just because the etag hasn't changed doesn't mean the last-modified hasn't, so the issue isn't actually fixed.

@benjamn benjamn referenced this issue Dec 23, 2015

Closed

Cordova improvements in Meteor 1.3 #5786

26 of 32 tasks complete

@martijnwalraven martijnwalraven added a commit that referenced this issue Jan 7, 2016

@martijnwalraven martijnwalraven Set ETag header to asset hash if available
Previously, the ETag header was set (by `send`) to a default value based on the
inode of the file. Using the asset hash instead allows for proper conditional
requests even after redeployments.

To take advantage of content-based caching, we also have to disable the
Last-Modified header because having this set to the file date would still make
requests conditional on the most recent deployment. This requires updating the
send dependency and is done in a separate commit.

Fixes #626.
68ae8b4

@lassombra lassombra added a commit to lassombra/meteor that referenced this issue Feb 13, 2016

@martijnwalraven @lassombra martijnwalraven + lassombra Set ETag header to asset hash if available
Previously, the ETag header was set (by `send`) to a default value based on the
inode of the file. Using the asset hash instead allows for proper conditional
requests even after redeployments.

To take advantage of content-based caching, we also have to disable the
Last-Modified header because having this set to the file date would still make
requests conditional on the most recent deployment. This requires updating the
send dependency and is done in a separate commit.

Fixes #626.
1c871a5
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment