Revised fileserver Accept-Encoding and ETag #1432

Closed
wants to merge 5 commits into
from

Projects

None yet

4 participants

@rickb777
Contributor

The parsing of Accept-Encoding was inadequate, not recognising word boundaries. The returned ETag was not using strong and weak variants correctly.

This PR addresses these issues and improves the inadequacies in the related tests.

Rick Revised fileserver to handle parsing of Accept-Encoding better and to…
… switch between strong and weak etags.
b5a92d6
@CLAassistant
CLAassistant commented Feb 13, 2017 edited

CLA assistant check
Thank you for your submission, we really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
1 out of 2 committers have signed the CLA.

rickb777
Rick


Rick seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.

@rickb777
Contributor

I signed the CLA. It claims I didn't. Fooey.

@tpng
Collaborator
tpng commented Feb 14, 2017

It seems you need to add this email address used in the commit to your GitHub a/c.
rickb777@github.com
https://github.com/settings/emails

@rickb777 rickb777 Merge branch 'master' into master
f9a76a7
@mholt
Owner
mholt commented Feb 14, 2017

Hm, yeah, the CLA tool doesn't work if your email isn't on your GitHub profile. It's fine though, I can override that check.

@mholt

Thanks for the PR! I just have a question that I inlined.

caddyhttp/staticfiles/fileserver.go
+// calculateEtag produces a strong etag by default. Prefix the result with "W/" to convert this into a weak one.
+// see https://tools.ietf.org/html/rfc7232#section-2.3
+func calculateEtag(d os.FileInfo) string {
+ return fmt.Sprintf(`"%x-%x"`, d.ModTime().Unix(), d.Size())
@mholt
mholt Feb 14, 2017 Owner

This is a strong ETag, but we don't use the actual bytes of the file to produce it. Will that be a problem? (FWIW, I like not reading over the whole file for efficiency purposes.)

@rickb777
rickb777 Feb 14, 2017 Contributor

This is a good question. Generally, I think you're right to be more concerned about the inefficiency of scanning the whole file to compute a hash. And the use of timestamp+length is not a complete guarantee that the file contents haven't changed.

But the bullish high-performance argument is that timestamp+length is good enough in all normal use-cases. This is what most people will be happy with, I think, and is what Nginx does. After all, strong etags lead to more aggressive caching, boosting perceived performance significantly.

Thinking about this a little more, the same argument suggests that the compressed artefacts should also have strong etags (based on timestamp+length).

If you think Caddy should be more accommodating, perhaps we can provide an option to force the etags to be weak, or to disable them completely..

And there's more work to be done on 304s.

So I am now thinking that maybe this needs a little more work to

a) use strong etags for all served resources by default
b) provide an option to use weak etags instead when required.
c) implement 304-not-modified checks on If-None-Match conditional requests.

I'll take another look tonight.

[also of interest: PR191 https://github.com/mholt/caddy/issues/191]

@rickb777
rickb777 Feb 14, 2017 Contributor

I just checked Nginx 1.10.1:

  • when Accept-Encoding:gzip is sent and gzip_static is on, the precompressed content is served if available.
  • when Accept-Encoding:gzip is missing or gzip_static is off or the precompressed content is not available, the uncompressed content is served.
  • the ETag is computed from the timestamp+length algorithm in both cases above.
  • strong ETags are always used.
@mholt
mholt Feb 16, 2017 Owner

But the bullish high-performance argument is that timestamp+length is good enough in all normal use-cases.

Because if the contents of the file change, it is highly likely that the timestamp or length will change. I agree that's reasonable in pretty much every case I can think of.

Thinking about this a little more, the same argument suggests that the compressed artefacts should also have strong etags (based on timestamp+length).

Sure, I don't see why the caching on those needs to be treated any differently? It's still a static file.

Let's try strong E-Tags then.

rickb777 and others added some commits Feb 14, 2017
@rickb777 rickb777 Merge branch 'master' into master
7e39ef9
Rick Revised fileserver further to use shorter ETags based on base-36. All…
… ETags are now of the 'strong' variety.
ef755ac
Rick Merge branch 'master' of github.com:rickb777/caddy
9b7bef9
@rickb777 rickb777 closed this Feb 14, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment