HTTP/2 push support plugin (golang 1.8) #1215

Merged
merged 19 commits into from Feb 17, 2017

Projects

None yet
@wendigo
Collaborator
wendigo commented Oct 26, 2016 edited

Hi all,

This change introduces new push plugin (adressing #816)

This will be possible in golang 1.8 due to http.Pusher and http.PushOptions interfaces being introduced (golang/go@cf73bbf, tracking golang/go#13443)

As interface was published yesterday I could not resist and implemented new push directive ๐Ÿ˜

This middleware operates in two modes: rules-based and link-based.

Rules-based syntax

push /index.html /index.css

push /index2.html /index.css

push / /ga.js

push /index.html /pushedResource.js /pushedCss.css {
   method GET
   header That-Was-Pushed Wow
   header Server Caddy-Push
}

Link-based mode

Middleware intercepts Link headers from the Next middleware and tries to use Pusher to push resources to the client.

For more information about Link header: https://w3c.github.io/preload/

Note for reviewers

This plugin is complete and tested but non functional as of current go's tip (actual implementation in https://go-review.googlesource.com/#/c/29439/ was not yet merged).

But... when go 1.8 will be released this plugin will actually support HTTP/2 Push in Caddy

Discussion

I'm not sure if this plugin should be integral part of the caddy's core or should I move it to it's own repo - opinions on that will be appreciated.

https://www.youtube.com/watch?v=vCadcBR95oU ๐Ÿ•ถ

@wendigo wendigo changed the title from HTTP/2 push support (golang 1.8) to HTTP/2 push support plugin (golang 1.8) Oct 26, 2016
@mholt
Owner
mholt commented Oct 26, 2016 edited

Wow, fantastic @wendigo! I haven't looked at this yet (just woke up) but I am really surprised -- wasn't expecting to see this show up in my inbox. ๐Ÿ˜„

This definitely belongs in core, not a separate repo.

I have had some thoughts on server push and how Caddy should/might handle it -- let me elaborate them here briefly just to give you an idea.

  • Link headers, as described.
  • Enumerating rules by mapping URIs/paths to lists of resources to push, as described (almost every other server push implementation does these two things).
  • I would like to try to make server push automatic. Can we have Caddy parse the HTML, for instance, and determine which JS, CSS, and image files to push down? We could then cache those results for some time. This is kind of tricky and won't cover every use case for server push but probably will for most.
  • I put a lot of thought into using machine learning for server push (I study deep learning in grad school), since it seems reasonable that Caddy could learn a mapping between requests and subsequent requests by the same client. However, I don't think this is beneficial because each website is different. In other words, it doesn't generalize well. Caddy would have to be re-trained for each website and on an ongoing basis, which might defeat the purpose.
  • It would, perhaps, be more effective to just hardcode rules for inferring a map of requests to dependencies. In other words, Caddy just watches to see which resources a client requests "immediately after" (exact definition TBD) an initial request. It then builds a mapping of URI or path to []Path, for example. However: this doesn't work well for dynamic sites, unless Caddy invokes an HTTP request within itself to get the bytes of the response to push to the client.

In any case, I like the idea of the user being able manually specify push rules and using the Link header, at least to start. But I'm really intent on making server push automatic in the future. What do you think?

Thanks to build tags, this PR is safe to use pre-Go 1.8 (nice work). I'm willing to accept this PR with the proposed set of functionality (without the "automatic push" features) -- but I and other reviewers should go over it in more detail first of course. Expect some iterations on this before it's ready to go.

Thanks, I'm excited for this!

@mholt mholt added the under review label Oct 26, 2016
@wendigo
Collaborator
wendigo commented Oct 26, 2016

I don't know if parsing HTML body is a good idea - it won't catch most of the resources (like images, fonts referenced in CSS files, javascripts/images loaded by javascripts and so on).

Maybe supporting something like push manifest is a good idea? It can be dynamically reloaded / watched via fsnotify and constructed by any tool you want (using machine learning based on logs).

Also pushing whole directories might be a good idea, i.e. push everything in css/ and img/ folders.

@wendigo
Collaborator
wendigo commented Oct 26, 2016 edited

I've added header/method validation and loaded plugin. I've built caddy both on 1.8 (current tip) and 1.7.3 and it does not break Caddy.

@wendigo
Collaborator
wendigo commented Oct 26, 2016 edited

I've managed to build custom go1.8 with cherry-picked Pusher implementation and test Caddy with push support enabled. I have couple of changes that I didn't anticipate + buildsrv fix (does not work in go 1.8)

But still...

yeah

@wendigo
Collaborator
wendigo commented Oct 26, 2016
@wendigo
Collaborator
wendigo commented Oct 26, 2016

I've also fixed the case when using:

push / /index.css

will cause recursive push (requested /index.css will match rule and try to push /index.css and so on).

@wendigo
Collaborator
wendigo commented Oct 27, 2016

Updated H2 bundle landed in golang's tip (golang/go@07e7266) so this change can be easily tested from now on (and functional on tip ๐Ÿ‘)

@mholt
Owner
mholt commented Oct 28, 2016

I'll need some time before I can review this thoroughly -- any other collaborators are welcome and invited to review in the meantime! (Don't merge yet though. I don't normally say that, but this is a pretty big change.)

@elcore
Collaborator
elcore commented Oct 30, 2016 edited

Hello @wendigo,

we want to implement X25519 and CHACHA20-POLY1305 ๐Ÿ˜„

There will be no need to use // +build go1.8 after the release of Golang 1.8 (Q1 2017)

P.S.: I will review your code in the next weeks ๐Ÿ˜„, if I have enough time.

@wendigo
Collaborator
wendigo commented Oct 30, 2016

@elcore: I know, I just wanted this change to not break current go version 1.7 - so it can be merged without breaking anything. After 1.8 release build tags can be removed.

@elcore
Collaborator
elcore commented Oct 30, 2016

@wendigo perfect ๐Ÿ˜„

@mholt
Owner
mholt commented Nov 9, 2016

I am keeping my eye on this, just so you know ;)

caddyhttp/push/handler_go18.go
+ pusher, hasPusher := w.(http.Pusher)
+
+ // No Pusher, no cry
+ if hasPusher {
@carlmjohnson
carlmjohnson Nov 21, 2016

You can get rid of a lot of nesting with

if !hasPusher {
    return h.Next.ServeHTTP(w, r)
}
@wendigo
wendigo Nov 21, 2016 Collaborator

You're right - thx :) Fixed!

@mholt
Owner
mholt commented Dec 1, 2016

I don't know if parsing HTML body is a good idea - it won't catch most of the resources (like images, fonts referenced in CSS files, javascripts/images loaded by javascripts and so on).

I only partly agree. ๐Ÿ˜‰ We can parse CSS bodies too. JavaScript, a little harder; I wouldn't worry about JS for now. I bet parsing HTML and CSS files would get us most of the way there without much difficulty.

@carlmjohnson

The machine "learning" could be pretty dumb and still work. What if one-in-N requests was served without using push, and then you noted which subsequent requests were serving files with a mime-type of CSS/JS/image?

@mholt
Owner
mholt commented Dec 14, 2016 edited

The machine "learning" could be pretty dumb and still work. What if one-in-N requests was served without using push, and then you noted which subsequent requests were serving files with a mime-type of CSS/JS/image?

Yes, indeed. It would basically look like a map of request path to a list of subsequent request paths. (There would be some more data attached to each path in a struct, like validity period or eviction time or something like that... in order to keep the map updated over the lifetime of the server.)

So those are the two approaches I'm deciding between to make push automatic: parsing HTML/CSS documents is one way, and the other way is observing request patterns over time. The first works instantly and is fairly mechanical, straightforward, but may miss some pushes. The second takes a little time to get optimal, but should work well in that it won't "miss" anything it couldn't parse.

@lenovouser

Something which would be nice is if a backend server sends a preload header e.g. using cloudflare/netjet to automatically push these resources too

@wendigo
Collaborator
wendigo commented Dec 14, 2016

@lenovouser Link preload headers are already supported by this change so https://github.com/cloudflare/netjet will work with it as it basically generates Link headers (https://github.com/cloudflare/netjet/blob/master/index.js#L43)

@lenovouser

Ah, okay. So that means Caddy sees the Link headers and pushes them already? Because the node module doesn't push. It just creates the header.

@lenovouser

Ah, yeah. Seems like it. Sorry - totally missed that.

@wendigo
Collaborator
wendigo commented Dec 14, 2016

@lenovouser exactly, Caddy with push plugin will intercept Link headers and push resources for you

@mholt
Owner
mholt commented Dec 17, 2016

@tdewolff just made https://github.com/tdewolff/push which does the automatic push based on parsing contents. If we can find a way to cache the parse results so the parsing doesn't have to happen on every request, I think we should be able to add this to Caddy. (But it doesn't need to be a part of this PR.)

@wendigo
Collaborator
wendigo commented Dec 17, 2016

Looking into it @mholt :)

@mholt
Owner
mholt commented Dec 21, 2016

@brasilikum Yes, see my comment two above yours ๐Ÿ˜‰

@abh
abh commented Dec 27, 2016

Regarding how to use server push, it turns out it's hard. These links might be of interest:
https://www.ietf.org/mail-archive/web/httpbisa/current/msg27742.html (and the ensuing discussion).
https://tools.ietf.org/html/draft-ietf-httpbis-cache-digest-01

@mholt mholt referenced this pull request Jan 25, 2017
Open

Update Caddy to use Go 1.8 #1375

9 of 11 tasks complete
@mholt
Owner
mholt commented Jan 25, 2017 edited

@wendigo I checked this out locally and started using it -- this rocks! Thank you!

I have some suggestions/requests regarding the Caddyfile syntax. I keep finding it very tempting to do this:

push / {
    /resource1.jpg
    /resource2.js
    # etc...
}
push /other-page {
    # another list of resources
}
push /simple-page /just-one-resource.css

What do you think? Since it's just one per line, it shouldn't break the use of method and header...

That does remind me to ask: what exactly do the method and header properties do? (Edit: nevermind, found the docs for PushOptions from the code)

Since this won't be merged until Go 1.8 is out and I don't mind requiring Go 1.8 to build Caddy, how would you feel about removing all the build tags that protect it from older versions? In other words, you can target Go 1.8 -- no need to worry about 1.7 at this point.

@mholt
Owner
mholt commented Jan 25, 2017

One other thing -- while I think manually specifying resources is a good capability to have, how about integrating @tdewolff's auto-push (by parsing certain kinds of content) package into this PR?

For sites that are configured to serve over HTTP/2, I'd actually like to make the auto-push enabled by default, with the ability to turn it off by doing a manual configuration like this. What do you both think?

@tdewolff
tdewolff commented Jan 25, 2017 edited

Sounds nice, but it has to make sure that caching works (using a req URL -> resources mapping), otherwise it is not so much faster to parse and push. Also, as @abh mentioned it should support cache digest to prevent sending already cached resources to the client. The first case I have already implemented and the second one I want to support in the future.

EDIT: as soon as an HTML URL needs to be revalidated in terms of the resource URLs it contains, doing this in the original request slows down page requests occasionally. You could add said URL to a worker thread that parses and extracts resource URLs so the cache list stays updated.

I like @carlmjohnson his proposal of self-learning. But how do you know which resources belong to which HTML request? Also, is there feedback from pushes whether they were needed or redundant, so the resource list can stay updated?

@mholt
Owner
mholt commented Jan 25, 2017 edited

@tdewolff You're right about the caching; however, a background goroutine would need to acquire a write lock during the whole refresh period, which I don't think would be more efficient, because then all requests are blocked until all assets are refreshed.

Maybe each request could check the cache item timestamp and refresh if old (stochastic update) or if the ETag (if present) changes.

The "learning" idea is fine too, this is something I came up with a little over a year ago -- but the tricky part, as you said, would be mapping with confidence the initial request to the subsequent resources for that request. It would require knowledge of the connection which isn't exposed in http.Request.

When do you plan on implementing cache digest?

@wendigo
Collaborator
wendigo commented Jan 30, 2017

@mholt I like your proposed syntax - working on it right now :) I will also remove build tags so this change can be merged when 1.8 (hopefully) will be released in 2 days

@wendigo
Collaborator
wendigo commented Jan 30, 2017

Ok, build tags - removed, new syntax - added :)

@stp-ip
Contributor
stp-ip commented Jan 30, 2017

The .idea/ directory seems unnecessary ;)

@wendigo
Collaborator
wendigo commented Jan 30, 2017

You're absolutely right! :)

caddyhttp/gzip/gzip.go
+}
+
+// Interface guard
+var _ http.Flusher = (*gzipResponseWriter)(nil)
@nussjustin
nussjustin Jan 30, 2017

Shouldn't this be http.Pusher?

@wendigo
wendigo Jan 30, 2017 Collaborator

Should be: http.Pusher, Flusher, Hijacker and CloseNotifier :) Added, thx!

caddyhttp/header/header.go
+}
+
+// Interface guard
+var _ http.Flusher = (*responseWriterWrapper)(nil)
@nussjustin
nussjustin Jan 30, 2017

Shouldn't this be http.Pusher?

caddyhttp/httpserver/recorder.go
+}
+
+// Interface guard
+var _ http.Flusher = (*ResponseRecorder)(nil)
@nussjustin
nussjustin Jan 30, 2017

Shouldn't this be http.Pusher?

@mholt
Owner
mholt commented Feb 4, 2017

Thanks Mateusz! Will be reviewing this shortly.

@mholt
Owner
mholt commented Feb 16, 2017

I'll be getting back to this PR this weekend as, hopefully, Go 1.8 comes out.

@wendigo
Collaborator
wendigo commented Feb 16, 2017

Let's hope so ;)

@mholt

This is an impressive PR, @wendigo. Nicely done.

I think I need your help with something. After we merge this (very soon by the looks of it), can you help me augment this with @tdewolff's auto-push functionality (linked above)? I think having auto-server-push will be really awesome, and the changes in this PR provide a solid foundation to build on.

caddyhttp/push/setup.go
+}
+
+// ErrNotSupported is returned when push directive is not available
+var ErrNotSupported = errors.New("push directive is available when build on golang 1.8")
@mholt
mholt Feb 17, 2017 Owner

This variable is not used anywhere anymore! ๐ŸŽ‰ It can be removed.

@wendigo
wendigo Feb 17, 2017 Collaborator

Removed!

caddyhttp/push/setup_test.go
+ "reflect"
+ "testing"
+
+ "github.com/davecgh/go-spew/spew"
@mholt
mholt Feb 17, 2017 Owner

I'm sure this is useful, but we try to keep our dependencies (even in tests) as lean as possible. Any chance this package could be replaced with an old faithful fmt.Printf("%+v") (or %#v) even though it's not quite as pretty?

@wendigo
wendigo Feb 17, 2017 Collaborator

Removed.

caddyhttp/push/handler.go
+ }
+ }
+
+ headers := w.Header()
@mholt
mholt Feb 17, 2017 Owner

This variable is only referenced once... maybe we don't need to declare this one?

@wendigo
Collaborator
wendigo commented Feb 17, 2017

Sure thing! Let's merge it and I will add autopush :)

wendigo added some commits Oct 26, 2016
@wendigo wendigo WIP 26baa0c
@wendigo wendigo HTTP2/Push for golang 1.8 326ca10
@wendigo wendigo Push plugin completed for review b135229
@wendigo wendigo Correct build tag 8e39e26
@wendigo wendigo Move push plugin position ab0acb4
@wendigo wendigo Add build tags to tests 06af279
@wendigo wendigo Gofmt that code daaa887
@wendigo wendigo Add header/method validations d266d7c
@wendigo wendigo Load push plugin 90add3c
@wendigo wendigo Fixes for wrapping writers 6f1d7b8
@wendigo wendigo Push after delivering file 6cefca4
@wendigo wendigo Fixes, review changes bfc2f28
@wendigo wendigo Remove build tags, support new syntax 33c356b
@wendigo wendigo Fix spelling 7123ffa
@wendigo wendigo gofmt -s -w . 385a25b
@wendigo wendigo Gogland time af132fd
@wendigo wendigo Add interface guards b5e9e19
@wendigo wendigo gofmt c2a67fa
@wendigo wendigo After review fixes
e9885ba
@mholt
mholt approved these changes Feb 17, 2017 View changes
@mholt mholt removed the under review label Feb 17, 2017
@mholt mholt merged commit cdf7cf5 into master Feb 17, 2017

5 checks passed

continuous-integration/appveyor/branch AppVeyor build succeeded
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
continuous-integration/travis-ci/push The Travis CI build passed
Details
licence/cla Contributor License Agreement is signed.
Details
@mholt
Owner
mholt commented Feb 17, 2017

@wendigo This should help get things started, take a look at this issue: tdewolff/push#1

Basically, we should be caching (for some configurable amount of time) the results of parsing the resources so that we don't have to do it on every single request. It would be interesting to first implement it very simply without caching, and just do the bare-bones auto-push, and then benchmark it using wrk or something. Then after that, implement the caching and see how it performs. Getting it set up for the first benchmarks should only be a few lines of code then. Does that make sense?

@wendigo wendigo deleted the 1.8_http2_push branch Feb 17, 2017
@nwidger
nwidger commented Feb 17, 2017

Should this feature implicitly copy certain request headers into the the *http.PushOptions's Header field passed to the http.Pusher's Push method? The most important one in my mind would be to copy the original request's Accept-Encoding header if it exists. Otherwise, the pushed resources will not be gzip'd even if the client previously indicated support for compression.

@mholt
Owner
mholt commented Feb 17, 2017

@nwidger Do you mean the auto-push (TODO) or the manually-configured push? I don't believe the Accept-Encoding header, which is on the request, needs to be pushed into the response anyway, though.

@wendigo
Collaborator
wendigo commented Feb 17, 2017

@mholt as far as I understand @nwidger it should. If client request has "Accept-Encoding" it should be passed via PushOptions so any backend (fastcgi or fileserver) can handle this request and push (gzipped or whatever) resource to the client. I will think about solution.

@mholt
Owner
mholt commented Feb 17, 2017

But a push response goes down to the client, not to a backend.

@nwidger
nwidger commented Feb 17, 2017

Other important headers to copy might include Authorization, WWW-Authenticate and Cookie for authentication and session info.

@nwidger
nwidger commented Feb 17, 2017

@mholt The target parameter, opts.Method and opts.Header passed to the http.Pusher.Push call constitute a synthetic request that gets serviced by the HTTP server just like any other. The response to that synthetic request is what gets pushed to the client. That is my understanding at least.

@mholt
Owner
mholt commented Feb 17, 2017

Ah, I see what you mean. Yes... perhaps, see what the proxy middleware sends upstream to get a decent list.

@nwidger
nwidger commented Feb 18, 2017

If I'm reading the code right, it looks like Apache's mod_http2 implementation copies the User-Agent, Accept, Accept-Encoding, Accept-Language and Cache-Control headers from the original request: https://github.com/icing/mod_h2/blob/master/mod_http2/h2_push.c#L280

@nwidger
nwidger commented Feb 18, 2017

Similarly, nghttp2 appears to copy the Accept-Encoding, Accept-Language, Cache-Control, Host and User-Agent headers: https://github.com/nghttp2/nghttp2/blob/master/src/shrpx_http2_upstream.cc#L1976

@wendigo
Collaborator
wendigo commented Feb 18, 2017

@nwidger thx, working on it right now

+ return h.Next.ServeHTTP(w, r)
+ }
+
+ // Serve file first
@nwidger
nwidger Feb 18, 2017

The description for http.Pusher contains the following text:

        // Handlers that wish to push URL X should call Push before sending any
        // data that may trigger a request for URL X. This avoids a race where the
        // client issues requests for X before receiving the PUSH_PROMISE for X.

By serving the original request first, aren't we violating this rule? The client will receive the Link headers and may send out additional requests for those resources before we send out the push promises for them.

@wendigo
wendigo Feb 18, 2017 Collaborator

I didn't see that comment - if it's a rule - it should not be violated :) Fixed in #1453

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment