Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use a HTTP header to convey the digest #167

Closed
wants to merge 10 commits into from

Conversation

kazuho
Copy link
Contributor

@kazuho kazuho commented Jan 18, 2016

This PR changes the conveyer of the cache digest from a HTTP/2 frame to a HTTP request header.

There are several merits in doing so. By defining it as a header,

  • a cache-digest can be sent by ServiceWorker
  • a cache-digest can easily be transferred to application servers over HTTP/1
  • client connecting through an cache-digest-unaware proxy can benefit from push

The downside had been considered that including a cache-digest in every HTTP request might consume too much upstream bandwidth (when compared to using a HTTP/2 frame), but I have concluded that by sending multiple cache-digest headers we can earn effective compression by HPACK so that the overhead may be negligible.

For example, a client can send in the first HTTP request after establishing an HTTP/2 connection a cache-digest header containing the entire digest:

cache-digest: fresh=ABC...xyz    # entire cache digest

and in the following HTTP requests, send the same big header (which will be compressed to 1 to 2 bytes by HPACK) and a second header containing a small delta since the first request.

cache-digest: fresh=ABC...xyz    # entire cache digest
cache-digest: fresh=0uec         # small delta

The ABNF has been generalized to use set of name-value pairs in order to give room for future extensions.

Some relevant mails regarding the topic:
https://lists.w3.org/Archives/Public/ietf-http-wg/2016JanMar/0076.html
https://lists.w3.org/Archives/Public/ietf-http-wg/2016JanMar/0081.html
https://lists.w3.org/Archives/Public/ietf-http-wg/2016JanMar/0095.html
https://lists.w3.org/Archives/Public/ietf-http-wg/2016JanMar/0096.html
https://lists.w3.org/Archives/Public/ietf-http-wg/2016JanMar/0120.html

- truncate the hash value (instead of calculating the modulo)
- emit N and P in 5 bits, so that `hash-values` can always be represented using a 64-bit value type
- split the emissions of N and P to separate steps to avoid confusion

The changes were suggested by Martin Thomson (https://lists.w3.org/Archives/Public/ietf-http-wg/2016JanMar/0026.html), and Alex Rousskov (https://lists.w3.org/Archives/Public/ietf-http-wg/2016JanMar/0025.html).
@kazuho
Copy link
Contributor Author

kazuho commented Jan 19, 2016

Sorry please consider this PR as WIP; we need to:

@igrigorik
Copy link

👍 excited to see this!

@mnot
Copy link
Owner

mnot commented Jan 31, 2016

My .02 -

I think it's fine to define a header for this; like you say, there are some good use cases for it.

However, why are we removing the HTTP/2 frame? It has a number of benefits:

  • The UA can complete the first request as quickly as possible, without fate-sharing between the digest and it.
  • The UA can update/replace the server's digest state without a request being issued
  • The digest state is explicitly connection-specific and hop-by-hop
  • The digest doesn't take up limited HPACK dictionary space

@kazuho
Copy link
Contributor Author

kazuho commented Feb 1, 2016

Thank you for your sensible response.

Let's consider how we should define both the frame and the header so that they can be kept in sync.

My idea was that we could just use the HTTP header for simplicity, but I agree that it might be helpful to keep the frame definition as well. Specifically, I hadn't considered the possibility of the digest header taking up HPACK dictionary space; considering the fact that the default size is 4K, I agree that we should better provide a way that does not use a header.

I think the best way to proceed is to first agree on the basic design regarding syntax and semantics. After that I can update this PR (if you want me to), and move on to discussing the details.

Syntax

I see three options here:

  • a) use the frame definition as the base, and define the header as the base64-encoded form of the frame
  • b) use the header definition as the base and define the frame as the non-base64-encoded form of the digest-value plus the parameters conveyed as a string
  • c) same as b, but encode the parameters using a binary notation

I do not have a strong opinion here, but my weak preference is b, which would state:

+----------------------+
| Parameters-Len? (16) |
+------------------+---+
| Digest-Value (*) |
+----------------+-+
| Parameters (*) |
+----------------+

Digest-Value is the non-base64-encoded digest, and the length is: length-of-frame - Parameters-Len
Parameters contain the parameters (in string) defined for the `cache-digest` header

Semantics

Do you agree that we should add host, path and type parameters to the cache digest?

If yes, we would need to state that a frame replaces the previous one with same values assigned for the three parameters.

@mnot
Copy link
Owner

mnot commented Feb 5, 2016

(b) seems reasonable to me.

Regarding the parameters -

  • host seems pretty reasonable, although we should probably specify it in terms of origins, for clarity.
  • path seems a bit odd. What does it imply to the server? It seems to me that really it's an identifier for the digest so that it can be replaced in the future. i.e., a server can consult any digest that's valid for the origin (taking account of host restrictions), and can have multiple applicable digests for a given origin, but this identifier (not the name I'd suggest :) tells it when to replace and when to consider it a different digest. That way, you can even change the scope of host on a digest if you want to, over time.
  • type seems reasonable, except as soon as we have more than one value, we'll need some way for the server to request a particular type from the client. Not sure what to do here, except default type to what we define, and really try to get it right the first time; the big win for cache digests is sending them right after the first request on a connection, so if there are too many formats (i.e., it fills more than 1RT), the advantage is lost.

@kazuho
Copy link
Contributor Author

kazuho commented Feb 9, 2016

@mnot Thank you for the response.

  • path seems a bit odd. What does it imply to the server?

It implies that the server should use the cache-digest only for determining the cache state of resources that belong to the specified path.

The addition was suggested by @martinthomson, and my understanding is that the primary use-case of the attribute is SW-based implementations.

For a SW-based cache implementation, it would be natural to implement a cache that only retains files under certain path (e.g. /scripts or /css), considering the fact that the size of SW cache may be pretty small. In such case, the client needs to notify the server that the cache-digest being sent only covers those paths, that the server should not use the cache-digest to determine if a response that does not fall under the cached path is already cached.

  • type seems reasonable, except as soon as we have more than one value, we'll need some way for the server to request a particular type from the client.

Agreed. The only reason for defining the type attribute is for keeping the specification open for future extensions.

@mnot
Copy link
Owner

mnot commented Feb 10, 2016

WRT path - imagine a server has set digests with path=/foo and path=/foo/bar. Does this mean that the server should only consult the latter digest for /foo/bar/baz, or should it consult both?

What I was suggesting was something like id=abc, purely as a mechanism for there to be multiple active digests for a given origin, and to allow individual digests to be updated.

That way, we're not tying the life cycle of a digest to a specific path on the server.

@martinthomson WDYT?

@kazuho
Copy link
Contributor Author

kazuho commented Feb 10, 2016

@mnot

WRT path - imagine a server has set digests with path=/foo and path=/foo/bar. Does this mean that the server should only consult the latter digest for /foo/bar/baz, or should it consult both?

It should consult both (I agree that for path it seems a bit odd, but we would anyways have the same issue for host).

IMO specifying that a server should consult all the active cache-digests that match the scope (i.e. host and path attributes) is a natural thing to do, considering the fact that for HPACK efficiency of header-conveyed digests we need to allow splitting a digest covering single scope into multiple cache-digest entries for HPACK efficiency (discussed in these lines of the PR).

What I was suggesting was something like id=abc, purely as a mechanism for there to be multiple active digests for a given origin, and to allow individual digests to be updated.

That way, we're not tying the life cycle of a digest to a specific path on the server.

The issue is that without path attribute it is impossible for a client to send a cache-digest for a small part of the server.

Consider the case when a client sends a cache-digest only containing the digests for files under /scripts. When a server considers if it should push /img/foo.jpg, it should not refer to the digest, since the digest does not cover the file. Without path attribute, a server cannot make such distinction; it would estimate that the image file is not being cached by the client, and push the file to the client every time it receives a new request (since the image will never become the part of the cache-digest that only covers files under /scripts).

In other words, having path attribute defined is a must if we are to allow a client to send a digest covering only a specific portion of a host, regardless of whether we should provide a mechanism to update part of the digest (by using id or something alike).

And regarding if we should define a way to update part of a digest, I wonder what the use-case would be.

For H2 frame-based digest, a server can make a good guess of the client's cache state, since the server knows what it has sent. OTOH for the header-based digest, we should not provide a way to update a previously sent digest, since many HTTP client APIs (e.g. Fetch) does not provide a guarantee that multiple requests will be sent over a single HTTP/2 connection.

@martinthomson
Copy link
Contributor

It's always possible to send a digest over a small slice of the server, just include resources that are in that slice. That leaves the main advantage of including a path to be one of reducing the chance of a false positive. That is, if /foo/1 ends up colliding with /bar/55, then including path=/bar will avoid that collision.

It's a small thing, and in retrospect, it's probably not worth the bits that it saves. Better to spend those bits on making the false positive rate lower by giving the digest more bits.

@mnot
Copy link
Owner

mnot commented Feb 16, 2016

Stepping back, there are a few possible ways that this can work:

  1. Every digest received on a connection is valid. That means that the server consults them all, and treats all as current. If the browser clears its cache, evicts, etc., it has no way to invalidate them except to drop the connection.
  2. Only the most recent digest on a connection is valid. That means that there is only one "current" digest, and so multiple digest sources can't co-exist on the same origin.
  3. We have some sort of digest identifier (with a sensible default), so that the browser can either create a new digest or replace an existing one on the server at will.

path seemed like a confusing version of (c) to me; it conflated digest version management with scoping. I was proposing something like id=foo for (c).

@martinthomson it sounds like you're leaning towards (a) above, correct?

@mnot
Copy link
Owner

mnot commented Feb 16, 2016

Reconsidering the syntax discussion above, my personal preference (which is not a demand :) is (a), because in the long term, we'd hope that browsers create digests natively; the header is for experimentation. The canonical form should be the one that's the most used.

YMMV, just my .02.

@martinthomson
Copy link
Contributor

Yes, (a) makes the most sense to me too. Almost. All requests are potential sources of useful information, but there is a risk of destroying stateless request handling.

Think of it as having the digest on the request determining what you push for that request. The server probably should not infer that a request without a digest has anything cached, even if the previous request had the header. We should try to avoid making inter-request state a real thing, even for cases like this where inference is key.

Some inference is inevitable, but no need to codify it.

@kazuho
Copy link
Contributor Author

kazuho commented Feb 23, 2016

@mnot @martinthomson Sorry for the delay.

Reconsidering the syntax discussion above, my personal preference (which is not a demand :) is (a)

No objections. Moving one step forward, how should we encode the attributes (e.g. origin)? My preference goes to something like below.

By using the first octet to contain flags to indicate what extensions are being used, we can keep the spec. open to future extensions (e.g. the path attribute, sending digest for stale resources).

+--------------+----------------+
| Reserved (7) | Has-Origin (1) |
+--------------+---+------------+
| Origin-Len? (16) |
+------------------+
|   Origin? (*)    |
+------------------+--------------------------------------------+
|         Digest-Value? (*)                    ...
+---------------------------------------------------------------+

The CACHE_DIGEST frame payload has the following fields:

* Reserved: Reserved bits. Clients MUST set the bits to zero. Servers MUST ignore CACHE_DIGEST frames that have any of the reserved bits being set.
* Has-Origin: A flag indicating if Origin-Len and Origin fields exist.
* Origin-Len: An unsigned, 16-bit integer indicating the length, in octets, of the Origin field.
* Origin: Scope of Digest-Value
* Digest-Value: An optional sequence of octets containing the digest as computed in {{computing}}.

@kazuho
Copy link
Contributor Author

kazuho commented Feb 23, 2016

@mnot

Stepping back, there are a few possible ways that this can work:

  1. Every digest received on a connection is valid. That means that the server consults them all, and treats all as current. If the browser clears its cache, evicts, etc., it has no way to invalidate them except to drop the connection.
  2. Only the most recent digest on a connection is valid. That means that there is only one "current" digest, and so multiple digest sources can't co-exist on the same origin.
  3. We have some sort of digest identifier (with a sensible default), so that the browser can either create a new digest or replace an existing one on the server at will.

path seemed like a confusing version of (c) to me; it conflated digest version management with scoping. I was proposing something like id=foo for (c).

Thank you for the explanation. That is a good point.

I'd assume we would have the same issue for the Origin attribute as well.

Consider a case where a server is authoritative for *.example.com. A client might first send a request for a.example.com and a cache digest with its origin attribute set to a.example.com, then send a request for b.example.com and another cache digest with its origin set to b.example.com.

In this case, I believe that the server should retain both digests, and replace either of them when a client sends a new cache digest with same origin value. In other words, to me it seems natural to conflate the scope and version management (and my understanding was that we should do the same for path).

@martinthomson
Copy link
Contributor

@kazuho, I think that a header field could just include the origin as a parameter. It will encode more efficiently that way:

Cache-Digest: d=goihsgosihdfoih, o=https://example.com:8443

Noting of course that paying attention to origin only reduces false positive rate because servers won't test the digest against resources on other origins. That makes the o= attribute ignoreable.

@kazuho
Copy link
Contributor Author

kazuho commented Feb 23, 2016

@martinthomson

@kazuho, I think that a header field could just include the origin as a parameter. It will encode more efficiently that way:

Cache-Digest: d=goihsgosihdfoih, o=https://example.com:8443

I agree that it would be more natural (and easy to use from sender's standpoint) if we use parameters for header fields. The downside would be that on the server side there needs to be two decoders for the attributes (one for the header and one for the frame), and that we would need to keep the definitions for the two in sync.

Therefore, I'd be happy if we could just include the name=value pairs in the frame as they are in the HTTP/2 frame in case we are going to use name=value pairs for the header definition (and that was the intention of (b)).

Noting of course that paying attention to origin only reduces false positive rate because servers won't test the digest against resources on other origins. That makes the o= attribute ignoreable.

I think we have a disagreement here.

In my view, client's cache state of a resource from server's viewpoint has three states: unknown, cached, not-cached. State of any resource is unknown until a client sends a cache-digest. When a client sends a cache-digest for a specific origin, then the cache states of the resources belonging to the origin switches to either cached or not-cached. Cache states of resources not belonging to the origin continue to be unknown.

How should a server use the information for optimizing content delivery? IMO a server should:

  • for cached resources, do not push
  • for not-cached resources, always push
  • for unknown resources, push if the resource is tiny (or do not push at all)

As you can see, the strategy is different between not-cached and unknown. In other words, distinguishing between the two states by using the origin attribute is highly recommended; we are likely to see false-negatives if we permit the servers to ignore the attribute.

@martinthomson
Copy link
Contributor

I wasn't aware that you were still pursuing the frame.

As for the tri-state, yes, I can see that failing to narrow the scope does cause the unknown set to be eliminated. Maybe default origin to that of the request and allow it to be overridden. Note that in that case, a path (or maybe a combined origin and path, i.e., a URL prefix) does make sense.

@kazuho
Copy link
Contributor Author

kazuho commented Feb 25, 2016

@martinthomson

I wasn't aware that you were still pursuing the frame.

It seems likely that I was not clear enough when I stated: a) use the frame definition as the base, and define the header as the base64-encoded form of the frame, to which we have agreed.

Rereading the comments on this PR, I think we have an agreement that the frame definition should be defined as the base (and to define the header as a variant to the frame definition), but do not have a clear agreement on how to encode the attributes (e.g. origin) when either of the frame or the header is used.

@mnot, @martinthomson Therefore, would you mind confirming your preference in how to encode the attributes in the frame and the header. Sorry for the fuss.

As said, my weak preference is to use name=value pairs (defined using ABNF) in both cases. In case of CACHE_DIGEST frame, the pairs should be embedded as a string. Pros of the approach is that the spec. would be simple (since at least the syntactical definition of the attributes can be 100% shared) and that the header would be easy to understand. Size of the frame can be kept small if we choose short names for pre-defined attributes (e.g. use o=https://... to represent the origin attribute).

Maybe default origin to that of the request and allow it to be overridden.

+1

Note that in that case, a path (or maybe a combined origin and path, i.e., a URL prefix) does make sense.

I Agree that we need path.

Whether we should combine origin host and path into a single attribute containing a URL prefix IMO depends on how much the pushed contents are scattered across the domains. If it is typical to have a website that scatters CSS and/or JavaScript files to tens of hosts under a single domain, you would likely want to use a wild-card hostname to specify the origin of the cache-digest under the premise that the server uses a wild-card certificate. And if we are to support wild-card hostnames with the origin attribute, we should define origin and path attributes separately, since a URL prefix including a wild-card hostname (e.g. https://*.example.com/foo) is confusing.

@mnot
Copy link
Owner

mnot commented Mar 2, 2016

I don't have strong opinions on the encoding inside the frame; I'd suggest as long as it's sensible (e.g., there's no unnecessary encoding), that should be fine.

Regarding origin -- yes, that potentially limits the scope too.

I'm not so sure of the tri-state. The draft currently says:

A client MAY choose a subset of the available stored responses to include in the set.

This is to allow it limit the size of the digest (e.g., if it has thousands or more cached responses, with a long heavy tail).

So, the server can't really assume that once it has a digest, any non-matching URLs are not-cached.

We could introduce a flag from the client to indicate that it's a complete digest for the origin if we think that's an important distinction. Without that, path doesn't make sense to me still (sorry). Even with it, I think that path is potentially more complex than it's worth (sorry again), because I suspect Web server operators won't be segmenting their resources into separate paths for different digests, but instead there will be overlap.

I also sometimes wonder if we should have a fresh flag to indicate whether a cached representation is fresh, so the serve can choose to optimistically push a 304 (which is really cheap).

@kazuho
Copy link
Contributor Author

kazuho commented Mar 2, 2016

@mnot Thank you for your answer, and thank you for pointing out that the current draft allows clients to send a digest that does not cover all the fresh resources.

We could introduce a flag from the client to indicate that it's a complete digest for the origin if we think that's an important distinction.

Due to the reasons stated below, I think we should introduce such flag, or else prohibit the clients from sending incomplete digests.

It is typical to see cases where the RTT from a client to the application server is much greater than the RTT to the edge server.

The bandwidth an edge server can utilize for pushing assets before starting to serve a response from an origin server is exponentially proportional to the ratio between the RTTs of the two. It is exponential because the TCP send window multiplies by 2x every round-trip. If the RTT to the application server is 4x of that to the edge server, the bandwidth available for push is 3 round-trips (~100KB). If the ratio is 5x, available bandwidth would be 4 round-trips (~200KB).
(see the figure on http://blog.kazuhooku.com/2015/12/optimizing-performance-of-multi-tiered.html)

However, I doubt if we would typically want to use that much of a bandwidth for pushing resources that might be cached, considering the fact that not all users are subscribing to their carriers with a fixed-rate plan.

In other words, IMO to fully utilize the bandwidth available for server-push, being able to distinguish between unknown and not-cached is a must.

I also sometimes wonder if we should have a fresh flag to indicate whether a cached representation is fresh, so the serve can choose to optimistically push a 304 (which is really cheap).

I've always been eager to have this. And I agree on using a a flag to indicate whether a digest (conveyed in a H2 frame or a HTTP header) is for fresh resources or for stale ones.

The obstacle here is that there has not yet been an agreement between HTTP/2 implementers how to push a 304. Neither Chrome nor Firefox correctly implement support for pushed 304s (see https://lists.w3.org/Archives/Public/ietf-http-wg/2016JanMar/0222.html) or do they recognize a push using HEAD method.

@mnot
Copy link
Owner

mnot commented Mar 11, 2016

Thinking about this, one thing that strikes me is that the use cases for the frame vs. the header are subtly different.

I suspect the frame is always (or nearly always) going to be generated by the HTTP cache of the client, not application code, and the header is going to be generated by application code, not the HTTP cache.

The HTTP cache has a definitive view of all responses it holds for the origin, and it doesn't have any concept of application boundaries within the origin. Therefore, I think that the path parameter doesn't make much sense for the frame type, and it's probably safe to always make the origin that of the associated stream. I'm not necessarily against defining these parameters for the frame, but I suspect they won't get used much.

On the other hand, the header is going to be used by ServiceWorker, and that has a definite "footprint" on the origin that is useful to describe with path.

Sending an incomplete digest is very important for the HTTP cache / frame use case, because the client wants to send enough digest to be useful, but not so much as to require the server to pause before starting to push, because it's still consuming the digest (remembering that if the connection is new, filling the initial congestion window is a risk for a cache holding even ~700 responses for the origin). Remember, the HTTP cache can't make rational decisions about how to segment up the origin into paths.

It's not nearly as important for the service worker case, since the SW has application knowledge, and is not (usually) handling the complete origin.

@mnot
Copy link
Owner

mnot commented Mar 11, 2016

Regarding pushing 304s -- we should write an I-D :)

@kazuho
Copy link
Contributor Author

kazuho commented Mar 16, 2016

@mnot I agree with your view that path is unlikely be to used with a frame.

And I think it might be better for us to (at least for the time being) step back from trying to determine what is right for SW, considering the fact that having a spec. is not essential for a SW-based cache digest implementation.

There is no need to define every aspect of the cache-digest that will be used by an SW-based implementation, considering the fact that a SW is fully controlled by its origin. It is safe for every origin to use their own variant of cache-digest. Actually, users of a SW-based implementation might want to use more application-specific information than path as a key for determining whether if a resource should be part of the digest. For example, it might be a good idea to use the content-type.

Therefore, I think in the next revision we should try to define:

  • how to express origin within the H2 frame
  • how to express a partial digest
  • how to make the frame extensible (so that we could add support for digest of stale resources, etc.)

We could potentially discuss how it should be encoded as a H1 header, but I do not think it is necessary (due to the reasons explained).

@kazuho
Copy link
Contributor Author

kazuho commented Mar 16, 2016

@mnot Regarding your comment on the necessity of a partial digest, I agree that we should have a flag to express that.

OTOH I assume that a cache-digest sent within INITCWND can be much larger than for 700 resources (this might be a knit pick, I'm sorry). If P is 256 (i.e. probability of false positive is 1/256), then the space consumed by a single resource will be 8 bits, which means that you could store a digest of 1,400 resources within the first packet, or a digest of 14,000 resources within INITCWND (in case INITCWND is 10).

Also, I would suggest a client to cut down the value of P to cram more resources into a cache digest before starting to send a partial digest. IMO a cache digest with P=2 still has merit. Consider a case where a client is not in possession of 10 resources that are required to render a document it requested. Even in case of P=2, we would be at average able to determine 5 resources that are not-cached, and start pushing them.

PS. In case P=2, the required space for a digest will be slightly above 2 bits per resource, which means that you can send a digest of ~5000 resources within a single packet (1,400 octets).

@mnot
Copy link
Owner

mnot commented Mar 16, 2016

@kazuho I was assuming that client-side INITCWND is still largely around 3, not 10. I've looked for data on this, but haven't found much; do you know of any? @igrigorik any idea?

@igrigorik
Copy link

I don't have conclusive data either but, FWIW, in the experiments I've run in the past (against ~top 1K) the majority were 10.. Part of the problem is that there is no definite way to test this.

@mnot
Copy link
Owner

mnot commented Mar 21, 2016

@igrigorik not servers (which "top 1K" sounds like) -- clients (e.g,. OSX, Windows, iOS). Since the client is sending the cache digest, that's what's important here.

@igrigorik
Copy link

Doh, misread that -- yes, I was referring to server. I don't have any good data for clients.. and I'd love to get my hands on it.

@igrigorik
Copy link

Regarding pushing 304s -- we should write an I-D :)

FWIW, I started this a while back: https://groups.google.com/a/chromium.org/d/msg/net-dev/yfkW4mkWIPU/5RckmfktJgAJ - check out the linked doc, would love to hear your thoughts.

And yes, we probably should.. Assuming we agree on what the behavior should be.

@kazuho
Copy link
Contributor Author

kazuho commented Mar 22, 2016

@mnot

I was assuming that client-side INITCWND is still largely around 3, not 10. I've looked for data on this, but haven't found much; do you know of any?

That's a good question! Hadn't thought that the numbers could be different bet. server and client.

That said, regardless of how large INITCWND on the client side is, I have no objection is having a flag to indicate a partial digest. Having some information is definitely better than having none.

@igrigorik

Regarding pushing 304s -- we should write an I-D :)

FWIW, I started this a while back: https://groups.google.com/a/chromium.org/d/msg/net-dev/yfkW4mkWIPU/5RckmfktJgAJ - check out the linked doc, would love to hear your thoughts.

And yes, we probably should.. Assuming we agree on what the behavior should be.

Yeah. Now I remember leaving a comment on the linked doc pointing out that cache validators can be pushed without using 304, which we discussed in detail in h2o/h2o#447. But the workaround described in the issue is a hack, and I would definitely love to see the method of pushing a cache validator being formalized.

And for the matter, my two cents go to pushing a HEAD request for the purpose.

@kazuho
Copy link
Contributor Author

kazuho commented Apr 6, 2016

Wondering what would be the right strategy for a web server to take if it receives a partial cache digest, I am starting to think that using a 1xx response is the best way to do it.

What I am suggesting is, that if a server receives a partial digest it should:

  • push tiny resources that are not known to be cached
  • for larger resources not known to be cached, send Link: rel=preload using 1xx as an early metadata indication

Regarding the use of a 1xx response for the purpose, I agree to @reschke's comment that we need to define a new status code.

I also agree with @mnot's comment that we should be careful of interoperability issues when using 1xx on the open web, and in case of early metadata indication, I believe we could make it a opt-in from the client-side (e.g. Expect: 1xx-early-metadata or something), or just limit the feature to HTTP/2 over TLS.

see also: w3c/preload#38

This was referenced May 4, 2016
@mnot
Copy link
Owner

mnot commented May 4, 2016

I chatted with Jana about this in B-A, who asserted that client-side INITCWND is often 10.

I've created some issues to cover what I think there's agreement upon above.

@kazuho, above you mention wanting to define "how to express origin within the H2 frame." What's the use case for that? In the current draft, the origin of the digest is linked to the stream that the digest is sent on. Is there a case where we'd want to send a digest before any requests are sent to an origin?

@kazuho
Copy link
Contributor Author

kazuho commented May 4, 2016

@mnot Thank you for the INITCWND info and for opening the issues.

above you mention wanting to define "how to express origin within the H2 frame." What's the use case for that? In the current draft, the origin of the digest is linked to the stream that the digest is sent on. Is there a case where we'd want to send a digest before any requests are sent to an origin?

Yes for SW use-case, but I am not sure for H2 frames.

In case of SW with prior knowledge of how the web-site is deployed, the client may want to send a digest that covers other hosts than the origin (if the server presents a wild-card certificate). This would be useful for websites that serve resources files (e.g. JS, CSS) from a separate hostname.

OTOH, for H2 (without the prior knowledge) it would be less likely to see such use.

So under the premise that the draft would mainly focus on defining H2 frame (i.e. evade from defining the path attribute), I do not think there's a strong need for an attribute to overwrite the origin.

@mnot
Copy link
Owner

mnot commented May 10, 2016

OK, that makes sense. I think we can close this pull request (if I have that wrong, please re-open!), and we can focus on those issues and getting a new draft out.

@mnot mnot closed this May 10, 2016
@kazuho
Copy link
Contributor Author

kazuho commented May 10, 2016

OK, that makes sense. I think we can close this pull request (if I have that wrong, please re-open!), and we can focus on those issues and getting a new draft out.

👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants