-
Notifications
You must be signed in to change notification settings - Fork 189
HPACK encoder doesn't handle duplicate headers appropriately #50
Comments
Right, the actual HPACK logic is easy. What's substantially harder is trying to represent the multiple header values on the far side. My current solution is a dictionary whose values are lists, but this causes huge pain in the tests. That's probably ok, I'll see if the tests can be fixed tomorrow. |
Actually, this isn't simple. We need to handle two separate situations:
The HPACK decoder uses Python sets internally, which allow for efficient differencing. Unfortunately, it doesn't allow us to easily have multiple headers in the same header set. Looks like this may require a more dramatic rearchitecting of the HPACK implementation. |
Yeah, the rewrite here is pretty big, but the impact of this bug is low. I won't block releases behind this bug. EDIT: This got discussed at length on the WG list. The short position is that hyper's HPACK implementation is totally wrong because of some subtleties in the spec. |
So, given that this is a full redesign, may as well do my design out here. Open development, right? Specific notes:
Specific requirements:
Notes:
|
Let's first consider decoding. The algorithm is as follows: First, prepare a header set to be emitted. This set is initially empty. Next, determine the representation of the element. This is one of: "indexed", "literal added to header table", "literal not added to header table". If the representation is "indexed", check whether it's in the reference set (does anything in the reference set compare equal to the reference?). If it is, remove it from the reference set and do not emit the header. If it is not, and it references the static table: 1) emit the header; 2) insert the static entry at the start of the header table; 3) add a reference to the reference set. Otherwise, if it is not in the reference set and references the header table: 1) emit the header; 2) add a reference to the reference set. If the representation is "literal not added to the header table", only emit the header. If the representation is "literal added to the header table": 1) emit the header; 2) insert the header at the beginning of the header table; 3) add a reference to the reference set. Finally, everything in the reference set that has not been emitted as part of this process gets added to the header set. Typically, this last step is the most tricky bit. The core problem is that we need a way to work out whether a header in the reference set has been emitted yet. Fundamentally, any header that was added to the reference set and remains in there as part of this procedure was emitted, and none of the others were. However, we need a way to keep track of which ones we emitted. Some options for this:
|
Alright, so how does this 'reference' object work? The constraints are as follows:
The obvious implementation here is to use object identity on the tuples. |
Ok, so the decoder is done and seems like it works. Encoder time. This is a little harder. Big problem is repeated headers. If we have a unique header (i.e. name-value pair) then either it's in the reference set and we don't encode it, or it's not and we do encode it (and add it to the reference set). If we have a header that's repeated, each instance following the first needs to be encoded as two headers, one removing from the reference set and the other re-adding it (and emitting it again). Some examples:
Encoding this in a way that doesn't special-case is tricky, and I just don't think I can do it in a way that I really like. The problem is that searching the header set for all repeated headers every single time seems wildly inefficient. Is it? Not sure. |
It's not inefficient if we use the Counter data structure! This is an awesome brainwave. =D |
This rewrite was required because it turned out that certain assumptions in the original HPACK implementation were revealed to be untrue. Unfortunately, this rewrite adds substantial complexity to much of this code. See GitHub issue #50 for more details.
I ended up going a totally different route with this, using the ability of HPACK to be streamed. It involved a pretty gnarly rewrite, the sum-total of which can be seen here. This passes all my tests including the fixed up HPACK integration tests. Time for me to go back and re-open http2jp/hpack-test-case#13 to confirm that |
Definitely fixed. |
As seen in http2jp/hpack-test-case#13, there's a very specific edge-case behaviour where the HPACK encoder incorrectly handles repeated headers. Specifically, if a header set contains two identical headers (both key and value) that is already in the reference set, we won't emit any headers, causing the output to contain only a single instance of that repeated header.
The nicest fix here is actually likely to be to fix up #36: repeated identical headers will therefore be concatenated together and will look different to the HPACK internals.
The text was updated successfully, but these errors were encountered: