Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

QPACK improvement: wrap absolute index values #1644

Closed
dtikhonov opened this issue Aug 9, 2018 · 19 comments
Closed

QPACK improvement: wrap absolute index values #1644

dtikhonov opened this issue Aug 9, 2018 · 19 comments
Assignees
Labels
-qpack design An issue that affects the design of the protocol; resolution requires consensus. has-consensus An issue that the Chairs have determined has consensus, by canvassing the mailing list.

Comments

@dtikhonov
Copy link
Member

What follows is a modified version of the original proposal on the mailing list. It includes changes to address a few problems identified by @martinthomson.

Abstract

The practical and theoretical problems caused by the unbounded absolute index values in the current QPACK Internet Draft are discussed. The proposal to impose a limit on the range of absolute index values by wrapping them is presented.

Background

The current QPACK draft states that

[the] first entry inserted has an absolute index of "1"; indices increase sequentially with each insertion.

No upper limit is specified. This is reasonable, as it is impossible to run out of numbers.

Nevertheless, an absolute index whose value is ever-increasing creates significant problems.

Problem 1: Compression Impact

A large absolute index value has a negative effect on compression. The unadjusted value of the
absolute index is used in the Largest Reference field in the Header Data Prefix.

Larger values mean that more bytes are necessary to encode them. For example, using 8-bit prefix integer encoding, values larger than 254 encode to 2 bytes, larger than 382 to 3 bytes, and larger than 16638 to 4 bytes.

Relatively quickly, the overhead associated with communicating the absolute index doubles and then triples.

Problem 2: Theoretical Concerns

Why should compression at t(1) be worse than at t(0)? An ability to operate at the optimum level in perpetuity is a good property for a protocol to have.

Problem 3: Implementation Efficiency

A QPACK implementation has to place some limits on the value of the absolute index -- simply because it has to pick an integral type to store it. For example, uint64_t is likely large enough for all uses of the protocol, but it is wider than uint32_t and may be more expensive to process. On other platforms, uint16_t may be the fastest integer; the unbounded nature of the absolute index would rule out its use.

Such limit also requires an extra check to cease inserting entries into the dynamic table once the maximum value for the integer is reached. (One can run out of numbers in practice...)

Solution: Have the Absolute Index Wrap Around

The main insight of the proposal is that the dynamic table has a bounded number of entries, that both encoder and decoder know what this number is, and that both can use this information to place an absolute index value communicated to it by its peer at the correct place in the sequence.

The dynamic table has a maximum size which is specified at the beginning of the connection and which never changes afterwards. The current size of the table is the sum of all its entry sizes. The entry size is the sum of the fixed overhead plus the lengths of the name and the value strings. It is this fixed entry overhead that places a hard limit on the number of entries in the dynamic table:

    MaxEntries = MaxTableSize / FixedEntryOverhead

For example, using the current QPACK defaults, this number is

    MaxEntries = 4096 / 32 = 128

Imagine an encoder or decoder whose largest absolute index value (referred to as the total number of insertions in the QPACK draft) is X. The smallest valid absolute index value its peer could possibly communicate to it is X - MaxEntries + 1.

The value of the largest possible absolute index value at a certain point in time cannot be known in the current specification of QPACK. This is because the encoder is allowed to insert an unlimited number of entries into the dynamic table -- for example, speculative updates -- as long as their eviction is not blocked. Nevertheless, such behavior is unlikely to be useful in practice. The proposal is to limit the number of unacked insertions into the dynamic table to a multiple of MaxEntries. Let's call this number CeilMul.

Given the two limits above, we can wrap the absolute index value without loss of information. The proposal is to limit the absolute index values to the range

    [ 1, MaxEntries * (1 + CeilMul) ]

Continuing with the QPACK defaults example above and using CeilMul = 1, the sequence would go

    1, 2, 3, ... 254, 255, 256, 1, 2, ... (and so on).

The compression overhead cost is now a function of the maximum table size (and not a function of time as before).

To continue with our example, out of 256 possible values, 254 encode to 1 byte and 2 to two bytes. A 64 KB table results in a maximum encoding of 3 bytes, while a 1 MB table has a maximum encoding of 4 bytes.

Changes to the Draft

The following changes are necessary:

  1. The definition of the absolute index changes: now its values are calculated using modular arithmetic.
  2. The window size calculation based on the maximum dynamic table size needs to be added.

Conclusion

Limiting the range of values the absolute index can have improves compression and implementation efficiency and removes the time penalty from the QPACK protocol. A mechanism to do just that using existing QPACK parameters is hereby proposed.

@MikeBishop
Copy link
Contributor

I don't disagree with the goals or the general mechanism. However, given that the max table size can change unilaterally over the course of a connection, it seems problematic to use that to determine the wrapping point. It needs to be something relatively stable over the lifetime of the connection, and that probably means the setting which bounds the max table size.

Even that can change on a 0-RTT connection when the SETTINGS frame is received, but there's a simple way around that (assuming #1641). You tentatively calculate MaxEntries based on the remembered value, and prohibit sending more than MaxEntries entries prior to seeing the SETTINGS frame. Once you see the SETTINGS frame, you know where the wrap point actually is and you can wrap if you need to. For any reasonable value of the bound, that prohibition shouldn't be an issue.

@dtikhonov
Copy link
Member Author

It needs to be something relatively stable over the lifetime of the connection, and that probably means the setting which bounds the max table size.

That's what I meant by the sentence The dynamic table has a maximum size which is specified at the beginning of the connection and which never changes afterwards.

You tentatively calculate MaxEntries based on the remembered value, and prohibit sending more than MaxEntries entries prior to seeing the SETTINGS frame. Once you see the SETTINGS frame, you know where the wrap point actually is and you can wrap if you need to. For any reasonable value of the bound, that prohibition shouldn't be an issue.

That's a good idea. You can actually send up to MaxEntries * CeilMul (whatever CeilMul is set to in the end) until you see the SETTINGS frame.

@MikeBishop
Copy link
Contributor

Okay, just terminology, then. Yes, I agree you can send as much as you're able to without wrapping, and artificially expanding the index range gives you more room at the potential price of bits in part of the sequence.

It seems like the more efficient way to do that would be to:

  • Calculate MaxEntries
  • Figure out how many bytes are required to encode MaxEntries
  • Wrap at the maximum number encodable in that number of bytes

It gets a less-fixed expansion, but the expansion is always free. MaxEntries as defined will already always be enough items to overflow the table, so I'm not particularly inclined to be much more aggressive than that. However, that's a little more obtuse to specify.

@dtikhonov
Copy link
Member Author

Why do you think that the range can be as small as MaxEntries? The proposal has it as at least twice that.

I think the price of expansion to the maximum encodable number can be steep. If the window is 256, then the maximum encodable value in two bytes is 382. When the window is 256, the average length of the encoded number is (254 * 1 + 2 * 2) / 256 = 1.008. When you expand the window to 282, that number grows to (254 * 1 + (382 - 254) * 2) / 382 = 1.335.

@MikeBishop
Copy link
Contributor

MikeBishop commented Aug 9, 2018

I think doubling-or-more is attempting to guard against a situation where X was valid pre-wrap but about to expire, then enough entries are added that X is valid again post-wrap, and there's confusion about whether a new frame refers to the old or new one, correct? If so, the existing prohibition on evicting entries which still have references should ensure that there's no confusion, and the fact that real entry sizes will produce more of a gap in real usage further insulates us from the problem. If it's intended to prevent a different issue, please clarify.

Your math about how much it costs is correct, but somewhat misconstrues the idea -- the window is being expanded to the rest of the byte instead of being doubled or more, not expanded after doubling. For smaller table sizes, that's more conservative.

So for example, modulo any off-by-one errors in counting:

MaxEntries Expansion Doubled
128 [0..254) = 1 byte (254 * 1 + 2 * 2) / 256 = 1.008 bytes
256 [0..382) = (254 * 1 + 128 * 2) / 382 = 1.335 bytes [0..512) = (254 * 1 + 128 * 2 + 130 * 3) = 1.76 bytes
2000 [0..16639) = (254 * 1 + 128 * 2 + 16257 * 3) / 16639 = 2.96 bytes [0..4000) = (254 * 1 + 128 * 2 + 3618 * 3) = 2.81 bytes
16000 [0..16639) = (254 * 1 + 128 * 2 + 16257 * 3) / 16639 = 2.96 bytes [0..32000) = (254 * 1 + 128 * 2 + 16257 * 3 + 15361 * 4) / 32000 = 3.46 bytes

However, you're right that it becomes more tilted to the largest possible encoding once you pass two bytes, so that's not ideal for very large table sizes. Effectively it's a step function rather than a smooth increase, so it will be worse for things that just cross the step.

@dtikhonov
Copy link
Member Author

I think doubling-or-more is attempting to guard against a situation where X was valid pre-wrap but about to expire, then enough entries are added that X is valid again post-wrap, and there's confusion about whether a new frame refers to the old or new one, correct? If so,

Almost, but not quite (or maybe we are talking about the same thing -- in that case, apologies): The particular values of the index -- pre-wrap or post-wrap -- do not really matter, as what matters is where they are relative to the position of the latest entry -- either to the left of it or to the right of it.

the existing prohibition on evicting entries which still have references should ensure that there's no confusion

Please note that a dynamic entry may have no references (e.g. a speculative update). The encoder is currently allowed to insert an unlimited number of entries into the table -- and evict them -- if no headers reference them.

The purpose of making the window larger than MaxEntries is to provide low and high boundaries. Since the numbers wrap, the decoder must be able to classify an incoming Largest Reference as something that it is supposed to possess (the lower bound, or past) or as a future reference that the header block should be blocked on (the upper bound).

The low bound is the absolute index of the decoder's newest entry in the dynamic table minus MaxEntries plus one. A difference of MaxEntries or more would indicate that the encoder has inserted more than MaxEntries into the dynamic table, which is an error.

Thus, the window must be at least MaxEntries to cover entries that may already exist. Because the index wraps, we need more room for the "future" entries -- those on which the header block will block on. The proposal is to extend the window another MaxEntries and place additional restrictions on the encoder to guarantee that no more than that number of entries be unacknowledged.

The full range of values can be separated into two broad sections, Past and Future:

  1                                                                       MaxAbsIndexId

  |                Past                             |              Future             |
  |  Error area     |   Current Dynamic Table     X |                                 |

                                                  ^
                                                  |
                                                  /
                                      Latest Entry

(Absolute index value increases from left to right and then wraps around. The diagram has X = MaxEntries for convenience.)

After classifying the incoming Largest Reference as Past or Future, the decoder either:

  • Marks the header block as blocked and parks the stream (Future);
  • Processes the header block (Past, with Largest Reference pointing to a valid table entry); or
  • Produces an error if the Largest Reference points to an already-evicted entry.

Reserving MaxEntries for the lower bound and a healthy number of entries for the future entries makes for rules that are easy to reason about. This gives us a theoretical guarantee, so to speak, without using arguments such as "in practice, entries are larger then the fixed overhead." A smaller window, I am afraid, will require much more complicated rules.

@MikeBishop
Copy link
Contributor

Okay, that helps a lot. I see where you're coming from.

My perspective was that MaxEntries is an absolute bound on the size of the table, so if entries are off the end of the table (the "Error area" in your diagram) the correct interpretation would be that they're in the future. That always allows some room to insert without acknowledgement, because of the difference between real entries and the zero-size entries used to calculate MaxEntries. I don't think it leads to confusion since you have to reference the table on a stream to have anything to get confused by 😉 and once you've referenced an entry you have a new pin in how far the table can advance without acknowledgement.

In any scheme, you can't reuse a slot until the eviction of the item which previously occupied it has been acknowledged. If the window is 2*MaxEntries or more, that's at least a full turnover of the table and the limit on unacknowledged entries automatically implies that requirement. If the window is MaxEntries, you have to explicitly track that dependency which will create the limit on how many unacknowledged entries can exist. I think either is easy enough to reason about, but yours tracks a single value rather than a ring buffer of references, which is a decided advantage.

Okay -- I'm sufficiently convinced.

@martinthomson
Copy link
Member

I'm also convinced and would like to see a PR.

CeilMul can be 1. I don't see any point in entertaining anything less (too complicated) and anything more is bananas: you are creating a header block that is blocked for >1 RTT if that arrives.

I think that we should work with 0-based indexing. HPACK having 1-based indexing was awful and we managed to get rid of that by tweaking the opcodes in QPACK.

Make sure that you include floor() in your calculation of MaxEntries.

I like Mike's addendum about 0-RTT. We will need to be careful about that. And implementations will need to test that specially. Of course, for the table to wrap, you need to have evictions in 0-RTT, which won't happen because that takes round trips (at least two, with CeilMul at 1, I believe). If you have round trips in your 0-RTT, something is badly wrong.

@martinthomson martinthomson added design An issue that affects the design of the protocol; resolution requires consensus. -qpack labels Aug 10, 2018
@dtikhonov
Copy link
Member Author

dtikhonov commented Aug 10, 2018

I am glad that you approve the proposal, @MikeBishop and @martinthomson . I will create a PR.

I agree: setting CeilMul to 1 is the best choice. It gives the index range the property of being exactly twice the size of the maximum possible number of entries in the dynamic table. This permits some immediate optimizations in the implementation.

I also agree that beginning the dynamic table at index 0 is better, as this is how most of us index arrays (C, C++, Go, etc.). Should this be a separate PR -- assuming that @afrind, @krasic, and @MikeBishop agree as well? Also, shall we start the static table with 0 for consistency's sake?

The remarks regarding floor() and the 0-RTT scenario have been noted.

@MikeBishop
Copy link
Contributor

No one's approval is needed to create the PR. 😉

1-based indexing shows up wherever a sentinel is needed at 0, basically. We managed to get rid of the sentinel requirement for table references (the static table actually is 0-based in QPACK, there's just nothing in slot 0 until I stop being lazy and get #1355 ready to go). However, for absolute indices, 0 is currently the sentinel for "no dynamic references in this packet." If we want to go back to using a flag for that, or if you have some other way to communicate that, there's no reason it ought to be 1-based.

@dtikhonov
Copy link
Member Author

0 is currently the sentinel for "no dynamic references in this packet." If we want to go back to using a flag for that, or if you have some other way to communicate that, there's no reason it ought to be 1-based.

That's pretty easy: 0 means "no dynamic references", non-zero means "decode the value and subtract 1."

@MikeBishop
Copy link
Contributor

...which is functionally what we have now, just with different wording to describe it. As far as I'm concerned, that's purely editorial semantics, so whatever @afrind wants is fine.

@martinthomson
Copy link
Member

I'm completely comfortable with largest reference starting at 1. Because I use len(table) there; that is, the number of entries in the table. In other words, "you need N entries available before you can use this header block"; as opposed to "the index of the last entry referenced in this header block is N". It's better than trying to work out how to signal the special value.

@dtikhonov
Copy link
Member Author

In other words, "you need N entries available before you can use this header block"; as opposed to "the index of the last entry referenced in this header block is N".

I don't see how this could work. Imagine a full table -- then, on average, one insertion results in one eviction. The number of entries stays about the same. This number does not tell you anything useful. You need the actual index value.

@martinthomson
Copy link
Member

The number counts insertions without regard for evictions that might happen. It's no different in practice to the current number, but is simply moving the goalpost.

The other way of thinking of this is as follows:

0 -----
   header: value
1 -----
   other: 1
2 -----   <<< in 0-based indexing, this refers to the start of :host
   :host: example.com
3 -----   <<< but with our indexing scheme we work backwards
          <<<  so it makes sense to identify the end of the entry instead

That's consistent.

@dtikhonov
Copy link
Member Author

I am still lost. Is this number the absolute index of the next entry to be inserted, then?

In that case, you will still need signaling -- this is because zero is a valid index value. 0 cannot both mean "no references" and "reference to entry MaxEntries * 2 - 1."

@martinthomson martinthomson added the needs-discussion An issue that needs more discussion before we can resolve it. label Sep 14, 2018
@afrind
Copy link
Contributor

afrind commented Sep 14, 2018

I implemented the simplest form of the PR -- just encoding the LR value mod MaxEntries and reconstructing the value in the decoder. It shows a 148 byte or .26% wire savings encoding a browser session with a few hundred requests. It took ~20 lines of code in the encoder and decoder to make it work, including handling a new error case (encoded LR > MaxEntries * 2 + 1).

@martinthomson martinthomson removed the needs-discussion An issue that needs more discussion before we can resolve it. label Sep 15, 2018
@afrind afrind assigned afrind and unassigned afrind Sep 17, 2018
afrind added a commit that referenced this issue Sep 18, 2018
@afrind
Copy link
Contributor

afrind commented Sep 27, 2018

Discussed this at the NYC interim considering three outcomes:

  1. Do nothing
  2. Encode LR mod 2*MaxEntries
  3. Redefine Absolute Indices to wrap at 2*MaxEntries

We took a hum to gauge consensus around doing something or doing nothing, and the room was split.

I'm planning to move forward with option 2. Please review and comment on #1763. Note that the concepts from option 3 are not precluded and I'd be ok including text describing an implementation that wrapped all absolute indexes internally.

@dtikhonov
Copy link
Member Author

We took a hum to gauge consensus around doing something or doing nothing, and the room was split.

...with the outcome that we do something -- option 2 or option 3 -- as interpreted by the chairs.

afrind added a commit that referenced this issue Oct 2, 2018
* QPACK: Encode Largest Reference modulo MaxEntries

Fixes #1644

* s/MUST not/MUST NOT/

Update decoder algorithm so that it actually works

* Fix another edge case

When CurrentWrapped + MaxEntries == LargestReference, the encoder is a full table ahead of the decoder, and they have wrapped an equal number of times.
@mnot mnot added the has-consensus An issue that the Chairs have determined has consensus, by canvassing the mailing list. label Mar 5, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
-qpack design An issue that affects the design of the protocol; resolution requires consensus. has-consensus An issue that the Chairs have determined has consensus, by canvassing the mailing list.
Projects
None yet
Development

No branches or pull requests

5 participants