Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Window Size for Shared Dictionary Content Encodings #2754

Closed
nidhijaju opened this issue Mar 8, 2024 · 17 comments · Fixed by #2769 or #2780
Closed

Window Size for Shared Dictionary Content Encodings #2754

nidhijaju opened this issue Mar 8, 2024 · 17 comments · Fixed by #2769 or #2780

Comments

@nidhijaju
Copy link

nidhijaju commented Mar 8, 2024

Context: https://datatracker.ietf.org/doc/draft-jaju-httpbis-zstd-window-size/ aims to clarify the window size limit for the zstd Content Encoding token as it wasn't clear in RFC8878.

Since the Compression Dictionary Transport draft hasn't been published yet, we should also add some window size considerations to it for br-d and zstd-d to maintain interoperability.
I think in Chrome we ended up using max(dictionary size, 8 MB).

@pmeenan

@pmeenan
Copy link
Contributor

pmeenan commented Mar 9, 2024

@eustas could you comment on how this impacts Brotli dictionary decode? Is the dictionary independent from the compression window (so we can adopt the same defaults as br or do we need to specify something relative to the size of the dictionary?

@Cyan4973, @felixhandte for ZStandard, to allow for delta-encoding large resources, is max(dictionary size, 8 MB) reasonable or should it be dictionary size + 8 MB?

We have use cases for delta-encoding wasm applications where the dictionary can easily be upwards of 50 MB.

@Cyan4973
Copy link

Cyan4973 commented Mar 9, 2024

For Zstandard, the specification states that the entirety of the dictionary is accessible as long as the input is <= Window Size.

Therefore, to give an extreme case, it's possible to have a 50 MB dictionary, and it's entirely accessible even if the Window Size is only 1 KB, but only as long as the data to de/compress is <= 1 KB. As soon as we reach 1 KB + 1, reaching the dictionary is no longer valid.

For delta compression, this has consequences. Presuming the reference is ~50 MB, and the updated content is also ~50 MB, one would need a window size of ~50 MB to ensure access into the dictionary during the entire reconstruction process.
But if the updated content is just a bit bigger, say ~51 MB, a ~50 MB window size would allow delta operation only during the first ~50 MB, the last ~1 MB would have to be compressed without dictionary.

If follows that the size of the dictionary is irrelevant for the Window size. What matters is for how long one wants to access the dictionary while rebuilding the content. And as a general approximation, one wants this dictionary during the entire reconstruction process, so the window size is effectively the size of the content to rebuild.

Zstandard has a "windowless" mode which states exactly that : the size of the "window" is the size of the content. For this to work, one has to set the size of the content at the beginning of the compression process .
Setting the content size is either an automatic operation when data is provided entirely upfront (ZSTD_compress2()), or, when data is provided in small increments (ZSTD_compressStream()), it must be stated explicitly at the beginning (ZSTD_CCtx_setPledgedSrcSize()).
libzstd will automatically trigger the "windowless" mode when the declared content size is smaller than the set Window size (effectively reducing the memory requirement for the decoder side). I'm not sure if there is another way to trigger it.

Note that the decoder will be informed of the Window Size requirement by reading the frame header, and may decide to not proceed if this value is deemed too large. For example, zstd has a default 128 MB limit, and will refuse to decompress payload requiring more that that, unless explicitly authorized to employ more memory (--memory=).

Just for reference, zstd offers a "large delta compression" mode with the command --patch-from=. It has made us somewhat familiar with some of these issues. In situations where the amount of data to compress and compare is very large, we now prefer to mmap both the dictionary content and the destination file. It has proven useful to reduce memory pressure.

@pmeenan
Copy link
Contributor

pmeenan commented Mar 9, 2024

Thanks. That makes things a bit more complicated for the case where we want to require a minimum window size that the server can trust that all clients advertising br-d content-encoding will be able to decode.

For the dictionary case it might be cleanest to use the same 128MB limit for window sizes as the default zstd limit (knowing that the client won't allocate that much unless it actually gets a stream that uses it).

@eustas
Copy link

eustas commented Mar 25, 2024

Hello.

Shared dictionary is placed in the address space between backward reference window and static dictionary. In other words it does not interfere with normal compression window, but using it makes "static dictionary" references more expensive. We decided to order it that way, because shared dictionary works almost the same way as extra window.

Sorry for the late response.

Best regards,
Eugene.

@eustas
Copy link

eustas commented Mar 25, 2024

Extra note.

As you can see brotli allows backward distances behind window. IIRC, regular (as opposed to large-window) brotli allows distances up to ~56MiB, i.e. larger shared dictionaries would not be useful (== referenced)

@pmeenan
Copy link
Contributor

pmeenan commented Mar 26, 2024

Thanks. For the dictionaries to be usable for a broad set of apps (including large wasm bundles), I'm considering spec'ing:

Brotli: Regular 16MB window, allowing for delta-compression of resources up to ~50MB.
ZStandard: 128MB window, allowing for delta compression of resources up to around that size, depending on where the duplicated bytes are in the updated resource.

The main difference with regular content-encoding is on the ZStandard side where the window is 8MB.

For platforms that are resource constrained and are not willing to support a 128MB window they would be limited to Brotli (and then decide how large of a dictionary they are willing to keep and use).

Another option on the ZStandard side would be to allow for a maximum window that is a multiple of the dictionary size (i.e. 8MB or 2 * dictionary size whichever is larger with a 128MB max). That way the client can have an upper limit that it knows in advance before advertising zstd-d as being available. That adds complexity to the encoding side though.

We have already seen cases where people want to delta-compress resources > 50MB (wasm apps) where they had to use ZStandard instead of Brotli because of the Brotli limits so I don't think a small limit like 8MB for ZStandard would work with this use case.

@martinthomson
Copy link
Contributor

That massive resources exist is not sufficient justification to build a system that will not be available to some users. I would prefer a far smaller default. If only to encourage the use of smaller resources overall.

Defining a separate codepoint for massive windows is possible, say zstd-bloated-d. The scalable option also seems reasonable. Clients can then choose to offer dictionaries when they exceed available resources.

@pmeenan
Copy link
Contributor

pmeenan commented Mar 27, 2024

I'm ok using the same 8MB window (-wlog=23 I believe) that we use for zstd for zstd-d and note that the effectiveness of delta compression for responses > 8MB using zstd-d may be limited. The scalable option concerns me that people are likely to get it wrong and it won't be obvious and the decompression will start to fail as the dictionary and resource size changes.

@eustas does that sound about right?

@felixhandte
Copy link

The alternative floated where zstd-d's window size is derived from the size of the dictionary is much more attractive to me. (That is, cap the window size log at ceil(log_2(clamp(8 MB, N * dictionary_size, 128 MB))), where the spec picks an N somewhere in the range [1, 2].)

In such a scheme, where the window size is deterministically derived from the size of the dictionary--a resource the client must already have access to--the client is free to decline to advertise support for zstd-d for dictionaries which would require the use of a window larger than the client is willing to support. I.e., in resource-constrained environments, Chrome could avoid having to allocate more than 8 MB windows for zstd-d by not advertising support for zstd-d on requests where the resolved available dictionary is larger than 8 MB.

Put another way, given that the client will have the tools to make these choices for themselves, I would rather give the client discretion here rather than just bake a flat restriction into the spec.

@pmeenan pmeenan reopened this Apr 11, 2024
@pmeenan
Copy link
Contributor

pmeenan commented Apr 11, 2024

@felixhandte would you be supportive of updating the zstd command-line tooling to add an option that would set the window to the larger of 8MB or the dictionary size (i.e. match the content-encoding behavior)?

My main concern is that it is easy to document and get people to use a fixed command-line for a 8MB window but if it involves a more complicated algorithm then it is likely to break in subtle ways that they may not notice until the resources are bigger than the dictionary (and are bigger than 8MB).

@felixhandte
Copy link

@pmeenan, yeah, that would be pretty straightforward. Happy to add that.

@pmeenan
Copy link
Contributor

pmeenan commented Apr 17, 2024

If we end up creating a new file/stream format that bundles the dictionary hash with the compressed stream we can make sure the tooling for that file format handles the window size stuff for zstandard automatically (my main concern is around unexpected breakage from different server and client window requirements that only trigger when resources grow).

The language will be a bit complex but will probably have to look something like A stream of bytes compressed using the Zstandard protocol with an external dictionary of not more than the larger of 8 MB and the size of the dictionary.

@Cyan4973
Copy link

Cyan4973 commented Apr 17, 2024

If I may chime in for some tactical consideration :

In an update scenario, it's common for a new resource to be slightly larger than the old one it replaces. The expectation is that we generally tend to add stuff over time.

A formulation which caps the window size to the size of the dictionary would introduce problems for all these update cases, which are expected to be the majority of update scenarios.

On the other hand, we don't want this increase in size to be completely uncapped, in order to keep control of memory size. And if the new resource is really much larger than the old one, the benefits of delta compression are dubious anyway.

So the suggestion is to authorize a window size which is slightly larger than the external dictionary size.
As to how much larger exactly, a suggestion here is by +25% (x1.25).

Detailed explanation:

I would expect a large majority of update scenarios to fit into this picture, with the new resource being slightly larger than the old one by a few %.

For the less common cases where the new ressource is much larger than the old one, Zstandard could still employ the dictionary but only for the beginning of decompression. The compression process will have to select a window size which remains under the specified limit (x1.25 external dictionary size).

As a technical insight, note that byte-accurate window size is only a thing when the window is the entire decompressed content. Otherwise, the window size is encoded in a condensed format, which can only represent increments by 1/8th of a power of 2.
This is not a problem to determine this "largest window size under the specified limit", but due to limited accuracy, you can already guess what would be the problem is we were to keep the formulation "not more than the larger of 8 MB and the size of the dictionary".

Imagine that the external dictionary size is 9 MB - 1 bytes, and the new resource to delta-compress is 9 MB, aka 1-byte larger than the dictionary. With the initial formulation, since the window size is limited to the size of the dictionary, the compressor cannot use 9 MB as a window size, because it's 1 byte larger than the limit. So, it will have to find the next best window size under the limit. Due to granularity limitations, this next best window size is 8 MB.
So delta compression would proceed fine for the first 8 MB,
but for the last 1 MB, data would have to be compressed without the benefit of the dictionary.

With the newly proposed limitation to "x1.25 the dictionary size", delta compression would remain active in above scenario during the entire decompression process.
And it would remain fully active if the new resource to decompress is 10 MB, 11 MB or 11.24 MB.
But if the new resource is 11.25 MB or larger, then the compressor will automatically limit the window size to 11 MB (since it's the largest representable window size under the limit, which, in this particular instance, is 9437183 * 1.25 = 11796478 = 11.25 MB), achieving the wanted memory size control.

@pmeenan
Copy link
Contributor

pmeenan commented Apr 17, 2024

Since neither the server not the client may know what the resource size will be when compression starts (say, in a streaming HTML case), won't the 1/8th power of 2 granularity apply even for cases where the resource is the same size (or sometimes smaller) than the dictionary?

For resources slightly larger than the old one, the dictionary would still be used for the majority of the resource but the last bit would fall back to regular window-based compression. I agree that having the dictionary available for the whole resource would provide better compression but I'm not sure where to draw the line. Something like a 25% fudge factor can be quite a bit when you start talking about 50 MB+ resources that might be close to where the client wants to draw the line anyway. They may have allowed the dictionary at 50 MB but not allow it's use at all if it needed 62 MB available for a window.

A similar case could be made for the non-update case where you have a custom-built dictionary but the resource being compressed exceeds 8 MB. Only the first 8 MB of the resource would be able to use the dictionary. Hopefully at that point the compression window would have most of the stuff that was in the dictionary but the dictionary becoming unavailable is something we should expect for some cases with ZStandard, even if we use a fudge factor.

Is there a way to convey the 1/8th power of two granularity on the window size or is it something that the ZStandard libraries will take care of automatically for users (i.e. setting the window size to 11 MB when the caller sets it to 11.24 MB)? I'm wondering how much of this implementors are going to have to deal with directly.

@Cyan4973
Copy link

Cyan4973 commented Apr 18, 2024

Since neither the server not the client may know what the resource size will be when compression starts (say, in a streaming HTML case), won't the 1/8th power of 2 granularity apply even for cases where the resource is the same size (or sometimes smaller) than the dictionary?

It depends if the compression side (server) is informed upfront of the size of the payload to compress.
If not, then indeed you are right, the compressor will have to set a Window Size, and the granularity rule applies.

So for example, let's assume the payload to delta-compress is 50 MB, and the reference dictionary is also 50 MB. Let's then assume it implies a window size limit of 50 MB (dictionary size) and the compressor is not informed of the size of the payload. In which case, the compressor will have to set a Window Size, and it will have to settle for the largest representable window size under the limit, which happens to be 48 MB. Therefore, the dictionary won't be used during the compression of the last 2 MB of the payload.

With the proposed +25% tolerance rule, the limit would become 62.5 MB, and the largest representable window size under the limit is 60 MB. Now the dictionary remains active during the entire compression of the 50 MB payload.

When the data to decompress ends up being smaller than the window size, it's obviously not a problem for the decompression process: the final decompressed size is still properly reported at the end. It's just that the intermediate buffer allocation will be larger than it could have been.
Note that libzstd uses malloc for the allocation of buffers, so the address space is just reserved, but the physical memory may never be used if it's not needed (i.e. if there is no write action on the later segment part).

Something like a 25% fudge factor can be quite a bit when you start talking about 50 MB+ resources that might be close to where the client wants to draw the line anyway. They may have allowed the dictionary at 50 MB but not allow it's use at all if it needed 62 MB available for a window.

+25% is more like a suggestion, this can certainly be discussed if some other limit feels better.
I liked that it's a (binary) clean size / 4 factor, and it's larger than 1/8th, which covers the scenario above, and gives a little bit of breathing room if the newer payload is a bit larger than the older reference one.

A similar case could be made for the non-update case where you have a custom-built dictionary but the resource being compressed exceeds 8 MB.

I see this scenario as being very distinct from delta-compression.

For delta-compression, we expect large segments to be identical between the reference and the payload to compress, so we want this ability to access the reference during the entire compression of the payload. The price to pay is that the reference is very specific, and is not designed to offer any serious benefit to any other payload.

Custom dictionaries are different: they are supposed to carry the most relevant "commonalities" detected across a set of samples. Presuming a large enough sample set, the common components are not expected to be very large. In such a scenario, the custom dictionary is expected to be (relatively) lightweight (compared to delta-compression), likely in the <= 100 KB range.
Consequently, by the time the payload to compress reaches 1 MB, it should already contain all references it needs to compress the rest of the file. I would not expect a large impact of the custom dictionary beyond that point, let alone beyond 8 MB.

Is there a way to convey the 1/8th power of two granularity on the window size or is it something that the ZStandard libraries will take care of automatically for users (i.e. setting the window size to 11 MB when the caller sets it to 11.24 MB)?

The plan would be for the library to do this adaptation automatically. The user would not have to know anything about it, though it's better if the user understands that the library selects the closest window size which is <= to the limit, so that they are not surprised if, upon inspection, the final window size is not byte-identical to the set limit. A good API documentation will likely help to reach this objective

I'm wondering how much of this implementors are going to have to deal with directly.

If by implementors you mean "server side programmers", they would have to know what's the implicit window size limit when a client requests a zstd-d format (essentially the rule you are currently defining). Then they would set this limit as a compression parameter, and the library will take care of the adaptation into window size.

@felixhandte
Copy link

[...] won't the 1/8th power of 2 granularity apply even for cases where the resource is the same size (or sometimes smaller) than the dictionary?

Yes. (Keeping in mind of course that most traffic in question will presumably be under the 8 MB floor and so none of this complexity will apply.)

I'm wondering how much of this implementors are going to have to deal with directly.

Right now the zstd library has limited controls about setting detailed window constraints. We'll need to add a new option to give users that fine-grained control. We can do that with an eye towards user simplicity.

Frankly, I'm also very open to also adding a --rfcXXXX-window-mode flag (and equivalent compression parameter in the library) that would just look at the size of the dictionary (which zstd necessarily has access to) and would configure the window accordingly. In that case, users wouldn't have to do any math at all.

@pmeenan
Copy link
Contributor

pmeenan commented Apr 25, 2024

OK, I chaged the structure and language so that they are no longer streams of bytes compressed with a given algorithm but are now specifically Dictionary-Compressed Brotli and Dictionary-Compressed Zstandard.

The references for those named formats have the window restrictions (which for Zstandard includes the 1.25x dictionary requirement).

Currently the byte stream itself is still a native Brotli/Zstandard stream, just with specific settings but this also makes it cleaner if we decide to embed the hash in the response stream.

PR with the update is here: #2780

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment