Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFP: Zstandard Compression and the application/zstd Media Type #105

Closed
Jxck opened this issue Oct 5, 2018 · 29 comments
Closed

RFP: Zstandard Compression and the application/zstd Media Type #105

Jxck opened this issue Oct 5, 2018 · 29 comments
Labels
venue: IETF Specifications in IETF

Comments

@Jxck
Copy link

Jxck commented Oct 5, 2018

Request for Mozilla Position on an Emerging Web Specification

  • Specification Title: Zstandard Compression and the application/zstd Media Type
  • Specification or proposal URL: https://tools.ietf.org/html/rfc8478
  • Caniuse.com URL (optional):
  • Bugzilla URL (optional):
  • Mozillians who can provide input (optional):

Other information

https://facebook.github.io/zstd/

@dbaron
Copy link
Contributor

dbaron commented Oct 5, 2018

Is there a particular context in which you're interested in it being supported? As an HTTP Content-Encoding and Content-Transfer-Encoding? Maybe integration with transform streams once they exist? Other uses?

I think @indygreg has some familiarity with Zstandard, and maybe @ddragana or @martinthomson might have opinions?

@dbaron dbaron added the venue: IETF Specifications in IETF label Oct 5, 2018
@Jxck
Copy link
Author

Jxck commented Oct 10, 2018

@dbaron I'm interested in support zstd in Content-Encoding like brotli.

@martinthomson
Copy link
Member

I believe that the Facebook folks indicated that they didn't intend for this to be used on the web on the basis that brotli was sufficiently performant. On that basis, this isn't that interesting, if this is indeed the case. We'd probably want to see performance numbers that justified the costs (which would include changes to the HTTP/QUIC header compression static tables, if we were serious).

@indygreg
Copy link

Caveat emptor: I'm not familiar with the implications of commenting on this matter and my words here reflect my personal opinion as someone familiar with the technology and not that of an official Mozilla position. I'm also not familiar with the nuances involved in making a decision to support zstandard in a web browser.

When I wrote https://gregoryszorc.com/blog/2017/03/07/better-compression-with-zstandard/ in March 2017, my opinion would have been "zstandard on the web doesn't make much sense because the web has brotli and zstandard it isn't sufficiently different from brotli." This was also the conclusion we reached in https://bugzilla.mozilla.org/show_bug.cgi?id=1352595 when investigating zstandard for omni.ja compression (we ended up supporting brotli compression because it was readily available). Had zstandard been added to the web before brotli, I would have said the same thing at the time were someone to propose adding brotli to an already zstandard-enabled web.

Fast forward ~1.5 years. In the time since, zstandard has continued to improve substantially.

A lot of work has gone into dictionary compression, which could have positive benefits to the web. Read more at https://github.com/facebook/zstd/releases/tag/v1.2.0, https://github.com/facebook/zstd/releases/tag/v1.3.5, and https://github.com/facebook/zstd/releases/tag/v1.3.6.

Another major new feature is support for lower/faster compression ratios (exposed as negative compression levels). This allows zstandard to approach lz4's compression/decompression speed. Read more at https://github.com/facebook/zstd/releases/tag/v1.3.4.

While I have doubts it will be useful for web scenarios (due to high memory usage requirements), the "long distance matching" or "long range mode" has improved a bit, allowing faster compression at ultra high compression ratios.

A potentially killer feature for the web is "adaptive compression," where zstandard can dynamically adjust compression parameters to account for line speed. e.g. if the receiver isn't consuming data as fast as zstandard can generate it, zstandard can throw more CPU at compression and reduce the amount of data going over the wire. Or if things are sender-side bottlenecked, zstandard can reduce CPU/compression and send more bits over the wire. The good news is zstandard has this feature and it is actively being improved (see https://github.com/facebook/zstd/releases/tag/v1.3.6). The bad news is it isn't part of the libzstd C API. I'm not sure if this feature will ever be part of the C API. Nor do I know how much work it would be to port this feature to the web.

When I wrote my aforementioned blog post about zstandard, I lauded the flexibility of zstandard's compression/performance settings. You could go from low CPU/memory and very fast but poor ratio compression all the way to high CPU/memory and slow but high ratio compression. In the time since, the introduction of negative compression levels and long distance matching has broadened the use cases for zstandard. I believe it is without a doubt the best general purpose compression format available today.

I should add a caveat that I haven't been following brotli's development super closely. But its development velocity is slower than zstandard's and a quick perusal of its release notes doesn't seem to reveal anything too exciting. It kind of looks like it is in maintenance mode or only looking for iterative improvements. (This could be a good thing for web technologies, I dunno.)

One aspect of zstandard that is important for web consideration is its memory usage. Different compression settings have vastly different memory requirements on both producer and receiver. Obviously not all devices are able or willing to use all available memory settings. So considerations must be made on what "acceptable" memory use should be. RFC 8478 recommends limiting decoder-side memory usage to 8 MB. Should zstandard be exposed to the web, some thought should go into more formally expressing memory limits. My (pretty naive about web matters opinion) is that it would be wrong to limit to 8 MB across the board because some exchanges could benefit from using the extra memory and 8 MB could be really small in the future (just like zlib/deflate's 32 KB max window size is absurdly small in 2018). I think it would be better for peers to advertise memory limits and to negotiate an appropriate setting. Maybe 8 MB is the default and adaptive compression is used to increase, if allowed. I suspect a media type parameter could be leveraged to express memory requirements. I'm not sure if this was discussed as part of publishing RFC 8478...

It's also worth noting that RFC 8478 and the application/zstandard media type only begin to scratch the surface with what's possible with (zstandard) compression on the web. Compression contexts in existing web technologies seem to map to single requests/responses. e.g. an HTTP Content-Encoding or HTTP/2 stream has a lifetime for the HTTP message payload. But you can do so much more with zstandard.

For example, you can keep the compression context alive across logical units and "flush" data at those logical boundaries. This would allow the compressor/decompressor to reference already-sent data in a future send, reducing bytes over the wire. This was recently discussed at facebook/zstd#1360. And Mercurial's new wire protocol leverages this to minimize bytes over wire.

You can also "chain" logical units so the compression context for item N+1 is seeded with the content of item N, allowing zstandard to effectively generate deltas between logical units. I have an API in python-zstandard for this https://github.com/indygreg/python-zstandard#prefix-dictionary-chain-decompression.

Both these "flushing" and "chaining" concepts can be implemented in the form of a custom media type (and I don't believe are unique to zstandard). But I believe web technologies could potentially benefit by promoting these ideas to 1st class citizens, where appropriate. e.g. in "stream" APIs that send N discrete logical units between peers. (There are obviously security/performance considerations to keeping long-running compression contexts alive in memory, potentially across logical requests.) What I'm trying to say is it feels like web technologies are only scratching the surface of what's possible with compression and there's potentially some significant performance wins that can be realized by leveraging "modern compression" on the web. But I digress.

All that being said, I'm not sure if there's enough here to justify both brotli and zstandard on the web. I do believe zstandard is the superior technology. But a strong case can be made that brotli is "good enough," especially if we're limiting the web's use of compression to "simple" use cases, such as traditional one-shot or streaming compression using mostly-fixed compression settings. Zstandard's case grows stronger if you want to explore dictionary compression, adaptive compression, negotiation of compression levels/settings, and flushing/chaining scenarios.

With regards to Martin's comment about Facebook's prior indications, I would encourage reaching out to Yann Collet (the primary author of zstandard - and lz4) for his thoughts. He's Cyan4973 on GitHub.

I hope this information is useful in making a decision!

@martinthomson
Copy link
Member

Well, maybe @Cyan4973 can add thoughts here.

I understand that there are advantages, but the disadvantages of having yet another format are not insubstantial.

That dictionary-based scheme is where I hold the most hope for zstd. There are several highly-motivated people working on studying this now. Those schemes are considerably more complex than a simple Content-Encoding: zstd, but offer massive compression advantages. However, the security risks are significant. Exploitable side-channels created as a result of mixing secret and attacker-controlled inputs to compression mean that any effort to deploy a dictionary-based scheme is fraught. We've been burned in the past and so we are being extremely cautious. You can see some of the discussion in this thread.

@Cyan4973
Copy link

Cyan4973 commented Oct 16, 2018

"Normal" zstd for http traffic is more targeted at productivity tools, such as wget / curl.
That's a domain where the advanced features mentioned by @indygreg provide nice benefits.

For the web, aka the typical webpage displayed by Mozilla's Firefox, the situation is substantially different. A single innocent looking web page is nowadays composed of multiple resources of different nature (and different sources). html proper is merely one small part of it, there is also css, javascript, json, xml, etc.
Not even mentioning the elephant in the room, images and video, which are excluded from this discussion, having their own compression technology attached.

One could imagine to divide typical web traffic into a manageable set of ~5/6 categories, and then use a dedicated dictionary for each category. Seems simple design.
So we made some preliminary tests based on this idea, and the benefits were shockingly good, way better than expected, and significantly better than the best alternative method we could compare to.

Sure, dynamic dictionary fetching can provide even more benefits, but I don't see that happening any time soon (for public environments). I suspect the current scrutiny regarding potential (unknown) security implications will delay any adoption in this area by a number of years, if ever.

As a consequence, I am more in favor of baby steps, reducing risks, and delivering some of the benefits of dictionary compression within a manageable timeframe. Introducing a static set of dictionaries, bundled with zstd (identified, for example, by a dedicated encoding tag like zstd-dictsetv1), can provide substantial benefits on short term, without carrying the security implications of dynamic fetching.

Of course, it also means that designing this static set of dictionaries becomes a critical operation, since global efficiency will directly depend on it, and eventually it will become a baseline for a number of years.

For this critical stage, it's very important to know the web, type of resource, respective share and their evolution, have some available representative sample set, do some training, testing, shadowing, etc.
And that's also why I'm cautious about publishing any figure right now. We did some tests, but with the limited knowledge and tooling we had to model the web. A real team of web experts would do all these things better, and likely influence the design in subtle yet important ways, ultimately for the greater benefit of web efficiency.

As far as web expertise is concerned, it's hard to imagine any organization better than Mozilla.
Hence come a suggestion : would Mozilla be interested in the design of this initial set of static dictionaries ? I do believe that, if any step in this direction had to be attempted, it only makes sense with the involvement of an organization as important as Mozilla.

@mnot
Copy link
Collaborator

mnot commented Oct 16, 2018

How would you choose a static dictionary for something like JavaScript? Beyond the basic syntax and idioms of the language, anything more seems like it would be Mozilla (or whoever) choosing a winner among JS libraries / frameworks -- something that seems like it would be good to avoid on an Open Web.

@Cyan4973
Copy link

Cyan4973 commented Oct 16, 2018

An efficient dictionary is built on collection of statistics. A general plan to reach this goal is to grab a sufficiently large and representative sample of the web, and pass that to the generator. The generator automatically determines the best fragments and their ranking. There is no manual selection of any content anywhere in the process, so no one gets to pick a winner.

One could say that selecting the samples could be an indirect way of favoring a winner. And that's why it's important to collect samples in a way which is as neutral and universal as possible. A few players stand out in this respect, and can be trusted, both technically and ethically, to get close to this objective (I obviously think of Mozilla as one of them). I also believe the sampling methodology should be published, to increase trust.

There are a few golden rules to create a good sample set :

  • the set should be geographically diverse (avoid getting all samples from the same region)
  • the set should be timely diverse (avoid having all samples from the same day)
  • the set should be "large enough" (which is no problem when it comes to web content)

A compact dictionary is merely a few dozens of kilobytes, so the final selected fragments truly "stands out" and will be present a lot of times in the sample set. When it comes to text-based sources, the final dictionary can even be visually inspected.

One could say that, even with all these safeguards in place, the sample set will be representative of the web as it is now, and therefore not follow its future evolution.
That would be fair : that's in essence the limitation of "static" dictionaries.

For this topic, one can answer :

  • The "static set of dictionary" proposal is a temporary solution, allowing the web to enjoy the benefits of dictionary compression on short-term, while waiting for consensus to emerge on dynamic-fetching. Once dynamic fetching exists, any framework will be free to create its own tailor-made dictionary to maximize its own efficiency. That's the final goal.
  • If it takes a long time to reach any consensus for dynamic fetching, even the static dictionary set can be updated later on. The update rate will be slow, but at least this will make it possible to follow long-term evolution of the web. As previously, the methodology for building this refresh should be published.

For discussion.

@kaizhu256
Copy link

i'm skeptical of custom/dynamic dictionaries (and state-based compression) being worth the trouble for most [web] product-developers. web-projects are fraught with risks, and many have learned (the hard-way) to always choose the safe, zero-config solution over the marginally-better-but-more-complicated one.

you should limit your scope with the jquery-approach - focus on a universal, zero-config solution thats easy-to-use (and with "good-enough" performance-improvements over zlib), ratherr than for all-out performance at the cost of usability.

@felixhandte
Copy link

@kaizhu256: you're absolutely right. A solution that requires individual site maintainers to create, configure, and deploy the state that forms the basis for stateful compression will never be (and should never be) widely adopted by site operators.

However, I think that that objection misses the point. Judging a tool's utility by looking at unweighted operator adoption ignores the fact that actual HTTP traffic is extremely strongly skewed towards a very small number of very large operators (e.g., Akamai, Cloudflare, AWS S3, Google, Facebook...). For these organizations, anything that produces wins will probably be worth deploying, and the engineering cost to do so will be tractable given those organizations' scale. And even if only those organizations were to deploy such a system, internet users as a whole would benefit, since a significant amount of their traffic is terminated by those origins.

Beyond that though, I think we can also pursue some form of state-based compression that can be implemented correctly/safely with no operator oversight (other than enabling/linking some mod-cdict in their HTTP server). Certainly that's possible if we define a static set of dictionaries.

@mnot
Copy link
Collaborator

mnot commented Oct 17, 2018

@Cyan4973 I understand how to build a representative dictionary. What I'm asking is whether empowering content that is already popular by making it more compressible -- thereby giving it a competitive advantage over its competition, including new libraries -- is good for the long-term health of the Web.

@Cyan4973
Copy link

I suspect there is a difference in projected timeline.
The "static dictionary set" suggested here is not expected to become a standard that will remain "forever".
It's merely a snapshot. By definition, the snapshot is representative of the traffic sampled to create it, but then traffic changes, so the snapshot must change too.

We have been looking at long term pattern evolutions, and our current understanding is, there is no such thing as a "universal timeless referential", immune to framework/coding trends. If one looks at the code of websites 10 years ago, one sees little in common with nowadays. All referential decay. By artificially undervaluing present samples, the main thing it achieves is to create a dictionary which is less relevant today, but it doesn't get any better in the future (we tried).

A dictionary is expected to have a short lifespan. How short is the good question. In private environments, dictionaries can be updated every week, even faster for special cases. For the web "at large" it wouldn't work. I suspect that just discussing the update mechanism could take several months. Targeting a lifespan of a few years, including overlapping periods with older / newer dictionaries, feels more comfortable. Even at such a slower pace, the main property remains : this scheme does not remain "stuck in time", it evolves with the web.

Also :

  • The suspicion that a dictionary built from sampling a period benefits successful frameworks of the same period is valid, but it doesn't mean that the resulting dictionary has no generality. Sure, patterns emerge, but surviving patterns do not necessarily "belong to a single framework". Here are some examples from a recent (limited) test :

    • js:
      • return true;
      • return false}
      • Number.isInteger
      • d.type="text/javascript";
      • which is licensed under the Apache License, Version 2.0 */
      • window.console&&window.console.log
      • );break;case
    • html:
      • <meta charSet="utf-8"/>
      • <link href="/favicon.ico" rel="icon" type="image/x-icon" />
      • <meta name="description" content=""/>
  • There is of course no guarantee that any given document will benefit from any of these entries, but it should be little surprise to find them, irrespective of the framework used to create these pages. Note that due to scarcity of time and resource, I had to run above test with a limited test of samples; a larger, diverse and more representative sample set would force the elected entries to be more general.

  • When it comes to text documents, a visual inspection can also be completed on the resulting dictionary, in order to check if included content is too "specific" or advantageous for a single framework.

  • The chances that a brand new compression scheme quickly becomes important enough to influence web traffic look dim. Due to required client / server compatibility, deployment will be slow, usage will be partial, etc. hence I can't imagine an initial static set of dictionaries having any impact during the course of its lifetime.

  • Let's remind that the "set of static dictionaries" is merely a "baseline default", while waiting for availability of dynamic dictionaries. When dynamic ones become available, larger web sites and frameworks will be better off providing their own more specialized dictionaries, for substantially improved efficiency.

  • While the discussion is focused on Javascript, it's just one data type among many that can be addressed. Different types can be more consistent over time, and it might be better to start with the easier ones.

@jyrkialakuijala
Copy link

Would be nice to learn about streaming abilities of zstd. With some of our previous designs of compressors (like gipfeli) we used formats that were faster but less streamable than brotli. In brotli nearly every byte you receive will bring you new decodable data, i.e., no hidden buffer, and you can start decoding from the first bytes. If further processing of decompressed data is CPU heavy (such as parsing and dom reconstruction), being able to start it earlier during the transfer can lead to substantial savings. If further processing depends on high latency events like fetching new data, being able to issue these fetches earlier is rather important.

How many bytes do you need to receive before you can emit the first byte? Does making it acceptable for web use need special settings that impact compression density or decoding speed?

@jyrkialakuijala
Copy link

jyrkialakuijala commented Nov 14, 2018

Would be nice to see a comparative benchmark that compresses and decompresses payloads for the internet with the planned decoding memory use (something like 256 kB to 4 MB range for backward reference window). Even better if I can run that benchmark myself on my favorite architecture.

zstd used to be in this benchmark, but the benchmark author removed it. Perhaps we could convince them to add it back. https://sites.google.com/site/powturbo/home/web-compression

@adamroach
Copy link
Contributor

(adding for later use) -- Bugzilla bug is https://bugzilla.mozilla.org/show_bug.cgi?id=1301878

@adamroach
Copy link
Contributor

What I'm seeing in this thread is that, while everyone who has weighed in so far thinks zstandard is a neat technology, its utility in a general-purpose web browser is of somewhat limited utility (given that we're shipping brotli already), and (as @martinthomson points out) adding more formats without significant improvements is generally something we want to avoid (due, among other things, to increased maintenence cost).

This calculus may change as standardized dictionaries are published and we get a feel for performance with such dictionaries. My proposal is to mark this as defer, pending availability of such dictionaries and information about their effectiveness.

Any final comments before I close this issue accordingly?

@Cyan4973
Copy link

Cyan4973 commented Nov 28, 2019

1 year ago (which is when this thread was started), I would have agreed with this conclusion. But since then, our experience on http traffic has improved, and as a consequence, our position has shifted a bit.

Facebook is a big user of brotli. We will continue to use brotli, but we had to dial it down a bit, for the following reasons :

  • static resources : this is an area where brotli excels. It used to be deployed with maximum compression level, for best ratio, on the ground that static resources are compressed once, then shipped many times.
    In practice, this view is too simplified. I'm not sure how much details can be shared here, but let's say that this view disregards the quantity of "static" resources to generate, their lifetime, which is not always long, and content distribution patterns. Maximum compression level was making a visible cpu dent in edge infrastructure, so we had to dial down compression level to something more manageable.
    At the new compression level, compression ratio is more comparable to zstd, which can offer the same or slightly better at comparable cpu cost. That being said, a little better is not enough, and we are likely going to continue to use brotli for static resources in the foreseeable future.

  • dynamic resources : there was an initial push for brotli in this area. It was cancelled, and gzip was preferred. Some limitations become visible "at scale". On a benchmark involving a single connection from a server to a modern PC, brotli is likely to produce nice results. But when the traffic involves a myriad of parallel connections, serving terminals of many forms and various capabilities, it does no longer hold. For many android devices, especially low-end ones, gzip offers a more consistent experience, which is a better trade-off for end users.
    On the server side, the impact is even more pronounced : the accumulation of parallel connections is pretty taxing on resources, and the scenario favors gzip which is more frugal. Due to the amount of requests served every second, this makes a sensible difference for our bottom line, so selecting gzip for mobile traffic, which is the largest share of Facebook traffic, was non-controversial.
    What changed since last year is that we introduced zstd compression for mobile traffic, and it's working great. Compared to gzip, it offers improvement on all key metrics simultaneously, saving bandwidth, server load, and even improving client responsiveness, so much that it's visible in our interaction metrics.
    As a consequence, nowadays, http traffic for Facebook mobile applications is being delivered zstd-compressed.

Therefore, if Facebook found it advantageous for its architecture to deploy zstd compression for dynamic content even without the benefit of dictionaries, one can wonder if the same conclusion could become valid for other Internet services.

That's a new element, which wasn't available one year ago, and may be relevant to the outcome of this RFP.


On the topic of dictionary compression for the web, there are progresses too, and early experiments measure excellent results. But it's early days, it's not widely deployed yet, and we hope to share more details in the future.
That being said, we understand that dictionaries add a level of complexity, hence can't be expected to be deployed in a short timeframe.
For the fully dynamic dictionary proposal, an effort has been started at IETF, with a first document centered on security considerations, led by @felixhandte. It's a first step, and reaching full standardization is likely going to require some time.

A set of static dictionaries would be a nice way to bridge that gap, since it removes the issue of dynamic fetching, where most security risks are concentrated. Even as a temporary solution bridging a few years while waiting for standardization of fully dynamic dictionaries, it can bring valuable benefits for the Internet ecosystem. Plus it's not new since brotli already ships a standard dictionary as part of its library.

I believe this is a different topic, and it may deserve opening another RFP, keeping this one for zstd without dictionary only. There already are arguments about the scope and fairness of such a proposal, which would be better discussed in a dedicated thread. More importantly, I believe Facebook should not carry the topic alone: it makes more sense if a number of Internet actors get involved in the selection of a dictionary set.

@brunoais
Copy link

brunoais commented Nov 28, 2019

I have built a POC infrastructure at work with custom tweaked wget which receives answers zstd compressed of dynamic content.
What Cyan4973 mentions upholds to us too. Internet traffic decreased by a significant amount when running the same tests which had known to overwhelm our load tests datacenter due to too much data passing in the switches and CPU topping off to try to compress with gzip. The network was jamming for (average) ~20% of the time.
zstd at medium settings (3-5), at the same calling load, our servers were only at around 60% CPU and the network was not overwhelmed any more (<3% is jamming signal).
This was such a good result that there are suggestions flowing around about making zstd to gzip in the load balancer (bad idea btw) to reduce load on that part of the datacenter and serve dynamic http content to users gziped.

In our end zstd is proving its worth. We agree on pushing this one forward!

@adamroach
Copy link
Contributor

Tagging @indygreg, @ddragana, @martinthomson, @dbaron, @lukewagner, @bholley, and @annevk for input, taking the conversation so far into account. It would be most ideal if you could weigh in on your proposed disposition of this topic (from among important, worth prototyping, non-harmful, defer, and harmful), but any input would be welcome.

@brunoais
Copy link

brunoais commented Dec 5, 2019

I classify as: worth prototyping non-harmful

@bholley
Copy link
Collaborator

bholley commented Dec 5, 2019

I lean towards defer. It seems like promising technology with support from some key players, but I'm not yet sold on the ROI.

To reconsider, I'd want to see a more comprehensive and quantitative analysis of the benefits zstd would unlock on the web today - either dictionary-less, with static dictionaries, or with dynamic dictionaries. For either of the latter two options, we'd also need a credible plan for generating and delivering those dictionaries that aligns with our values.

Compression schemes can mature and flourish without ground-floor support from web browsers. There are lots of organizations with strong economic incentives to use the best technology between endpoints under their control - so if zstd is truly the superior choice, I expect we'll see other players starting to deploy it. Repeated and diverse success stories would certainly bolster the case for inclusion in the web platform.

@rockdaboot
Copy link

I have built a POC infrastructure at work with custom tweaked wget which receives answers zstd compressed of dynamic content.

@brunoais There is wget2 with upstream support for zstd and brotli (and others). You can adjust the number of parallel threads for testing (for lists of URLs or for recursive downloads) to quickly compare impact of different compression types. Wget2 also has --stats-* options to write timings and payload sizes (compressed and uncompressed) and more as CSV - easy to feed that into most stats/gfx tools. https://gitlab.com/gnuwget/wget2

@adamroach
Copy link
Contributor

I sincerely appreciate the input from advocates of the ZSTD scheme here, and I thank you for taking the time and effort to make your case.

In parsing out the positions of Mozilla community members, I'm seeing a pretty clear signal that we want to place this in defer for the time being. I'll note that some commenters are participating psueudonymously, which means I may have overlooked a contributor. If you think this is the case, please let me know what name on about:credits or https://mozillians.org/ I should be looking for.

I plan to close this as defer, with the following detail (thanks to @bholley for much of the text):

While we believe zstd is a promising technology, its use in a general-purpose web browser (given the existing slate of compression algorithms) has not been demonstrated to provide compelling new utility, and does not clearly warrant the additional maintenance cost and attack surface of adding such code. We are deferring a final position pending a more comprehensive and quantitative analysis of the benefits zstd would unlock on the web today - either dictionary-less, with static dictionaries, or with dynamic dictionaries. For either of the latter two options, we'd also need a credible plan for generating and delivering those dictionaries that aligns with our values.

@gvollant
Copy link

Do you known a public http/https server with zstd compression enabled?

@rockdaboot
Copy link

wget2 -d 'https://de-de.facebook.com/unsupportedbrowser' shows that the server uses zstd over http/2. From the debug log:

25.101812.362 :status: 200
25.101812.362 content-encoding: zstd

@gvollant
Copy link

I run with success
curl "https://de-de.facebook.com" -H "Accept-Encoding: zstd" -otestzstd.zst
zstd -d testzstd.zst

@gvollant
Copy link

curl/curl#5453

@felixhandte
Copy link

@gvollant, most of Facebook's services now support the zstd content-coding:

  • *.facebook.com / *.workplace.com / *.messenger.com: served from HHVM (source)
  • *.fbcdn.net, *.whatsapp.com: served by Proxygen (source)
  • *.instagram.com: served by Django

There's a pretty good diversity of implementations, which exercise the various features of the spec pretty well. We do chunked, streaming transfers (should be visible by requesting https://www.facebook.com/, for example). We also have paths that stream a sequence of independent frames, which is an exciting feature of the spec that lets the server and client drop the compression context between chunks, rather than having to hold a window buffer open. This feature let us significantly increase the connection tenancy on one of our servers that streams updates over time (which had previously been memory-bound holding these contexts open).

@rektide
Copy link

rektide commented Apr 23, 2023

#771 seems like it might be a potential path to getting Compression Dictionaries, which could resolve some of the concerns that left this Request with a defer status.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
venue: IETF Specifications in IETF
Projects
None yet
Development

No branches or pull requests