You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I’m filing this issue to provide some background for a series of changes I’m about to send. What follows are excerpts from a number of different internal documents and investigations, in the hope that they provide enough background to follow along at a high level. Let me know if anyone is super interested in some detail for some reason, and I can expand.
Background
Currently, Go Protobuf supports lazy decoding for extensions, but only when compiled with support for the feature: see internal/flags.LazyUnmarshalExtensions, which is enabled when building with the protolegacy build flag.
However, the current lazy decoding support is not lazy across proto.Size() and proto.Marshal() calls.
This means that programs such as logs analysis pipelines benefit from lazy extension decoding, as they typically don’t read many extensions (or read no extensions at all).
Programs like proxy servers, which read, maybe extend and then write the same proto message, do not benefit from lazy extension decoding, as they will need to lazily decode the extension just to encode it immediately afterwards.
Because we’re spending a significant cost in this code path (decoding just to re-encode), we’d like to change the code to support fully lazy extensions, meaning they will be encoded with the same bytes that they were read from the wire.
Non-minimal wire format
One complication with this effort is what we call non-minimal wire format:
When encoding Protobuf messages, there is one minimal wire format size and a number of larger non-minimal wire formats that decode to the same message.
Non-minimal wire format refers to scenarios like non-repeated fields appearing multiple times, non-optimal varint encoding, packed repeated fields that appear non-packed on the wire and others.
We can encounter non-minimal wire format in different scenarios:
Accidentally. A (possibly third-party) Protobuf encoder does not encode ideally (e.g. uses more space than necessary when encoding a varint). Note: The Protobuf wire format doesn’t have a canonical encoding.
Maliciously. An attacker could craft Protobuf messages specifically to trigger crashes over the network.
While we believe that non-minimal wire format is rare in general, we do have at least one report of non-minimal wire format (the kind where a non-repeated field is present multiple times) encountered in our production systems (more investigation pending).
Shrinking messages
When lazy decoding is enabled, it might be surprising at first that messages can shrink: accessing a message (typically expected to be a read-only operation) needs to lazily decode the message and can thereby modify how Go Protobuf will re-encode the decoded message.
Consider the following sequence of events:
Call proto.Size() on a message with lazy extensions that use non-minimal wire format.
Access the extension, thereby causing lazy decoding, thereby causing normalization.
Now, proto.Size() reports fewer bytes than before!
Encoding invalid wire format
When a sub-message has to be encoded, Protobuf’s Tag-Length-Value wire format requires that we put the correct size on the wire before the message contents. [Side note: Other implementations like upb follow a back-to-front approach which sidesteps this problem. We have thoroughly investigated adopting this approach for Go Protobuf but discarded the option because of the complexity of supporting back-to-front marshaling.]
In today’s implementation, the marshaler calculates the size of each sub-message before recursing into the message marshaling function. To avoid duplicate sizing costs, the size is cached inside the Protobuf message struct after the first calculation in the sizeCache field.
There are edge cases where a message can shrink between populating the size cache and encoding it, which would result in Go Protobuf producing invalid wire format. This would be a terrible outcome, so we’ll prevent producing invalid wire format by adding checks to Go Protobuf’s encoder that detect the situation and error out instead.
Plan
Add checks against invalid wire format
Fix discovered options plumbing issues
Enable fully lazy extensions
Out of scope, but maybe worthwhile at a later point: make lazy extensions available without the protolegacy build tag.
The text was updated successfully, but these errors were encountered:
There currently is no risk of producing invalid wire format,
but that will change with subsequent changes regarding lazy decoding.
We have been running this change in production for about a month,
without ever triggering the check (until lazy decoding is involved).
related to golang/protobuf#1609
Change-Id: I3c5c956aee2fa81f99dea03ed2a977a1547081fc
Reviewed-on: https://go-review.googlesource.com/c/protobuf/+/579595
Auto-Submit: Michael Stapelberg <stapelberg@google.com>
Reviewed-by: Lasse Folger <lassefolger@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
gopherbot
pushed a commit
to protocolbuffers/protobuf-go
that referenced
this issue
Apr 18, 2024
This used to not be necessary, but a subsequent change will change behavior
based on MarshalOptions.Deterministic, so it is important that we do not
accidentally drop MarshalOptions anywhere.
related to golang/protobuf#1609
Change-Id: I6b53352707d93939642a627eb41c930f8ac3157f
Reviewed-on: https://go-review.googlesource.com/c/protobuf/+/579995
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Lasse Folger <lassefolger@google.com>
I’m filing this issue to provide some background for a series of changes I’m about to send. What follows are excerpts from a number of different internal documents and investigations, in the hope that they provide enough background to follow along at a high level. Let me know if anyone is super interested in some detail for some reason, and I can expand.
Background
Currently, Go Protobuf supports lazy decoding for extensions, but only when compiled with support for the feature: see internal/flags.LazyUnmarshalExtensions, which is enabled when building with the
protolegacy
build flag.However, the current lazy decoding support is not lazy across
proto.Size()
andproto.Marshal()
calls.This means that programs such as logs analysis pipelines benefit from lazy extension decoding, as they typically don’t read many extensions (or read no extensions at all).
Programs like proxy servers, which read, maybe extend and then write the same proto message, do not benefit from lazy extension decoding, as they will need to lazily decode the extension just to encode it immediately afterwards.
Because we’re spending a significant cost in this code path (decoding just to re-encode), we’d like to change the code to support fully lazy extensions, meaning they will be encoded with the same bytes that they were read from the wire.
Non-minimal wire format
One complication with this effort is what we call non-minimal wire format:
When encoding Protobuf messages, there is one minimal wire format size and a number of larger non-minimal wire formats that decode to the same message.
Non-minimal wire format refers to scenarios like non-repeated fields appearing multiple times, non-optimal varint encoding, packed repeated fields that appear non-packed on the wire and others.
We can encounter non-minimal wire format in different scenarios:
While we believe that non-minimal wire format is rare in general, we do have at least one report of non-minimal wire format (the kind where a non-repeated field is present multiple times) encountered in our production systems (more investigation pending).
Shrinking messages
When lazy decoding is enabled, it might be surprising at first that messages can shrink: accessing a message (typically expected to be a read-only operation) needs to lazily decode the message and can thereby modify how Go Protobuf will re-encode the decoded message.
Consider the following sequence of events:
proto.Size()
on a message with lazy extensions that use non-minimal wire format.proto.Size()
reports fewer bytes than before!Encoding invalid wire format
When a sub-message has to be encoded, Protobuf’s Tag-Length-Value wire format requires that we put the correct size on the wire before the message contents. [Side note: Other implementations like upb follow a back-to-front approach which sidesteps this problem. We have thoroughly investigated adopting this approach for Go Protobuf but discarded the option because of the complexity of supporting back-to-front marshaling.]
In today’s implementation, the marshaler calculates the size of each sub-message before recursing into the message marshaling function. To avoid duplicate sizing costs, the size is cached inside the Protobuf message struct after the first calculation in the sizeCache field.
There are edge cases where a message can shrink between populating the size cache and encoding it, which would result in Go Protobuf producing invalid wire format. This would be a terrible outcome, so we’ll prevent producing invalid wire format by adding checks to Go Protobuf’s encoder that detect the situation and error out instead.
Plan
Out of scope, but maybe worthwhile at a later point: make lazy extensions available without the protolegacy build tag.
The text was updated successfully, but these errors were encountered: