Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracking issue: Fully Lazy Extension decoding #1609

Open
3 tasks
stapelberg opened this issue Apr 17, 2024 · 0 comments
Open
3 tasks

Tracking issue: Fully Lazy Extension decoding #1609

stapelberg opened this issue Apr 17, 2024 · 0 comments
Assignees

Comments

@stapelberg
Copy link

I’m filing this issue to provide some background for a series of changes I’m about to send. What follows are excerpts from a number of different internal documents and investigations, in the hope that they provide enough background to follow along at a high level. Let me know if anyone is super interested in some detail for some reason, and I can expand.

Background

Currently, Go Protobuf supports lazy decoding for extensions, but only when compiled with support for the feature: see internal/flags.LazyUnmarshalExtensions, which is enabled when building with the protolegacy build flag.

However, the current lazy decoding support is not lazy across proto.Size() and proto.Marshal() calls.

This means that programs such as logs analysis pipelines benefit from lazy extension decoding, as they typically don’t read many extensions (or read no extensions at all).

Programs like proxy servers, which read, maybe extend and then write the same proto message, do not benefit from lazy extension decoding, as they will need to lazily decode the extension just to encode it immediately afterwards.

Because we’re spending a significant cost in this code path (decoding just to re-encode), we’d like to change the code to support fully lazy extensions, meaning they will be encoded with the same bytes that they were read from the wire.

Non-minimal wire format

One complication with this effort is what we call non-minimal wire format:

When encoding Protobuf messages, there is one minimal wire format size and a number of larger non-minimal wire formats that decode to the same message.

Non-minimal wire format refers to scenarios like non-repeated fields appearing multiple times, non-optimal varint encoding, packed repeated fields that appear non-packed on the wire and others.

We can encounter non-minimal wire format in different scenarios:

While we believe that non-minimal wire format is rare in general, we do have at least one report of non-minimal wire format (the kind where a non-repeated field is present multiple times) encountered in our production systems (more investigation pending).

Shrinking messages

When lazy decoding is enabled, it might be surprising at first that messages can shrink: accessing a message (typically expected to be a read-only operation) needs to lazily decode the message and can thereby modify how Go Protobuf will re-encode the decoded message.

Consider the following sequence of events:

  1. Call proto.Size() on a message with lazy extensions that use non-minimal wire format.
  2. Access the extension, thereby causing lazy decoding, thereby causing normalization.
  3. Now, proto.Size() reports fewer bytes than before!

Encoding invalid wire format

When a sub-message has to be encoded, Protobuf’s Tag-Length-Value wire format requires that we put the correct size on the wire before the message contents. [Side note: Other implementations like upb follow a back-to-front approach which sidesteps this problem. We have thoroughly investigated adopting this approach for Go Protobuf but discarded the option because of the complexity of supporting back-to-front marshaling.]

In today’s implementation, the marshaler calculates the size of each sub-message before recursing into the message marshaling function. To avoid duplicate sizing costs, the size is cached inside the Protobuf message struct after the first calculation in the sizeCache field.

There are edge cases where a message can shrink between populating the size cache and encoding it, which would result in Go Protobuf producing invalid wire format. This would be a terrible outcome, so we’ll prevent producing invalid wire format by adding checks to Go Protobuf’s encoder that detect the situation and error out instead.

Plan

  • Add checks against invalid wire format
  • Fix discovered options plumbing issues
  • Enable fully lazy extensions

Out of scope, but maybe worthwhile at a later point: make lazy extensions available without the protolegacy build tag.

@stapelberg stapelberg self-assigned this Apr 17, 2024
gopherbot pushed a commit to protocolbuffers/protobuf-go that referenced this issue Apr 18, 2024
There currently is no risk of producing invalid wire format,
but that will change with subsequent changes regarding lazy decoding.

We have been running this change in production for about a month,
without ever triggering the check (until lazy decoding is involved).

related to golang/protobuf#1609

Change-Id: I3c5c956aee2fa81f99dea03ed2a977a1547081fc
Reviewed-on: https://go-review.googlesource.com/c/protobuf/+/579595
Auto-Submit: Michael Stapelberg <stapelberg@google.com>
Reviewed-by: Lasse Folger <lassefolger@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
gopherbot pushed a commit to protocolbuffers/protobuf-go that referenced this issue Apr 18, 2024
This used to not be necessary, but a subsequent change will change behavior
based on MarshalOptions.Deterministic, so it is important that we do not
accidentally drop MarshalOptions anywhere.

related to golang/protobuf#1609

Change-Id: I6b53352707d93939642a627eb41c930f8ac3157f
Reviewed-on: https://go-review.googlesource.com/c/protobuf/+/579995
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Lasse Folger <lassefolger@google.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant