Skip to content

Conversation

@jgraettinger
Copy link
Member

@jgraettinger jgraettinger commented Aug 21, 2025

Update all string protobuf fields which use casttype
encoding/json.RawMessage to instead be bytes.

This is a correctness issue, and is required for correct processing
semantics in later Go gRPC library versions. The decoder must be able to
assume that a cast of []byte to a string field allows the []byte to be
re-used post-cast, and the use of json.RawMessage breaks this
assumption.

It's also a performance concern, because prost in Rust is able to
use zero-copy bytes::Bytes to back a bytes field, while String must
be deeply cloned.

No semantic or wire changes to the protocol -- this is just a low-level refactor of the in-memory type representation.

Workflow steps:

(How does one use this feature, and how has it changed)

Documentation links affected:

(list any documentation links that you created, or existing ones that you've identified as needing updates, along with a brief description)

Notes for reviewers:

(anything that might help someone review this PR)


This change is Reviewable

Update all string protobuf fields which use casttype
`encoding/json.RawMessage` to instead be bytes.

This is a correctness issue, and is required for correct processing
semantics in later Go gRPC library versions. The decoder must be able to
assume that a cast of []byte to a string field allows the []byte to be
re-used post-cast, and the use of json.RawMessage breaks this
assumption.

It's also a performance concern, because `prost` in Rust is able to
use zero-copy bytes::Bytes to back a `bytes` field, while String _must_
be deeply cloned.
Update post-processing of generated protobuf serde implementations to
account for new bytes::Bytes underlying type.
Various mechanical updates to Rust crates to account for the
String => bytes::Bytes conversion of affected fields.
@jgraettinger jgraettinger marked this pull request as ready for review August 21, 2025 20:19
@jgraettinger jgraettinger requested a review from psFried August 21, 2025 20:19
Copy link
Member

@psFried psFried left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

doc_json,
exists: meta.front(),
key_json: String::new(), // TODO(johnny)
key_json: bytes::Bytes::new(), // TODO(johnny)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: do you remember what these TODOs were for? Remove em?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They're for plumbing in a json encoding (array) of the key, for connectors that prefer to work with JSON instead of packed foundation-DB tuples. Obviously hasn't been super important, but may be if we start to support user-implemented connectors.

@jgraettinger jgraettinger merged commit c87ad4f into master Aug 21, 2025
12 checks passed
@jgraettinger jgraettinger deleted the johnny/proto-bytes branch August 22, 2025 14:11
Alex-Bair added a commit to estuary/airbyte that referenced this pull request Sep 19, 2025
Updates ATF to the latest protocol changes introduced in estuary/flow#2354.
Alex-Bair added a commit to estuary/airbyte that referenced this pull request Sep 19, 2025
Updates ATF to the latest protocol changes introduced in estuary/flow#2354.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants