Skip to content

Support gRPC proxy phase3#39

Merged
mwfj merged 17 commits into
mainfrom
support-grpc-proxy-phase3
May 25, 2026
Merged

Support gRPC proxy phase3#39
mwfj merged 17 commits into
mainfrom
support-grpc-proxy-phase3

Conversation

@mwfj
Copy link
Copy Markdown
Owner

@mwfj mwfj commented May 23, 2026

Support gRPC-Web bridge (Phase 3 — final phase)

Summary

Phase 3 of the gRPC proxy feature ships the gRPC-Web bridge: a per-route serialization shim that lets HTTP/1.1 and HTTP/2 clients carrying application/grpc-web or application/grpc-web-text (with optional +suffix) reach existing gRPC upstream pools without a dedicated sidecar proxy. The bridge translates wire bytes to and from canonical gRPC, while every Phase 1+2 surface (Trailers-Only synthesis, grpc-timeout deadlines, OTel rpc.* emission, trailer-status retry, breaker classification) applies unchanged because is_grpc_web_ = true always forces is_grpc_ = true at the classifier.

This is the final phase of the gRPC proxying feature — Phases 1 and 2 are already merged on main (PR #37 + PR #38).

What ships

Surface Behaviour
Per-route config New proxy.grpc_web.enabled field (restart-only). Rejected at boot when proxy.protocol = "rest". Live on protocol ∈ {grpc, auto}.
Inbound classifier (H1 + H2) Strict media-type parser IsGrpcWebMediaType admits application/grpc-web[-text] with optional +proto / +json / vendor suffix; rejects spelling-adjacent neighbours (application/grpc-websocket, application/grpc-web-extras). H1 hook fires from HttpConnectionHandler::SetHeadersCompleteCallback — gRPC-Web-only (raw application/grpc on H1 stays unclassified per PROTOCOL-HTTP2 §3.2). H2 classifier extension reuses the existing dispatch site.
Inbound translation Streaming text-mode: consumer-side GrpcWebInboundBodyStream decorator with the Rev 4 F3 Read matrix (never OK + bytes_read == 0; explicit WOULD_BLOCK / END_OF_STREAM / ABORTED states). Buffered text-mode: in-place base64 decode in ProxyTransaction::Start() BEFORE dispatch. Binary mode: transparent passthrough. Outbound content-type rewritten to application/grpc; client-sent te: trailers stripped (force-re-injected by the existing client_te_trailers_ || is_grpc_ fold).
Outbound translation Binary mode: passthrough. Text mode: per-call base64 encoding with 3-byte residue buffering; residue flushed with padding before the terminal trailer-frame.
Terminal trailer-frame emission gRPC trailers serialised to the in-stream wire shape: 0x80 + 4-byte BE length + lowercase ASCII header lines + ': ' separator + CRLF between lines, no trailing CRLF after the final line. Emitted as DATA on the streaming path (OnResponseComplete gRPC-Web fork) and appended to response_body_ on the buffered path (BuildClientResponse gRPC-Web fork). When upstream omits trailers, the bridge synthesises grpc-status from MapHttpToGrpcStatus(response_status_code) and writes it to the observability snapshot.
Trailers-Only re-serialisation RewriteTrailersOnlyForGrpcWeb hoisted into HttpServer::FinalizeIfSnapshot (H1 wrap-ownership) + explicit per-callsite calls at every H2 emission site (Http2Session::DispatchStreamRequest{,Streaming}, H2 sync dispatch callback in SetupHttp2Handlers, H2 async-resume submit lambda, H2 async-handler completion callback). All paths idempotent via IsTrailersOnly() + IsGrpcWebRewritten() defense-in-depth flags.
Bridge-driven failures are breaker-neutral Malformed base64 / truncated final-group are client-shape failures: H1 streaming PumpH1StreamingBody_ ABORTED, H2 streaming data-source ABORTED, buffered cap-overrun, and MaybeRetry denial all call ReleaseBreakerAdmissionNeutral() before terminal error. Wire status maps to RESULT_PARSE_ERROR → INTERNAL (Rev 1 user decision: no new RESULT_* codes).
H2 streaming-abort callback signature H2StreamingAbortCallback extended from (int code, const std::string& msg) to (int code, const std::string& msg, bool breaker_neutral). ProxyTransaction::MakeDeferredErrorCallback's closure consumes the flag and calls ReleaseBreakerAdmissionNeutral() before OnError.
Response Content-Length handling Stripped on streaming gRPC-Web outbound (chunked / no-CL on the wire); auto-computed on buffered path from the post-bridge body size. New HttpResponse::ClearPreservedContentLength() helper.
MAX_RESPONSE_BODY_SIZE re-checks Re-checked at TWO insertion points after the bridge expands bytes: per-chunk in OnBodyChunk after TranslateOutboundData, and buffered terminal after FlushAndBuildTrailerFrame. Overrun emits a deterministic MakeGrpcWebErrorResponse Trailers-Only INTERNAL response with snapshot-write-before-delivery.

Architecture

Four extension axes against the Phase 1+2 baseline:

  1. ClassifierIsGrpcWebMediaType strict parser + MaybeClassifyGrpcWebOnH1 H1 hook + H2 ClassifyRequest extension. Both flips is_grpc_web_ AND is_grpc_=true.
  2. GrpcWebBridge — per-request shim owned by ProxyTransaction::grpc_web_bridge_ (unique_ptr). Holds binary/text mode + 3-byte outbound residue buffer + +suffix for response content-type echo. Reset across retries via new GrpcWebBridge::Reset().
  3. GrpcWebInboundBodyStream — consumer-side decode decorator wrapping the parser's body stream. All 13 BodyStream virtuals forwarded; text-mode Read decodes on demand; SnapshotForSubmit clamps residue-only to bytes_queued ≥ 1.
  4. Callsite-wrap rolloutRewriteTrailersOnlyForGrpcWeb hoisted into FinalizeIfSnapshot for H1, explicit per-site calls for H2, both wraps run BEFORE wire submit. Audit table covers 9 H2 callsites.

Locked operator decisions (per the design plan Rev 7):

  • Symmetric request/response mode (request content-type drives bridge mode; HTTP Accept parsed for log/observability but ignored — documented compatibility limitation).
  • Bridge-decode failures are breaker-neutral (upstream did nothing wrong).
  • gRPC-Web admission applies on protocol ∈ {grpc, auto}; rest rejected at boot.
  • No CORS handling — operators deploy CORS middleware externally (Envoy, NGINX); browser preflights without external CORS will fail.
  • grpc-encoding / grpc-accept-encoding pass through verbatim (the bridge is a wire-format shim, not a message-compression transformer).

Files changed

24 files, +1112 / -60 lines.

New files (6):

  • include/grpc/grpc_web_bridge.h + server/grpc_web_bridge.cc — bridge class, IsGrpcWebMediaType, MaybeClassifyGrpcWebOnH1, RewriteTrailersOnlyForGrpcWeb, MakeGrpcWebErrorResponse, ComputeClientFacingContentType, IsGrpcWebBridgeDecodeFailureReason, BuildTrailerFrame, FlushAndBuildTrailerFrame.
  • include/upstream/grpc_web_inbound_body_stream.h + server/grpc_web_inbound_body_stream.cc — consumer-side decode decorator.
  • test/grpc_web_test.h — 67 unit tests.
  • test/grpc_web_edge_test.h — 24 edge / race / leak / perf tests.

Modified files (18): http_request.h (gRPC-Web fields), http_response.h (new helpers + grpc_web_rewritten_ flag), route_options.h (grpc_web_enabled), server_config.h (GrpcWebConfig), upstream_callbacks.h (3-arg callback), upstream_h2_stream.h (streaming_abort_breaker_neutral field), proxy_transaction.h (bridge + flags), util/base64.{h,cc} (strict DecodeStandard + INT_MAX guard), config_loader.cc, http_connection_handler.cc, http_server.cc, http2_session.cc, upstream_h2_connection.cc, proxy_transaction.cc, grpc_synthesis.{h,cc}, header_rewriter.cc, plus 4 test fixtures (h2_upstream_test.h, grpc_test.h, grpc_proxy_test.h, run_test.cc).

Build / CI: Makefile (new sources + test_grpc_web / test_grpc_web_edge targets), .github/workflows/ci.yml (new suites in build-linux-tsan-rest + build-macos), .github/workflows/weekly-valgrind.yml (memory-safety subset).

Docs: docs/grpc.md (operator-facing gRPC-Web section + compatibility limitations).

Test coverage

1826 / 1826 tests passing across two consecutive stability runs. Baseline was 1728; Phase 3 adds 98 tests:

  • 67 unit tests in test/grpc_web_test.h — bridge class, decoder, classifier, helpers, decorator Read matrix.
  • 24 edge / race / leak / perf tests in test/grpc_web_edge_test.h — concurrent multi-stream gRPC-Web on a single H2 connection, client disconnect mid-decode, 10K-iteration leak guards, perf upper bounds on TranslateOutboundData / BuildTrailerFrame / IsGrpcWebMediaType.
  • 2 wire-level integration tests in test/grpc_proxy_test.hTestGW1_TrailersOnlyRewriteOnGrpcWebRoute (H2 Trailers-Only synthesis → in-stream trailer-frame on the wire) and TestGW2_BufferedTextDecodesInboundBody (regression guard for the bridge-construction-order bug caught in code review).
  • 5 regression guards added during the post-implementation review cycle covering the 7 xhigh findings (bridge residue reset, H2 buffered CL update, missing-trailers synthesis, async-handler observability snapshot capture, H1 Trailer header strip, BytesQueued residue clamp, base64 INT_MAX guard).

Review trail

The PR went through extensive iterative review before merge:

Stage Reviewer Findings Applied
Design plan (8 rounds) architect + 8 reviewer passes 55 55
Code review (gateway-code-reviewer) 10 (1 BLOCKING + 9 important/minor) 10 10
/code-review xhigh (5 finder angles + Phase-3 sweep) 7 findings 7 7
Total post-design review 17 17 (none declined)

Key bugs caught and fixed during review:

  1. B1 (BLOCKING) — bridge construction was ordered after the buffered-text decode gate; pre-decode bytes would have flowed to the upstream as raw base64. Fixed by hoisting construction earlier in Start().
  2. xhigh F1ResetForRetryAttempt did not clear bridge residue; text-mode retries would have prepended stale 1-2 byte residue to attempt-2 output (corrupted gRPC frame). Fixed by adding GrpcWebBridge::Reset().
  3. xhigh F2 — buffered text-mode H2 path emitted stale Content-Length alongside decoded DATA, causing strict upstream H2 servers to reject with PROTOCOL_ERROR. Fixed by updating rewritten_headers_["content-length"] after DecodeBufferedTextBody.
  4. xhigh F3 — empty response_trailers_ (non-gRPC upstream or misbehaving gRPC) produced a 5-byte trailer-frame with NO grpc-status. Fixed by synthesising grpc-status from MapHttpToGrpcStatus(response_status_code) when trailers are empty.
  5. xhigh F4 — H2 async-handler completion lambda's synthetic_req had no obs_snapshot populated; SynthesizeMiddlewareReject silently skipped the snapshot write, exporting rpc.response.status_code = __missing__ for every async-handler 4xx/5xx on gRPC routes. Fixed by capturing obs_snap_local into the synthetic request.

Compatibility notes

  • No breaking API changes for non-gRPC traffic. The classifier is opt-in via proxy.grpc_web.enabled (default false). All Phase 1+2 behaviour for application/grpc traffic is unchanged.
  • H2StreamingAbortCallback signature change (std::function alias in include/upstream/upstream_callbacks.h) is the only public-interface change — extended from 2-arg to 3-arg. Production callers (ProxyTransaction::MakeDeferredErrorCallback) updated in this PR. Any downstream sink overriding MakeDeferredErrorCallback must update the lambda signature.
  • Compatibility limitations documented in docs/grpc.md:
    • Symmetric request/response wire mode only — Accept is parsed for log but does not override response encoding.
    • No CORS — browser preflights fail without an external CORS layer.
    • grpc-encoding / grpc-accept-encoding pass through verbatim; the bridge is not a per-message compression transformer.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a per-route gRPC-Web bridge, allowing the gateway to support application/grpc-web[-text] clients by translating them to the standard gRPC wire format for upstreams. The implementation includes updated request classification for both HTTP/1.1 and HTTP/2, a new GrpcWebBridge for outbound translation, and a GrpcWebInboundBodyStream decorator for decoding text-mode requests. Review feedback highlights several performance improvement opportunities, particularly regarding the reduction of redundant string allocations and copies during data translation and header normalization. Suggestions were also made to enable zero-copy passthrough for binary mode traffic to minimize overhead.

Comment thread server/grpc_web_bridge.cc
Comment thread server/grpc_web_bridge.cc
Comment thread server/grpc_web_bridge.cc
Comment thread server/grpc_web_bridge.cc
Comment thread server/proxy_transaction.cc
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 65b68e6876

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread util/base64.cc
Comment thread server/grpc_web_bridge.cc
@mwfj
Copy link
Copy Markdown
Owner Author

mwfj commented May 25, 2026

LGTM

@mwfj mwfj merged commit 616224e into main May 25, 2026
6 checks passed
@mwfj mwfj deleted the support-grpc-proxy-phase3 branch May 25, 2026 08:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant