Skip to content

Support gRPC proxy phase1#37

Merged
mwfj merged 15 commits into
mainfrom
support-grpc-proxy-phase1
May 21, 2026
Merged

Support gRPC proxy phase1#37
mwfj merged 15 commits into
mainfrom
support-grpc-proxy-phase1

Conversation

@mwfj
Copy link
Copy Markdown
Owner

@mwfj mwfj commented May 19, 2026

gRPC Proxying — Phase 1 (gRPC↔gRPC passthrough)

Ships the minimum-credible "gRPC proxying" layer on top of the existing HTTP/2 transport: inbound classification, grpc-timeout-driven local deadline enforcement, end-to-end Trailers-Only synthesis, and gRPC-aware error mapping at every middleware-reject + proxy-failure emission site.

Almost no new transport code — HTTP/2 inbound, HTTP/2 outbound (ALPN, multiplexed sessions, donated leases), end-to-end trailer forwarding, streaming request/response bodies, and the proxy engine are already shipped. This PR is policy plus a thin per-request classifier that flips a flag at HEADERS-complete.

Summary

  • Inbound classifier (HTTP/2 + content-type + per-route proxy.protocol) flips req.is_grpc_ at HEADERS-complete with sub-microsecond cost.
  • Quad-state grpc-timeout parser distinguishes Valid / Invalid / SubMs / Absent; sub-ms grammar emits Trailers-Only DEADLINE_EXCEEDED pre-dispatch without ever touching the upstream.
  • ProxyTransaction::ArmGrpcDeadline enforces min(client grpc-timeout, route.response_timeout_ms). Deadline expiry routes through the new LocalAbortAndDeliver path (gateway-driven termination, distinct from Cancel() for client disconnect).
  • MakeGrpcErrorResponse wired into DeliverTerminalError — every proxy failure on a gRPC route (502 / 503 / 504 / 413) is delivered as Trailers-Only grpc-status: UNAVAILABLE / RESOURCE_EXHAUSTED / etc., not a raw HTTP response.
  • MaybeSynthesizeGrpcRejectFromHttpStatus + HandleClassifierReject helpers wrap every middleware-reject emission site (sync H1, async-resume, H2 streaming dispatch, H2 buffered dispatch) — auth / rate-limit / body-limit / 404 / 405 / 5xx all become Trailers-Only on gRPC routes.
  • OnHeaders / OnTrailers scan peer-emitted grpc-status and publish it through the observability snapshot for happy-path metrics.
  • RESULT_GRPC_DEADLINE_EXCEEDED = -19 is a new result code distinct from RESULT_RESPONSE_TIMEOUT so the circuit breaker classifies deadline expiry correctly.
  • Outbound te: trailers forced via force_te_trailers_ = client_te_trailers_ \|\| is_grpc_ cached at construction. Some gRPC backends use the header as a proxy-compatibility marker.
  • HeaderRewriter strips client-supplied grpc-status / grpc-message / grpc-status-details-bin so a forged request can't surface trailer-equivalent semantics.
  • ObservabilitySnapshot owns gRPC identity (is_grpc_, grpc_service_, grpc_method_, grpc_fully_qualified_method_) and response status (grpc_response_status_). Populated by ObservabilityMiddleware::SetGrpcIdentity at construction and by every synthesis path before delivery.
  • Per-route proxy.protocol config (auto / grpc / rest) is loaded, validated, round-tripped, and threaded through RouteProtocolFromConfigString at route registration. restart-required field; surfaces a warning on reload mismatch.

What's NOT in this PR (deferred)

  • Outbound grpc-timeout decrement: EncodeGrpcTimeout is shipped but not yet called at HEADERS-build time. The local-side ArmGrpcDeadline enforces the deadline correctly today; only the upstream-visible deadline is unaffected. Reserved for a Phase 1.5 follow-up.
  • Phase 2 — trailer-status retry: ParseGrpcRetryPushback (gRFC A6 tri-state) and IsTrailersOnlyResponseShape are reserved hook points. MaybeTriggerGrpcTrailerStatusRetry and RetryPolicy::RetryCondition::GRPC_UNAVAILABLE are not wired.
  • Phase 2 — OTel rpc.* emission: the snapshot owns the data, but finalize-site emission of rpc.method / rpc.response.status_code / rpc.server.call.duration is deferred.
  • Phase 3 — gRPC-Web bridge: is_grpc_web_ is reserved on HttpRequest; the bridge module is not built.
  • gRPC over HTTP/1: the classifier suppresses on http_major != 2 per spec — gRPC requires HTTP/2.
  • grpc_proxy end-to-end integration suite (~30 wire-level tests): deferred. The unit suite + existing http2/proxy regression suites exercise the integration sites indirectly.

Architecture

include/grpc/
  grpc_reject_kind.h     # MiddlewareRejectKind enum (own header to break the
                         #   http_request.h ↔ grpc_synthesis.h include cycle)
  grpc_status.h          # GrpcStatus constants, MapHttpToGrpcStatus,
                         #   MapProxyResultToGrpcStatus, GrpcStatusName
  grpc_timeout.h         # ParseGrpcTimeoutResult quad-state, ParseGrpcTimeoutMs,
                         #   EncodeGrpcTimeout
  grpc_synthesis.h       # ClassifyRequest, MakeTrailersOnlyResponse,
                         #   PercentEncodeGrpcMessage, MapMiddlewareRejectToGrpcStatus,
                         #   SynthesizeMiddlewareReject, HandleClassifierReject,
                         #   MaybeSynthesizeGrpcRejectFromHttpStatus,
                         #   RouteProtocolFromConfigString, ParseGrpcRetryPushback

server/
  grpc_status.cc         # ~80 LoC
  grpc_timeout.cc        # ~120 LoC — quad-state parser with overflow-safe
                         #   cap-before-multiply; encoder picks smallest fitting unit
  grpc_synthesis.cc      # ~290 LoC

test/
  grpc_test.h            # 37 pure-logic unit tests

Layer placement: new code lives at Layer 6 (proxy/transport-adjacent) for ProxyTransaction extensions and Layer 7 (middleware) for SynthesizeMiddlewareReject + classifier. No new transport-layer changes.

Files changed

New (8 files, ~1300 LoC):

  • include/grpc/grpc_reject_kind.h
  • include/grpc/grpc_status.h + server/grpc_status.cc
  • include/grpc/grpc_timeout.h + server/grpc_timeout.cc
  • include/grpc/grpc_synthesis.h + server/grpc_synthesis.cc
  • test/grpc_test.h

Modified (15 files, +612 / -23):

File Change
include/http/http_request.h is_grpc_, is_grpc_web_, grpc_timeout_ms, grpc_reject_kind_, grpc_service_, grpc_method_ + symmetric Reset()
include/http/http_response.h MarkTrailersOnly() / IsTrailersOnly() marker; Trailer(k,v) / GetTrailers() for post-commit trailer attach
include/http/route_options.h RouteProtocol enum (Auto/Grpc/Rest) + protocol field
include/config/server_config.h ProxyConfig::protocol string field threaded through operator==
include/observability/observability_snapshot.h gRPC identity fields + SetGrpcIdentity(req) template + set_grpc_response_status setter
include/upstream/proxy_transaction.h RESULT_GRPC_DEADLINE_EXCEEDED = -19; MakeGrpcErrorResponse, OnGrpcTimeoutExpired, ArmGrpcDeadline, ClearGrpcDeadline, LocalAbortAndDeliver, AbortUpstreamForLocalReason, IsTrailersOnlyResponseShape; fields is_grpc_, grpc_deadline_ms_, grpc_deadline_generation_, body_bytes_seen_, force_te_trailers_
server/proxy_transaction.cc Constructor captures gRPC fields; Start() computes effective deadline + arms ArmGrpcDeadline; Cleanup() bumps deadline generation; OnBodyChunk increments body_bytes_seen_; OnHeaders / OnTrailers publish peer grpc-status via ExtractGrpcStatusToSnapshot helper; both H2 submit sites use cached force_te_trailers_; DeliverTerminalError forks on gRPC for pre-commit (Trailers-Only) and post-commit (trailers-with-END_STREAM); new method impls at file tail (MakeGrpcErrorResponse, OnGrpcTimeoutExpired, LocalAbortAndDeliver, AbortUpstreamForLocalReason, ArmGrpcDeadline, ClearGrpcDeadline, IsTrailersOnlyResponseShape)
server/header_rewriter.cc Strip client-supplied grpc-status / grpc-message / grpc-status-details-bin
server/http2_session.cc ClassifyRequest runs at both HEADERS-complete sites; both dispatch sites call HandleClassifierReject (sentinel short-circuit) + MaybeSynthesizeGrpcRejectFromHttpStatus (post-handler wrap)
server/http_server.cc H1 sync-path: sentinel consumption + emit wrap; async-resume submit closure: emit wrap before tweak_response; proxy-registration call sites pass RouteProtocolFromConfigString(proxy.protocol)
server/observability_middleware.cc snap->SetGrpcIdentity(request) at snapshot construction
server/config_loader.cc Load + ToJson + Validate for proxy.protocol (allowlist auto/grpc/rest)
Makefile GRPC_SRCS added to LIB_SRCS
test/run_test.cc Registers grpc suite + CLI flag + PrintUsage entry
.github/workflows/ci.yml grpc added to build-linux-tsan-rest suite enumeration

Correctness contracts (load-bearing)

  • Snapshot-write-before-delivery: every gRPC synthesis path writes obs_snapshot_->set_grpc_response_status(grpc_status) BEFORE returning the response or emitting trailers. Missed sites surface as __missing__ in rpc.response.status_code.
  • Breaker-report-before-cleanup: LocalAbortAndDeliver calls ReportBreakerOutcome(result_code) BEFORE AbortUpstreamForLocalReason and the delivery branch. Cleanup neutral-releases breaker admission, so reporting after cleanup would lose the failure signal.
  • Generation-counter discipline: ArmGrpcDeadline, ClearGrpcDeadline, Cleanup, and AbortUpstreamForLocalReason bump grpc_deadline_generation_ to invalidate in-flight closures. Closure body gates on generation match, cancelled_, IsKilledForShutdown(), and complete_cb_invoked_.
  • Local-abort vs Cancel distinction: Cancel() is the client-disconnect path (nulls complete_cb_, marks invoked). LocalAbortAndDeliver is the gateway-driven termination path with a synthesized response — it leaves complete_cb_ alive for the pre-commit DeliverResponse to fire.
  • No double-Cleanup: AbortUpstreamForLocalReason deliberately does NOT call Cleanup. The pre-commit branch reaches Cleanup via DeliverResponse; the post-commit branch calls Cleanup directly after stream_sender_.End.

Test plan

  • ./test_runner grpc37/37 new unit tests covering:
    • canonical HTTP→gRPC status mapping table
    • RESULT_* → grpc-status family classification (UNAVAILABLE / INTERNAL / DEADLINE_EXCEEDED)
    • quad-state ParseGrpcTimeoutMs (Valid / Invalid / SubMs / Absent + overflow + missing-unit + 9+ digits)
    • EncodeGrpcTimeout round-trip across all units (m/S/M/H)
    • PercentEncodeGrpcMessage %-escape per PROTOCOL-HTTP2 (100%100%25) + non-ASCII / control-byte escape
    • MakeTrailersOnlyResponse shape verification (:status 200 + grpc-status + grpc-message + Trailers-Only marker)
    • every MiddlewareRejectKind → grpc-status branch
    • SynthesizeMiddlewareReject idempotency + non-gRPC pass-through
    • MaybeSynthesizeGrpcRejectFromHttpStatus HTTP→Trailers-Only mapping + 2xx pass-through
    • ClassifyRequest: happy path, H1 suppression, application/grpc-web exclusion, +xxx content-type variants, RouteProtocol::Rest suppression, RouteProtocol::Grpc force, non-POST + invalid + sub-ms + valid grpc-timeout
    • ParseGrpcRetryPushback tri-state (Absent / Delay / negative-Terminal / unparseable-Terminal)
    • MakeGrpcErrorResponse for DEADLINE_EXCEEDED and CHECKOUT_FAILED
    • HttpResponse::Trailer / GetTrailers accessor contract
  • Full regression sweep: 1678 / 1678 pass (was 1641 before; +37 new gRPC tests). No regressions in basic, http, http2, ws, tls, proxy, h2_upstream, config, auth, observability, streaming_request, h2_trailer, race, timeout, stress.
  • Build clean across Makefile defaults; both test_runner (34 MB) and server_runner (20 MB) link without errors.
  • TODO before merge: full TSan-rest run on CI to confirm grpc suite picks up.
  • Deferred to follow-up PR: grpc_proxy end-to-end integration suite (~30 wire-level tests including AsyncJwksFailureEmitsTrailersOnly and +xxx content-type variants on real gRPC clients).

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces comprehensive gRPC support to the proxy, including a request classifier, status mapping, deadline enforcement, and support for Trailers-Only responses. Key changes involve adding gRPC-specific configuration options, observability tracking for gRPC identity and status, and logic for handling gRPC-specific error synthesis. Additionally, a new unit test suite has been added to verify the gRPC helper functionality. I have no feedback to provide.

@mwfj
Copy link
Copy Markdown
Owner Author

mwfj commented May 19, 2026

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4ade512ea5

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread server/proxy_transaction.cc
Comment thread server/http2_session.cc
@mwfj
Copy link
Copy Markdown
Owner Author

mwfj commented May 21, 2026

LGTM

@mwfj mwfj merged commit e6ddb60 into main May 21, 2026
6 checks passed
@mwfj mwfj deleted the support-grpc-proxy-phase1 branch May 21, 2026 04:00
This was referenced May 21, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant