[codex] 增强 request gate 诊断并提供可选等待上限#265
Merged
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
变更概述
本次 PR 聚焦增强 gateway request gate 的诊断能力,并提供可选的有界等待配置,用于排查/缓解同一 API key、同一路径、同一模型高并发场景下的队头阻塞。
主要改动:
CODEXMANAGER_REQUEST_GATE_WAIT_TIMEOUT_MS未设置或为0时,仍按原逻辑等待到请求 deadline。CODEXMANAGER_REQUEST_GATE_WAIT_TIMEOUT_MS的能力:管理员可以显式配置例如5000,让 gate 最多等待 5 秒,超时后记录REQUEST_GATE_SKIP reason=gate_wait_timeout并继续请求上游。wait_ms和跳过原因,便于区分 gate 队头阻塞和真实上游/网络慢响应。BRIDGE_RESULT中补充first_response_ms,方便定位首字节耗时。CODEXMANAGER_REQUEST_GATE_WAIT_TIMEOUT_MS=5000的注释示例,不默认开启,避免改变通用 Docker 部署行为。docs/report/gateway-request-gate-fix-20260523.md,记录根因、可选缓解方案、Docker 压测结果和剩余风险。影响范围
涉及范围主要在:
crates/service/src/gateway/core/runtime_config.rscrates/service/src/gateway/upstream/proxy_pipeline/request_gate.rscrates/service/src/gateway/observability/trace_log.rscrates/service/src/gateway/observability/error_log.rscrates/service/src/gateway/observability/http_bridge/delivery.rscrates/service/src/gateway/upstream/protocol/aggregate_api.rscrates/service/src/gateway/upstream/proxy_pipeline/response_finalize.rsdocs/report/gateway-request-gate-fix-20260523.md范围说明
本 PR 不修改账号认证、API key 持久化、计费、权限边界或 SQLite schema。
本 PR 也不把 5 秒 gate 等待上限作为通用默认行为;只有显式配置
CODEXMANAGER_REQUEST_GATE_WAIT_TIMEOUT_MS为非零值时才启用。issue #264 中提到的
UPSTREAM_STREAM_TIMEOUT_MS同时承担首字节等待和 stream idle timeout 的问题,当前按配置预期行为处理,不纳入本 PR 修复范围。验证
已执行并通过:
cargo test -p codexmanager-service request_gate --libcargo test -p codexmanager-service trace_log --libcargo check -p codexmanager-service --all-targets补充说明:
cargo fmt -- --check已在 rebased 分支上尝试执行,但最新upstream/main中存在 compact-model 相关代码的无关格式化差异。为避免把不属于本 PR 的格式化改动混入,本 PR 未包含这些无关格式化修改。