feat(sre): integrate traces, logs, metrics into one sdk#6580
feat(sre): integrate traces, logs, metrics into one sdk#6580c121914yu merged 8 commits intolabring:v4.14.9-devfrom
Conversation
c955339 to
b29e10c
Compare
df3f271 to
625ba84
Compare
c121914yu
left a comment
There was a problem hiding this comment.
PR Review: feat(sre): integrate traces, logs, metrics into one sdk
变更概览
- PR 编号: #6580
- 作者: @xqvvu
- 分支: v4.14.9-dev <- feat/metrics-monitor
- 变更统计: +2821 -475 行, 42 个文件
- CI 状态: 全部通过
优点
- 架构方向正确: 将 logger、metrics、tracing 统一到
@fastgpt-sdk/otel是合理的可观测性演进方向, 渐进式迁移策略降低了风险 - Workflow 可观测性: step 级别的 tracing + metrics 埋点设计合理, span 属性覆盖了关键维度
- HTTP 中间件集成: NextEntry 中的 span 集成干净, 正确处理了错误场景
- 环境变量设计: 每个信号独立开关, 同时兼容标准 OTEL 环境变量
问题汇总
严重问题 (3 个)
- Span 状态码魔法数字重复 - 3 个文件中硬编码
SPAN_STATUS_CODE_ERROR = 2, 应统一导入 - 模块级
getMeter()调用时序问题 -metrics.ts在模块加载时调用getMeter(), 可能在configureMetrics()之前执行 z.url()类型变更 - Zod v4 中z.url()返回URL对象而非string, 可能导致下游运行时错误
建议改进 (4 个)
normalizeAttributes在 2 个文件中重复实现process.memoryUsage()per-step 调用存在性能隐患- Workflow dispatch 嵌套层级过深, 建议分离 observability 和业务逻辑
- 缺少
disposeMetrics()/disposeTracing()的 shutdown 调用
可选优化 (3 个)
- HTTP span 命名建议遵循 OTel 语义约定
- 内存增长指标是进程级别的, 并发 step 时无法归因
- 旧
@fastgpt-sdk/logger依赖可在后续清理
总体评价
- 代码质量: 4/5
- 安全性: 5/5
- 性能: 3/5
- 可维护性: 4/5
审查结论
需修改 - 建议修复严重问题后合并。整体方向正确, 是一个有价值的可观测性基础设施改进。
详细代码评论见下方行级注释。
|
✅ Build Successful - Preview sandbox Image for this PR: |
|
✅ Build Successful - Preview fastgpt Image for this PR: |
|
✅ Build Successful - Preview mcp_server Image for this PR: |
96e4511 to
9c6c5b1
Compare
* fix: 1.image read 2.JSON parsing error * dataset cite and pause * perf: plancall second parse * add test --------- Co-authored-by: archer <545436317@qq.com>
d1f4a7c to
31bb56a
Compare
|
|
|
|
* fix: image read and json error (Agent) (#6502) * fix: 1.image read 2.JSON parsing error * dataset cite and pause * perf: plancall second parse * add test --------- Co-authored-by: archer <545436317@qq.com> * master message * remove invalid code * feat(sre): integrate traces, logs, metrics into one sdk (#6580) * fix: image read and json error (Agent) (#6502) * fix: 1.image read 2.JSON parsing error * dataset cite and pause * perf: plancall second parse * add test --------- Co-authored-by: archer <545436317@qq.com> * master message * wip: otel sdk * feat(sre): integrate traces, logs, metrics into one sdk * fix(sre): use SpanStatusCode constants * fix(sre): clarify step memory measurement * update package * fix: ts --------- Co-authored-by: YeYuheng <57035043+YYH211@users.noreply.github.com> Co-authored-by: archer <545436317@qq.com> * doc * sandbox in agent (#6579) * doc * update template * fix: pr * fix: sdk package * update lock * update next * update dockerfile * dockerfile * dockerfile * update sdk version * update dockerefile * version --------- Co-authored-by: YeYuheng <57035043+YYH211@users.noreply.github.com> Co-authored-by: Ryo <whoeverimf5@gmail.com>
No description provided.