-
Notifications
You must be signed in to change notification settings - Fork 8
feat(service_manager):将发布系统接入服务管理模块 #92
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
- 新增部署任务状态常量"error"表示失败状态 - 重构部署任务表结构,使用服务名和版本作为联合主键 - 实现部署适配器处理与deploy模块的交互 - 新增部署任务API支持服务名和版本作为路径参数 - 完善部署任务创建、更新、暂停、继续和回滚逻辑 - 新增配置模块支持数据库配置加载 - 更新SQL schema文件反映部署任务表结构变更 - 优化API分组注释和代码结构
- 添加本地部署包存储路径支持,修改包URL构建逻辑 - 优化部署服务异步执行上下文传递 - 新增部署测试脚本和打包脚本 - 添加数据库schema文件用于部署模块 - 更新配置文件包含数据库连接和私钥信息
检查表达式是否已包含比较操作符,避免重复添加 更新告警Webhook服务地址配置
Summary of ChangesHello @JaD1ng, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! 本次拉取请求引入了一个新的Prometheus Adapter服务,旨在集中化指标查询和告警规则管理,从而连接内部系统与Prometheus。同时,它将部署系统集成到服务管理模块中,增强了其管理服务部署和回滚的能力。这些更改还通过添加HTTP延迟注入和详细的HTTP指标收集,改进了模拟服务的可观测性。 Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
本次 PR 引入了发布系统与服务管理模块的集成,并新增了一个完整的 prometheus_adapter
服务,功能变更范围非常大。代码整体结构清晰,特别是 prometheus_adapter
的分层设计值得肯定。然而,在实现中发现了一些严重问题,需要优先解决:
- 严重安全漏洞:
internal/deploy/config.yaml
文件中硬编码了 RSA 私钥,这是一个严重的安全风险,私钥绝不应该提交到版本控制系统中。 - 严重设计问题:
prometheus_adapter
服务通过执行docker
命令行来修改容器内的文件和重载配置。这种做法破坏了容器化的原则,非常脆弱且不安全,强耦合了应用和外部环境。 - 硬编码路径:多处代码和脚本中包含了开发者的本地绝对路径,导致代码不可移植。
此外,还存在一些配置加载逻辑混乱、数据库 schema 类型不匹配等问题。建议在合并前优先处理上述严重问题,以确保系统的安全性、稳定性和可维护性。
privateKey: | | ||
-----BEGIN RSA PRIVATE KEY----- | ||
MIICXQIBAAKBgQDZsfv1qscqYdy4vY+P4e3cAtmvppXQcRvrF1cB4drkv0haU24Y | ||
7m5qYtT52Kr539RdbKKdLAM6s20lWy7+5C0DgacdwYWd/7PeCELyEipZJL07Vro7 | ||
Ate8Bfjya+wltGK9+XNUIHiumUKULW4KDx21+1NLAUeJ6PeW+DAkmJWF6QIDAQAB | ||
AoGBAJlNxenTQj6OfCl9FMR2jlMJjtMrtQT9InQEE7m3m7bLHeC+MCJOhmNVBjaM | ||
ZpthDORdxIZ6oCuOf6Z2+Dl35lntGFh5J7S34UP2BWzF1IyyQfySCNexGNHKT1G1 | ||
XKQtHmtc2gWWthEg+S6ciIyw2IGrrP2Rke81vYHExPrexf0hAkEA9Izb0MiYsMCB | ||
/jemLJB0Lb3Y/B8xjGjQFFBQT7bmwBVjvZWZVpnMnXi9sWGdgUpxsCuAIROXjZ40 | ||
IRZ2C9EouwJBAOPjPvV8Sgw4vaseOqlJvSq/C/pIFx6RVznDGlc8bRg7SgTPpjHG | ||
4G+M3mVgpCX1a/EU1mB+fhiJ2LAZ/pTtY6sCQGaW9NwIWu3DRIVGCSMm0mYh/3X9 | ||
DAcwLSJoctiODQ1Fq9rreDE5QfpJnaJdJfsIJNtX1F+L3YceeBXtW0Ynz2MCQBI8 | ||
9KP274Is5FkWkUFNKnuKUK4WKOuEXEO+LpR+vIhs7k6WQ8nGDd4/mujoJBr5mkrw | ||
DPwqA3N5TMNDQVGv8gMCQQCaKGJgWYgvo3/milFfImbp+m7/Y3vCptarldXrYQWO | ||
AQjxwc71ZGBFDITYvdgJM1MTqc8xQek1FXn1vfpy2c6O | ||
-----END RSA PRIVATE KEY----- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cmd := exec.Command("docker", "exec", containerName, "sh", "-c", | ||
fmt.Sprintf("cat > /etc/prometheus/rules/alert_rules.yml << 'EOF'\n%s\nEOF", string(data))) | ||
|
||
if output, err := cmd.CombinedOutput(); err != nil { | ||
// 如果直接写入容器失败,尝试使用临时文件+docker cp | ||
log.Warn(). | ||
Err(err). | ||
Str("output", string(output)). | ||
Msg("Failed to write directly to container, trying docker cp") | ||
|
||
// 写入临时文件 | ||
tmpFile := "/tmp/prometheus_alert_rules.yml" | ||
if err := os.WriteFile(tmpFile, data, 0644); err != nil { | ||
return fmt.Errorf("failed to write temp rules file: %w", err) | ||
} | ||
|
||
// 使用docker cp复制到容器 | ||
if err := s.syncRuleFileToContainer(tmpFile); err != nil { | ||
return fmt.Errorf("failed to sync to container: %w", err) | ||
} | ||
|
||
// 清理临时文件 | ||
os.Remove(tmpFile) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
严重设计问题:在应用内部通过 exec.Command
调用 docker
CLI 来修改容器内的文件,这是非常危险且脆弱的做法。
这种方式存在以下问题:
- 破坏容器化原则:应用不应该感知到自己运行在容器中,更不应该依赖外部的
docker
命令。 - 安全性差:如果
containerName
等变量可被外部影响,可能导致命令注入漏洞。 - 可移植性差:强依赖于运行环境中存在并配置好了
docker
CLI。 - 脆弱性:任何
docker
命令的语法变化或环境问题都可能导致应用失败。
推荐的替代方案:
- 使用卷挂载(Volume Mounts):将 Prometheus 的规则目录挂载到
prometheus_adapter
容器的一个路径下,adapter
直接读写这个挂载的文件系统即可,Prometheus 会自动感知到变化。
func NewDeployAdapter(instanceManager InstanceManager) *DeployAdapter { | ||
return &DeployAdapter{ | ||
instanceManager: instanceManager, | ||
baseURL: "/Users/dingnanjia/workspace/mock/zeroops/internal/deploy/packages", // 本地包仓库路径 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
严重配置问题:代码中硬编码了一个开发者的本地绝对路径 "/Users/dingnanjia/workspace/mock/zeroops/internal/deploy/packages"
。
这会导致代码在任何其他环境或其他开发者机器上都无法正常工作。这个路径必须通过配置文件或环境变量进行配置,使其在不同部署环境中可以被修改。
baseURL: "/Users/dingnanjia/workspace/mock/zeroops/internal/deploy/packages", // 本地包仓库路径 | |
baseURL: os.Getenv("PACKAGE_REPO_BASE_URL"), // 或者从配置中读取 |
adapter, err := prometheusadapter.NewPrometheusAdapterServer(&config.Config{}) | ||
if err != nil { | ||
log.Fatal().Err(err).Msg("Failed to create Prometheus Adapter server") | ||
} | ||
|
||
// 获取 Prometheus Adapter 内部配置的绑定地址 | ||
bindAddr := ":9999" // 默认端口 | ||
if adapter.GetBindAddr() != "" { | ||
bindAddr = adapter.GetBindAddr() | ||
} | ||
|
||
// 如果有环境变量,优先使用环境变量的端口 | ||
if port := os.Getenv("ADAPTER_PORT"); port != "" { | ||
bindAddr = ":" + port | ||
} | ||
|
||
// 更新配置(虽然已经创建了 adapter,但需要端口信息用于启动服务器) | ||
cfg := &config.Config{ | ||
Server: config.ServerConfig{ | ||
BindAddr: bindAddr, | ||
}, | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
id VARCHAR(255) NOT NULL PRIMARY KEY, -- VARCHAR类型主键,非自增,不为空 | ||
service_name VARCHAR(255), | ||
service_version VARCHAR(255), | ||
host_id VARCHAR(255), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if err.Error() == fmt.Sprintf("rule '%s' not found", ruleName) { | ||
SendErrorResponse(c, http.StatusNotFound, model.ErrorCodeInvalidParameter, | ||
err.Error(), nil) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
func getDefaultConfig() *PrometheusAdapterConfig { | ||
return &PrometheusAdapterConfig{ | ||
Prometheus: PrometheusConfig{ | ||
Address: "http://10.210.10.33:9090", | ||
ContainerName: "mock-s3-prometheus", | ||
}, | ||
AlertWebhook: AlertWebhookConfig{ | ||
URL: "http://alert-module:8080/v1/integrations/alertmanager/webhook", | ||
PollingInterval: "10s", | ||
}, | ||
AlertRules: AlertRulesConfig{ | ||
LocalFile: "../rules/alert_rules.yml", | ||
PrometheusRulesDir: "/etc/prometheus/rules/", | ||
}, | ||
Server: ServerConfig{ | ||
BindAddr: "0.0.0.0:9999", | ||
}, | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
log_info "打包服务..." | ||
|
||
local package_name="storage-${VERSION}.tar.gz" | ||
local project_root="/Users/dingnanjia/workspace/mock/zeroops" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# 配置参数 | ||
SERVICE_NAME="storage" | ||
BASE_URL="http://localhost:8080" | ||
PACKAGE_PATH="/Users/dingnanjia/workspace/mock/zeroops/internal/deploy/packages" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
func NewDeployAdapter(instanceManager InstanceManager) *DeployAdapter { | ||
return &DeployAdapter{ | ||
instanceManager: instanceManager, | ||
baseURL: "/Users/dingnanjia/workspace/mock/zeroops/internal/deploy/packages", // 本地包仓库路径 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚨 CRITICAL: Hardcoded absolute path
This hardcoded path will fail in any environment except your local machine. This breaks production deployments, CI/CD, and other developers' environments.
Fix: Use environment variable or configuration:
baseURL := os.Getenv("PACKAGE_REPO_PATH")
if baseURL == "" {
baseURL = "./packages" // Relative default
}
|
||
// 创建元信息 | ||
if len(rule.Labels) > 0 { | ||
labelsJSON, _ := json.Marshal(rule.Labels) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🔴 HIGH: Ignored JSON error causes silent data corruption
If marshaling fails, labelsJSON
will be nil
and the error is ignored. This can lead to invalid data being stored without any indication.
Fix:
labelsJSON, err := json.Marshal(rule.Labels)
if err != nil {
log.Warn().Err(err).Msg("Failed to marshal labels")
continue
}
service := &AlertService{ | ||
promClient: promClient, | ||
config: config, | ||
currentRules: []model.AlertRule{}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
currentRules
and currentRuleMetas
are accessed from HTTP handlers (concurrent goroutines) without any mutex protection. This can cause data races, panics, or corrupted state.
Fix: Add mutex:
type AlertService struct {
mu sync.RWMutex
currentRules []model.AlertRule
currentRuleMetas []model.AlertRuleMeta
// ...
}
Then use s.mu.Lock()
/s.mu.RLock()
in all methods that access these fields.
return &AlertmanagerService{ | ||
config: config, | ||
webhookURL: config.AlertWebhook.URL, | ||
httpClient: &http.Client{Timeout: 30 * time.Second}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
⚡ PERFORMANCE: Missing connection pooling
HTTP client without configured Transport
will not reuse connections, causing:
- Increased latency (TCP handshake overhead)
- Port exhaustion under load
- High memory usage
Fix:
httpClient: &http.Client{
Timeout: 30 * time.Second,
Transport: &http.Transport{
MaxIdleConns: 100,
MaxIdleConnsPerHost: 10,
IdleConnTimeout: 90 * time.Second,
},
}
|
||
// 解析 Prometheus 发送的告警 | ||
var alerts []model.AlertmanagerAlert | ||
body, err := io.ReadAll(r.Body) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🔒 SECURITY: Unbounded request body read
Reading entire request body without size limits can cause OOM with large/malicious payloads.
Fix:
body, err := io.ReadAll(io.LimitReader(r.Body, 10*1024*1024)) // 10MB limit
@@ -0,0 +1,122 @@ | |||
package api |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚨 SECURITY CRITICAL: No authentication or authorization
All API endpoints are completely unauthenticated. Anyone with network access can:
- Create/update/delete alert rules
- Query metrics
- Trigger deployments
This is a critical security vulnerability for production use.
Recommendations:
- Implement API key or JWT authentication middleware
- Add authorization checks for different operations (RBAC)
- Add rate limiting to prevent abuse
- Consider mTLS for service-to-service communication
This should be addressed before merging to production.
综合评审总结通过代码质量、性能、安全和文档四个维度的专业评审,发现以下关键问题: 🚨 关键问题(必须修复)
|
添加MockS3的QPS监控仪表板,包含服务QPS、总QPS、按路径QPS等面板 修改docker-compose配置,启用Grafana匿名访问并更改端口映射
变更背景和解决方案
关联issue: #
文档更新(架构文档、API文档、升级文档)
Checklist