feat(service_manager) 重构服务管理模块 #50

JaD1ng · 2025-09-11T09:09:13Z

重构服务管理系统，将原有release-system模块整合到service_manager中，包含以下主要变更：

新增服务基础信息、版本、实例等核心模型定义
实现服务管理相关API接口
添加部署任务管理和批次控制功能
完善数据库操作层和业务逻辑层
定义服务状态监控和错误处理机制

移除不再使用的release-system模块及相关文件

添加发布系统核心模块，包括： - 服务信息仓储接口定义 - 发布任务仓储接口定义 - 部署任务和服务信息模型 - 外部系统适配器接口

重构服务管理系统，将原有release-system模块整合到service_manager中，包含以下主要变更： 1. 新增服务基础信息、版本、实例等核心模型定义 2. 实现服务管理相关API接口 3. 添加部署任务管理和批次控制功能 4. 完善数据库操作层和业务逻辑层 5. 定义服务状态监控和错误处理机制移除不再使用的release-system模块及相关文件

niupilot · 2025-09-11T09:09:16Z

Code Review for PR #50: Service Manager Refactoring

Review Tasks:

Gather context and understand the refactoring scope
Review architecture and design decisions
Analyze code quality and best practices
Check for potential bugs and security issues
Review API design and database operations
Verify error handling and validation
Check documentation consistency
Provide comprehensive feedback

Overview

This PR represents a significant refactoring effort that consolidates the release-system module into the service_manager. The refactoring introduces a well-structured layered architecture with clear separation of concerns across models, APIs, database operations, and business logic.

✅ Positive Aspects

1. Excellent Architecture Design

Clean layered architecture: API → Service → Database
Proper separation of concerns with dedicated packages for each layer
Well-defined models with appropriate data structures
Consistent use of context throughout the application

2. Comprehensive API Design

RESTful API endpoints with proper HTTP methods
Consistent error response format across all endpoints
Good use of HTTP status codes (400, 404, 409, 500)
Proper parameter validation and binding

3. Robust Error Handling

Custom error types defined in service/errors.go:6-11
Consistent error propagation from database → service → API layers
Proper logging with structured fields using zerolog

4. Database Layer Design

Transaction support with proper rollback handling
Parameterized queries preventing SQL injection
Proper connection management and context usage

⚠️ Issues and Improvements Needed

1. Critical: Inconsistent ID Types

File: service/deploy_service.go:218

return s.db.GetDeployBatches(ctx, 1) // 临时硬编码，需要根据实际deployID转换

Issue: Hardcoded ID conversion indicates fundamental type inconsistency
Impact: Will cause runtime errors when deployID is not "1"
Fix: Standardize ID types (either all string or all int64) across the system

2. String-Based Error Comparison

File: api/info_api.go:202, 246, 281

if err.Error() == "service not found" {

Issue: Error comparison using string matching is fragile
Fix: Use typed errors like in deployment APIs:

if err == service.ErrServiceNotFound {

3. Missing Input Validation

File: api/deploy_api.go:43-49

if req.Service == "" || req.Version == "" {
    c.JSON(http.StatusBadRequest, map[string]any{
        "error":   "bad request", 
        "message": "service and version are required",
    })
    return
}

Issue: Redundant validation since model already has binding:"required" tags
Recommendation: Remove redundant checks or explain why both are needed

4. Potential SQL Injection Risk

File: database/deploy_repo.go:129-156

sql := `UPDATE service_deploy_tasks SET `
// ... dynamic SQL building
sql += updates[0]
for i := 1; i < len(updates); i++ {
    sql += ", " + updates[i]
}

Risk: Dynamic SQL construction could be vulnerable
Current State: Appears safe due to hardcoded field names, but brittle
Recommendation: Consider using a query builder or more explicit field handling

5. Missing Transaction in Critical Operations

File: database/deploy_repo.go:167-177

func (d *Database) PauseDeployment(ctx context.Context, deployID string) error {
    query := `UPDATE service_deploy_tasks SET deploy_state = ? WHERE id = ? AND deploy_state = ?`
    _, err := d.ExecContext(ctx, query, model.StatusStop, deployID, model.StatusDeploying)
    return err
}

Issue: State-changing operations lack transaction protection
Risk: Inconsistent state if multiple operations fail mid-process

6. Incomplete Model Definitions

File: model/service.go:4-7

type Service struct {
    Name string   `json:"name" db:"name"`
    Deps []string `json:"deps" db:"deps"`
}

Issue: Very minimal service model
Missing: Created/updated timestamps, description, owner, etc.
Impact: May need schema changes later

7. API Design Concerns

Inconsistent Response Formats:

Some endpoints return {"items": [...]}
Others return the array directly
Some return {"message": "success", "service": "name"}

Missing Pagination:

GET /v1/deployments has limit but no offset/cursor pagination
Could cause performance issues with large datasets

8. Database Connection Not Initialized

File: database/database.go:15-18

func NewDatabase(cfg *config.DatabaseConfig) (*Database, error) {
    database := &Database{config: cfg}
    return database, nil
}

Issue: Database connection is never actually established
Impact: All database operations will fail with nil pointer errors

🔧 Recommendations

High Priority

Fix ID type consistency - Choose string or int64 and use consistently
Implement proper database connection initialization
Replace string error comparisons with typed errors
Add transactions to state-changing operations

Medium Priority

Standardize API response formats
Add proper pagination support
Enhance service model with standard fields
Add comprehensive input validation

Low Priority

Add API documentation/OpenAPI specs
Consider adding health check endpoints
Add metric endpoints for monitoring

📚 Documentation Consistency

✅ Code comments are in Chinese, consistent with team preference
✅ API endpoint documentation in comments is helpful
❌ Missing: README updates for new service_manager architecture
❌ Missing: API documentation for the new endpoints

🎯 Overall Assessment

This is a well-architected refactoring that demonstrates good software engineering principles. The layered architecture, error handling, and API design show mature development practices. However, there are some critical issues (ID type consistency, database initialization) that must be addressed before merging.

Recommendation: Address the high-priority issues before merging. The architecture is solid, but the implementation needs these fixes to be production-ready.

👉 Want me to re-run this review? Just type /review in the comments! For more usage examples, visit CodeAgent GitHub Repository.

gemini-code-assist

Summary of Changes

Hello @JaD1ng, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

本次拉取请求对服务管理系统进行了大规模重构，核心目标是将原有的发布系统功能整合进服务管理模块，从而提供一个统一且功能更完善的服务生命周期管理平台。这不仅新增了对服务基本信息、版本、实例的精细化管理能力，还实现了部署任务的创建、查询、更新、删除以及暂停、继续、回滚等全流程控制，显著提升了系统的服务治理和自动化部署能力。

Highlights

服务管理系统重构与模块整合: 将原有的release-system模块完全整合到service_manager中，实现了统一的服务管理平台。
新增核心模型定义: 引入了服务基础信息、服务版本、服务实例等核心数据模型，为精细化管理奠定基础。
实现全面的API接口: 新增了服务信息管理（CRUD）和部署任务管理（CRUD、暂停、继续、回滚）的API接口，提供了丰富的功能操作。
强化数据库操作层与业务逻辑层: 数据库层增加了上下文感知操作，并细化了服务信息和部署任务的仓储逻辑；业务逻辑层则分离了服务信息和部署管理的职责，提升了代码的可维护性。
定义服务状态监控与错误处理机制: 引入了服务健康状态、异常处理状态等常量，并定义了业务层面的错误类型，为后续的服务监控和异常处理提供了基础。
移除废弃模块: 彻底移除了不再使用的release-system模块及其相关文件，精简了代码库。

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in issue comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

这次的PR重构了服务管理系统，整合了原有的release-system模块，并增加了大量新功能，包括服务信息管理、部署任务管理等。整体结构清晰，分层合理。代码在API层、服务层和数据访问层都做了很好的划分。

我发现了一些可以改进的地方，主要集中在以下几个方面：

代码重复：API处理函数中有重复的错误处理逻辑，可以抽象成辅助函数来简化代码。
错误处理：部分地方使用了字符串比较来判断错误类型，这是一种脆弱的实践，建议使用errors.Is。
性能问题：存在N+1查询问题，可能会在服务数量多时导致性能瓶颈。
硬编码和TODO：代码中存在一些硬编码的ID和TODO注释，这些是未完成的功能或潜在的bug，需要修复。

具体的修改建议请看下面的评论。

gemini-code-assist · 2025-09-11T09:11:36Z

internal/service_manager/service/deploy_service.go

+	// TODO:将deployID转换为int (这里简化处理)
+	// 实际项目中应该在数据库层统一ID类型
+	return s.db.GetDeployBatches(ctx, 1) // 临时硬编码，需要根据实际deployID转换


GetDeployBatches 函数中硬编码了 deployID 为 1，并且有一个 TODO 注释表明这是一个临时解决方案。这是一个严重的bug，会导致函数总是查询同一个部署任务的批次，而不是传入的 deployID 对应的任务。需要将字符串类型的 deployID 转换为整数，并传递给数据库查询。请记得导入 strconv 包。

deployIDInt, err := strconv.ParseInt(deployID, 10, 64) if err != nil { // 或者返回一个更具体的错误 return nil, err } return s.db.GetDeployBatches(ctx, int(deployIDInt))

gemini-code-assist · 2025-09-11T09:11:36Z

internal/service_manager/service/info_service.go

+
+		activeVersion := model.ActiveVersionItem{
+			Version:                 version,
+			DeployID:                "1001", // TODO:临时值，实际需要从部署任务中获取


GetServiceActiveVersions 函数中 DeployID 被硬编码为 "1001"。这是一个bug，会导致所有活跃版本都返回错误的部署ID。这里应该实现从部署任务中获取正确的 DeployID 的逻辑。

Suggested change

DeployID: "1001", // TODO:临时值，实际需要从部署任务中获取

DeployID: "", // TODO:临时值，实际需要从部署任务中获取

gemini-code-assist · 2025-09-11T09:11:37Z

internal/service_manager/api/info_api.go

+		if err.Error() == "service not found" {
+			c.JSON(http.StatusNotFound, map[string]any{
+				"error":   "not found",
+				"message": "service not found",
+			})
+			return
+		}


此处使用 err.Error() == "service not found" 来检查错误类型，这是一种脆弱的实践，因为错误信息文本可能会改变。建议使用 errors.Is 和哨兵错误（sentinel error）进行比较，这样代码会更健壮。
你需要：

在 info_api.go 中导入 errors 和 github.com/qiniu/zeroops/internal/service_manager/service 包。

使用 errors.Is(err, service.ErrServiceNotFound) 进行判断。
这个建议同样适用于 UpdateService 和 DeleteService 函数中的类似逻辑。

Suggested change

if err.Error() == "service not found" {

c.JSON(http.StatusNotFound, map[string]any{

"error": "not found",

"message": "service not found",

})

return

}

if errors.Is(err, service.ErrServiceNotFound) {

c.JSON(http.StatusNotFound, map[string]any{

"error": "not found",

"message": "service not found",

})

return

}

gemini-code-assist · 2025-09-11T09:11:37Z

internal/service_manager/service/info_service.go

+	for i, service := range services {
+		// 获取服务状态来确定健康状态
+		state, err := s.db.GetServiceState(ctx, service.Name)
+		if err != nil {
+			log.Error().Err(err).Str("service", service.Name).Msg("failed to get service state")
+		}


GetServicesResponse 函数在循环中调用 s.db.GetServiceState，这会导致N+1查询问题。当服务数量很大时，会产生大量的数据库查询，严重影响性能。建议优化为一次性查询所有服务的最新状态，然后在内存中进行匹配。
例如，你可以实现一个 GetLatestStatesForServices(ctx, serviceNames) 的数据库函数。

gemini-code-assist · 2025-09-11T09:11:37Z

internal/service_manager/api/deploy_api.go

+func (api *Api) CreateDeployment(c *fox.Context) {
+	ctx := c.Request.Context()
+
+	var req model.CreateDeploymentRequest
+	if err := c.ShouldBindJSON(&req); err != nil {
+		c.JSON(http.StatusBadRequest, map[string]any{
+			"error":   "bad request",
+			"message": "invalid request body: " + err.Error(),
+		})
+		return
+	}
+
+	if req.Service == "" || req.Version == "" {
+		c.JSON(http.StatusBadRequest, map[string]any{
+			"error":   "bad request",
+			"message": "service and version are required",
+		})
+		return
+	}
+
+	deployID, err := api.service.CreateDeployment(ctx, &req)
+	if err != nil {
+		if err == service.ErrServiceNotFound {
+			c.JSON(http.StatusBadRequest, map[string]any{
+				"error":   "bad request",
+				"message": "service not found",
+			})
+			return
+		}
+		if err == service.ErrDeploymentConflict {
+			c.JSON(http.StatusConflict, map[string]any{
+				"error":   "conflict",
+				"message": "deployment conflict: service version already in deployment",
+			})
+			return
+		}
+		log.Error().Err(err).
+			Str("service", req.Service).
+			Str("version", req.Version).
+			Msg("failed to create deployment")
+		c.JSON(http.StatusInternalServerError, map[string]any{
+			"error":   "internal server error",
+			"message": "failed to create deployment",
+		})
+		return
+	}
+
+	c.JSON(http.StatusCreated, map[string]any{
+		"id":      deployID,
+		"message": "deployment created successfully",
+	})
+}


在这个文件中的多个API处理函数（如 CreateDeployment, GetDeploymentByID 等）中，错误处理和JSON响应的逻辑有大量重复。这会增加代码维护的难度。建议提取一个辅助函数来统一处理错误响应，使代码更简洁、更易于维护。
例如，可以创建一个这样的辅助函数：

func (api *Api) sendError(c *fox.Context, statusCode int, errCode, message string) { c.JSON(statusCode, map[string]any{ "error": errCode, "message": message, }) }

然后，CreateDeployment中的错误处理可以简化为：

if err := c.ShouldBindJSON(&req); err != nil { api.sendError(c, http.StatusBadRequest, "bad request", "invalid request body: "+err.Error()) return }

gemini-code-assist · 2025-09-11T09:11:37Z

internal/service_manager/api/deploy_api.go

+	if limitStr := c.Query("limit"); limitStr != "" {
+		if limit, err := strconv.Atoi(limitStr); err == nil {
+			query.Limit = limit
+		}
+	}


当解析limit查询参数失败时，代码选择忽略错误。虽然这在某些情况下可以接受，但更好的做法是向客户端返回一个400 Bad Request错误。这样可以让API的使用者明确知道他们提供了无效的参数。

if limitStr := c.Query("limit"); limitStr != "" { limit, err := strconv.Atoi(limitStr) if err != nil { c.JSON(http.StatusBadRequest, map[string]any{ "error": "bad request", "message": "invalid limit parameter", }) return } query.Limit = limit }

gemini-code-assist · 2025-09-11T09:11:37Z

internal/service_manager/database/deploy_repo.go

+func (d *Database) UpdateDeployment(ctx context.Context, deployID string, req *model.UpdateDeploymentRequest) error {
+	sql := `UPDATE service_deploy_tasks SET `
+	args := []any{}
+	updates := []string{}
+
+	if req.Version != "" {
+		updates = append(updates, "version = ?")
+		args = append(args, req.Version)
+	}
+
+	if req.ScheduleTime != nil {
+		updates = append(updates, "deploy_begin_time = ?")
+		args = append(args, req.ScheduleTime)
+	}
+
+	if len(updates) == 0 {
+		return nil
+	}
+
+	sql += updates[0]
+	for i := 1; i < len(updates); i++ {
+		sql += ", " + updates[i]
+	}
+
+	sql += " WHERE id = ? AND deploy_state = ?"
+	args = append(args, deployID, model.StatusUnrelease)
+
+	_, err := d.ExecContext(ctx, sql, args...)
+	return err
+}


在 UpdateDeployment 函数中，动态构建SQL更新语句的部分可以通过 strings.Join 来简化，使代码更简洁易读。请记得导入 strings 包。

func (d *Database) UpdateDeployment(ctx context.Context, deployID string, req *model.UpdateDeploymentRequest) error { args := []any{} updates := []string{} if req.Version != "" { updates = append(updates, "version = ?") args = append(args, req.Version) } if req.ScheduleTime != nil { updates = append(updates, "deploy_begin_time = ?") args = append(args, req.ScheduleTime) } if len(updates) == 0 { return nil } query := `UPDATE service_deploy_tasks SET ` + strings.Join(updates, ", ") + ` WHERE id = ? AND deploy_state = ?` args = append(args, deployID, model.StatusUnrelease) _, err := d.ExecContext(ctx, query, args...) return err }

gemini-code-assist · 2025-09-11T09:11:37Z

internal/service_manager/service/info_service.go

+	// TODO:根据类型过滤（这里简化处理，实际需要根据业务需求过滤）
+	if versionType == "unrelease" {
+		// 返回未发布的版本，这里简化返回所有版本
+		return versions, nil
+	}


GetServiceAvailableVersions 函数的过滤逻辑尚未完成，如 TODO 注释所示。当前实现会直接返回所有版本，而不是根据 versionType 进行过滤。请完成这部分逻辑以确保API按预期工作。

gemini-code-assist · 2025-09-11T09:11:37Z

internal/service_manager/service/info_service.go

+func (s *Service) GetServiceMetricTimeSeries(ctx context.Context, serviceName, metricName string, query *model.MetricTimeSeriesQuery) (*model.PrometheusQueryRangeResponse, error) {
+	// TODO:这里应该调用实际的Prometheus或其他监控系统API
+	// 现在返回模拟数据
+
+	response := &model.PrometheusQueryRangeResponse{
+		Status: "success",
+		Data: model.PrometheusQueryRangeData{
+			ResultType: "matrix",
+			Result: []model.PrometheusTimeSeries{
+				{
+					Metric: map[string]string{
+						"__name__": metricName,
+						"service":  serviceName,
+						"instance": "instance-1",
+						"version":  query.Version,
+					},
+					Values: [][]any{
+						{1435781430.781, "1.2"},
+						{1435781445.781, "1.5"},
+						{1435781460.781, "1.1"},
+					},
+				},
+				{
+					Metric: map[string]string{
+						"__name__": metricName,
+						"service":  serviceName,
+						"instance": "instance-2",
+						"version":  query.Version,
+					},
+					Values: [][]any{
+						{1435781430.781, "0.8"},
+						{1435781445.781, "0.9"},
+						{1435781460.781, "1.0"},
+					},
+				},
+			},
+		},
+	}
+
+	return response, nil
+}


GetServiceMetricTimeSeries 函数目前返回的是模拟数据，并且有 TODO 注释。为了使功能完整，这里需要实现对实际监控系统（如Prometheus）的API调用来获取真实的时序指标数据。

JaD1ng added 2 commits September 11, 2025 13:56

feat: 初始化发布系统基础模块和模型定义

2250f38

添加发布系统核心模块，包括： - 服务信息仓储接口定义 - 发布任务仓储接口定义 - 部署任务和服务信息模型 - 外部系统适配器接口

gemini-code-assist bot reviewed Sep 11, 2025

View reviewed changes

JaD1ng changed the title ~~Feat/issue 41~~ feat(service_manager) 重构服务管理模块 Sep 11, 2025

gemini-code-assist bot reviewed Sep 11, 2025

View reviewed changes

refactor(api): 使用错误常量替代字符串比较以提高可维护性

be8ff6c

JaD1ng merged commit 39b66f7 into qiniu:develop Sep 11, 2025

	DeployID: "1001", // TODO:临时值，实际需要从部署任务中获取
	DeployID: "", // TODO:临时值，实际需要从部署任务中获取

feat(service_manager) 重构服务管理模块 #50

feat(service_manager) 重构服务管理模块 #50

Uh oh!

Conversation

JaD1ng commented Sep 11, 2025

Uh oh!

niupilot bot commented Sep 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review for PR #50: Service Manager Refactoring

Review Tasks:

Overview

✅ Positive Aspects

1. Excellent Architecture Design

2. Comprehensive API Design

3. Robust Error Handling

4. Database Layer Design

⚠️ Issues and Improvements Needed

1. Critical: Inconsistent ID Types

2. String-Based Error Comparison

3. Missing Input Validation

4. Potential SQL Injection Risk

5. Missing Transaction in Critical Operations

6. Incomplete Model Definitions

7. API Design Concerns

8. Database Connection Not Initialized

🔧 Recommendations

High Priority

Medium Priority

Low Priority

📚 Documentation Consistency

🎯 Overall Assessment

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Sep 11, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Sep 11, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Sep 11, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Sep 11, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Sep 11, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Sep 11, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Sep 11, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Sep 11, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Sep 11, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

niupilot bot commented Sep 11, 2025 •

edited

Loading