Skip to content

Conversation

@JaD1ng
Copy link
Collaborator

@JaD1ng JaD1ng commented Sep 11, 2025

重构服务管理系统,将原有release-system模块整合到service_manager中,包含以下主要变更:

  1. 新增服务基础信息、版本、实例等核心模型定义
  2. 实现服务管理相关API接口
  3. 添加部署任务管理和批次控制功能
  4. 完善数据库操作层和业务逻辑层
  5. 定义服务状态监控和错误处理机制

移除不再使用的release-system模块及相关文件

添加发布系统核心模块,包括:
- 服务信息仓储接口定义
- 发布任务仓储接口定义
- 部署任务和服务信息模型
- 外部系统适配器接口
重构服务管理系统,将原有release-system模块整合到service_manager中,包含以下主要变更:

1. 新增服务基础信息、版本、实例等核心模型定义
2. 实现服务管理相关API接口
3. 添加部署任务管理和批次控制功能
4. 完善数据库操作层和业务逻辑层
5. 定义服务状态监控和错误处理机制

移除不再使用的release-system模块及相关文件
@niupilot
Copy link

niupilot bot commented Sep 11, 2025

Code Review for PR #50: Service Manager Refactoring

Review Tasks:

  • Gather context and understand the refactoring scope
  • Review architecture and design decisions
  • Analyze code quality and best practices
  • Check for potential bugs and security issues
  • Review API design and database operations
  • Verify error handling and validation
  • Check documentation consistency
  • Provide comprehensive feedback

Overview

This PR represents a significant refactoring effort that consolidates the release-system module into the service_manager. The refactoring introduces a well-structured layered architecture with clear separation of concerns across models, APIs, database operations, and business logic.

✅ Positive Aspects

1. Excellent Architecture Design

  • Clean layered architecture: API → Service → Database
  • Proper separation of concerns with dedicated packages for each layer
  • Well-defined models with appropriate data structures
  • Consistent use of context throughout the application

2. Comprehensive API Design

  • RESTful API endpoints with proper HTTP methods
  • Consistent error response format across all endpoints
  • Good use of HTTP status codes (400, 404, 409, 500)
  • Proper parameter validation and binding

3. Robust Error Handling

  • Custom error types defined in service/errors.go:6-11
  • Consistent error propagation from database → service → API layers
  • Proper logging with structured fields using zerolog

4. Database Layer Design

  • Transaction support with proper rollback handling
  • Parameterized queries preventing SQL injection
  • Proper connection management and context usage

⚠️ Issues and Improvements Needed

1. Critical: Inconsistent ID Types

File: service/deploy_service.go:218

return s.db.GetDeployBatches(ctx, 1) // 临时硬编码,需要根据实际deployID转换
  • Issue: Hardcoded ID conversion indicates fundamental type inconsistency
  • Impact: Will cause runtime errors when deployID is not "1"
  • Fix: Standardize ID types (either all string or all int64) across the system

2. String-Based Error Comparison

File: api/info_api.go:202, 246, 281

if err.Error() == "service not found" {
  • Issue: Error comparison using string matching is fragile
  • Fix: Use typed errors like in deployment APIs:
if err == service.ErrServiceNotFound {

3. Missing Input Validation

File: api/deploy_api.go:43-49

if req.Service == "" || req.Version == "" {
    c.JSON(http.StatusBadRequest, map[string]any{
        "error":   "bad request", 
        "message": "service and version are required",
    })
    return
}
  • Issue: Redundant validation since model already has binding:"required" tags
  • Recommendation: Remove redundant checks or explain why both are needed

4. Potential SQL Injection Risk

File: database/deploy_repo.go:129-156

sql := `UPDATE service_deploy_tasks SET `
// ... dynamic SQL building
sql += updates[0]
for i := 1; i < len(updates); i++ {
    sql += ", " + updates[i]
}
  • Risk: Dynamic SQL construction could be vulnerable
  • Current State: Appears safe due to hardcoded field names, but brittle
  • Recommendation: Consider using a query builder or more explicit field handling

5. Missing Transaction in Critical Operations

File: database/deploy_repo.go:167-177

func (d *Database) PauseDeployment(ctx context.Context, deployID string) error {
    query := `UPDATE service_deploy_tasks SET deploy_state = ? WHERE id = ? AND deploy_state = ?`
    _, err := d.ExecContext(ctx, query, model.StatusStop, deployID, model.StatusDeploying)
    return err
}
  • Issue: State-changing operations lack transaction protection
  • Risk: Inconsistent state if multiple operations fail mid-process

6. Incomplete Model Definitions

File: model/service.go:4-7

type Service struct {
    Name string   `json:"name" db:"name"`
    Deps []string `json:"deps" db:"deps"`
}
  • Issue: Very minimal service model
  • Missing: Created/updated timestamps, description, owner, etc.
  • Impact: May need schema changes later

7. API Design Concerns

Inconsistent Response Formats:

  • Some endpoints return {"items": [...]}
  • Others return the array directly
  • Some return {"message": "success", "service": "name"}

Missing Pagination:

  • GET /v1/deployments has limit but no offset/cursor pagination
  • Could cause performance issues with large datasets

8. Database Connection Not Initialized

File: database/database.go:15-18

func NewDatabase(cfg *config.DatabaseConfig) (*Database, error) {
    database := &Database{config: cfg}
    return database, nil
}
  • Issue: Database connection is never actually established
  • Impact: All database operations will fail with nil pointer errors

🔧 Recommendations

High Priority

  1. Fix ID type consistency - Choose string or int64 and use consistently
  2. Implement proper database connection initialization
  3. Replace string error comparisons with typed errors
  4. Add transactions to state-changing operations

Medium Priority

  1. Standardize API response formats
  2. Add proper pagination support
  3. Enhance service model with standard fields
  4. Add comprehensive input validation

Low Priority

  1. Add API documentation/OpenAPI specs
  2. Consider adding health check endpoints
  3. Add metric endpoints for monitoring

📚 Documentation Consistency

  • ✅ Code comments are in Chinese, consistent with team preference
  • ✅ API endpoint documentation in comments is helpful
  • Missing: README updates for new service_manager architecture
  • Missing: API documentation for the new endpoints

🎯 Overall Assessment

This is a well-architected refactoring that demonstrates good software engineering principles. The layered architecture, error handling, and API design show mature development practices. However, there are some critical issues (ID type consistency, database initialization) that must be addressed before merging.

Recommendation: Address the high-priority issues before merging. The architecture is solid, but the implementation needs these fixes to be production-ready.


👉 Want me to re-run this review? Just type /review in the comments! For more usage examples, visit CodeAgent GitHub Repository.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary of Changes

Hello @JaD1ng, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

本次拉取请求对服务管理系统进行了大规模重构,核心目标是将原有的发布系统功能整合进服务管理模块,从而提供一个统一且功能更完善的服务生命周期管理平台。这不仅新增了对服务基本信息、版本、实例的精细化管理能力,还实现了部署任务的创建、查询、更新、删除以及暂停、继续、回滚等全流程控制,显著提升了系统的服务治理和自动化部署能力。

Highlights

  • 服务管理系统重构与模块整合: 将原有的release-system模块完全整合到service_manager中,实现了统一的服务管理平台。
  • 新增核心模型定义: 引入了服务基础信息、服务版本、服务实例等核心数据模型,为精细化管理奠定基础。
  • 实现全面的API接口: 新增了服务信息管理(CRUD)和部署任务管理(CRUD、暂停、继续、回滚)的API接口,提供了丰富的功能操作。
  • 强化数据库操作层与业务逻辑层: 数据库层增加了上下文感知操作,并细化了服务信息和部署任务的仓储逻辑;业务逻辑层则分离了服务信息和部署管理的职责,提升了代码的可维护性。
  • 定义服务状态监控与错误处理机制: 引入了服务健康状态、异常处理状态等常量,并定义了业务层面的错误类型,为后续的服务监控和异常处理提供了基础。
  • 移除废弃模块: 彻底移除了不再使用的release-system模块及其相关文件,精简了代码库。
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@JaD1ng JaD1ng changed the title Feat/issue 41 feat(service_manager) 重构服务管理模块 Sep 11, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

这次的PR重构了服务管理系统,整合了原有的release-system模块,并增加了大量新功能,包括服务信息管理、部署任务管理等。整体结构清晰,分层合理。代码在API层、服务层和数据访问层都做了很好的划分。

我发现了一些可以改进的地方,主要集中在以下几个方面:

  • 代码重复:API处理函数中有重复的错误处理逻辑,可以抽象成辅助函数来简化代码。
  • 错误处理:部分地方使用了字符串比较来判断错误类型,这是一种脆弱的实践,建议使用errors.Is
  • 性能问题:存在N+1查询问题,可能会在服务数量多时导致性能瓶颈。
  • 硬编码和TODO:代码中存在一些硬编码的ID和TODO注释,这些是未完成的功能或潜在的bug,需要修复。

具体的修改建议请看下面的评论。

Comment on lines +216 to +218
// TODO:将deployID转换为int (这里简化处理)
// 实际项目中应该在数据库层统一ID类型
return s.db.GetDeployBatches(ctx, 1) // 临时硬编码,需要根据实际deployID转换
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

GetDeployBatches 函数中硬编码了 deployID1,并且有一个 TODO 注释表明这是一个临时解决方案。这是一个严重的bug,会导致函数总是查询同一个部署任务的批次,而不是传入的 deployID 对应的任务。需要将字符串类型的 deployID 转换为整数,并传递给数据库查询。请记得导入 strconv 包。

	deployIDInt, err := strconv.ParseInt(deployID, 10, 64)
	if err != nil {
		// 或者返回一个更具体的错误
		return nil, err
	}
	return s.db.GetDeployBatches(ctx, int(deployIDInt))


activeVersion := model.ActiveVersionItem{
Version: version,
DeployID: "1001", // TODO:临时值,实际需要从部署任务中获取
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

GetServiceActiveVersions 函数中 DeployID 被硬编码为 "1001"。这是一个bug,会导致所有活跃版本都返回错误的部署ID。这里应该实现从部署任务中获取正确的 DeployID 的逻辑。

Suggested change
DeployID: "1001", // TODO:临时值,实际需要从部署任务中获取
DeployID: "", // TODO:临时值,实际需要从部署任务中获取

Comment on lines 202 to 208
if err.Error() == "service not found" {
c.JSON(http.StatusNotFound, map[string]any{
"error": "not found",
"message": "service not found",
})
return
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

此处使用 err.Error() == "service not found" 来检查错误类型,这是一种脆弱的实践,因为错误信息文本可能会改变。建议使用 errors.Is 和哨兵错误(sentinel error)进行比较,这样代码会更健壮。
你需要:

  1. info_api.go 中导入 errorsgithub.com/qiniu/zeroops/internal/service_manager/service 包。
  2. 使用 errors.Is(err, service.ErrServiceNotFound) 进行判断。
    这个建议同样适用于 UpdateServiceDeleteService 函数中的类似逻辑。
Suggested change
if err.Error() == "service not found" {
c.JSON(http.StatusNotFound, map[string]any{
"error": "not found",
"message": "service not found",
})
return
}
if errors.Is(err, service.ErrServiceNotFound) {
c.JSON(http.StatusNotFound, map[string]any{
"error": "not found",
"message": "service not found",
})
return
}

Comment on lines +22 to +27
for i, service := range services {
// 获取服务状态来确定健康状态
state, err := s.db.GetServiceState(ctx, service.Name)
if err != nil {
log.Error().Err(err).Str("service", service.Name).Msg("failed to get service state")
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

GetServicesResponse 函数在循环中调用 s.db.GetServiceState,这会导致N+1查询问题。当服务数量很大时,会产生大量的数据库查询,严重影响性能。建议优化为一次性查询所有服务的最新状态,然后在内存中进行匹配。
例如,你可以实现一个 GetLatestStatesForServices(ctx, serviceNames) 的数据库函数。

Comment on lines +31 to +82
func (api *Api) CreateDeployment(c *fox.Context) {
ctx := c.Request.Context()

var req model.CreateDeploymentRequest
if err := c.ShouldBindJSON(&req); err != nil {
c.JSON(http.StatusBadRequest, map[string]any{
"error": "bad request",
"message": "invalid request body: " + err.Error(),
})
return
}

if req.Service == "" || req.Version == "" {
c.JSON(http.StatusBadRequest, map[string]any{
"error": "bad request",
"message": "service and version are required",
})
return
}

deployID, err := api.service.CreateDeployment(ctx, &req)
if err != nil {
if err == service.ErrServiceNotFound {
c.JSON(http.StatusBadRequest, map[string]any{
"error": "bad request",
"message": "service not found",
})
return
}
if err == service.ErrDeploymentConflict {
c.JSON(http.StatusConflict, map[string]any{
"error": "conflict",
"message": "deployment conflict: service version already in deployment",
})
return
}
log.Error().Err(err).
Str("service", req.Service).
Str("version", req.Version).
Msg("failed to create deployment")
c.JSON(http.StatusInternalServerError, map[string]any{
"error": "internal server error",
"message": "failed to create deployment",
})
return
}

c.JSON(http.StatusCreated, map[string]any{
"id": deployID,
"message": "deployment created successfully",
})
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

在这个文件中的多个API处理函数(如 CreateDeployment, GetDeploymentByID 等)中,错误处理和JSON响应的逻辑有大量重复。这会增加代码维护的难度。建议提取一个辅助函数来统一处理错误响应,使代码更简洁、更易于维护。
例如,可以创建一个这样的辅助函数:

func (api *Api) sendError(c *fox.Context, statusCode int, errCode, message string) {
    c.JSON(statusCode, map[string]any{
        "error":   errCode,
        "message": message,
    })
}

然后,CreateDeployment中的错误处理可以简化为:

if err := c.ShouldBindJSON(&req); err != nil {
    api.sendError(c, http.StatusBadRequest, "bad request", "invalid request body: "+err.Error())
    return
}

Comment on lines +127 to +131
if limitStr := c.Query("limit"); limitStr != "" {
if limit, err := strconv.Atoi(limitStr); err == nil {
query.Limit = limit
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

当解析limit查询参数失败时,代码选择忽略错误。虽然这在某些情况下可以接受,但更好的做法是向客户端返回一个400 Bad Request错误。这样可以让API的使用者明确知道他们提供了无效的参数。

if limitStr := c.Query("limit"); limitStr != "" {
		limit, err := strconv.Atoi(limitStr)
		if err != nil {
			c.JSON(http.StatusBadRequest, map[string]any{
				"error":   "bad request",
				"message": "invalid limit parameter",
			})
			return
		}
		query.Limit = limit
	}

Comment on lines +128 to +157
func (d *Database) UpdateDeployment(ctx context.Context, deployID string, req *model.UpdateDeploymentRequest) error {
sql := `UPDATE service_deploy_tasks SET `
args := []any{}
updates := []string{}

if req.Version != "" {
updates = append(updates, "version = ?")
args = append(args, req.Version)
}

if req.ScheduleTime != nil {
updates = append(updates, "deploy_begin_time = ?")
args = append(args, req.ScheduleTime)
}

if len(updates) == 0 {
return nil
}

sql += updates[0]
for i := 1; i < len(updates); i++ {
sql += ", " + updates[i]
}

sql += " WHERE id = ? AND deploy_state = ?"
args = append(args, deployID, model.StatusUnrelease)

_, err := d.ExecContext(ctx, sql, args...)
return err
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

UpdateDeployment 函数中,动态构建SQL更新语句的部分可以通过 strings.Join 来简化,使代码更简洁易读。请记得导入 strings 包。

func (d *Database) UpdateDeployment(ctx context.Context, deployID string, req *model.UpdateDeploymentRequest) error {
	args := []any{}
	updates := []string{}

	if req.Version != "" {
		updates = append(updates, "version = ?")
		args = append(args, req.Version)
	}

	if req.ScheduleTime != nil {
		updates = append(updates, "deploy_begin_time = ?")
		args = append(args, req.ScheduleTime)
	}

	if len(updates) == 0 {
		return nil
	}

	query := `UPDATE service_deploy_tasks SET ` + strings.Join(updates, ", ") + ` WHERE id = ? AND deploy_state = ?`
	args = append(args, deployID, model.StatusUnrelease)

	_, err := d.ExecContext(ctx, query, args...)
	return err
}

Comment on lines +107 to +111
// TODO:根据类型过滤(这里简化处理,实际需要根据业务需求过滤)
if versionType == "unrelease" {
// 返回未发布的版本,这里简化返回所有版本
return versions, nil
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

GetServiceAvailableVersions 函数的过滤逻辑尚未完成,如 TODO 注释所示。当前实现会直接返回所有版本,而不是根据 versionType 进行过滤。请完成这部分逻辑以确保API按预期工作。

Comment on lines +144 to +184
func (s *Service) GetServiceMetricTimeSeries(ctx context.Context, serviceName, metricName string, query *model.MetricTimeSeriesQuery) (*model.PrometheusQueryRangeResponse, error) {
// TODO:这里应该调用实际的Prometheus或其他监控系统API
// 现在返回模拟数据

response := &model.PrometheusQueryRangeResponse{
Status: "success",
Data: model.PrometheusQueryRangeData{
ResultType: "matrix",
Result: []model.PrometheusTimeSeries{
{
Metric: map[string]string{
"__name__": metricName,
"service": serviceName,
"instance": "instance-1",
"version": query.Version,
},
Values: [][]any{
{1435781430.781, "1.2"},
{1435781445.781, "1.5"},
{1435781460.781, "1.1"},
},
},
{
Metric: map[string]string{
"__name__": metricName,
"service": serviceName,
"instance": "instance-2",
"version": query.Version,
},
Values: [][]any{
{1435781430.781, "0.8"},
{1435781445.781, "0.9"},
{1435781460.781, "1.0"},
},
},
},
},
}

return response, nil
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

GetServiceMetricTimeSeries 函数目前返回的是模拟数据,并且有 TODO 注释。为了使功能完整,这里需要实现对实际监控系统(如Prometheus)的API调用来获取真实的时序指标数据。

@JaD1ng JaD1ng merged commit 39b66f7 into qiniu:develop Sep 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant