Skip to content

ETCD Cluster 管理与 Discovery 剥离#50

Closed
yousongyang wants to merge 3 commits into
owent:mainfrom
yousongyang:main
Closed

ETCD Cluster 管理与 Discovery 剥离#50
yousongyang wants to merge 3 commits into
owent:mainfrom
yousongyang:main

Conversation

@yousongyang
Copy link
Copy Markdown
Contributor

No description provided.

Copy link
Copy Markdown
Owner

@owent owent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AICR problem for src/atframe/modules/service_discovery_module.cpp:273

Copy link
Copy Markdown
Owner

@owent owent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AICR problem for src/atframe/modules/etcd_module.cpp:698

Copy link
Copy Markdown
Owner

@owent owent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AI Code Review Summary

发现两个中等严重度问题

Target: ETCD Cluster 管理与 Discovery 剥离
Author: @yousongyang
Branch: main
Reviewers: @yousongyang


审查范围

本 PR 重构 etcd_module,拆分为 etcd_module(集群管理)和 service_discovery_module(服务发现),引入 etcd_cluster_holder 作为集群状态容器,并支持多集群服务发现。

发现的问题

问题1: stop() 未处理额外集群上下文

文件: src/atframe/modules/service_discovery_module.cpp:273
严重度: medium

stop() 方法仅调用 cluster_context_.reset_internal_watchers_and_keepalives(),未迭代处理 cluster_context_list_ 中的额外上下文。当使用 add_service_discovery_cluster_context 配置多集群时,这些额外上下文的 watchers 和 keepalives 在模块关闭时不会被清理,可能导致已停止的 etcd 连接被访问。

问题2: http 回调中潜在 use-after-free

文件: src/atframe/modules/etcd_module.cpp:698
严重度: medium

http_callback_on_etcd_closed 回调中直接重置 cleanup_request_ 而未先清空 priv_data,随后从 clusters_ 列表删除 holder。若 holder 被 erase 销毁,http_request 的后续清理若访问 priv_data 将导致 use-after-free。同文件 reset() 方法已展示正确模式。

建议

  1. stop() 中迭代 cluster_context_list_ 调用每个上下文的 reset_internal_watchers_and_keepalives()
  2. http_callback_on_etcd_closed 重置前调用 cleanup_request_->set_priv_data(nullptr)

Problems (2)

# Severity Category Location Message
0 MEDIUM correctness src/atframe/modules/service_discovery_module.cpp:273 stop() 方法仅清理默认的 cluster_context_,未处理 cluster_context_list_ 中的额外集群上下文。当使用 add_service_discovery_cluster_context 添加多集群时,这些上下文的 watchers 和 keepalives 在模块关闭时不会被正确停止,可能导致 etcd 模块停止后 watchers 仍尝试访问 etcd 连接。
1 MEDIUM memory_safety src/atframe/modules/etcd_module.cpp:698 http_callback_on_etcd_closed 回调中直接重置 cleanup_request_ 而未先清空 priv_data,随后从 clusters_ 列表中删除 holder。如果 holder 被 erase 销毁,http_request 的清理机制(可能在回调返回后执行)若尝试访问 priv_data,将导致 use-after-free。reset() 方法(683-687行)展示了安全模式:先调用 set_priv_data(nullptr) 再重置,此处应遵循相同模式。

Code reference: src/atframe/modules/service_discovery_module.cpp:273

LIBATAPP_MACRO_API int service_discovery_module::stop() {
  cluster_context_.reset_internal_watchers_and_keepalives();
  return 0;
}

Code reference: src/atframe/modules/etcd_module.cpp:698

  }

  self->cleanup_request_.reset();

  LIBATAPP_MACRO_ETCD_CLUSTER_LOG_DEBUG(self->cluster_, "Etcd revoke lease finished");

  • Powered by AICodeReviewer*

Copy link
Copy Markdown
Owner

@owent owent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AICR problem for include/atframe/atapp_conf.proto:583

Copy link
Copy Markdown
Owner

@owent owent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AICR problem for src/atframe/modules/etcd_module.cpp:702

Copy link
Copy Markdown
Owner

@owent owent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AICR problem for src/atframe/modules/service_discovery_module.cpp:260

Copy link
Copy Markdown
Owner

@owent owent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AI Code Review Summary

发现 1 个高优先级兼容性问题

Target: ETCD Cluster 管理与 Discovery 剥离
Author: @owent
Branch: main
Reviewers: @owent


审查范围

重构 PR 将 etcd_module 分拆为 etcd_module(底层集群管理)和 service_discovery_module(服务发现逻辑),支持多 etcd 集群场景。

问题汇总

  • 1 个:Protobuf 字段类型变更破坏配置兼容性
  • 1 个:HTTP 回调中原始指针生命周期风险
  • 1 个:冗余对象构造

关键问题

atapp_conf.proto 字段 401 从 atapp_etcd 改为 atapp_etcd_common 是破坏性变更。用户现有配置文件中的 hosts/path 等字段在升级后会被丢弃。测试配置已更新,但缺少迁移文档。

建议优先级

  1. 评估 protobuf 变更的兼容性策略(保留旧字段或提供迁移方案)
  2. 考虑使用 weak_ptr 替代原始指针 module_
  3. 清理构造函数中的冗余创建

Problems (3)

# Severity Category Location Message
0 HIGH api_schema_compatibility include/atframe/atapp_conf.proto:583 Protobuf field 401 类型从 atapp_etcd 改为 atapp_etcd_common 是破坏性变更。旧配置文件中 etcd.hosts、etcd.path、etcd.authorization 等字段在反序列化时会丢失,因为新类型只包含 enable 字段。现有用户升级后配置会丢失 etcd 连接信息,导致服务无法连接到 etcd 集群。触发场景:用户使用旧版本生成的二进制配置文件或 protobuf 序列化数据升级到新版本。建议:保留字段 401 的原类型,或提供配置迁移工具和文档。
1 MEDIUM memory_lifecycle_safety src/atframe/modules/etcd_module.cpp:702 http_callback_on_etcd_closed 回调中访问 self->module_->clusters_ 存在潜在生命周期风险。module_ 是 etcd_module 的原始指针,在 HTTP 请求完成回调触发时,如果 etcd_module 已被销毁(例如 app 异常终止或超时),会导致 use-after-free。触发场景:app shutdown 时 stop() 返回 1 表示清理未完成,但上层代码未能继续 tick() 等待完成就直接销毁 app。影响:崩溃或数据损坏。缓解措施:在 etcd_module 析构前调用 reset() 清除回调,但依赖调用方遵循正确的 shutdown 流程。建议:使用 weak_ptr 或在回调中增加有效性检查。
2 LOW resource_efficiency src/atframe/modules/service_discovery_module.cpp:260 service_discovery_cluster_context 构造函数中创建 etcd_cluster_holder_(line 260),但 init() 中立即替换为新实例(line 178)。构造函数创建的 holder 从未被使用,造成不必要的内存分配和对象构造。触发场景:每次 service_discovery_module 初始化。影响:轻微性能损失。建议:移除构造函数中的 holder 创建,保留 nullptr 初始化。

Code reference: include/atframe/atapp_conf.proto:583

  atbus_configure bus = 301;

  atapp_etcd_common etcd = 401;
  atapp_etcd service_discovery_etcd = 402;

Code reference: src/atframe/modules/etcd_module.cpp:702

  LIBATAPP_MACRO_ETCD_CLUSTER_LOG_DEBUG(self->cluster_, "Etcd revoke lease finished");

  self->module_->clusters_.erase(self->self_iterator_);
  return 0;
}

Code reference: src/atframe/modules/service_discovery_module.cpp:260

      discovery_watcher_snapshot_index_allocator_(0),
      topology_watcher_snapshot_index_allocator_(0) {
  etcd_cluster_holder_ = std::make_shared<etcd_cluster_holder>();
  last_etcd_event_topology_header_.cluster_id = 0;
  last_etcd_event_topology_header_.member_id = 0;

  • Powered by AICodeReviewer*

Copy link
Copy Markdown
Owner

@owent owent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HIGH · api_schema_compatibility

protobuf 字段 401 类型从 atapp_etcd 变更为 atapp_etcd_common,导致向后兼容性破坏。现有配置文件中 etcd.hostsetcd.path 等字段数据在新 schema 下会变为未知字段无法访问,而新代码从 service_discovery_etcd (字段 402) 读取这些配置。未迁移的旧配置会导致服务发现功能因缺少 hosts 而失败。触发场景:使用旧版本配置文件启动新版本程序。建议方案:保留字段 401 为原类型并提供迁移文档,或增加配置兼容性检测逻辑。

Location: include/atframe/atapp_conf.proto:579

Copy link
Copy Markdown
Owner

@owent owent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AICR problem for src/atframe/modules/service_discovery_module.cpp:220

Copy link
Copy Markdown
Owner

@owent owent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AI Code Review Summary

Target: ETCD Cluster 管理与 Discovery 剥离
Author: @owent
Branch: main
Reviewers: @owent


No summary provided.

Problems (2)

# Severity Category Location Message
0 HIGH api_schema_compatibility include/atframe/atapp_conf.proto:579 protobuf 字段 401 类型从 atapp_etcd 变更为 atapp_etcd_common,导致向后兼容性破坏。现有配置文件中 etcd.hostsetcd.path 等字段数据在新 schema 下会变为未知字段无法访问,而新代码从 service_discovery_etcd (字段 402) 读取这些配置。未迁移的旧配置会导致服务发现功能因缺少 hosts 而失败。触发场景:使用旧版本配置文件启动新版本程序。建议方案:保留字段 401 为原类型并提供迁移文档,或增加配置兼容性检测逻辑。
1 MEDIUM memory_lifecycle src/atframe/modules/service_discovery_module.cpp:220 lambda 捕获 context (shared_ptr) 存入 etcd_cluster_holder::init_keepalive_watcher_func_,而 etcd_cluster_holdercontext->etcd_cluster_holder_ 持有,形成引用环:contextetcd_cluster_holder_lambda(context)contextstop()reset_internal_watchers_and_keepalives() 未清除这些函数对象,导致 cluster_context_list_ 中的条目内存泄漏。触发场景:调用 add_service_discovery_cluster_context 并随后销毁模块。建议:改用 std::weak_ptr 捕获,或在清理阶段显式清空函数对象。

Code reference: src/atframe/modules/service_discovery_module.cpp:220

  auto context = std::make_shared<service_discovery_cluster_context>();
  int ret = get_app()->get_etcd_module()->init_cluster(
      context->etcd_cluster_holder_, load_conf_func,
      [this, context](atframework::atapp::app &app, atframework::atapp::etcd_cluster_holder &cluster_holder,
                      bool init) -> int {

  • Powered by AICodeReviewer*

Copy link
Copy Markdown
Owner

@owent owent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AICR problem for src/atframe/modules/service_discovery_module.cpp:1434

Copy link
Copy Markdown
Owner

@owent owent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AICR problem for src/atframe/modules/etcd_module.cpp:617

Copy link
Copy Markdown
Owner

@owent owent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AI Code Review Summary

Target: ETCD Cluster 管理与 Discovery 剥离
Author: @owent
Branch: main
Reviewers: @owent


No summary provided.

Problems (2)

# Severity Category Location Message
0 HIGH correctness src/atframe/modules/service_discovery_module.cpp:1434 update_internal_watcher_event(topology_info_t&) 逻辑错误:当版本号未增加(upgrade模式)但内容不同时,topology info 未被更新。在etcd集群重启或跨成员监听的边缘场景下,可能导致 topology 数据变为陈旧数据。同时,当内容相同但从不同集群上下文(context_ptr)到达时,事件被静默丢弃,context_ptr未更新,这与 discovery 处理逻辑不一致,可能导致多集群场景下的拓扑数据追踪错误。
1 MEDIUM safety src/atframe/modules/etcd_module.cpp:617 etcd_cluster_holder::tick() 直接解引用 atapp_ 和 module_ 而未进行空指针检查。虽然当前初始化流程确保这些指针在 tick 调用前已设置,但作为防御性编程,应在关键路径添加空指针检查,以应对边缘情况(如初始化部分失败、手动创建 holder 等异常场景)。

Code reference: src/atframe/modules/service_discovery_module.cpp:1434

    internal_topology_info_set_.erase(local_cache_iter);
  } else {
    std::unordered_map<uint64_t, topology_storage_t>::iterator local_cache_iter =
        internal_topology_info_set_.find(topology_info.storage.info->id());
    if (local_cache_iter != internal_topology_info_set_.end()) {

Code reference: src/atframe/modules/etcd_module.cpp:617

int etcd_cluster_holder::tick() {
  // Slow down the tick interval of etcd module, it require http request which is very slow compared to atbus
  if (tick_next_timepoint_ >= atapp_->get_last_tick_time()) {
    return 0;
  }

  • Powered by AICodeReviewer*

Copy link
Copy Markdown
Owner

@owent owent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AICR problem for src/atframe/modules/service_discovery_module.cpp:1451

Copy link
Copy Markdown
Owner

@owent owent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AI Code Review Summary

发现一个未初始化成员导致清理逻辑失效的问题

Target: ETCD Cluster 管理与 Discovery 剥离
Author: @yousongyang
Branch: main
Reviewers: @yousongyang


审查完成。本次变更将 etcd 服务发现功能从 etcd_module 重构到新的 service_discovery_module,整体架构清晰。

发现 1 个高严重性问题:

service_discovery_module.cpp:1451 — topology_info_t.storage.context_ptr 在首次插入 topology map 时未初始化。watcher callback 仅设置外层 topology_info.context_ptr (line 1238),但嵌套的 storage.context_ptr 保持未定义值。这破坏了按 context_ptr 清理孤立节点的逻辑,导致节点在所属集群断开后无法正确移除。

其余变更(配置结构调整、etcd_cluster_holder 抽象、API 迁移)未发现明显缺陷。测试文件同步更新了新增的 context_ptr 参数,传入 0 作为默认值,符合预期。

Problems (1)

# Severity Category Location Message
0 HIGH correctness src/atframe/modules/service_discovery_module.cpp:1451 topology_info.storage.context_ptr 未初始化就被插入到 topology map 中。在 watcher callback (line 1237-1238) 只设置了 topology_info.context_ptr,但 topology_info.storage.context_ptr 保持未定义值。这导致 cleanup_old_topology_peers (line 1057) 的 context_ptr 过滤失效——新插入的节点永远不会匹配任何 context,在所属集群断开时无法正确清理。修复:在 line 1238 后添加 topology_info.storage.context_ptr = topology_info.context_ptr; 或在 line 1451 插入前设置 storage.context_ptr。

Code reference: src/atframe/modules/service_discovery_module.cpp:1451

      }
    } else {
      internal_topology_info_set_[topology_info.storage.info->id()] = topology_info.storage;
      info_ptr = topology_info.storage.info;
    }

  • Powered by AICodeReviewer*

Copy link
Copy Markdown
Owner

@owent owent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AICR problem for include/atframe/modules/service_discovery_module.h:82

Copy link
Copy Markdown
Owner

@owent owent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AICR problem for include/atframe/modules/service_discovery_module.h:88

Copy link
Copy Markdown
Owner

@owent owent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AICR problem for src/atframe/modules/service_discovery_module.cpp:645

Copy link
Copy Markdown
Owner

@owent owent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AI Code Review Summary

架构重构安全,结构体初始化需加固

Target: ETCD Cluster 管理与 Discovery 剥离
Author: @yousongyang
Branch: main
Reviewers: @yousongyang


审查范围

本次变更为 libatapp 的 etcd/service discovery 架构重构:将服务发现与拓扑管理逻辑从 etcd_module 分离至新增的 service_discovery_module,引入 etcd_cluster_holder 支持多集群上下文,配置项拆分为 etcd.enableservice_discovery_etcd

问题概览

  • Medium 1topology_storage_t::context_ptr 缺少默认初始化,存在未定义值风险
  • Low 2topology_info_ttopology_storage_tcontext_ptr 字段冗余,语义重叠
  • Low 3get_raw_etcd_ctx() 无空指针防御,潜在崩溃路径

核心风险

新增的 context_ptr 机制用于区分不同集群上下文的节点归属。若该值因未初始化而持垃圾数据,cleanup_old_nodes/cleanup_old_topology_peers 的指针比对逻辑可能误判,导致错误清理全局 discovery 集。当前代码路径显式设置该值,但结构体定义层面未固化初始化保证。

建议

优先修复 topology_storage_t 的字段初始化问题;其余两项为维护性与防御性改进,可延后处理。

未覆盖内容

  • 未审查 test/case 目录下的全部测试实现细节
  • 未验证配置迁移对现有用户的兼容性影响(属 API 变更范畴)

Problems (3)

# Severity Category Location Message
0 MEDIUM correctness include/atframe/modules/service_discovery_module.h:82 topology_storage_t 新增的 context_ptr 字段缺少默认初始化。该结构体无构造函数,uintptr_t context_ptr 在默认构造时将持有未定义值(垃圾数据)。虽然当前生产代码路径在 watcher 回调中显式设置该值,但结构体定义本身不安全——若后续新增代码路径直接创建 topology_storage_t 而未初始化 context_ptr,可能导致 cleanup_old_topology_peers 中的指针值比对错误,进而误删或漏删节点。建议添加 in-class 初始化:uintptr_t context_ptr = 0;
1 LOW correctness include/atframe/modules/service_discovery_module.h:88 topology_info_t 与其内嵌的 topology_storage_t 均包含 context_ptr 字段,存在冗余。watcher 回调中两者被设为相同值,update_internal_watcher_event(topology_info_t&) 通过 topology_info.context_ptr 参数更新 storage.context_ptr。此设计虽在当前实现中一致,但两个同名字段语义重叠,若后续维护中仅修改其一而遗漏另一,可能引发 context 比对失效。建议明确文档说明两字段关系,或考虑合并为单一字段。
2 LOW correctness src/atframe/modules/service_discovery_module.cpp:645 get_raw_etcd_ctx() 直接解引用 cluster_context_.etcd_cluster_holder_ 而无空指针检查。该成员在构造时为空(nullptr),仅在 init() 中被赋值。若在模块初始化完成前调用此方法(例如外部代码在 app 构造后、init 前访问),将导致空指针崩溃。当前测试与生产代码均在 init 后调用,实际风险较低,但缺乏防御性编程。建议添加断言或早期返回空引用保护。

Code reference: include/atframe/modules/service_discovery_module.h:82

  using atapp_topology_info_ptr_t = atfw::util::memory::strong_rc_ptr<atapp::protocol::atapp_topology_info>;

  struct LIBATAPP_MACRO_API_HEAD_ONLY topology_storage_t {
    atapp_topology_info_ptr_t info;
    uintptr_t context_ptr;

Code reference: include/atframe/modules/service_discovery_module.h:88

  };

  struct LIBATAPP_MACRO_API_HEAD_ONLY topology_info_t {
    topology_storage_t storage;
    uintptr_t context_ptr;

Code reference: src/atframe/modules/service_discovery_module.cpp:645

}

LIBATAPP_MACRO_API const ::atframework::atapp::etcd_cluster &service_discovery_module::get_raw_etcd_ctx() const {
  return cluster_context_.etcd_cluster_holder_->get_etcd_cluster();
}

  • Powered by AICodeReviewer*

Comment thread src/atframe/modules/service_discovery_module.cpp Fixed
Copy link
Copy Markdown
Owner

@owent owent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AICR problem for src/atframe/modules/etcd_module.cpp:122

Copy link
Copy Markdown
Owner

@owent owent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AICR problem for src/atframe/modules/etcd_module.cpp:693

Copy link
Copy Markdown
Owner

@owent owent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AI Code Review Summary

Target: ETCD Cluster 管理与 Discovery 剥离
Author: @yousongyang
Branch: main
Reviewers: @yousongyang


No summary provided.

Problems (2)

# Severity Category Location Message
0 HIGH correctness src/atframe/modules/etcd_module.cpp:122 etcd_module::init() 缺少返回语句。当 etcd_enabled_ 为 false 时,代码执行完 FWLOGINFO 后函数结束,但没有 return 语句。这导致返回值未定义(实际可能返回栈上残留值),调用者无法正确判断初始化是否成功。应在 line 121 的 FWLOGINFO 后添加 return 0 或重构控制流。
1 HIGH correctness src/atframe/modules/etcd_module.cpp:693 etcd_cluster_holder::http_callback_on_etcd_closed 与原 etcd_module 的实现相比,缺少触发 app stop 的逻辑。原实现(已删除)在 lease revoke 完成后会调用 get_app()->stop() 继续关闭流程。新实现只调用 reset(),可能导致 app 在收到关闭命令后无法正常退出。应在 reset() 后检查 atapp_ 并触发继续关闭流程。

Code reference: src/atframe/modules/etcd_module.cpp:122

    FWLOGINFO("etcd disabled, start single mode");
  }
  return 0;
}

Code reference: src/atframe/modules/etcd_module.cpp:693

}

int etcd_cluster_holder::http_callback_on_etcd_closed(atfw::util::network::http_request &req) {
  etcd_cluster_holder *self = reinterpret_cast<etcd_cluster_holder *>(req.get_priv_data());
  if (nullptr == self) {

  • Powered by AICodeReviewer*

Copy link
Copy Markdown
Owner

@owent owent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AICR problem for include/atframe/atapp_conf.proto:583

Copy link
Copy Markdown
Owner

@owent owent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AICR problem for src/atframe/modules/etcd_module.cpp:160

Copy link
Copy Markdown
Owner

@owent owent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AI Code Review Summary

Proto schema 向后兼容性问题

Target: ETCD Cluster 管理与 Discovery 剥离
Author: @yousongyang
Branch: main
Reviewers: @yousongyang


Review scope: etcd_module 与 service_discovery_module 分离重构,涉及 proto schema 变更、模块拆分、测试配置更新。

Issues: 2 actionable problems reported.

  • medium: Proto schema 字段类型变更可能导致配置数据丢失
  • low: init_cluster 错误处理不一致

Uncertainty: Proto schema 变更可能是有意的 API 重设计,而非 bug。但缺少向后兼容处理会导致用户升级时遇到静默配置丢失。建议确认是否有迁移指南或兼容层计划。

Problems (2)

# Severity Category Location Message
0 MEDIUM api_compatibility include/atframe/atapp_conf.proto:583 Proto schema field 401 类型从 atapp_etcd 改为 atapp_etcd_common,可能导致向后兼容性问题。现有的序列化配置(包含 hosts/path 等字段)在新 schema 下会被静默丢弃这些字段,可能导致升级后 etcd 连接失败。建议考虑:添加向后兼容的迁移逻辑,或在 proto 中保留旧字段结构并使用不同的字段号。
1 LOW correctness src/atframe/modules/etcd_module.cpp:160 init_cluster 在初始化失败时仍将 holder 添加到 clusters_ 列表。第 160-166 行:先 push_back(holder),再调用 stop(),最后返回 -1。这意味着即使初始化失败,后续 tick() 仍会遍历并调用该 holder 的 tick()。建议:失败时不应添加到列表,或添加后应有明确的错误状态标记和清理逻辑。

Code reference: include/atframe/atapp_conf.proto:583

  atbus_configure bus = 301;

  atapp_etcd_common etcd = 401;
  atapp_etcd service_discovery_etcd = 402;

Code reference: src/atframe/modules/etcd_module.cpp:160

    int result = cluster_holder->init_keepalive_watcher_func_(*cluster_holder->atapp_, *cluster_holder, true);
    if (result != 0) {
      clusters_.push_back(cluster_holder);
      cluster_holder->stop();
      return -1;

  • Powered by AICodeReviewer*

Copy link
Copy Markdown
Owner

@owent owent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AICR problem for src/atframe/modules/etcd_module.cpp:125

Copy link
Copy Markdown
Owner

@owent owent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AICR problem for src/atframe/modules/etcd_module.cpp:658

Copy link
Copy Markdown
Owner

@owent owent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AI Code Review Summary

重构架构存在空指针风险

Target: ETCD Cluster 管理与 Discovery 剥离
Author: @yousongyang
Branch: main
Reviewers: @yousongyang


审查范围

本次 PR 将 etcd_module 重构为两部分:etcd_module(基础集群管理)和新增的 service_discovery_module(服务发现逻辑)。引入 etcd_cluster_holder 作为集群实例容器,新增 context_ptr 字段用于多集群上下文追踪,并调整 protobuf 配置 schema。

问题摘要

发现 2 个中等严重度 问题,均为空指针解引用风险:

  1. init_cluster() 空指针风险 (etcd_module.cpp:143):get_app() 返回值未校验即被解引用调用。
  2. stop() 空指针风险 (etcd_module.cpp:672-674):atapp_ 的空值检查未覆盖 reset_keepalive_watcher_func_ lambda 调用。

备注

  • protobuf schema 变换(atapp_etcd_common 字段401 + atapp_etcd 字段402)因字段编号1的 enable 布尔值兼容,不构成兼容性问题。
  • tick() 方法已包含空值早返回检查(第612行),该路径无风险。
  • 测试文件更新与新 API 调用一致,未发现测试逻辑缺陷。

Problems (2)

# Severity Category Location Message
0 MEDIUM correctness src/atframe/modules/etcd_module.cpp:125 init_cluster() 第143行调用 cluster_holder->load_conf_func_(*cluster_holder->atapp_, *cluster_holder) 时,若 get_app() 返回空指针(第138行 cluster_holder->atapp_ = get_app()),将导致空指针解引用。虽然正常初始化流程中模块依附于 app 后 get_app() 不应为空,但缺少显式空值检查降低了代码健壮性。触发场景:模块被错误地在未依附 app 的状态下调用 init_cluster()
1 MEDIUM correctness src/atframe/modules/etcd_module.cpp:658 etcd_cluster_holder::stop() 第672-674行调用 reset_keepalive_watcher_func_(*atapp_, *this) 时,仅第661行的 if (atapp_ != nullptr) 检查用于 revoke_lease 分支,并未保护后续的 lambda 调用。若 atapp_ 为空且 reset_keepalive_watcher_func_ 已设置,调用 reset_keepalive_watcher_func_(*atapp_, *this) 将崩溃。建议在调用 lambda 前增加 atapp_ 的空值检查。

Code reference: src/atframe/modules/etcd_module.cpp:125

}

LIBATAPP_MACRO_API int etcd_module::init_cluster(
    etcd_cluster_holder_ptr_t cluster_holder, etcd_cluster_load_conf_func_t load_conf_func,
    etcd_cluster_init_keepalive_watcher_func_t init_keepalive_watcher_func,

Code reference: src/atframe/modules/etcd_module.cpp:658

}

void etcd_cluster_holder::stop() {
  if (!cleanup_request_ && !cluster_.check_flag(etcd_cluster::flag_t::kClosing)) {
    bool revoke_lease = true;

  • Powered by AICodeReviewer*

Copy link
Copy Markdown
Owner

@owent owent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AICR problem for src/atframe/modules/service_discovery_module.cpp:271

Copy link
Copy Markdown
Owner

@owent owent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AI Code Review Summary

Target: ETCD Cluster 管理与 Discovery 剥离
Author: @yousongyang
Branch: main
Reviewers: @yousongyang


No summary provided.

Problems (1)

# Severity Category Location Message
0 MEDIUM correctness src/atframe/modules/service_discovery_module.cpp:271 stop() 总是返回 0,但底层 etcd_cluster_holder::stop() 可能触发异步清理请求。当清理请求仍在运行时,调用方可能误以为模块已完全停止。应检查所有 context 是否已完全停止,并与 etcd_module::stop() 的返回值语义保持一致。

Code reference: src/atframe/modules/service_discovery_module.cpp:271

}

LIBATAPP_MACRO_API int service_discovery_module::stop() {
  cluster_context_.reset_internal_watchers_and_keepalives();
  for (std::shared_ptr<service_discovery_cluster_context> context : cluster_context_list_) {

  • Powered by AICodeReviewer*

@owent owent self-assigned this May 16, 2026
@owent owent requested a review from Copilot May 16, 2026 13:28
Copy link
Copy Markdown
Owner

@owent owent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AICR problem for include/atframe/atapp_conf.proto:583

Copy link
Copy Markdown
Owner

@owent owent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AICR problem for src/atframe/modules/worker_pool_module.cpp:488

Copy link
Copy Markdown
Owner

@owent owent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AICR problem for src/atframe/modules/etcd_module.cpp:158

Copy link
Copy Markdown
Owner

@owent owent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AICR problem for src/atframe/modules/service_discovery_module.cpp:486

Copy link
Copy Markdown
Owner

@owent owent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AI Code Review Summary

etcd模块拆分重构发现Proto兼容性与等待逻辑问题

Target: ETCD Cluster 管理与 Discovery 剥离
Author: @owent
Branch: main
Reviewers: @owent


审查范围

重构将etcd功能拆分为 etcd_module(底层集群管理)与 service_discovery_module(服务发现逻辑),引入 etcd_cluster_holder 抽象支持多集群上下文。

主要发现

严重度 类别 问题
high api_compatibility Proto字段类型变更导致配置兼容性断裂
medium correctness worker_pool等待时长计算可能负数
medium correctness init_cluster失败后仍加入列表
low correctness service_discovery同样等待逻辑风险

建议

  1. Proto变更应评估配置迁移路径,或使用版本化字段保持向后兼容。
  2. worker_pool/service_discovery等待逻辑应在累计超时分支使用预设preserve时长而非原tick差值。
  3. init_cluster失败分支应避免将半初始化holder加入活跃列表。

上下文限制

worker_pool完整tick流程需运行时验证;未检查所有新模块的初始化顺序依赖。

Problems (4)

# Severity Category Location Message
0 HIGH api_compatibility include/atframe/atapp_conf.proto:583 Proto字段类型变更导致API兼容性断裂:atapp_configure.etcdatapp_etcd 改为 atapp_etcd_common。现有配置文件(如旧的YAML/CONF)若使用旧schema将无法正确解析。需确认是否有兼容性迁移机制或版本化策略。
1 MEDIUM correctness src/atframe/modules/worker_pool_module.cpp:488 wait_for等待时长计算可能产生负值或异常行为:当累计busy+wait时间超过1秒时,使用 tick_interval - (busy_end_time - start_time) 计算等待时间。此时 busy_end_time - start_time 可能大于 tick_interval,导致负数时长传入wait_for,行为未定义或无实际等待。应在该分支使用 wait_preserve_us 作为等待时长而非原表达式。
2 MEDIUM correctness src/atframe/modules/etcd_module.cpp:158 init_cluster中初始化失败后仍将cluster_holder加入clusters_列表(line 160-162)。当init_keepalive_watcher_func返回非0时,代码push了holder并调用stop(),但该holder可能处于不完整初始化状态,后续tick/reload操作可能访问未初始化字段。应考虑失败时不加入列表或完全清理。
3 LOW correctness src/atframe/modules/service_discovery_module.cpp:486 wait_for锁传入表达式 tick_interval - (busy_end_time - start_time) 与上方477行的preserve wait逻辑存在同样风险:此时 busy_end_time 已超出本轮tick窗口,表达式值可能为负。

Code reference: include/atframe/atapp_conf.proto:583

  atbus_configure bus = 301;

  atapp_etcd_common etcd = 401;
  atapp_etcd service_discovery_etcd = 402;

Code reference: src/atframe/modules/worker_pool_module.cpp:488

        std::unique_lock<std::mutex> lk_cv(self->waker_lock_);
        self->status_.store(static_cast<uint8_t>(worker_status::kSleeping), std::memory_order_release);
        self->waker_cv_.wait_for(lk_cv, tick_interval - (busy_end_time - start_time));
        self->status_.store(static_cast<uint8_t>(worker_status::kRunning), std::memory_order_release);

Code reference: src/atframe/modules/etcd_module.cpp:158

  if (cluster_holder->init_keepalive_watcher_func_) {
    int result = cluster_holder->init_keepalive_watcher_func_(*cluster_holder->atapp_, *cluster_holder, true);
    if (result != 0) {
      clusters_.push_back(cluster_holder);

Code reference: src/atframe/modules/service_discovery_module.cpp:486

  if (nullptr == owner) {
    return std::string();
  }

  return LOG_WRAPPER_FWAPI_FORMAT("{}{}/{}-{}", path, ETCD_MODULE_BY_ID_DIR, owner->get_app_name(), owner->get_id());

  • Powered by AICodeReviewer*

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors libatapp’s etcd responsibilities by extracting service discovery (node discovery + topology watchers/keepalives) into a new service_discovery_module, leaving etcd_module primarily responsible for etcd cluster lifecycle/transport. It also updates configuration schema and test coverage to reflect the split.

Changes:

  • Introduce service_discovery_module (with multi-cluster-context support) and wire it into atapp::app + atbus connector reload flow.
  • Adjust configuration model: atapp.etcd becomes a minimal enable flag, and full discovery configuration moves to atapp.service_discovery_etcd.
  • Update tests and discovery node APIs (copy_from / update_version) to carry a context_ptr for multi-context isolation.

Reviewed changes

Copilot reviewed 37 out of 37 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
test/case/atapp_upstream_forward_test.cpp Switch discovery injection from etcd_module to service_discovery_module; update copy_from signature.
test/case/atapp_topology_change_test.cpp Use service_discovery_module global discovery and updated copy_from signature.
test/case/atapp_test_etcd.yaml Enable service_discovery_etcd for unit-test configs.
test/case/atapp_test_etcd_node2.yaml Enable service_discovery_etcd for unit-test configs.
test/case/atapp_test_etcd_node1.yaml Enable service_discovery_etcd for unit-test configs.
test/case/atapp_test_etcd_module.yaml Enable service_discovery_etcd for module tests.
test/case/atapp_test_etcd_module_node3.yaml Add a third-node config for multi-context discovery tests.
test/case/atapp_test_etcd_module_node2.yaml Enable service_discovery_etcd for node2 module tests.
test/case/atapp_test_etcd_module_node1.yaml Enable service_discovery_etcd for node1 module tests.
test/case/atapp_message_test.cpp Use service_discovery_module global discovery and new copy_from signature.
test/case/atapp_etcd_module_unit_test.cpp Port topology-storage tests to service_discovery_module::topology_storage_t.
test/case/atapp_etcd_module_test.cpp Port etcd-module tests to use service_discovery_module APIs where discovery/topology are involved; add multi-context test.
test/case/atapp_etcd_cluster_test.cpp Point cluster tests at service_discovery_module’s etcd ctx access path.
test/case/atapp_error_recovery_test.cpp Update discovery injection to service_discovery_module and new copy_from signature.
test/case/atapp_downstream_send_test.cpp Update discovery injection to service_discovery_module and new copy_from signature.
test/case/atapp_discovery_test.cpp Update discovery node version APIs (copy_from, update_version) with context_ptr.
test/case/atapp_discovery_reconnect_test.cpp Use service_discovery_module for global discovery add/remove in reconnect tests.
test/case/atapp_direct_connect_test.cpp Update discovery module usage and refine test behavior for pre-discovery sends vs post-discovery direct connects.
test/case/atapp_configure_loader_test.yaml Add service_discovery_etcd section for configure-loader tests.
test/case/atapp_configure_loader_test.env.txt Rename env variables from ATAPP_ETCD_* (hosts/path/etc) to ATAPP_SERVICE_DISCOVERY_ETCD_*.
test/case/atapp_configure_loader_test.cpp Validate service_discovery_etcd is parsed/loaded as the etcd discovery config.
test/case/atapp_configure_loader_test.conf Rename config keys from etcd.* (hosts/path/etc) to service_discovery_etcd.*.
test/case/atapp_configure_expression_test.yaml Add service_discovery_etcd section for expression tests.
test/case/atapp_configure_expression_test.env.txt Rename env variables to ATAPP_SERVICE_DISCOVERY_ETCD_* for expression tests.
test/case/atapp_configure_expression_test.conf Rename config keys to service_discovery_etcd.* for expression tests.
src/CMakeLists.txt Build-system wiring for new service_discovery_module sources/headers.
src/atframe/modules/worker_pool_module.cpp Add per-second tick “preserve sleep” accounting/behavior controls.
src/atframe/modules/service_discovery_module.cpp New implementation: discovery/topology keepalives, watchers, snapshot handling, multi-context support.
src/atframe/etcdcli/etcd_discovery.cpp Extend discovery node to track context_ptr and update signatures.
src/atframe/connectors/atapp_connector_atbus.cpp Route topology keepalive refresh trigger via service_discovery_module.
src/atframe/atapp.cpp Register service_discovery_module, route discovery accessors/senders/stats to it, and update discovery node construction signature.
include/atframe/modules/service_discovery_module.h New public header for service_discovery_module.
include/atframe/modules/etcd_module.h Refactor: introduce etcd_cluster_holder + init_cluster() and remove discovery/topology responsibilities from the header.
include/atframe/etcdcli/etcd_discovery.h Update discovery node API signatures and add get_context_ptr().
include/atframe/atapp.h Add get_service_discovery_module() and store module instance in app.
include/atframe/atapp_conf.proto Introduce atapp_etcd_common and move full discovery etcd config to service_discovery_etcd.
Comments suppressed due to low confidence (1)

src/atframe/atapp.cpp:4814

  • command_handler_disable_etcd checks internal_module_service_discovery_ for initialization, but the command’s logic operates on internal_module_etcd_ (is_etcd_enabled()/disable_etcd()). This makes the guard inconsistent and can incorrectly report “Etcd module is not initialized” based on the wrong module. The initialization check should align with the module actually being used (or the command should be updated to disable the service discovery etcd context instead).
int app::command_handler_disable_etcd(atfw::util::cli::callback_param params) {
  if (!internal_module_service_discovery_) {
    const char *msg = "Etcd module is not initialized, skip command.";
    FWLOGERROR("{}", msg);
    add_custom_command_rsp(params, msg);
  } else if (internal_module_etcd_->is_etcd_enabled()) {
    internal_module_etcd_->disable_etcd();

Comment on lines +481 to +491
auto wait_preserve_us = self->get_owner().configure_tick_preserve_microseconds_in_second.load(std::memory_order_acquire);
if (wait_preserve_us <= 0) {
wait_preserve_us = 8000;
}

std::unique_lock<std::mutex> lk_cv(self->waker_lock_);
self->status_.store(static_cast<uint8_t>(worker_status::kSleeping), std::memory_order_release);
self->waker_cv_.wait_for(lk_cv, tick_interval - (busy_end_time - start_time));
self->status_.store(static_cast<uint8_t>(worker_status::kRunning), std::memory_order_release);

self->cpu_time_sleep_us_.fetch_add(wait_preserve_us, std::memory_order_release);
Comment on lines +1449 to +1452
if (changed) {
local_cache_iter->second.info = topology_info.storage.info;
info_ptr = local_cache_iter->second.info;
}
Comment on lines +36 to +46
// atapp_log nullptr 使用默认Logger
// atapp_log 非空但 category 为空不使用日志
class etcd_cluster_holder;
using etcd_cluster_load_conf_func_t = std::function<void(atframework::atapp::app &, etcd_cluster_holder &)>;
using etcd_cluster_init_keepalive_watcher_func_t =
std::function<int(atframework::atapp::app &, etcd_cluster_holder &, bool init)>;
using etcd_cluster_reset_keepalive_watcher_func_t =
std::function<void(atframework::atapp::app &, etcd_cluster_holder &)>;

using etcd_cluster_holder_ptr_t = std::shared_ptr<etcd_cluster_holder>;

Comment on lines +22 to +33
#include <atframe/etcdcli/etcd_cluster.h>
#include <atframe/etcdcli/etcd_discovery.h>
#include <atframe/etcdcli/etcd_keepalive.h>
#include <atframe/etcdcli/etcd_watcher.h>
#include <atframe/modules/etcd_module.h>

#include <list>
#include <mutex>
#include <set>
#include <string>
#include <unordered_map>

Copy link
Copy Markdown
Owner

@owent owent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AICR problem for src/atframe/modules/worker_pool_module.cpp:486

wait_preserve_us = 8000;
}

std::unique_lock<std::mutex> lk_cv(self->waker_lock_);
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MEDIUM
·
correctness

worker_pool_module 的秒级保护等待逻辑存在问题:wait_for 使用 tick_interval - (busy_end_time - start_time) 作为等待时长,但该计算可能产生负值(若 busy_end_time - start_time >= tick_interval),且 cpu_time_sleep_us_ 记录的是 wait_preserve_us(默认8000μs),而非实际等待时长。当常规等待和秒级保护等待在同一迭代中先后触发时,实际等待时长会是双倍,但统计值不准确。

Location: src/atframe/modules/worker_pool_module.cpp:486

Referenced code (src/atframe/modules/worker_pool_module.cpp:486):

      }

      std::unique_lock<std::mutex> lk_cv(self->waker_lock_);
      self->status_.store(static_cast<uint8_t>(worker_status::kSleeping), std::memory_order_release);
      self->waker_cv_.wait_for(lk_cv, tick_interval - (busy_end_time - start_time));

Suggested fix:
考虑使用 wait_preserve_us 作为实际等待时长,并统一 sleep 统计逻辑;或在秒级保护等待块中重新计算等待时长。

Copy link
Copy Markdown
Owner

@owent owent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AICR problem for include/atframe/atapp_conf.proto:583

atbus_configure bus = 301;

atapp_etcd etcd = 401;
atapp_etcd_common etcd = 401;
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HIGH
·
api_compatibility

atapp_configure.etcd 字段类型从 atapp_etcd 改为 atapp_etcd_common 是破坏性 API 变更。现有配置文件中的 etcd.hosts、etcd.path 等字段将被静默忽略,用户需手动迁移至 service_discovery_etcd 字段。虽然 wire format 上 bool enable 字段可能兼容,但语义层面会导致功能失效。

Location: include/atframe/atapp_conf.proto:583

Referenced code (include/atframe/atapp_conf.proto:583):

atbus_configure bus = 301;

atapp_etcd_common etcd = 401;
atapp_etcd service_discovery_etcd = 402;

Suggested fix:
建议在发布说明中明确标注此为破坏性变更,并提供配置迁移指南;或考虑保留 etcd 字段的完整 atapp_etcd 类型向后兼容。

Copy link
Copy Markdown
Owner

@owent owent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AICR problem for src/atframe/modules/service_discovery_module.cpp:644

cluster_context_.add_topology_watcher_callback(fn);
}

LIBATAPP_MACRO_API const ::atframework::atapp::etcd_cluster &service_discovery_module::get_raw_etcd_ctx() const {
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LOW
·
correctness

get_raw_etcd_ctx() 直接解引用 cluster_context_.etcd_cluster_holder_ 而无空指针检查。若在模块初始化前调用此方法(虽然正常流程不应发生),会导致崩溃。建议添加防御性检查或在文档中明确调用前置条件。

Location: src/atframe/modules/service_discovery_module.cpp:644

Referenced code (src/atframe/modules/service_discovery_module.cpp:644):

}

LIBATAPP_MACRO_API const ::atframework::atapp::etcd_cluster &service_discovery_module::get_raw_etcd_ctx() const {
return cluster_context_.etcd_cluster_holder_->get_etcd_cluster();
}

Copy link
Copy Markdown
Owner

@owent owent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AICR problem for src/atframe/modules/etcd_module.cpp:693

if (conf_path_cache_.empty()) {
etcd_module *self = const_cast<etcd_module *>(this);
self->conf_path_cache_ = generate_etcd_path(get_configure().path());
int etcd_cluster_holder::http_callback_on_etcd_closed(atfw::util::network::http_request &req) {
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LOW
·
maintainability

http_callback_on_etcd_closed 函数内日志消息仍引用 "etcd_module",但该方法已迁移至 etcd_cluster_holder 类,日志消息应更新以反映当前类名。

Location: src/atframe/modules/etcd_module.cpp:693

Referenced code (src/atframe/modules/etcd_module.cpp:693):

}

int etcd_cluster_holder::http_callback_on_etcd_closed(atfw::util::network::http_request &req) {
etcd_cluster_holder *self = reinterpret_cast<etcd_cluster_holder *>(req.get_priv_data());
if (nullptr == self) {

@owent
Copy link
Copy Markdown
Owner

owent commented May 18, 2026

AI Code Review

Updated: 2026-05-19 03:23:11 UTC | Commit: 5aa964c

AI Code Review Summary

etcd/service_discovery模块重构问题

Target: ETCD Cluster 管理与 Discovery 剥离
Author: @yousongyang
Branch: main
Reviewers: @yousongyang


评审范围

审查了将 service discovery 功能从 etcd_module 分离到新的 service_discovery_module 的大型重构 PR,涉及:

  • 新增 service_discovery_module 类及相关上下文管理
  • etcd_cluster_holder 抽象层引入
  • etcd_discovery_node 增加 context_ptr 字段支持多集群场景
  • 配置 schema 变化(etcd 字段拆分)
  • worker_pool 增加秒级 tick 压力调节逻辑

发现问题

高严重度

  • worker_pool_module.cpp:488 - 第二次 wait_for 使用过时的时间计算值,可能导致等待时长错误

中严重度

  • atapp_conf.proto:583 - Protobuf 字段 401 类型变化导致向后兼容性风险,旧配置的 etcd 详细字段会被忽略

限制与不确定性

  • 未运行测试验证重构后的实际行为
  • 配置 schema 兼容性问题的实际影响取决于用户迁移策略,可能已有外部迁移方案但未在 diff 中体现

Problems (2)

# Severity Category Location Message
0 HIGH correctness src/atframe/modules/worker_pool_module.cpp:488 在计数器重置后的第二次等待中使用了过时的时间值。busy_end_timestart_time 在 line 457-458 计算后可能已经经过了第一次 wait_for 等待(lines 461-466),但 line 488 的 wait_for(lk_cv, tick_interval - (busy_end_time - start_time)) 继续使用这些过时值计算等待时长。当线程已经等待一段时间后,这些变量不再反映当前状态,导致等待时长计算错误。

触发场景:当 current_tick_second_busy_us_ + current_tick_second_waited_us_ >= 1000000 时进入重置逻辑,此时使用的等待时长基于之前的时间点,可能产生负数或显著偏离预期值的等待时间。

修复方向:应在重置计数器后重新计算当前状态下的等待时长,或在 line 488 使用独立的 wait_preserve_us 值而非重新计算 tick 间隔差值。
|
| 1 | MEDIUM | api_compatibility | include/atframe/atapp_conf.proto:583 | Protobuf schema 字段编号 401 被复用为不同消息类型,存在向后兼容性风险。旧配置将 atapp_etcd(包含 hosts、path、authorization 等多个字段)序列化到字段 401。新代码将字段 401 改为 atapp_etcd_common(仅包含 enable 字段),原有的 etcd 配置字段在反序列化时会被静默忽略。

触发场景:使用旧版本配置文件的部署环境在升级后,etcd 的 hosts、path、authorization 等关键配置会丢失,导致服务发现功能异常或默认行为改变。

影响:这是一个破坏性 API 变化,需要用户更新所有配置文件将 etcd 配置迁移到新的 service_discovery_etcd 字段(402)。

修复方向:考虑保留字段 401 的 atapp_etcd 类型并新增字段,或在文档中明确标注此为 Breaking Change 并提供迁移指南。
|

Code reference: src/atframe/modules/worker_pool_module.cpp:488

        std::unique_lock<std::mutex> lk_cv(self->waker_lock_);
        self->status_.store(static_cast<uint8_t>(worker_status::kSleeping), std::memory_order_release);
        self->waker_cv_.wait_for(lk_cv, tick_interval - (busy_end_time - start_time));
        self->status_.store(static_cast<uint8_t>(worker_status::kRunning), std::memory_order_release);

Code reference: include/atframe/atapp_conf.proto:583

  atbus_configure bus = 301;

  atapp_etcd_common etcd = 401;
  atapp_etcd service_discovery_etcd = 402;

  • Powered by AICodeReviewer*

Open Issues (2)

  1. [HIGH] correctnesssrc/atframe/modules/worker_pool_module.cpp:488 (new in 5aa964c)
  2. [MEDIUM] api_compatibilityinclude/atframe/atapp_conf.proto:583 (new in 5aa964c)
Resolved (4)

The following previously reported issues are no longer present:

  • 08a7f428ac77b3c9 ✅ Resolved
  • e9259c573fa5c1a7 ✅ Resolved
  • 1640e4cfb6affe5c ✅ Resolved
  • 6c2df27a67ddf2f7 ✅ Resolved

Copy link
Copy Markdown
Owner

@owent owent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AICR problem for src/atframe/modules/worker_pool_module.cpp:488


std::unique_lock<std::mutex> lk_cv(self->waker_lock_);
self->status_.store(static_cast<uint8_t>(worker_status::kSleeping), std::memory_order_release);
self->waker_cv_.wait_for(lk_cv, tick_interval - (busy_end_time - start_time));
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HIGH
·
correctness

在计数器重置后的第二次等待中使用了过时的时间值。busy_end_timestart_time 在 line 457-458 计算后可能已经经过了第一次 wait_for 等待(lines 461-466),但 line 488 的 wait_for(lk_cv, tick_interval - (busy_end_time - start_time)) 继续使用这些过时值计算等待时长。当线程已经等待一段时间后,这些变量不再反映当前状态,导致等待时长计算错误。

触发场景:当 current_tick_second_busy_us_ + current_tick_second_waited_us_ >= 1000000 时进入重置逻辑,此时使用的等待时长基于之前的时间点,可能产生负数或显著偏离预期值的等待时间。

修复方向:应在重置计数器后重新计算当前状态下的等待时长,或在 line 488 使用独立的 wait_preserve_us 值而非重新计算 tick 间隔差值。

Location: src/atframe/modules/worker_pool_module.cpp:488

Referenced code (src/atframe/modules/worker_pool_module.cpp:488):

      std::unique_lock<std::mutex> lk_cv(self->waker_lock_);
      self->status_.store(static_cast<uint8_t>(worker_status::kSleeping), std::memory_order_release);
      self->waker_cv_.wait_for(lk_cv, tick_interval - (busy_end_time - start_time));
      self->status_.store(static_cast<uint8_t>(worker_status::kRunning), std::memory_order_release);

Copy link
Copy Markdown
Owner

@owent owent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AICR problem for include/atframe/atapp_conf.proto:583

atbus_configure bus = 301;

atapp_etcd etcd = 401;
atapp_etcd_common etcd = 401;
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MEDIUM
·
api_compatibility

Protobuf schema 字段编号 401 被复用为不同消息类型,存在向后兼容性风险。旧配置将 atapp_etcd(包含 hosts、path、authorization 等多个字段)序列化到字段 401。新代码将字段 401 改为 atapp_etcd_common(仅包含 enable 字段),原有的 etcd 配置字段在反序列化时会被静默忽略。

触发场景:使用旧版本配置文件的部署环境在升级后,etcd 的 hosts、path、authorization 等关键配置会丢失,导致服务发现功能异常或默认行为改变。

影响:这是一个破坏性 API 变化,需要用户更新所有配置文件将 etcd 配置迁移到新的 service_discovery_etcd 字段(402)。

修复方向:考虑保留字段 401 的 atapp_etcd 类型并新增字段,或在文档中明确标注此为 Breaking Change 并提供迁移指南。

Location: include/atframe/atapp_conf.proto:583

Referenced code (include/atframe/atapp_conf.proto:583):

atbus_configure bus = 301;

atapp_etcd_common etcd = 401;
atapp_etcd service_discovery_etcd = 402;

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants