ETCD Cluster 管理与 Discovery 剥离#50
Conversation
owent
left a comment
There was a problem hiding this comment.
AICR problem for src/atframe/modules/service_discovery_module.cpp:273
owent
left a comment
There was a problem hiding this comment.
AICR problem for src/atframe/modules/etcd_module.cpp:698
owent
left a comment
There was a problem hiding this comment.
AI Code Review Summary
发现两个中等严重度问题
Target: ETCD Cluster 管理与 Discovery 剥离
Author: @yousongyang
Branch: main
Reviewers: @yousongyang
审查范围
本 PR 重构 etcd_module,拆分为 etcd_module(集群管理)和 service_discovery_module(服务发现),引入 etcd_cluster_holder 作为集群状态容器,并支持多集群服务发现。
发现的问题
问题1: stop() 未处理额外集群上下文
文件: src/atframe/modules/service_discovery_module.cpp:273
严重度: medium
stop() 方法仅调用 cluster_context_.reset_internal_watchers_and_keepalives(),未迭代处理 cluster_context_list_ 中的额外上下文。当使用 add_service_discovery_cluster_context 配置多集群时,这些额外上下文的 watchers 和 keepalives 在模块关闭时不会被清理,可能导致已停止的 etcd 连接被访问。
问题2: http 回调中潜在 use-after-free
文件: src/atframe/modules/etcd_module.cpp:698
严重度: medium
http_callback_on_etcd_closed 回调中直接重置 cleanup_request_ 而未先清空 priv_data,随后从 clusters_ 列表删除 holder。若 holder 被 erase 销毁,http_request 的后续清理若访问 priv_data 将导致 use-after-free。同文件 reset() 方法已展示正确模式。
建议
- 在
stop()中迭代cluster_context_list_调用每个上下文的reset_internal_watchers_and_keepalives() - 在
http_callback_on_etcd_closed重置前调用cleanup_request_->set_priv_data(nullptr)
Problems (2)
| # | Severity | Category | Location | Message |
|---|---|---|---|---|
| 0 | MEDIUM | correctness | src/atframe/modules/service_discovery_module.cpp:273 |
stop() 方法仅清理默认的 cluster_context_,未处理 cluster_context_list_ 中的额外集群上下文。当使用 add_service_discovery_cluster_context 添加多集群时,这些上下文的 watchers 和 keepalives 在模块关闭时不会被正确停止,可能导致 etcd 模块停止后 watchers 仍尝试访问 etcd 连接。 |
| 1 | MEDIUM | memory_safety | src/atframe/modules/etcd_module.cpp:698 |
http_callback_on_etcd_closed 回调中直接重置 cleanup_request_ 而未先清空 priv_data,随后从 clusters_ 列表中删除 holder。如果 holder 被 erase 销毁,http_request 的清理机制(可能在回调返回后执行)若尝试访问 priv_data,将导致 use-after-free。reset() 方法(683-687行)展示了安全模式:先调用 set_priv_data(nullptr) 再重置,此处应遵循相同模式。 |
Code reference: src/atframe/modules/service_discovery_module.cpp:273
LIBATAPP_MACRO_API int service_discovery_module::stop() {
cluster_context_.reset_internal_watchers_and_keepalives();
return 0;
}Code reference: src/atframe/modules/etcd_module.cpp:698
}
self->cleanup_request_.reset();
LIBATAPP_MACRO_ETCD_CLUSTER_LOG_DEBUG(self->cluster_, "Etcd revoke lease finished");- Powered by AICodeReviewer*
owent
left a comment
There was a problem hiding this comment.
AICR problem for include/atframe/atapp_conf.proto:583
owent
left a comment
There was a problem hiding this comment.
AICR problem for src/atframe/modules/etcd_module.cpp:702
owent
left a comment
There was a problem hiding this comment.
AICR problem for src/atframe/modules/service_discovery_module.cpp:260
owent
left a comment
There was a problem hiding this comment.
AI Code Review Summary
发现 1 个高优先级兼容性问题
Target: ETCD Cluster 管理与 Discovery 剥离
Author: @owent
Branch: main
Reviewers: @owent
审查范围
重构 PR 将 etcd_module 分拆为 etcd_module(底层集群管理)和 service_discovery_module(服务发现逻辑),支持多 etcd 集群场景。
问题汇总
- 高 1 个:Protobuf 字段类型变更破坏配置兼容性
- 中 1 个:HTTP 回调中原始指针生命周期风险
- 低 1 个:冗余对象构造
关键问题
atapp_conf.proto 字段 401 从 atapp_etcd 改为 atapp_etcd_common 是破坏性变更。用户现有配置文件中的 hosts/path 等字段在升级后会被丢弃。测试配置已更新,但缺少迁移文档。
建议优先级
- 评估 protobuf 变更的兼容性策略(保留旧字段或提供迁移方案)
- 考虑使用 weak_ptr 替代原始指针 module_
- 清理构造函数中的冗余创建
Problems (3)
| # | Severity | Category | Location | Message |
|---|---|---|---|---|
| 0 | HIGH | api_schema_compatibility | include/atframe/atapp_conf.proto:583 |
Protobuf field 401 类型从 atapp_etcd 改为 atapp_etcd_common 是破坏性变更。旧配置文件中 etcd.hosts、etcd.path、etcd.authorization 等字段在反序列化时会丢失,因为新类型只包含 enable 字段。现有用户升级后配置会丢失 etcd 连接信息,导致服务无法连接到 etcd 集群。触发场景:用户使用旧版本生成的二进制配置文件或 protobuf 序列化数据升级到新版本。建议:保留字段 401 的原类型,或提供配置迁移工具和文档。 |
| 1 | MEDIUM | memory_lifecycle_safety | src/atframe/modules/etcd_module.cpp:702 |
http_callback_on_etcd_closed 回调中访问 self->module_->clusters_ 存在潜在生命周期风险。module_ 是 etcd_module 的原始指针,在 HTTP 请求完成回调触发时,如果 etcd_module 已被销毁(例如 app 异常终止或超时),会导致 use-after-free。触发场景:app shutdown 时 stop() 返回 1 表示清理未完成,但上层代码未能继续 tick() 等待完成就直接销毁 app。影响:崩溃或数据损坏。缓解措施:在 etcd_module 析构前调用 reset() 清除回调,但依赖调用方遵循正确的 shutdown 流程。建议:使用 weak_ptr 或在回调中增加有效性检查。 |
| 2 | LOW | resource_efficiency | src/atframe/modules/service_discovery_module.cpp:260 |
service_discovery_cluster_context 构造函数中创建 etcd_cluster_holder_(line 260),但 init() 中立即替换为新实例(line 178)。构造函数创建的 holder 从未被使用,造成不必要的内存分配和对象构造。触发场景:每次 service_discovery_module 初始化。影响:轻微性能损失。建议:移除构造函数中的 holder 创建,保留 nullptr 初始化。 |
Code reference: include/atframe/atapp_conf.proto:583
atbus_configure bus = 301;
atapp_etcd_common etcd = 401;
atapp_etcd service_discovery_etcd = 402;
Code reference: src/atframe/modules/etcd_module.cpp:702
LIBATAPP_MACRO_ETCD_CLUSTER_LOG_DEBUG(self->cluster_, "Etcd revoke lease finished");
self->module_->clusters_.erase(self->self_iterator_);
return 0;
}Code reference: src/atframe/modules/service_discovery_module.cpp:260
discovery_watcher_snapshot_index_allocator_(0),
topology_watcher_snapshot_index_allocator_(0) {
etcd_cluster_holder_ = std::make_shared<etcd_cluster_holder>();
last_etcd_event_topology_header_.cluster_id = 0;
last_etcd_event_topology_header_.member_id = 0;- Powered by AICodeReviewer*
owent
left a comment
There was a problem hiding this comment.
HIGH · api_schema_compatibility
protobuf 字段 401 类型从 atapp_etcd 变更为 atapp_etcd_common,导致向后兼容性破坏。现有配置文件中 etcd.hosts、etcd.path 等字段数据在新 schema 下会变为未知字段无法访问,而新代码从 service_discovery_etcd (字段 402) 读取这些配置。未迁移的旧配置会导致服务发现功能因缺少 hosts 而失败。触发场景:使用旧版本配置文件启动新版本程序。建议方案:保留字段 401 为原类型并提供迁移文档,或增加配置兼容性检测逻辑。
Location: include/atframe/atapp_conf.proto:579
owent
left a comment
There was a problem hiding this comment.
AICR problem for src/atframe/modules/service_discovery_module.cpp:220
owent
left a comment
There was a problem hiding this comment.
AI Code Review Summary
Target: ETCD Cluster 管理与 Discovery 剥离
Author: @owent
Branch: main
Reviewers: @owent
No summary provided.
Problems (2)
| # | Severity | Category | Location | Message |
|---|---|---|---|---|
| 0 | HIGH | api_schema_compatibility | include/atframe/atapp_conf.proto:579 |
protobuf 字段 401 类型从 atapp_etcd 变更为 atapp_etcd_common,导致向后兼容性破坏。现有配置文件中 etcd.hosts、etcd.path 等字段数据在新 schema 下会变为未知字段无法访问,而新代码从 service_discovery_etcd (字段 402) 读取这些配置。未迁移的旧配置会导致服务发现功能因缺少 hosts 而失败。触发场景:使用旧版本配置文件启动新版本程序。建议方案:保留字段 401 为原类型并提供迁移文档,或增加配置兼容性检测逻辑。 |
| 1 | MEDIUM | memory_lifecycle | src/atframe/modules/service_discovery_module.cpp:220 |
lambda 捕获 context (shared_ptr) 存入 etcd_cluster_holder::init_keepalive_watcher_func_,而 etcd_cluster_holder 由 context->etcd_cluster_holder_ 持有,形成引用环:context → etcd_cluster_holder_ → lambda(context) → context。stop() 和 reset_internal_watchers_and_keepalives() 未清除这些函数对象,导致 cluster_context_list_ 中的条目内存泄漏。触发场景:调用 add_service_discovery_cluster_context 并随后销毁模块。建议:改用 std::weak_ptr 捕获,或在清理阶段显式清空函数对象。 |
Code reference: src/atframe/modules/service_discovery_module.cpp:220
auto context = std::make_shared<service_discovery_cluster_context>();
int ret = get_app()->get_etcd_module()->init_cluster(
context->etcd_cluster_holder_, load_conf_func,
[this, context](atframework::atapp::app &app, atframework::atapp::etcd_cluster_holder &cluster_holder,
bool init) -> int {- Powered by AICodeReviewer*
owent
left a comment
There was a problem hiding this comment.
AICR problem for src/atframe/modules/service_discovery_module.cpp:1434
owent
left a comment
There was a problem hiding this comment.
AICR problem for src/atframe/modules/etcd_module.cpp:617
owent
left a comment
There was a problem hiding this comment.
AI Code Review Summary
Target: ETCD Cluster 管理与 Discovery 剥离
Author: @owent
Branch: main
Reviewers: @owent
No summary provided.
Problems (2)
| # | Severity | Category | Location | Message |
|---|---|---|---|---|
| 0 | HIGH | correctness | src/atframe/modules/service_discovery_module.cpp:1434 |
update_internal_watcher_event(topology_info_t&) 逻辑错误:当版本号未增加(upgrade模式)但内容不同时,topology info 未被更新。在etcd集群重启或跨成员监听的边缘场景下,可能导致 topology 数据变为陈旧数据。同时,当内容相同但从不同集群上下文(context_ptr)到达时,事件被静默丢弃,context_ptr未更新,这与 discovery 处理逻辑不一致,可能导致多集群场景下的拓扑数据追踪错误。 |
| 1 | MEDIUM | safety | src/atframe/modules/etcd_module.cpp:617 |
etcd_cluster_holder::tick() 直接解引用 atapp_ 和 module_ 而未进行空指针检查。虽然当前初始化流程确保这些指针在 tick 调用前已设置,但作为防御性编程,应在关键路径添加空指针检查,以应对边缘情况(如初始化部分失败、手动创建 holder 等异常场景)。 |
Code reference: src/atframe/modules/service_discovery_module.cpp:1434
internal_topology_info_set_.erase(local_cache_iter);
} else {
std::unordered_map<uint64_t, topology_storage_t>::iterator local_cache_iter =
internal_topology_info_set_.find(topology_info.storage.info->id());
if (local_cache_iter != internal_topology_info_set_.end()) {Code reference: src/atframe/modules/etcd_module.cpp:617
int etcd_cluster_holder::tick() {
// Slow down the tick interval of etcd module, it require http request which is very slow compared to atbus
if (tick_next_timepoint_ >= atapp_->get_last_tick_time()) {
return 0;
}- Powered by AICodeReviewer*
owent
left a comment
There was a problem hiding this comment.
AICR problem for src/atframe/modules/service_discovery_module.cpp:1451
owent
left a comment
There was a problem hiding this comment.
AI Code Review Summary
发现一个未初始化成员导致清理逻辑失效的问题
Target: ETCD Cluster 管理与 Discovery 剥离
Author: @yousongyang
Branch: main
Reviewers: @yousongyang
审查完成。本次变更将 etcd 服务发现功能从 etcd_module 重构到新的 service_discovery_module,整体架构清晰。
发现 1 个高严重性问题:
service_discovery_module.cpp:1451 — topology_info_t.storage.context_ptr 在首次插入 topology map 时未初始化。watcher callback 仅设置外层 topology_info.context_ptr (line 1238),但嵌套的 storage.context_ptr 保持未定义值。这破坏了按 context_ptr 清理孤立节点的逻辑,导致节点在所属集群断开后无法正确移除。
其余变更(配置结构调整、etcd_cluster_holder 抽象、API 迁移)未发现明显缺陷。测试文件同步更新了新增的 context_ptr 参数,传入 0 作为默认值,符合预期。
Problems (1)
| # | Severity | Category | Location | Message |
|---|---|---|---|---|
| 0 | HIGH | correctness | src/atframe/modules/service_discovery_module.cpp:1451 |
topology_info.storage.context_ptr 未初始化就被插入到 topology map 中。在 watcher callback (line 1237-1238) 只设置了 topology_info.context_ptr,但 topology_info.storage.context_ptr 保持未定义值。这导致 cleanup_old_topology_peers (line 1057) 的 context_ptr 过滤失效——新插入的节点永远不会匹配任何 context,在所属集群断开时无法正确清理。修复:在 line 1238 后添加 topology_info.storage.context_ptr = topology_info.context_ptr; 或在 line 1451 插入前设置 storage.context_ptr。 |
Code reference: src/atframe/modules/service_discovery_module.cpp:1451
}
} else {
internal_topology_info_set_[topology_info.storage.info->id()] = topology_info.storage;
info_ptr = topology_info.storage.info;
}- Powered by AICodeReviewer*
owent
left a comment
There was a problem hiding this comment.
AICR problem for include/atframe/modules/service_discovery_module.h:82
owent
left a comment
There was a problem hiding this comment.
AICR problem for include/atframe/modules/service_discovery_module.h:88
owent
left a comment
There was a problem hiding this comment.
AICR problem for src/atframe/modules/service_discovery_module.cpp:645
owent
left a comment
There was a problem hiding this comment.
AI Code Review Summary
架构重构安全,结构体初始化需加固
Target: ETCD Cluster 管理与 Discovery 剥离
Author: @yousongyang
Branch: main
Reviewers: @yousongyang
审查范围
本次变更为 libatapp 的 etcd/service discovery 架构重构:将服务发现与拓扑管理逻辑从 etcd_module 分离至新增的 service_discovery_module,引入 etcd_cluster_holder 支持多集群上下文,配置项拆分为 etcd.enable 与 service_discovery_etcd。
问题概览
- Medium 1:
topology_storage_t::context_ptr缺少默认初始化,存在未定义值风险 - Low 2:
topology_info_t与topology_storage_t的context_ptr字段冗余,语义重叠 - Low 3:
get_raw_etcd_ctx()无空指针防御,潜在崩溃路径
核心风险
新增的 context_ptr 机制用于区分不同集群上下文的节点归属。若该值因未初始化而持垃圾数据,cleanup_old_nodes/cleanup_old_topology_peers 的指针比对逻辑可能误判,导致错误清理全局 discovery 集。当前代码路径显式设置该值,但结构体定义层面未固化初始化保证。
建议
优先修复 topology_storage_t 的字段初始化问题;其余两项为维护性与防御性改进,可延后处理。
未覆盖内容
- 未审查 test/case 目录下的全部测试实现细节
- 未验证配置迁移对现有用户的兼容性影响(属 API 变更范畴)
Problems (3)
| # | Severity | Category | Location | Message |
|---|---|---|---|---|
| 0 | MEDIUM | correctness | include/atframe/modules/service_discovery_module.h:82 |
topology_storage_t 新增的 context_ptr 字段缺少默认初始化。该结构体无构造函数,uintptr_t context_ptr 在默认构造时将持有未定义值(垃圾数据)。虽然当前生产代码路径在 watcher 回调中显式设置该值,但结构体定义本身不安全——若后续新增代码路径直接创建 topology_storage_t 而未初始化 context_ptr,可能导致 cleanup_old_topology_peers 中的指针值比对错误,进而误删或漏删节点。建议添加 in-class 初始化:uintptr_t context_ptr = 0; |
| 1 | LOW | correctness | include/atframe/modules/service_discovery_module.h:88 |
topology_info_t 与其内嵌的 topology_storage_t 均包含 context_ptr 字段,存在冗余。watcher 回调中两者被设为相同值,update_internal_watcher_event(topology_info_t&) 通过 topology_info.context_ptr 参数更新 storage.context_ptr。此设计虽在当前实现中一致,但两个同名字段语义重叠,若后续维护中仅修改其一而遗漏另一,可能引发 context 比对失效。建议明确文档说明两字段关系,或考虑合并为单一字段。 |
| 2 | LOW | correctness | src/atframe/modules/service_discovery_module.cpp:645 |
get_raw_etcd_ctx() 直接解引用 cluster_context_.etcd_cluster_holder_ 而无空指针检查。该成员在构造时为空(nullptr),仅在 init() 中被赋值。若在模块初始化完成前调用此方法(例如外部代码在 app 构造后、init 前访问),将导致空指针崩溃。当前测试与生产代码均在 init 后调用,实际风险较低,但缺乏防御性编程。建议添加断言或早期返回空引用保护。 |
Code reference: include/atframe/modules/service_discovery_module.h:82
using atapp_topology_info_ptr_t = atfw::util::memory::strong_rc_ptr<atapp::protocol::atapp_topology_info>;
struct LIBATAPP_MACRO_API_HEAD_ONLY topology_storage_t {
atapp_topology_info_ptr_t info;
uintptr_t context_ptr;Code reference: include/atframe/modules/service_discovery_module.h:88
};
struct LIBATAPP_MACRO_API_HEAD_ONLY topology_info_t {
topology_storage_t storage;
uintptr_t context_ptr;Code reference: src/atframe/modules/service_discovery_module.cpp:645
}
LIBATAPP_MACRO_API const ::atframework::atapp::etcd_cluster &service_discovery_module::get_raw_etcd_ctx() const {
return cluster_context_.etcd_cluster_holder_->get_etcd_cluster();
}- Powered by AICodeReviewer*
owent
left a comment
There was a problem hiding this comment.
AICR problem for src/atframe/modules/etcd_module.cpp:122
owent
left a comment
There was a problem hiding this comment.
AICR problem for src/atframe/modules/etcd_module.cpp:693
owent
left a comment
There was a problem hiding this comment.
AI Code Review Summary
Target: ETCD Cluster 管理与 Discovery 剥离
Author: @yousongyang
Branch: main
Reviewers: @yousongyang
No summary provided.
Problems (2)
| # | Severity | Category | Location | Message |
|---|---|---|---|---|
| 0 | HIGH | correctness | src/atframe/modules/etcd_module.cpp:122 |
etcd_module::init() 缺少返回语句。当 etcd_enabled_ 为 false 时,代码执行完 FWLOGINFO 后函数结束,但没有 return 语句。这导致返回值未定义(实际可能返回栈上残留值),调用者无法正确判断初始化是否成功。应在 line 121 的 FWLOGINFO 后添加 return 0 或重构控制流。 |
| 1 | HIGH | correctness | src/atframe/modules/etcd_module.cpp:693 |
etcd_cluster_holder::http_callback_on_etcd_closed 与原 etcd_module 的实现相比,缺少触发 app stop 的逻辑。原实现(已删除)在 lease revoke 完成后会调用 get_app()->stop() 继续关闭流程。新实现只调用 reset(),可能导致 app 在收到关闭命令后无法正常退出。应在 reset() 后检查 atapp_ 并触发继续关闭流程。 |
Code reference: src/atframe/modules/etcd_module.cpp:122
FWLOGINFO("etcd disabled, start single mode");
}
return 0;
}Code reference: src/atframe/modules/etcd_module.cpp:693
}
int etcd_cluster_holder::http_callback_on_etcd_closed(atfw::util::network::http_request &req) {
etcd_cluster_holder *self = reinterpret_cast<etcd_cluster_holder *>(req.get_priv_data());
if (nullptr == self) {- Powered by AICodeReviewer*
owent
left a comment
There was a problem hiding this comment.
AICR problem for include/atframe/atapp_conf.proto:583
owent
left a comment
There was a problem hiding this comment.
AICR problem for src/atframe/modules/etcd_module.cpp:160
owent
left a comment
There was a problem hiding this comment.
AI Code Review Summary
Proto schema 向后兼容性问题
Target: ETCD Cluster 管理与 Discovery 剥离
Author: @yousongyang
Branch: main
Reviewers: @yousongyang
Review scope: etcd_module 与 service_discovery_module 分离重构,涉及 proto schema 变更、模块拆分、测试配置更新。
Issues: 2 actionable problems reported.
- medium: Proto schema 字段类型变更可能导致配置数据丢失
- low: init_cluster 错误处理不一致
Uncertainty: Proto schema 变更可能是有意的 API 重设计,而非 bug。但缺少向后兼容处理会导致用户升级时遇到静默配置丢失。建议确认是否有迁移指南或兼容层计划。
Problems (2)
| # | Severity | Category | Location | Message |
|---|---|---|---|---|
| 0 | MEDIUM | api_compatibility | include/atframe/atapp_conf.proto:583 |
Proto schema field 401 类型从 atapp_etcd 改为 atapp_etcd_common,可能导致向后兼容性问题。现有的序列化配置(包含 hosts/path 等字段)在新 schema 下会被静默丢弃这些字段,可能导致升级后 etcd 连接失败。建议考虑:添加向后兼容的迁移逻辑,或在 proto 中保留旧字段结构并使用不同的字段号。 |
| 1 | LOW | correctness | src/atframe/modules/etcd_module.cpp:160 |
init_cluster 在初始化失败时仍将 holder 添加到 clusters_ 列表。第 160-166 行:先 push_back(holder),再调用 stop(),最后返回 -1。这意味着即使初始化失败,后续 tick() 仍会遍历并调用该 holder 的 tick()。建议:失败时不应添加到列表,或添加后应有明确的错误状态标记和清理逻辑。 |
Code reference: include/atframe/atapp_conf.proto:583
atbus_configure bus = 301;
atapp_etcd_common etcd = 401;
atapp_etcd service_discovery_etcd = 402;
Code reference: src/atframe/modules/etcd_module.cpp:160
int result = cluster_holder->init_keepalive_watcher_func_(*cluster_holder->atapp_, *cluster_holder, true);
if (result != 0) {
clusters_.push_back(cluster_holder);
cluster_holder->stop();
return -1;- Powered by AICodeReviewer*
owent
left a comment
There was a problem hiding this comment.
AICR problem for src/atframe/modules/etcd_module.cpp:125
owent
left a comment
There was a problem hiding this comment.
AICR problem for src/atframe/modules/etcd_module.cpp:658
owent
left a comment
There was a problem hiding this comment.
AI Code Review Summary
重构架构存在空指针风险
Target: ETCD Cluster 管理与 Discovery 剥离
Author: @yousongyang
Branch: main
Reviewers: @yousongyang
审查范围
本次 PR 将 etcd_module 重构为两部分:etcd_module(基础集群管理)和新增的 service_discovery_module(服务发现逻辑)。引入 etcd_cluster_holder 作为集群实例容器,新增 context_ptr 字段用于多集群上下文追踪,并调整 protobuf 配置 schema。
问题摘要
发现 2 个中等严重度 问题,均为空指针解引用风险:
init_cluster()空指针风险 (etcd_module.cpp:143):get_app()返回值未校验即被解引用调用。stop()空指针风险 (etcd_module.cpp:672-674):atapp_的空值检查未覆盖reset_keepalive_watcher_func_lambda 调用。
备注
- protobuf schema 变换(
atapp_etcd_common字段401 +atapp_etcd字段402)因字段编号1的enable布尔值兼容,不构成兼容性问题。 tick()方法已包含空值早返回检查(第612行),该路径无风险。- 测试文件更新与新 API 调用一致,未发现测试逻辑缺陷。
Problems (2)
| # | Severity | Category | Location | Message |
|---|---|---|---|---|
| 0 | MEDIUM | correctness | src/atframe/modules/etcd_module.cpp:125 |
init_cluster() 第143行调用 cluster_holder->load_conf_func_(*cluster_holder->atapp_, *cluster_holder) 时,若 get_app() 返回空指针(第138行 cluster_holder->atapp_ = get_app()),将导致空指针解引用。虽然正常初始化流程中模块依附于 app 后 get_app() 不应为空,但缺少显式空值检查降低了代码健壮性。触发场景:模块被错误地在未依附 app 的状态下调用 init_cluster()。 |
| 1 | MEDIUM | correctness | src/atframe/modules/etcd_module.cpp:658 |
etcd_cluster_holder::stop() 第672-674行调用 reset_keepalive_watcher_func_(*atapp_, *this) 时,仅第661行的 if (atapp_ != nullptr) 检查用于 revoke_lease 分支,并未保护后续的 lambda 调用。若 atapp_ 为空且 reset_keepalive_watcher_func_ 已设置,调用 reset_keepalive_watcher_func_(*atapp_, *this) 将崩溃。建议在调用 lambda 前增加 atapp_ 的空值检查。 |
Code reference: src/atframe/modules/etcd_module.cpp:125
}
LIBATAPP_MACRO_API int etcd_module::init_cluster(
etcd_cluster_holder_ptr_t cluster_holder, etcd_cluster_load_conf_func_t load_conf_func,
etcd_cluster_init_keepalive_watcher_func_t init_keepalive_watcher_func,Code reference: src/atframe/modules/etcd_module.cpp:658
}
void etcd_cluster_holder::stop() {
if (!cleanup_request_ && !cluster_.check_flag(etcd_cluster::flag_t::kClosing)) {
bool revoke_lease = true;- Powered by AICodeReviewer*
owent
left a comment
There was a problem hiding this comment.
AICR problem for src/atframe/modules/service_discovery_module.cpp:271
owent
left a comment
There was a problem hiding this comment.
AI Code Review Summary
Target: ETCD Cluster 管理与 Discovery 剥离
Author: @yousongyang
Branch: main
Reviewers: @yousongyang
No summary provided.
Problems (1)
| # | Severity | Category | Location | Message |
|---|---|---|---|---|
| 0 | MEDIUM | correctness | src/atframe/modules/service_discovery_module.cpp:271 |
stop() 总是返回 0,但底层 etcd_cluster_holder::stop() 可能触发异步清理请求。当清理请求仍在运行时,调用方可能误以为模块已完全停止。应检查所有 context 是否已完全停止,并与 etcd_module::stop() 的返回值语义保持一致。 |
Code reference: src/atframe/modules/service_discovery_module.cpp:271
}
LIBATAPP_MACRO_API int service_discovery_module::stop() {
cluster_context_.reset_internal_watchers_and_keepalives();
for (std::shared_ptr<service_discovery_cluster_context> context : cluster_context_list_) {- Powered by AICodeReviewer*
owent
left a comment
There was a problem hiding this comment.
AICR problem for include/atframe/atapp_conf.proto:583
owent
left a comment
There was a problem hiding this comment.
AICR problem for src/atframe/modules/worker_pool_module.cpp:488
owent
left a comment
There was a problem hiding this comment.
AICR problem for src/atframe/modules/etcd_module.cpp:158
owent
left a comment
There was a problem hiding this comment.
AICR problem for src/atframe/modules/service_discovery_module.cpp:486
owent
left a comment
There was a problem hiding this comment.
AI Code Review Summary
etcd模块拆分重构发现Proto兼容性与等待逻辑问题
Target: ETCD Cluster 管理与 Discovery 剥离
Author: @owent
Branch: main
Reviewers: @owent
审查范围
重构将etcd功能拆分为 etcd_module(底层集群管理)与 service_discovery_module(服务发现逻辑),引入 etcd_cluster_holder 抽象支持多集群上下文。
主要发现
| 严重度 | 类别 | 问题 |
|---|---|---|
| high | api_compatibility | Proto字段类型变更导致配置兼容性断裂 |
| medium | correctness | worker_pool等待时长计算可能负数 |
| medium | correctness | init_cluster失败后仍加入列表 |
| low | correctness | service_discovery同样等待逻辑风险 |
建议
- Proto变更应评估配置迁移路径,或使用版本化字段保持向后兼容。
- worker_pool/service_discovery等待逻辑应在累计超时分支使用预设preserve时长而非原tick差值。
- init_cluster失败分支应避免将半初始化holder加入活跃列表。
上下文限制
worker_pool完整tick流程需运行时验证;未检查所有新模块的初始化顺序依赖。
Problems (4)
| # | Severity | Category | Location | Message |
|---|---|---|---|---|
| 0 | HIGH | api_compatibility | include/atframe/atapp_conf.proto:583 |
Proto字段类型变更导致API兼容性断裂:atapp_configure.etcd 从 atapp_etcd 改为 atapp_etcd_common。现有配置文件(如旧的YAML/CONF)若使用旧schema将无法正确解析。需确认是否有兼容性迁移机制或版本化策略。 |
| 1 | MEDIUM | correctness | src/atframe/modules/worker_pool_module.cpp:488 |
wait_for等待时长计算可能产生负值或异常行为:当累计busy+wait时间超过1秒时,使用 tick_interval - (busy_end_time - start_time) 计算等待时间。此时 busy_end_time - start_time 可能大于 tick_interval,导致负数时长传入wait_for,行为未定义或无实际等待。应在该分支使用 wait_preserve_us 作为等待时长而非原表达式。 |
| 2 | MEDIUM | correctness | src/atframe/modules/etcd_module.cpp:158 |
init_cluster中初始化失败后仍将cluster_holder加入clusters_列表(line 160-162)。当init_keepalive_watcher_func返回非0时,代码push了holder并调用stop(),但该holder可能处于不完整初始化状态,后续tick/reload操作可能访问未初始化字段。应考虑失败时不加入列表或完全清理。 |
| 3 | LOW | correctness | src/atframe/modules/service_discovery_module.cpp:486 |
wait_for锁传入表达式 tick_interval - (busy_end_time - start_time) 与上方477行的preserve wait逻辑存在同样风险:此时 busy_end_time 已超出本轮tick窗口,表达式值可能为负。 |
Code reference: include/atframe/atapp_conf.proto:583
atbus_configure bus = 301;
atapp_etcd_common etcd = 401;
atapp_etcd service_discovery_etcd = 402;
Code reference: src/atframe/modules/worker_pool_module.cpp:488
std::unique_lock<std::mutex> lk_cv(self->waker_lock_);
self->status_.store(static_cast<uint8_t>(worker_status::kSleeping), std::memory_order_release);
self->waker_cv_.wait_for(lk_cv, tick_interval - (busy_end_time - start_time));
self->status_.store(static_cast<uint8_t>(worker_status::kRunning), std::memory_order_release);Code reference: src/atframe/modules/etcd_module.cpp:158
if (cluster_holder->init_keepalive_watcher_func_) {
int result = cluster_holder->init_keepalive_watcher_func_(*cluster_holder->atapp_, *cluster_holder, true);
if (result != 0) {
clusters_.push_back(cluster_holder);Code reference: src/atframe/modules/service_discovery_module.cpp:486
if (nullptr == owner) {
return std::string();
}
return LOG_WRAPPER_FWAPI_FORMAT("{}{}/{}-{}", path, ETCD_MODULE_BY_ID_DIR, owner->get_app_name(), owner->get_id());- Powered by AICodeReviewer*
There was a problem hiding this comment.
Pull request overview
This PR refactors libatapp’s etcd responsibilities by extracting service discovery (node discovery + topology watchers/keepalives) into a new service_discovery_module, leaving etcd_module primarily responsible for etcd cluster lifecycle/transport. It also updates configuration schema and test coverage to reflect the split.
Changes:
- Introduce
service_discovery_module(with multi-cluster-context support) and wire it intoatapp::app+ atbus connector reload flow. - Adjust configuration model:
atapp.etcdbecomes a minimal enable flag, and full discovery configuration moves toatapp.service_discovery_etcd. - Update tests and discovery node APIs (
copy_from/update_version) to carry acontext_ptrfor multi-context isolation.
Reviewed changes
Copilot reviewed 37 out of 37 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| test/case/atapp_upstream_forward_test.cpp | Switch discovery injection from etcd_module to service_discovery_module; update copy_from signature. |
| test/case/atapp_topology_change_test.cpp | Use service_discovery_module global discovery and updated copy_from signature. |
| test/case/atapp_test_etcd.yaml | Enable service_discovery_etcd for unit-test configs. |
| test/case/atapp_test_etcd_node2.yaml | Enable service_discovery_etcd for unit-test configs. |
| test/case/atapp_test_etcd_node1.yaml | Enable service_discovery_etcd for unit-test configs. |
| test/case/atapp_test_etcd_module.yaml | Enable service_discovery_etcd for module tests. |
| test/case/atapp_test_etcd_module_node3.yaml | Add a third-node config for multi-context discovery tests. |
| test/case/atapp_test_etcd_module_node2.yaml | Enable service_discovery_etcd for node2 module tests. |
| test/case/atapp_test_etcd_module_node1.yaml | Enable service_discovery_etcd for node1 module tests. |
| test/case/atapp_message_test.cpp | Use service_discovery_module global discovery and new copy_from signature. |
| test/case/atapp_etcd_module_unit_test.cpp | Port topology-storage tests to service_discovery_module::topology_storage_t. |
| test/case/atapp_etcd_module_test.cpp | Port etcd-module tests to use service_discovery_module APIs where discovery/topology are involved; add multi-context test. |
| test/case/atapp_etcd_cluster_test.cpp | Point cluster tests at service_discovery_module’s etcd ctx access path. |
| test/case/atapp_error_recovery_test.cpp | Update discovery injection to service_discovery_module and new copy_from signature. |
| test/case/atapp_downstream_send_test.cpp | Update discovery injection to service_discovery_module and new copy_from signature. |
| test/case/atapp_discovery_test.cpp | Update discovery node version APIs (copy_from, update_version) with context_ptr. |
| test/case/atapp_discovery_reconnect_test.cpp | Use service_discovery_module for global discovery add/remove in reconnect tests. |
| test/case/atapp_direct_connect_test.cpp | Update discovery module usage and refine test behavior for pre-discovery sends vs post-discovery direct connects. |
| test/case/atapp_configure_loader_test.yaml | Add service_discovery_etcd section for configure-loader tests. |
| test/case/atapp_configure_loader_test.env.txt | Rename env variables from ATAPP_ETCD_* (hosts/path/etc) to ATAPP_SERVICE_DISCOVERY_ETCD_*. |
| test/case/atapp_configure_loader_test.cpp | Validate service_discovery_etcd is parsed/loaded as the etcd discovery config. |
| test/case/atapp_configure_loader_test.conf | Rename config keys from etcd.* (hosts/path/etc) to service_discovery_etcd.*. |
| test/case/atapp_configure_expression_test.yaml | Add service_discovery_etcd section for expression tests. |
| test/case/atapp_configure_expression_test.env.txt | Rename env variables to ATAPP_SERVICE_DISCOVERY_ETCD_* for expression tests. |
| test/case/atapp_configure_expression_test.conf | Rename config keys to service_discovery_etcd.* for expression tests. |
| src/CMakeLists.txt | Build-system wiring for new service_discovery_module sources/headers. |
| src/atframe/modules/worker_pool_module.cpp | Add per-second tick “preserve sleep” accounting/behavior controls. |
| src/atframe/modules/service_discovery_module.cpp | New implementation: discovery/topology keepalives, watchers, snapshot handling, multi-context support. |
| src/atframe/etcdcli/etcd_discovery.cpp | Extend discovery node to track context_ptr and update signatures. |
| src/atframe/connectors/atapp_connector_atbus.cpp | Route topology keepalive refresh trigger via service_discovery_module. |
| src/atframe/atapp.cpp | Register service_discovery_module, route discovery accessors/senders/stats to it, and update discovery node construction signature. |
| include/atframe/modules/service_discovery_module.h | New public header for service_discovery_module. |
| include/atframe/modules/etcd_module.h | Refactor: introduce etcd_cluster_holder + init_cluster() and remove discovery/topology responsibilities from the header. |
| include/atframe/etcdcli/etcd_discovery.h | Update discovery node API signatures and add get_context_ptr(). |
| include/atframe/atapp.h | Add get_service_discovery_module() and store module instance in app. |
| include/atframe/atapp_conf.proto | Introduce atapp_etcd_common and move full discovery etcd config to service_discovery_etcd. |
Comments suppressed due to low confidence (1)
src/atframe/atapp.cpp:4814
command_handler_disable_etcdchecksinternal_module_service_discovery_for initialization, but the command’s logic operates oninternal_module_etcd_(is_etcd_enabled()/disable_etcd()). This makes the guard inconsistent and can incorrectly report “Etcd module is not initialized” based on the wrong module. The initialization check should align with the module actually being used (or the command should be updated to disable the service discovery etcd context instead).
int app::command_handler_disable_etcd(atfw::util::cli::callback_param params) {
if (!internal_module_service_discovery_) {
const char *msg = "Etcd module is not initialized, skip command.";
FWLOGERROR("{}", msg);
add_custom_command_rsp(params, msg);
} else if (internal_module_etcd_->is_etcd_enabled()) {
internal_module_etcd_->disable_etcd();
| auto wait_preserve_us = self->get_owner().configure_tick_preserve_microseconds_in_second.load(std::memory_order_acquire); | ||
| if (wait_preserve_us <= 0) { | ||
| wait_preserve_us = 8000; | ||
| } | ||
|
|
||
| std::unique_lock<std::mutex> lk_cv(self->waker_lock_); | ||
| self->status_.store(static_cast<uint8_t>(worker_status::kSleeping), std::memory_order_release); | ||
| self->waker_cv_.wait_for(lk_cv, tick_interval - (busy_end_time - start_time)); | ||
| self->status_.store(static_cast<uint8_t>(worker_status::kRunning), std::memory_order_release); | ||
|
|
||
| self->cpu_time_sleep_us_.fetch_add(wait_preserve_us, std::memory_order_release); |
| if (changed) { | ||
| local_cache_iter->second.info = topology_info.storage.info; | ||
| info_ptr = local_cache_iter->second.info; | ||
| } |
| // atapp_log nullptr 使用默认Logger | ||
| // atapp_log 非空但 category 为空不使用日志 | ||
| class etcd_cluster_holder; | ||
| using etcd_cluster_load_conf_func_t = std::function<void(atframework::atapp::app &, etcd_cluster_holder &)>; | ||
| using etcd_cluster_init_keepalive_watcher_func_t = | ||
| std::function<int(atframework::atapp::app &, etcd_cluster_holder &, bool init)>; | ||
| using etcd_cluster_reset_keepalive_watcher_func_t = | ||
| std::function<void(atframework::atapp::app &, etcd_cluster_holder &)>; | ||
|
|
||
| using etcd_cluster_holder_ptr_t = std::shared_ptr<etcd_cluster_holder>; | ||
|
|
| #include <atframe/etcdcli/etcd_cluster.h> | ||
| #include <atframe/etcdcli/etcd_discovery.h> | ||
| #include <atframe/etcdcli/etcd_keepalive.h> | ||
| #include <atframe/etcdcli/etcd_watcher.h> | ||
| #include <atframe/modules/etcd_module.h> | ||
|
|
||
| #include <list> | ||
| #include <mutex> | ||
| #include <set> | ||
| #include <string> | ||
| #include <unordered_map> | ||
|
|
owent
left a comment
There was a problem hiding this comment.
AICR problem for src/atframe/modules/worker_pool_module.cpp:486
| wait_preserve_us = 8000; | ||
| } | ||
|
|
||
| std::unique_lock<std::mutex> lk_cv(self->waker_lock_); |
There was a problem hiding this comment.
MEDIUM
·
correctness
worker_pool_module 的秒级保护等待逻辑存在问题:wait_for 使用 tick_interval - (busy_end_time - start_time) 作为等待时长,但该计算可能产生负值(若 busy_end_time - start_time >= tick_interval),且 cpu_time_sleep_us_ 记录的是 wait_preserve_us(默认8000μs),而非实际等待时长。当常规等待和秒级保护等待在同一迭代中先后触发时,实际等待时长会是双倍,但统计值不准确。
Location: src/atframe/modules/worker_pool_module.cpp:486
Referenced code (src/atframe/modules/worker_pool_module.cpp:486):
}
std::unique_lock<std::mutex> lk_cv(self->waker_lock_);
self->status_.store(static_cast<uint8_t>(worker_status::kSleeping), std::memory_order_release);
self->waker_cv_.wait_for(lk_cv, tick_interval - (busy_end_time - start_time));Suggested fix:
考虑使用 wait_preserve_us 作为实际等待时长,并统一 sleep 统计逻辑;或在秒级保护等待块中重新计算等待时长。
owent
left a comment
There was a problem hiding this comment.
AICR problem for include/atframe/atapp_conf.proto:583
| atbus_configure bus = 301; | ||
|
|
||
| atapp_etcd etcd = 401; | ||
| atapp_etcd_common etcd = 401; |
There was a problem hiding this comment.
HIGH
·
api_compatibility
atapp_configure.etcd 字段类型从 atapp_etcd 改为 atapp_etcd_common 是破坏性 API 变更。现有配置文件中的 etcd.hosts、etcd.path 等字段将被静默忽略,用户需手动迁移至 service_discovery_etcd 字段。虽然 wire format 上 bool enable 字段可能兼容,但语义层面会导致功能失效。
Location: include/atframe/atapp_conf.proto:583
Referenced code (include/atframe/atapp_conf.proto:583):
atbus_configure bus = 301;
atapp_etcd_common etcd = 401;
atapp_etcd service_discovery_etcd = 402;
Suggested fix:
建议在发布说明中明确标注此为破坏性变更,并提供配置迁移指南;或考虑保留 etcd 字段的完整 atapp_etcd 类型向后兼容。
owent
left a comment
There was a problem hiding this comment.
AICR problem for src/atframe/modules/service_discovery_module.cpp:644
| cluster_context_.add_topology_watcher_callback(fn); | ||
| } | ||
|
|
||
| LIBATAPP_MACRO_API const ::atframework::atapp::etcd_cluster &service_discovery_module::get_raw_etcd_ctx() const { |
There was a problem hiding this comment.
LOW
·
correctness
get_raw_etcd_ctx() 直接解引用 cluster_context_.etcd_cluster_holder_ 而无空指针检查。若在模块初始化前调用此方法(虽然正常流程不应发生),会导致崩溃。建议添加防御性检查或在文档中明确调用前置条件。
Location: src/atframe/modules/service_discovery_module.cpp:644
Referenced code (src/atframe/modules/service_discovery_module.cpp:644):
}
LIBATAPP_MACRO_API const ::atframework::atapp::etcd_cluster &service_discovery_module::get_raw_etcd_ctx() const {
return cluster_context_.etcd_cluster_holder_->get_etcd_cluster();
}
owent
left a comment
There was a problem hiding this comment.
AICR problem for src/atframe/modules/etcd_module.cpp:693
| if (conf_path_cache_.empty()) { | ||
| etcd_module *self = const_cast<etcd_module *>(this); | ||
| self->conf_path_cache_ = generate_etcd_path(get_configure().path()); | ||
| int etcd_cluster_holder::http_callback_on_etcd_closed(atfw::util::network::http_request &req) { |
There was a problem hiding this comment.
LOW
·
maintainability
http_callback_on_etcd_closed 函数内日志消息仍引用 "etcd_module",但该方法已迁移至 etcd_cluster_holder 类,日志消息应更新以反映当前类名。
Location: src/atframe/modules/etcd_module.cpp:693
Referenced code (src/atframe/modules/etcd_module.cpp:693):
}
int etcd_cluster_holder::http_callback_on_etcd_closed(atfw::util::network::http_request &req) {
etcd_cluster_holder *self = reinterpret_cast<etcd_cluster_holder *>(req.get_priv_data());
if (nullptr == self) {
AI Code Review
AI Code Review Summaryetcd/service_discovery模块重构问题Target: ETCD Cluster 管理与 Discovery 剥离 评审范围审查了将 service discovery 功能从
发现问题高严重度
中严重度
限制与不确定性
Problems (2)
触发场景:当 修复方向:应在重置计数器后重新计算当前状态下的等待时长,或在 line 488 使用独立的 触发场景:使用旧版本配置文件的部署环境在升级后,etcd 的 hosts、path、authorization 等关键配置会丢失,导致服务发现功能异常或默认行为改变。 影响:这是一个破坏性 API 变化,需要用户更新所有配置文件将 etcd 配置迁移到新的 修复方向:考虑保留字段 401 的 Code reference:
|
owent
left a comment
There was a problem hiding this comment.
AICR problem for src/atframe/modules/worker_pool_module.cpp:488
|
|
||
| std::unique_lock<std::mutex> lk_cv(self->waker_lock_); | ||
| self->status_.store(static_cast<uint8_t>(worker_status::kSleeping), std::memory_order_release); | ||
| self->waker_cv_.wait_for(lk_cv, tick_interval - (busy_end_time - start_time)); |
There was a problem hiding this comment.
HIGH
·
correctness
在计数器重置后的第二次等待中使用了过时的时间值。busy_end_time 和 start_time 在 line 457-458 计算后可能已经经过了第一次 wait_for 等待(lines 461-466),但 line 488 的 wait_for(lk_cv, tick_interval - (busy_end_time - start_time)) 继续使用这些过时值计算等待时长。当线程已经等待一段时间后,这些变量不再反映当前状态,导致等待时长计算错误。
触发场景:当 current_tick_second_busy_us_ + current_tick_second_waited_us_ >= 1000000 时进入重置逻辑,此时使用的等待时长基于之前的时间点,可能产生负数或显著偏离预期值的等待时间。
修复方向:应在重置计数器后重新计算当前状态下的等待时长,或在 line 488 使用独立的 wait_preserve_us 值而非重新计算 tick 间隔差值。
Location: src/atframe/modules/worker_pool_module.cpp:488
Referenced code (src/atframe/modules/worker_pool_module.cpp:488):
std::unique_lock<std::mutex> lk_cv(self->waker_lock_);
self->status_.store(static_cast<uint8_t>(worker_status::kSleeping), std::memory_order_release);
self->waker_cv_.wait_for(lk_cv, tick_interval - (busy_end_time - start_time));
self->status_.store(static_cast<uint8_t>(worker_status::kRunning), std::memory_order_release);
owent
left a comment
There was a problem hiding this comment.
AICR problem for include/atframe/atapp_conf.proto:583
| atbus_configure bus = 301; | ||
|
|
||
| atapp_etcd etcd = 401; | ||
| atapp_etcd_common etcd = 401; |
There was a problem hiding this comment.
MEDIUM
·
api_compatibility
Protobuf schema 字段编号 401 被复用为不同消息类型,存在向后兼容性风险。旧配置将 atapp_etcd(包含 hosts、path、authorization 等多个字段)序列化到字段 401。新代码将字段 401 改为 atapp_etcd_common(仅包含 enable 字段),原有的 etcd 配置字段在反序列化时会被静默忽略。
触发场景:使用旧版本配置文件的部署环境在升级后,etcd 的 hosts、path、authorization 等关键配置会丢失,导致服务发现功能异常或默认行为改变。
影响:这是一个破坏性 API 变化,需要用户更新所有配置文件将 etcd 配置迁移到新的 service_discovery_etcd 字段(402)。
修复方向:考虑保留字段 401 的 atapp_etcd 类型并新增字段,或在文档中明确标注此为 Breaking Change 并提供迁移指南。
Location: include/atframe/atapp_conf.proto:583
Referenced code (include/atframe/atapp_conf.proto:583):
atbus_configure bus = 301;
atapp_etcd_common etcd = 401;
atapp_etcd service_discovery_etcd = 402;
No description provided.