Skip to content

Refactor usage polling to paged APIs with legacy fallback#138

Merged
seakee merged 12 commits into
seakee:mainfrom
zly2006:optimize-usage-conditional-requests
May 27, 2026
Merged

Refactor usage polling to paged APIs with legacy fallback#138
seakee merged 12 commits into
seakee:mainfrom
zly2006:optimize-usage-conditional-requests

Conversation

@zly2006
Copy link
Copy Markdown

@zly2006 zly2006 commented May 24, 2026

中文

变更内容

  • 将监控页的高频用量轮询从完整 /v0/management/usage 响应拆分为更小的分页/聚合接口:summaryaccountsapi-keysrealtimemodels
  • /v0/management/usage/summary 只返回全局聚合数据和筛选所需 facets,不再携带详细 breakdown;前端下拉选项优先使用 summary facets,旧后端仍从 legacy rows 推导。
  • 新增 model breakdown 分页接口,并让预估花费基于分页拉取后的 model 聚合数据计算;完整 model 聚合 traversal 固定从 page 1 开始,避免跳页/重复页。
  • 修复多页 model usage 合并逻辑,完整合并 tokens.*latency_sum_mslatency_count 和重复 endpoint/model 的 details。
  • 保留旧后端兼容能力:summary 或分页接口明确返回 404/405 时回退到旧版 /v0/management/usage;网络错误、超时、500、401/403 等非兼容性错误不再被 legacy fallback 掩盖。
  • 后端为分页接口增加 page_size 硬上限,并通过白名单校验 sort_key / sort_direction,避免过大分页或用户输入直接影响排序逻辑。
  • 账号/API Key 分页分组的 totalCost 排序现在按已保存 model price 计算预估成本,不再错误地退化为按 token 数排序。
  • Summary facets 现在只保留时间范围和搜索条件,不再被已选 account/provider/model/channel/API key/status 缩窄,避免下拉菜单只剩当前选中项。
  • accounts、api-keys、realtime 分页端点现在返回直接 items;这三类端点的 usage.apis detail aggregates 为空,page size 约束返回的分页项而不是继续传 endpoint 聚合明细。
  • 将 usage 搜索从 LIKE 改为 SQLite FTS5,并覆盖 API Key alias;FTS 查询现在使用 token prefix 语义,例如 co / cod 可匹配 codex,多词输入保持 AND 语义。
  • 为 FTS insert trigger 增加 append-only 说明:当前 usage_events 只追加;以后如果引入 retention cleanup、delete 或 update,需要补充对应 FTS trigger。
  • 修复相对时间范围刷新时 end_ms 被初次 render 固定的问题,避免总 tokens 和预估花费在第一次刷新后不再更新。

当前边界

  • accounts / api-keys 接口当前是 phase-one 优化:HTTP 响应已分页,但后端仍先按当前过滤条件聚合 detail 后在 Go 内存中 group/sort/slice。PR 中补了较大数据集分页测试;后续如继续扩大保留周期或数据量,应把这两类聚合下推到 SQL GROUP BY + ORDER BY + LIMIT/OFFSET
  • GitHub 之前提示的 hidden/bidirectional Unicode 来自生成的嵌入面板产物;该文件已保持 main 中的 placeholder 状态,不再进入 PR diff。当前变更文件扫描未发现 bidi/hidden Unicode 控制字符。

验证

  • go test ./...
  • go test ./internal/store ./internal/httpapi
  • npm test -- src/features/monitoring/hooks/useUsageData.test.ts src/services/api/usageService.test.ts src/features/monitoring/hooks/useMonitoringData.test.ts src/pages/MonitoringCenterPage.test.tsx
  • npm run type-check
  • npm run lint
  • npm run build
  • 使用 dpsk / DeepSeek review 做了独立代码审查,并修复其中确认真实的 totalCost 后端排序问题。
  • 部署到 offcpa-manager-office cpa-manager:pr138-no-page-aggregates-20260527-0314/health 返回 {"ok":true,"service":"cpa-manager"}
  • 部署后验证 /v0/management/usage/summary 返回 apis=0 且 facets 非空:providers=2、accounts=2、models=13、channels=2、api_keys=6。
  • 部署后验证 FTS prefix 搜索:search=cosearch=codsearch=gem 均返回匹配 summary,且 summary 仍不携带 detail apis
  • 部署后验证 /v0/management/usage/accounts?page=1&page_size=2&sort_key=totalCost&sort_direction=descaccountsapi-keysrealtimemodels 分页端点均返回 200 和分页数据。
  • 部署后验证 accounts/api-keys/realtime 分页响应返回直接 itemsusage.apis=0;models 仍通过 usage.apis 返回 model aggregates。
  • 部署后验证带 account/provider/model/channel/API key/status 的 summary 请求仍返回完整 facets 候选:providers=2、accounts=2、models=13、channels=2、api_keys=6。

English

Changes

  • Split the monitoring page's high-frequency usage polling from the full /v0/management/usage payload into smaller paged/aggregate endpoints: summary, accounts, api-keys, realtime, and models.
  • /v0/management/usage/summary now returns only global aggregate data plus filter facets; detailed breakdowns are no longer included. The frontend builds dropdown options from summary facets first and keeps legacy row-derived options for old backends.
  • Added a paged model breakdown endpoint and compute estimated cost from the loaded model aggregates. Full model aggregate traversal now always starts from page 1 to avoid skipped or duplicated pages.
  • Fixed multi-page model usage merging so it combines tokens.*, latency_sum_ms, latency_count, and appends details for repeated endpoint/model keys.
  • Kept legacy backend compatibility only for explicit unsupported-endpoint responses: summary or paged APIs fall back to /v0/management/usage on 404/405. Network errors, timeouts, 500s, 401/403s, and other non-compatibility failures are surfaced instead of being hidden by fallback.
  • Added a hard backend cap for page_size and backend whitelist validation for sort_key / sort_direction, preventing oversized pages or user input from directly affecting sorting logic.
  • Fixed totalCost sorting for account/API-key grouped pages so it uses saved model prices to estimate cost instead of silently sorting by token count.
  • Summary facets now keep only time range and search constraints instead of selected account/provider/model/channel/API key/status filters, so dropdowns do not collapse to the current selection.
  • Accounts, API-key, and realtime paged endpoints now return direct items; their usage.apis detail aggregates are empty, so page size controls the returned page items instead of shipping endpoint-grouped details.
  • Replaced usage search LIKE matching with SQLite FTS5 and included API Key aliases. FTS now uses token prefix semantics, so co / cod can match codex, while multi-term input keeps AND semantics.
  • Documented the FTS trigger strategy in code: usage_events is currently append-only; future retention cleanup, delete, or update paths must add matching FTS triggers.
  • Fixed stale relative time ranges where end_ms was frozen after the first render, which could keep total tokens and estimated cost unchanged after the first refresh.

Current Boundary

  • accounts / api-keys are a phase-one optimization: the HTTP response is paginated, but the backend still aggregates details for the current filter and then groups/sorts/slices in Go memory. This PR adds a larger dataset pagination test; if retention or data volume grows further, these two aggregations should be pushed down to SQL with GROUP BY + ORDER BY + LIMIT/OFFSET.
  • The earlier hidden/bidirectional Unicode warning came from the generated embedded panel artifact. That file is kept at the main-branch placeholder state and is no longer part of the PR diff. The current changed files were scanned and contain no bidi/hidden Unicode control characters.

Verification

  • go test ./...
  • go test ./internal/store ./internal/httpapi
  • npm test -- src/features/monitoring/hooks/useUsageData.test.ts src/services/api/usageService.test.ts src/features/monitoring/hooks/useMonitoringData.test.ts src/pages/MonitoringCenterPage.test.tsx
  • npm run type-check
  • npm run lint
  • npm run build
  • Ran an independent dpsk / DeepSeek review and fixed the confirmed real totalCost backend sorting issue it found.
  • Deployed to off: cpa-manager-office cpa-manager:pr138-no-page-aggregates-20260527-0314, and /health returned {"ok":true,"service":"cpa-manager"}.
  • After deployment, verified /v0/management/usage/summary returns apis=0 with non-empty facets: providers=2, accounts=2, models=13, channels=2, api_keys=6.
  • After deployment, verified FTS prefix searches search=co, search=cod, and search=gem return matching summaries while keeping summary detail apis empty.
  • After deployment, verified /v0/management/usage/accounts?page=1&page_size=2&sort_key=totalCost&sort_direction=desc, accounts, api-keys, realtime, and models paged endpoints return 200 with paged data.
  • After deployment, verified accounts/api-keys/realtime page responses return direct items with usage.apis=0; models still returns model aggregates through usage.apis.
  • After deployment, verified a summary request with account/provider/model/channel/API key/status filters still returns broad facet candidates: providers=2, accounts=2, models=13, channels=2, api_keys=6.

@zly2006 zly2006 force-pushed the optimize-usage-conditional-requests branch from 3ff8db6 to 07aceca Compare May 24, 2026 22:18
@zly2006 zly2006 changed the title Reduce usage polling payloads with conditional requests Reduce usage polling payloads with conditional & aggregate requests May 24, 2026
@zly2006 zly2006 force-pushed the optimize-usage-conditional-requests branch 4 times, most recently from 0ae6c22 to d3ddf09 Compare May 24, 2026 22:49
@zly2006 zly2006 marked this pull request as draft May 24, 2026 22:52
@zly2006 zly2006 force-pushed the optimize-usage-conditional-requests branch 4 times, most recently from 1d3000d to 1e2f1ec Compare May 24, 2026 23:41
@zly2006 zly2006 changed the title Reduce usage polling payloads with conditional & aggregate requests Refactor usage polling to summary API with legacy fallback May 24, 2026
@zly2006 zly2006 force-pushed the optimize-usage-conditional-requests branch from 1e2f1ec to 522d7f1 Compare May 25, 2026 00:04
@zly2006 zly2006 changed the title Refactor usage polling to summary API with legacy fallback Refactor usage polling to paged APIs with legacy fallback May 25, 2026
@zly2006 zly2006 force-pushed the optimize-usage-conditional-requests branch from 522d7f1 to 9417d6d Compare May 25, 2026 00:10
@zly2006 zly2006 marked this pull request as ready for review May 25, 2026 00:13
@zly2006
Copy link
Copy Markdown
Author

zly2006 commented May 25, 2026

@seakee could u please review?

@seakee
Copy link
Copy Markdown
Owner

seakee commented May 26, 2026

@seakee could u please review?

Thanks for the PR.

The current PR does help alleviate the problems of slow loading, timeouts, and high frontend aggregation pressure on the request monitoring page under large data volumes. The subsequent refresh logic fix is also valuable, as it prevents the end_ms from being fixed under a relative time range, which would otherwise cause the token and cost statistics not to update after a refresh.

However, this PR affects the core data path of CPA-Manager request monitoring, so I’m not recommending merging it directly just yet. A few things still need to be addressed:

  1. Although the paginated endpoint has already been split out, the /summary endpoint may still return a large number of details grouped by dimensions such as account, provider, model, api key hash, source, etc. When there are many combinations of accounts, models, and API keys, the summary itself could still become a large payload. I’d suggest making the summary return only top-level statistics and offloading all detailed/breakdown data to the paginated endpoint as much as possible, or at least adding a limit / lazy-loading parameter.

  2. Please confirm that there is a hard upper limit for page_size on the backend to prevent excessively large values from causing internal DoS or putting too much pressure on SQLite. In addition, sort_key / sort_direction must be mapped through a backend whitelist and must not be directly concatenated from user-supplied values.

  3. Currently, the search performs a LIKE "%term%" query across multiple fields, which could be quite slow under large data volumes. I’d suggest documenting the performance boundaries for a large time range combined with search, restricting the time range when necessary, or considering indexes / FTS optimization later.

  4. GitHub is flagging hidden / bidirectional Unicode characters that need to be handled. Hidden Unicode characters should not remain in core code. Please clean them up or explain their specific source and impact.

  5. At this point, the change is no longer just a frontend refactoring; it introduces a set of stable management APIs. Tests need to be added for summary, pagination, filtering, sorting, page size limits, legacy fallback, etc., to avoid discrepancies in request monitoring statistics down the line.

@zly2006
Copy link
Copy Markdown
Author

zly2006 commented May 26, 2026

thanks for your review, I will fix them

@zly2006 zly2006 force-pushed the optimize-usage-conditional-requests branch from d741d30 to a34cfbb Compare May 26, 2026 11:27
@zly2006
Copy link
Copy Markdown
Author

zly2006 commented May 26, 2026

I have implemented the suggestions above. I tested manually and it works very well, but I have not manually looked through the code, I will review them later.

I have something to check with you, after using FTS 5, users cannot search with partial queries, e.g. if I want to search codex usages I cannot get it with "co" "code" etc, I have to input full query. Is this good enough for user experience? @seakee

@zly2006 zly2006 force-pushed the optimize-usage-conditional-requests branch from a34cfbb to 2f5b26c Compare May 26, 2026 11:31
@seakee
Copy link
Copy Markdown
Owner

seakee commented May 26, 2026

First of all, thank you again for this PR and for the follow-up updates.

I agree with the overall direction, and I can see that you have already addressed a number of earlier review points. Splitting the monitoring page from high-frequency polling of the full /v0/management/usage payload into smaller aggregate/paged endpoints such as summary / accounts / api-keys / realtime / models is the right direction for improving CPA-Manager’s request monitoring.

One extra note: if this PR is eventually merged into CPA-Manager, these usage aggregation/pagination APIs will very likely be referenced or adapted in CPA-Manager-Plus as well. CPA-Manager-Plus is also improving request monitoring, data summaries, caller API key statistics, and account-level statistics, so having this API design stabilized here would be directly useful there too.

You are also very welcome to continue contributing PRs to CPA-Manager-Plus. Once the usage query protocol, pagination shape, FTS search behavior, and model cost aggregation are polished here, reusing them in Plus should be quite natural.

That said, because this PR touches the core data path of request monitoring, I still think a few issues should be addressed before merge:

  1. loadModelPages should always start from page 1

Currently the first request uses usagePageQueries.models as-is. If the caller ever passes models.page = 3, it would fetch page 3 first, then page 2..N, skipping page 1 and potentially double-counting page 3.

Even if the current caller always passes page 1, the helper contract is still unsafe. Please either force the first request to use page: 1, or remove page from the model aggregate query type and make it explicit that this path performs a full model aggregate traversal.

  1. mergeUsagePayloads is incomplete

It currently merges only total_requests / success_count / failure_count / total_tokens, but not tokens.*, latency_sum_ms, or latency_count.

Also, when merging apis.models, it appears to assign by model key. If different pages contain the same endpoint + model but different resolved model / failed dimensions, the later page may overwrite previous details. Please append details instead of overwriting, and recompute tokens / latency / totals consistently.

  1. Summary fallback should only happen on 404/405

Right now, if the summary request fails due to a network error, timeout, CORS issue, or connection failure, the status can be undefined and the code may still fall back to the legacy /v0/management/usage endpoint.

This can hide the real error and make refreshes slower. Please only fallback on 404/405, and rethrow all other errors.

  1. FTS5 search should support prefix search

After replacing LIKE "%term%" with regular FTS5 token matching, users may need to type the full token. For example, searching for co or code may not match codex, which is a UX regression for the monitoring page.

I do not suggest reverting to LIKE, but I think FTS5 should support prefix search, for example:

  • co -> co*
  • code -> code*
  • multi-term input can be converted into multiple prefix tokens using the current search semantics

The FTS table could also use a prefix index such as prefix='2 3 4'. I do not think arbitrary substring search like dex -> codex is required; prefix search should be enough.

  1. Please clean up the hidden / bidirectional Unicode warning

GitHub still shows a hidden/bidirectional Unicode warning. Unless there is a specific reason to keep those characters, please remove them before merge to avoid confusion in future reviews and maintenance.

  1. accounts / api-keys pagination is still not SQL-level pagination

The frontend payload is now paginated, which is a good improvement. However, the backend still appears to call usageSummary(..., includeDetails=true), then flatten, group, sort, and slice in Go memory for accounts and api-keys.

This means large time ranges can still create backend memory and aggregation pressure. If this PR is intended as a phase-one optimization, that is acceptable, but please document the current boundary in the PR description and ideally add a larger dataset test.

Longer term, accounts and api-keys should also be pushed down into SQL with GROUP BY + ORDER BY + LIMIT/OFFSET.

  1. FTS trigger strategy needs clarification

usage_events_fts appears to mainly rely on an insert trigger. If usage_events is append-only, that is acceptable for now. But if retention, pruning, delete, or update logic is added later, update/delete triggers will be needed; otherwise the FTS table may contain stale rows.

Please either add those triggers now, or at least add a clear comment that usage_events is currently append-only and that FTS triggers must be extended before introducing cleanup/mutation logic.

Overall, I am positive on this PR. The direction is correct, it has real value, and it aligns well with the future monitoring improvements in both CPA-Manager and CPA-Manager-Plus.

However, because this may become a foundation for later request monitoring work and possible Plus-side reuse, I would prefer to polish the issues above before merging.

@zly2006
Copy link
Copy Markdown
Author

zly2006 commented May 26, 2026

tysm! I will search for some docs about prefix searching and try to improve it

@zly2006
Copy link
Copy Markdown
Author

zly2006 commented May 26, 2026

oh sorry I didn't notice that this repo was maintenance only. Do you want me to create a new PR in CPA-Manager-Plus?

@seakee
Copy link
Copy Markdown
Owner

seakee commented May 26, 2026

oh sorry I didn't notice that this repo was maintenance only. Do you want me to create a new PR in CPA-Manager-Plus?

No worries, and thanks again for working on this.

My plan is not to reject this PR just because CPA-Manager is mostly maintenance-only. In my view, this kind of monitoring performance improvement still falls within maintenance work, and it was already part of my original plan. This PR actually helps fill an important gap in that plan, so I still think it has value for CPA-Manager.

That said, CPA-Manager-Plus is where I plan to continue more active development. It already has some optimizations around request monitoring and usage aggregation, but the implementation is still not thorough enough, and further performance work is also on the roadmap there.

So my suggestion is:

  • Please continue polishing this PR for CPA-Manager if you are willing. I think it is still useful and relevant here.
  • You are also very welcome to create a new PR in CPA-Manager-Plus, especially if you want to continue improving the monitoring performance work there.
  • The design and fixes from this PR can be a good reference for the Plus-side implementation, but Plus may need some adjustments because its frontend/backend structure is different.

@seakee
Copy link
Copy Markdown
Owner

seakee commented May 27, 2026

Thanks for the follow-up updates. I reviewed the latest head again.

This version looks much better, and most of the previous blocking issues have been addressed.

Confirmed fixed:

  • loadModelPages now always starts the full model aggregate traversal from page 1, and there is a regression test for the models.page > 1 case.
  • mergeUsagePayloads now merges tokens.*, latency aggregates, and appends repeated endpoint/model details instead of overwriting them.
  • summary legacy fallback is now limited to 404/405. Network errors, timeouts, 500s, and auth errors are no longer hidden by fallback.
  • FTS5 prefix search is now supported with prefix indexes, so searches like co / cod can match codex.
  • accounts / api-keys / realtime paged endpoints now return direct items, and usage.apis no longer carries detail aggregates for those page types.
  • totalCost sorting now uses saved model prices instead of falling back to token count.
  • The PR description now clearly documents the current boundary for accounts/api-keys backend aggregation.

At this point I no longer think this needs to be blocked on the previous review items.

There are still a few follow-up items worth keeping in mind:

  1. accounts / api-keys pagination is still phase-one optimization. The HTTP response is paginated, but the backend still aggregates details and then groups/sorts/slices in Go memory. This is now documented, so I am fine with tracking SQL-level GROUP BY + ORDER BY + LIMIT/OFFSET as a follow-up.
  2. The route dispatch still uses strings.HasSuffix; trimming trailing slashes or using exact path matching would make the new endpoints more predictable.
  3. loadModelPages still fans out with Promise.all across all model pages. This is probably fine for now because the model cardinality should be small, but a future concurrency cap would be safer.
  4. For environments that already ran an older build of this PR, the existing FTS virtual tables may need to be rebuilt to pick up the new prefix index. This should not affect a clean merge into main, since main does not have these FTS tables yet.

Overall, I am positive on this PR now. It fits the maintenance/performance optimization scope of CPA-Manager, and it can also serve as a useful reference for CPA-Manager-Plus monitoring optimization later.

I think this is now mergeable.

@seakee seakee merged commit 8ca939c into seakee:main May 27, 2026
3 checks passed
@zly2006 zly2006 deleted the optimize-usage-conditional-requests branch May 27, 2026 08:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants