Skip to content

fix: validate and dedupe geo ask platforms#23

Closed
study8677 wants to merge 1 commit into
mainfrom
codex/propose-fix-for-geo-ask-vulnerability
Closed

fix: validate and dedupe geo ask platforms#23
study8677 wants to merge 1 commit into
mainfrom
codex/propose-fix-for-geo-ask-vulnerability

Conversation

@study8677
Copy link
Copy Markdown
Owner

Motivation

  • The GEO ad-hoc ask flow allowed an attacker to submit unbounded or repeated platform entries which were expanded into parallel provider tasks, enabling cost/amplification and availability issues.
  • The endpoint also did not validate element types which could raise uncaught errors during provider selection.

Description

  • Add strict request-side validation in POST /api/v1/projects/{project_id}/geo/ask to require platforms be a list, cap length to 20, require non-empty string elements, and perform case-insensitive deduplication before dispatch.
  • Harden opencmo.tools.geo_ask._select_providers to safely handle non-string entries, ignore blank names, and deduplicate case-insensitively so duplicate inputs cannot cause fan-out even if upstream validation is bypassed.
  • Add regression tests exercising deduplication, non-string handling, max-entry validation, and router-side normalization to prevent reintroduction of the amplification bug.

Testing

  • Ran the router/unit test file with pytest tests/test_geo_ask.py which initially failed when PYTHONPATH was not set; this was resolved by running PYTHONPATH=src pytest tests/test_geo_ask.py -q.
  • With PYTHONPATH=src the test run executed the new tests and reported the suite ran with skips for optional FastAPI dependencies and no failing assertions (summary: 1 skipped, 2 warnings).
  • The changes are minimal and focused on input validation and selection logic and do not change provider execution semantics beyond preventing duplicate or malformed dispatches.

Codex Task

Copy link
Copy Markdown
Owner Author

Code Review — fix: validate and dedupe geo ask platforms

总体评价:这是一个针对 GEO ask 端点放大攻击漏洞的精准安全修复,采用了路由层 + 工具层的纵深防御策略,逻辑清晰,测试覆盖主要路径。建议 Approve,但有一个格式问题(阻断级)需要修复。


问题清单

级别 文件 & 行号 描述 建议
🔴 阻断 tests/test_geo_ask.py:217-218 test_select_providers_non_string_is_unknowntest_list_available_platforms_reports_status 之间缺少空行,两函数直接相邻,违反 PEP 8,部分 linter/解析工具会报错 在第 217 行 assert unknown == ["123"] 后补两个空行
🟡 建议 src/opencmo/web/routers/projects.py:406-407 长度检查在去重之前执行,导致 ["Perplexity"] * 21(21 个相同条目,去重后实为 1 个)被 400 拒绝,与 API 语义不符 先去重,再做 > 20 长度校验;或将错误信息改为 "max 20 unique entries" 说明意图
🟡 建议 tests/test_geo_ask.py:203-205 三个连续空行(PEP 8 要求顶层函数间两个空行) 删除多余的一个空行
🟢 优化 src/opencmo/tools/geo_ask.py:60 空白字符串在 _select_providers 中被静默跳过(不加入 unknown),而非字符串类型却会加入 unknown,两者行为不一致 将空白字符串也加入 unknown,使调用方可感知被忽略的条目
🟢 优化 tests/test_geo_ask.py 缺少对 _select_providers([" "]) 纯空白字符串静默跳过行为的测试 补充 test_select_providers_blank_string_is_skipped
🟢 优化 PR 描述中测试结果 1 skipped, 2 warnings 新增 3 个路由测试 + 2 个单元测试共 5 个,skipped 只有 1,需确认路由测试实际执行(而非因缺 httpx 被跳过) 完整运行 pytest tests/test_geo_ask.py -v 确认无意外 skip

亮点

  • 纵深防御:路由层和 _select_providers 工具层各自独立校验,即使绕过路由层,工具层仍能正确处理脏数据。
  • ask_platforms 以 keyword argument 传递 platform_namesprojects.py:429),使 test_geo_ask_endpoint_platforms_deduped_before_call 中的 await_args.kwargs["platform_names"] 断言可靠。
  • 去重时保留首次出现的原始大小写,符合最小惊讶原则。
  • 路由层注释清晰说明了安全意图(# Collapse duplicate provider names…to prevent fan-out amplification)。

修改示例

🔴 修复缺失空行

# tests/test_geo_ask.py
def test_select_providers_non_string_is_unknown(stub_registry):
    ...
    assert unknown == ["123"]


def test_list_available_platforms_reports_status(stub_registry):  # ← 补两个空行

🟡 先去重再校验长度

# projects.py
if platforms is not None:
    if not isinstance(platforms, list):
        return JSONResponse({"error": "platforms must be a list of strings"}, status_code=400)
    if any(not isinstance(name, str) or not name.strip() for name in platforms):
        return JSONResponse({"error": "platforms must be a list of non-empty strings"}, status_code=400)
    # Deduplicate first, then enforce cap on distinct entries.
    deduped: list[str] = []
    seen: set[str] = set()
    for name in platforms:
        key = name.strip().lower()
        if key not in seen:
            seen.add(key)
            deduped.append(name.strip())
    if len(deduped) > 20:
        return JSONResponse({"error": "platforms too long (max 20 unique entries)"}, status_code=400)
    platforms = deduped

Generated by Claude Code

study8677 added a commit that referenced this pull request May 21, 2026
* fix: validate and dedupe geo ask platforms

* fix: bind docker compose port to localhost by default

* fix: prevent verify-email auth bypass for verified users

* fix: fail closed for account-scoped publish credentials

* fix: prevent admin privilege escalation via signup

* test: update admin/publisher tests for security fixes

- test_publishers.py: replace env vars with llm.set_request_keys() since
  publish credentials no longer fall back to os.environ (account-scoped).
- test_trial_platform.py: add _seed_admin() helper that activates the
  bootstrapped !unusable admin row directly, since signup can no longer
  claim that row to prevent admin privilege escalation.

* test: fix ruff I001 import order
@study8677
Copy link
Copy Markdown
Owner Author

Incorporated via #27 → main.

@study8677 study8677 closed this May 21, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant