Skip to content

fix: extend AI co-author pattern check with more model names#788

Merged
BLumia merged 1 commit into
linuxdeepin:masterfrom
hudeng-go:fix-claude
Jun 5, 2026
Merged

fix: extend AI co-author pattern check with more model names#788
BLumia merged 1 commit into
linuxdeepin:masterfrom
hudeng-go:fix-claude

Conversation

@hudeng-go
Copy link
Copy Markdown
Contributor

@hudeng-go hudeng-go commented Jun 5, 2026

Add claude, anthropic, openai, chatgpt, gemini, deepseek, mistral, llama, codestral to the blocked co-author pattern list to catch more AI-generated co-author lines.

Summary by Sourcery

Build:

  • Update commitlint GitHub Actions workflow to block co-author lines referencing additional AI tools and models.

Add claude, anthropic, openai, chatgpt, gemini, deepseek, mistral,
llama, codestral to the blocked co-author pattern list to catch
more AI-generated co-author lines.
@sourcery-ai
Copy link
Copy Markdown

sourcery-ai Bot commented Jun 5, 2026

Reviewer's guide (collapsed on small PRs)

Reviewer's Guide

Extends the existing GitHub Actions commitlint workflow’s AI co-author detection by expanding the blocked regex pattern with additional popular AI/model vendor names for both PR body and commit message checks.

Sequence diagram for updated AI co-author pattern check in commitlint workflow

sequenceDiagram
  actor Developer
  participant GitHub
  participant commitlint_workflow
  participant pr_body_check_step
  participant commit_msg_check_step
  participant grep

  Developer->>GitHub: open pull_request
  GitHub->>commitlint_workflow: trigger workflow

  commitlint_workflow->>pr_body_check_step: set BLOCKED_PATTERN
  pr_body_check_step->>grep: grep -Eiq BLOCKED_PATTERN on PR_BODY
  alt [PR body matches extended AI co-author pattern]
    pr_body_check_step-->>commitlint_workflow: nonzero exit (fail job)
  else [no match]
    pr_body_check_step-->>commitlint_workflow: success
  end

  commitlint_workflow->>commit_msg_check_step: set BLOCKED_PATTERN
  commit_msg_check_step->>commit_msg_check_step: git log commit messages
  commit_msg_check_step->>grep: grep -Eiq BLOCKED_PATTERN on COMMIT_MSGS
  alt [commit messages match extended AI co-author pattern]
    commit_msg_check_step-->>commitlint_workflow: nonzero exit (fail job)
  else [no match]
    commit_msg_check_step-->>commitlint_workflow: success
  end
Loading

File-Level Changes

Change Details Files
Broaden AI co-author regex to include additional AI/model vendor keywords for PR body and commit message checks.
  • Update the BLOCKED_PATTERN in the PR body validation step to append new AI/model-related name patterns (claude, anthropic, openai, chatgpt, gemini, deepseek, mistral, llama, codestral).
  • Update the BLOCKED_PATTERN in the commit message validation step with the same extended set of AI/model-related name patterns to keep behavior consistent across checks.
.github/workflows/commitlint.yml

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@deepin-ci-robot
Copy link
Copy Markdown
Contributor

deepin pr auto review

你好!我是CodeGeeX。我已经仔细审查了你提供的 Git Diff 内容。

这次修改的主要目的是在 GitHub Actions 的 CI 流程中,扩展对 AI 生成代码的 Co-authored-by 签名的拦截范围。虽然意图很好(防止未审查的 AI 代码合入),但当前的实现方式在正则逻辑代码质量代码性能上都存在明显的问题,甚至可能引入安全风险

以下是详细的审查意见和改进建议:

1. 语法与逻辑问题

  • 过度匹配(逻辑漏洞):正则表达式中使用了 .* 作为连接符,这会导致严重的误报。
    • Co-authored-by:.*ai 会匹配到任何包含 "ai" 的单词,例如 Co-authored-by: DaveCo-authored-by: email@mail.com
    • Co-authored-by:.*openai 会匹配到 Co-authored-by: openai_contributor,但也可能因为正则的回溯特性匹配到其他不相关的长字符串。
    • Co-authored-by:.*llama 会匹配到 Co-authored-by: llamas(如果是真的人名/昵称)。
  • 大小写敏感问题:虽然 grep -Eiq 中的 i 参数使得匹配忽略了大小写,这意味着 Claudeclaude 都会被拦截,这是符合预期的。但正则本身写死了小写,依赖 grep 的参数,逻辑上虽然通,但不够严谨。

2. 代码质量

  • 违反 DRY 原则BLOCKED_PATTERN 在两个不同的 Step 中被完全重复地定义了两次。如果未来需要增加新的 AI 名称,必须同时修改两处,极易遗漏。
  • 可读性极差:长长的单行正则表达式用 | 连接,在 Shell 脚本中非常难以阅读和维护。

3. 代码性能

  • 正则回溯开销grep -E 处理如此长且包含大量 .* 的正则表达式时,如果输入文本(如 commit message)较长,会导致极大的回溯开销,降低 CI 执行速度。
  • 多次重复编译正则grep 需要对每一行输入重新匹配这个复杂的正则,效率低下。

4. 代码安全

  • 命令注入风险:在第一个 Step 中,PR_BODY: ${{ github.event.pull_request.body }} 被直接注入到 run 脚本中。攻击者可以构造恶意的 PR Body(例如包含反引号 $(...)等 Shell 元字符),当echo "${PR_BODY}"` 执行时,可能会导致命令注入。
  • 变量未加引号保护:第二个 Step 中,COMMIT_MSGS=$(...) 获取内容后,在 echo "${COMMIT_MSGS}" 时虽然加了双引号,但如果 commit message 中包含特殊字符,依然可能破坏 grep 的执行逻辑。

改进建议与重构代码

为了解决上述问题,建议进行以下重构:

  1. .* 改为 \b 或更精确的边界:避免误匹配正常单词。
  2. 提取公共变量:将 BLOCKED_PATTERN 提取为步骤级别的环境变量。
  3. 优化正则结构:利用正则的分组功能,减少 Co-authored-by: 的重复书写。
  4. 修复安全隐患:使用文件传递 PR Body,避免命令注入。

重构后的代码:

      - name: check PR description for AI co-author pattern
        env:
          PR_BODY: ${{ github.event.pull_request.body }}
        run: |
          # 优化正则:使用分组,避免重复 Co-authored-by:
          # 使用 \b 词边界替代 .* 防止误匹配(如 Dave, email)
          BLOCKED_PATTERN='Co-authored-by:.*\b(ai|agent|copilot|llm|gpt|claude|anthropic|openai|chatgpt|gemini|deepseek|mistral|llama|codestral)\b'
          
          # 安全修复:将 PR Body 写入文件再 grep,避免 Shell 元字符导致的命令注入
          echo "${PR_BODY}" > pr_body.txt
          
          if grep -Eiq "${BLOCKED_PATTERN}" pr_body.txt; then
            echo "FAIL: PR description contains blocked co-author AI pattern." >&2
            echo "Blocked pattern: ${BLOCKED_PATTERN}" >&2
            exit 1
          fi
          echo "PASS: No blocked AI co-author patterns found in PR description."
          rm -f pr_body.txt

      - name: check commit messages for AI co-author pattern
        env:
          # 质量优化:将公共正则提取到顶层 env(或使用 GITHUB_ENV 跨步骤共享)
          BLOCKED_PATTERN: 'Co-authored-by:.*\b(ai|agent|copilot|llm|gpt|claude|anthropic|openai|chatgpt|gemini|deepseek|mistral|llama|codestral)\b'
        shell: bash
        run: |
          # 性能优化:直接将 git log 输出重定向到文件,避免变量赋值和 echo 时的解析风险
          git log --format=%B ${{ github.event.pull_request.base.sha }}..${{ github.event.pull_request.head.sha }} > commit_msgs.txt 2>/dev/null || true
          
          if [[ -s commit_msgs.txt ]] && grep -Eiq "${BLOCKED_PATTERN}" commit_msgs.txt; then
            echo "FAIL: Commit messages contain blocked co-author AI pattern." >&2
            echo "Blocked pattern: ${BLOCKED_PATTERN}" >&2
            exit 1
          fi
          echo "PASS: No blocked AI co-author patterns found in commit messages."
          rm -f commit_msgs.txt

改进点详细说明:

  1. 正则优化Co-authored-by:.*\b(ai|agent|...)\b
    • 使用 (a|b|c) 分组,使得正则引擎只需编译一次 Co-authored-by:.*,大幅提升匹配效率。
    • 添加 \b(单词边界),这样 .*\bai\b 只会匹配独立的 ai 单词,不会匹配到 Dave 中的 ai,彻底解决误报问题。
  2. 安全优化:不再使用 echo "${VAR}" | grep,而是将内容写入临时文件 pr_body.txtcommit_msgs.txt,然后使用 grep ... file。这彻底切断了外部输入数据与 Shell 解释器的接触,杜绝了命令注入的可能性。
  3. 性能优化git log 的结果直接通过 > 重定向到文件,避免了将大量文本存入 Shell 变量带来的内存开销和潜在的参数长度限制(ARG_MAX)。
  4. 可维护性:通过 envBLOCKED_PATTERN 声明在步骤级别,后续如果需要修改,只需修改一处。如果两个步骤在同一个 Job 中,甚至可以将其写到 Job 级别的 env 中,进一步减少重复。

Copy link
Copy Markdown

@sourcery-ai sourcery-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've left some high level feedback:

  • The BLOCKED_PATTERN regex is duplicated in two steps; consider factoring it into a single definition (e.g., an env var at the job level or a YAML anchor) to avoid divergence when updating the list in future.
  • The regex has become quite long and hard to read; consider breaking the alternatives onto multiple lines with a here-doc or using grouping (e.g., Co-authored-by:.*(ai|agent|copilot|...)) to make future edits less error-prone.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- The `BLOCKED_PATTERN` regex is duplicated in two steps; consider factoring it into a single definition (e.g., an env var at the job level or a YAML anchor) to avoid divergence when updating the list in future.
- The regex has become quite long and hard to read; consider breaking the alternatives onto multiple lines with a here-doc or using grouping (e.g., `Co-authored-by:.*(ai|agent|copilot|...)`) to make future edits less error-prone.

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@deepin-ci-robot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: BLumia, hudeng-go

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@BLumia BLumia merged commit 72756c8 into linuxdeepin:master Jun 5, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants