Add local code review subagent and multi-model PR review support by lukasmasuch · Pull Request #13855 · streamlit/streamlit

lukasmasuch · 2026-02-07T16:47:01Z

Describe your changes

Refactors our ai-pr-review workflow to:

Extracts the code review instructions into a reusable file
Adds a subagent for local code review (works for claude & cursor) reusing the instructions.
Updates the ai-pr-review workflow support running review with multiple models. The model reviews are run in parallel and consolidated by a judge: e.g.: gpt-5.2-codex-high,opus-4.6-thinking will first execute a review with gpt-5.2-codex-high and opus-4.6-thinking in parallel and use opus-4.6-thinking to consolidate both feedbacks into one. Example: Add local code review subagent and multi-model PR review support #13855 (comment)

Contribution License Agreement

By submitting this pull request you agree that all contributions to this project are made under the Apache 2.0 license.

snyk-io · 2026-02-07T16:47:10Z

✅ Snyk checks have passed. No issues have been found so far.

Status	Scanner	Critical	High	Medium	Low	Total (0)
✅	Open Source Security	0	0	0	0	0 issues
✅	Licenses	0	0	0	0	0 issues

💻 Catch issues earlier using the plugins for VS Code, JetBrains IDEs, Visual Studio, and Eclipse.

github-actions · 2026-02-07T16:47:18Z

✅ PR preview is ready!

Name	Link
📦 Wheel file	https://core-previews.s3-us-west-2.amazonaws.com/pr-13855/streamlit-1.54.0-py3-none-any.whl
📦 `@streamlit/component-v2-lib`	Download from artifacts
🕹️ Preview app	pr-13855.streamlit.app (☁️ Deploy here if not accessible)

github-actions · 2026-02-07T16:51:34Z

Summary

This PR refactors the AI code review infrastructure by extracting shared code review instructions into a single source of truth (scripts/assets/code-review-instructions.md) and adding a new Claude Code agent (.claude/agents/reviewing-local-changes.md) for performing local code reviews. The GitHub Actions workflow (ai-pr-review.yml) is updated to reference the shared instructions via $(cat ...) instead of inlining them, and a sync mechanism is added to scripts/generate_agent_rules.py to keep the Claude agent file consistent with the shared source.

Changed files:

.claude/.gitignore — Allow the new agents/ directory
.claude/agents/reviewing-local-changes.md — New Claude Code agent for local code reviews
.claude/skills/fixing-streamlit-ci/SKILL.md — Minor fix: gh pr diff --stat → gh pr diff
.github/workflows/ai-pr-review.yml — Refactored to use shared instructions file
scripts/assets/code-review-instructions.md — New shared code review instructions (single source of truth)
scripts/generate_agent_rules.py — New sync_code_review_instructions() function

Code Quality

Well-structured refactoring. The shared instructions approach follows the DRY principle and is consistent with the existing pattern in the codebase (e.g., AGENTS.md files as source of truth for Cursor rules and GitHub Copilot instructions).

scripts/generate_agent_rules.py (lines 127–185): The new sync_code_review_instructions() function is clean and well-documented:

Proper error handling with FileNotFoundError and ValueError for missing files or missing headers.
The sync logic (find ## Review Checklist header, replace everything from that point onwards) is straightforward and robust.
Uses Final type annotations for module-level constants, consistent with the rest of the file.
The function follows the same patterns as the existing generate_agent_rules() function.

.claude/agents/reviewing-local-changes.md: The agent configuration is well-designed:

readonly: true and disallowedTools: Write, Edit correctly enforce read-only behavior for a review agent.
The instructions are thorough, including guidance on using both git and gh to gather context.
The dynamic head branch via !git branch --show-current`` is a nice touch for context.

.github/workflows/ai-pr-review.yml: The refactored prompt construction is cleaner. Using PROMPT="..." as a variable before passing to cursor-agent -p "$PROMPT" improves readability. The $(cat scripts/assets/code-review-instructions.md) command substitution is safe — the output of $() substitution is not re-parsed for further expansions (backticks in the markdown won't trigger command substitution).

Minor note: The review instructions now reference AGENTS.md files (e.g., e2e_playwright/AGENTS.md, frontend/AGENTS.md) instead of the old .cursor/rules/*.mdc paths. This is an improvement since AGENTS.md files are the canonical source of truth from which Cursor rules are generated.

One observation on scripts/generate_agent_rules.py: the file does not use from __future__ import annotations as recommended by lib/AGENTS.md for Python files. However, this was already the case before this PR (the file uses list[str] which is valid in Python 3.10+ without the import), so this is not a regression.

Test Coverage

No new tests are added. This is appropriate — the changes are entirely to CI tooling, agent configuration, and build-time scripts. These are not part of the Streamlit library itself, and the sync mechanism is a development-time utility invoked manually or via scripts/generate_agent_rules.py. The existing CI pipeline (which runs this workflow) serves as an integration test.

Backwards Compatibility

No backwards compatibility concerns. The changes affect only internal CI tooling and developer-facing agent configuration. No library code, public API, protobuf definitions, or user-facing behavior is modified.

Security & Risk

Low risk. The changes are scoped to CI configuration and developer tooling.

The Claude agent is correctly configured as read-only (readonly: true, disallowedTools: Write, Edit).
The workflow maintains the same permission model: the AI review job has read-only permissions (contents: read, pull-requests: read), and only the separate post-review job has write access (no agent runs in that job).
The $(cat scripts/assets/code-review-instructions.md) approach is safe for shell expansion — the markdown file contains backticks and other formatting characters, but these are not re-interpreted after command substitution.

Accessibility

No frontend changes — not applicable.

Recommendations

Consider adding the workflow's ai-pr-review.yml prompt template to the sync targets. Currently, only .claude/agents/reviewing-local-changes.md is synced from the shared source. The workflow prompt inlines the file via $(cat ...) at runtime, which is effective but means there are two different mechanisms for consuming the same content. This is fine for now but worth noting if more consumers are added in the future.
Minor: The PR description is empty. While the code changes are self-explanatory, a brief description in the PR body would help reviewers and future reference (e.g., "Extracts shared code review instructions into a single source of truth and adds a Claude Code agent for local reviews").

Verdict

APPROVED: Clean refactoring that extracts shared code review instructions into a single source of truth, adds a well-configured local review agent, and improves maintainability of the AI review infrastructure with no risk to library functionality.

This is an automated AI review. Please verify the feedback and use your judgment.

Model: opus-4.6-thinking.

Copilot

Pull request overview

Adds a reusable “code review instructions” document and wires it into both the GitHub Actions AI PR review workflow and a new local .claude review agent, with a small generator script helper to keep instructions in sync.

Changes:

Introduce scripts/assets/code-review-instructions.md as the shared review prompt content.
Update the AI PR review workflow to embed the shared instructions into the agent prompt.
Add a new .claude/agents/reviewing-local-changes.md agent and a generator sync step to keep its checklist section aligned.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
`scripts/generate_agent_rules.py`	Adds a sync routine to propagate shared review instructions into one or more `.claude` agent files.
`scripts/assets/code-review-instructions.md`	New shared, canonical code review checklist/instructions used by automation and local agent docs.
`.github/workflows/ai-pr-review.yml`	Refactors the prompt to include the shared instructions file and updates guidance/examples.
`.claude/skills/fixing-streamlit-ci/SKILL.md`	Adjusts the “gather context” section to prefer full diffs over `--stat`.
`.claude/agents/reviewing-local-changes.md`	Adds a local read-only code review agent definition and instructions.
`.claude/.gitignore`	Un-ignores the new `.claude/agents/` content so it can be committed.

- Change default model to multi-model: gpt-5.2-codex-high,opus-4.6-thinking - Fix concurrency group to use direct input reference instead of env variable - Add placeholder test and comment in alert module

lukasmasuch · 2026-02-07T17:16:20Z

+
+Review this branch's changes and ensure the changes are bug-free, backwards compatible, and ready for merge.
+
+## Review Checklist


This section below is auto-synced with the code-review-instructions and shared between the ai-pr-review job

1. Guard against empty/whitespace model input in parse step - Check if MODEL is empty/whitespace and fall back to default - Verify at least one model after parsing, fall back if needed 2. Add checkout to consolidate-reviews job - Now gh/git are actually available as the prompt states - Added GH_TOKEN env for gh commands 3. Use environment variables in consolidate-reviews job - Added MODELS_JSON and JUDGE_MODEL as env variables - Use shell variables instead of ${{ }} for security consistency

1. Fix JSON quotes breaking sed: - Use awk with ENVIRON instead of sed for placeholder substitution - ENVIRON safely accesses env vars without shell parsing issues - JSON like ["a","b"] no longer breaks the substitution command 2. Improve fallback documentation: - Clarify that judge model review is used as deterministic fallback - Explain this prevents random selection from multiple reviews

github-actions · 2026-02-07T18:29:13Z

Summary

This PR adds multi-model AI code review support to the CI pipeline and introduces a local code review sub-agent for Claude. The key changes:

Multi-model PR review workflow (.github/workflows/ai-pr-review.yml): Refactored from a single-model to a matrix-based multi-model review pipeline with four jobs: parse-models → ai-pr-review (matrix) → consolidate-reviews → ai-pr-post-review. Models run in parallel, and a "judge" model consolidates the results.
Local code review agent (.claude/agents/reviewing-local-changes.md): A new read-only Claude agent for performing code reviews on the current branch locally.
Shared review instructions (scripts/assets/code-review-instructions.md): A single source of truth for the review checklist, synced into both the local agent and the CI prompt.
Sync tooling (scripts/generate_agent_rules.py): New sync_code_review_instructions() function to keep the shared instructions in sync across targets, integrated into the pre-commit hook.
Minor fix to .claude/skills/fixing-streamlit-ci/SKILL.md: Changed gh pr diff --stat to gh pr diff for fetching full changes.

Code Quality

Reviewers agree: The overall structure and separation of responsibilities are well-designed. The four-job pipeline is clean, with clear job boundaries and comprehensive fallback logic (consolidated review → judge model's individual review → failure notice).

Security practices are solid (both reviewers concur): User-provided model names are handled through shell variables rather than ${{ }} expression interpolation, preventing script injection. The consolidation prompt uses a quoted heredoc and awk with ENVIRON for safe placeholder substitution. Read-only permissions are correctly scoped to the review jobs, with write access confined to the post-review job that runs no AI agents.

Duplicated sanitization logic (raised by opus-4.6-thinking): The model-name sanitization sed pipeline (sed 's/[^a-zA-Z0-9]/-/g' | sed 's/--*/-/g' | sed 's/^-//;s/-$//') appears in both the parse-models job (line 72) and the ai-pr-review sanitize step (line 117). While a comment acknowledges this, if these ever diverge, the fallback artifact download in ai-pr-post-review would silently fail. This is a minor maintenance risk, not blocking.

"No write operations" instruction clarity (raised by gpt-5.2-codex-high): The shared instructions say "Do NOT attempt to post comments, edit PRs, or perform any write operations" while the CI workflow appends a "Finalize Review" section that asks agents to write review.md and verdict.txt. Upon verification, this is not a real contradiction — the shared instructions are designed for the local agent (which has readonly: true and disallowedTools: Write, Edit), and the CI workflow explicitly overrides this by appending the finalize section. However, a clarifying comment could prevent future confusion.

Python code (generate_agent_rules.py): The sync_code_review_instructions() function is clean, follows existing patterns in the file, has proper error handling, and is correctly wired into both __main__ and the pre-commit trigger.

Test Coverage

Both reviewers agree: No automated tests are needed. These are CI infrastructure and agent configuration changes, not user-facing library code. The workflow itself will be validated by running it on actual PRs, and the pre-commit hook (generate-agent-rules) serves as an integration test for the sync function. The new file pattern for scripts/assets/code-review-instructions.md is correctly added to .pre-commit-config.yaml.

Backwards Compatibility

Both reviewers agree: No breaking changes. Single-model usage still works — when a single model is specified, the consolidation step is correctly skipped via the model_count > 1 guard. The default model changed from "opus-4.6-thinking" to "gpt-5.2-codex-high,opus-4.6-thinking", but existing manual triggers specifying a single model continue to work since the comma-separated parsing handles single values.

Security & Risk

Both reviewers agree: Low risk. The workflow remains read-only for the agent jobs, the local agent has defense-in-depth with readonly: true and disallowedTools, and no new privileged operations are introduced. No library code, runtime behavior, or user-facing functionality is affected.

Accessibility

No frontend changes in this PR. Not applicable.

Recommendations

PR description: The PR body uses the default template with no actual description filled in. Consider adding a description of the changes and motivation for multi-model review support for future reference.
Consider extracting the sanitization logic: The duplicated sed pipeline between parse-models and the ai-pr-review sanitize step is a maintenance risk. At minimum, a cross-referencing comment in both locations would help. Alternatively, consider extracting it to a shared shell function or script.
Minor: Clarify write-operation guidance: While not technically contradictory (the local agent truly is read-only, and the CI prompt appends an explicit override), consider adding a brief note in the shared instructions clarifying that the "no write operations" rule refers to external operations (posting comments, editing PRs) and that writing local files when instructed by the workflow is expected.

Verdict

APPROVED: Well-engineered infrastructure change that adds multi-model AI review support with proper security practices, comprehensive fallback handling, and a clean single-source-of-truth pattern for review instructions. Both reviewers approved with only minor non-blocking suggestions.

This is a consolidated AI review by opus-4.6-thinking. Please verify the feedback and use your judgment.

📋 Review by `gpt-5.2-codex-high`

Summary

Adds multi-model AI PR review support (model parsing, per-model review artifacts, consolidation), introduces a shared code review instructions source synced into the local agent, and updates pre-commit and .claude ignore rules accordingly.

Code Quality

Overall structure and separation of responsibilities look solid. One clarity issue remains: the shared instructions explicitly forbid any write operations while the workflow later requires writing review.md and verdict.txt, which can cause some models to refuse output. Consider aligning the instructions to avoid contradictory guidance.

- Do NOT run linting, tests, or build commands - focus only on code review.
- Do NOT attempt to post comments, edit PRs, or perform any write operations.
- Focus on the root cause of issues, not cascading failures.
- Be specific with file references and line numbers when noting issues.

          $(cat scripts/assets/code-review-instructions.md)

          ## Finalize Review:

          1. In the review footer, replace 'This is an automated AI review.' with 'This is an automated AI review by `$MATRIX_MODEL`.'
          2. Write your review into a file named 'review.md' in the current directory.
          3. Write either 'APPROVED' or 'CHANGES_REQUESTED' (exactly one of these) to a file named 'verdict.txt'.

Test Coverage

No tests added. These are workflow and tooling changes; validation will rely on running the updated workflow in CI.

Backwards Compatibility

No user-facing runtime behavior changes. Existing AI review behavior should remain compatible, with added support for multiple models and consolidation.

Security & Risk

Low risk. The workflow remains read-only for the agent job and only aggregates review outputs; no new privileged operations are introduced.

Accessibility

No frontend changes.

Recommendations

Clarify the "no write operations" rule in the shared instructions to explicitly allow writing the local review.md/verdict.txt artifacts required by the workflow.

Verdict

APPROVED: Changes are well-scoped and improve the AI review pipeline with minimal operational risk.

This is an automated AI review by gpt-5.2-codex-high. Please verify the feedback and use your judgment.

📋 Review by `opus-4.6-thinking`

Summary

This PR introduces multi-model AI code review support for the CI workflow and adds a local code review sub-agent for Claude. The key changes are:

Multi-model PR review workflow (.github/workflows/ai-pr-review.yml): Refactored from a single-model to a matrix-based multi-model review pipeline. Models run in parallel, and a consolidation step merges reviews using a "judge" model.
Local code review agent (.claude/agents/reviewing-local-changes.md): A new read-only Claude agent for performing code reviews on the current branch locally.
Shared review instructions (scripts/assets/code-review-instructions.md): A single source of truth for the review checklist/instructions, synced into both the local agent and the CI prompt.
Sync tooling (scripts/generate_agent_rules.py): New sync_code_review_instructions() function to keep the shared instructions in sync across targets.
Minor fix to .claude/skills/fixing-streamlit-ci/SKILL.md: Changed gh pr diff --stat to gh pr diff for fetching full changes.

Code Quality

Workflow structure (ai-pr-review.yml): The four-job pipeline (parse-models → ai-pr-review → consolidate-reviews → ai-pr-post-review) is well-structured with clear separation of concerns. The fallback logic in the post-review job (consolidated → judge model's individual review → failure notice) is comprehensive and handles edge cases correctly.

Security: The workflow properly avoids script injection by using shell variables ($MODEL, $MATRIX_MODEL) instead of ${{ }} expressions for user-controllable inputs. The consolidation prompt is written to a file via quoted heredoc (<< 'PROMPT_EOF') and uses awk with ENVIRON for safe placeholder substitution. These are good practices.

Python code (generate_agent_rules.py, lines 140–185): The sync_code_review_instructions() function is clean, follows existing patterns in the file, has proper error handling with informative messages, and uses appropriate file I/O.

Duplication of sanitization logic: The model-name sanitization (sed 's/[^a-zA-Z0-9]/-/g' | sed 's/--*/-/g' | sed 's/^-//;s/-$//') appears in two places — the parse-models job (line 72) and the ai-pr-review sanitize step (line 117). If these ever diverge, the fallback artifact download in ai-pr-post-review would silently fail. The comment on line 71 ("same logic as sanitize step") acknowledges this, but it's still a maintenance risk.

Test Coverage

The changes are to CI infrastructure, agent configuration, and a development tooling script — none of which are user-facing library code. Automated testing is not standard for these types of changes:

The workflow itself will be validated by running it on actual PRs.
The pre-commit hook (generate-agent-rules) serves as an integration test for the sync function, ensuring the shared instructions stay in sync when modified.
The new file pattern for scripts/assets/code-review-instructions.md is correctly added to the pre-commit trigger in .pre-commit-config.yaml (line 79).

No additional unit or e2e tests are needed.

Backwards Compatibility

No breaking changes. The workflow maintains full backwards compatibility:

Single-model usage still works: When a single model is specified, the consolidation step is correctly skipped (model_count > 1 guard on line 211), and the post-review job falls back to downloading that single model's artifact.
Default model change: The default changed from "opus-4.6-thinking" to "gpt-5.2-codex-high,opus-4.6-thinking". Existing manual triggers specifying a single model will continue to work since the comma-separated parsing handles single values.
Label trigger preserved: The ai-review label trigger continues to work as before.

Security & Risk

Input handling: User-provided model names (via workflow_dispatch) are handled safely through shell variables rather than expression interpolation, preventing script injection.
Read-only permissions: The review jobs correctly use read-only permissions (contents: read, pull-requests: read). Write access is confined to the post-review job which does not run any AI agents.
Local agent safety: The Claude agent is configured with readonly: true and disallowedTools: Write, Edit, providing defense-in-depth.
Low risk: These changes only affect the AI review infrastructure and do not touch any library code, runtime behavior, or user-facing functionality.

Accessibility

No frontend changes in this PR. Not applicable.

Recommendations

PR description: The PR body is empty — consider adding a description of the changes and motivation for multi-model review support for future reference.
Consider extracting the sanitization logic: The model-name sanitization sed pipeline is duplicated between parse-models (line 72) and ai-pr-review (line 117). If feasible, consider referencing the pre-computed judge_model_id from parse-models outputs in the fallback, which is already done, but ensure any future model-ID usage also reuses the same source. A comment cross-referencing the two locations would help.
Consolidation prompt leading whitespace: While YAML block scalars strip the common indentation, it's worth verifying that the heredoc content in the consolidation step (lines 285–336) renders cleanly in prompt.txt. If the YAML indentation ever changes asymmetrically, the prompt formatting could break subtly.

Verdict

APPROVED: Well-engineered infrastructure change that adds multi-model AI review support with proper security practices, comprehensive fallback handling, and a clean single-source-of-truth pattern for review instructions.

This is an automated AI review by opus-4.6-thinking. Please verify the feedback and use your judgment.

- Use fromJson() for explicit type coercion in model_count comparison - Document judge model selection behavior in workflow input description - Consolidate sed pipelines for cleaner shell commands - Simplify file truncation and chain awk/mv commands

github-actions · 2026-02-07T18:39:47Z

Summary

This PR enhances the AI PR review infrastructure with three main capabilities:

Multi-model PR reviews: The GitHub Actions workflow (ai-pr-review.yml) is refactored from a single-model review to a matrix-based approach that runs reviews with multiple AI models in parallel, then consolidates them into a single unified review using a "judge" model (the last model in the comma-separated list).
Local code review subagent: A new Claude Code subagent (.claude/agents/reviewing-local-changes.md) enables developers to perform code reviews locally using the same review checklist and output format as the CI workflow.
Shared review instructions: Code review instructions are extracted into a single source of truth (scripts/assets/code-review-instructions.md) and synced into target files via scripts/generate_agent_rules.py, ensuring consistency between the CI workflow prompt and the local subagent.

Supporting changes include updates to .claude/.gitignore, .pre-commit-config.yaml, and a minor fix in the CI skill documentation.

Code Quality

Workflow structure (ai-pr-review.yml): The workflow is well-organized into four clear jobs with appropriate dependency chains: parse-models → ai-pr-review (matrix) → consolidate-reviews → ai-pr-post-review. Each job has properly scoped permissions (read-only for review jobs, write only for posting). Both reviewers agreed on the clean organization.

Security-conscious shell scripting: The workflow correctly uses environment variables instead of direct ${{ }} expression interpolation in shell scripts (e.g., MATRIX_MODEL: ${{ matrix.model }} then "$MATRIX_MODEL" in shell). The consolidation prompt uses a single-quoted heredoc (<< 'PROMPT_EOF') to prevent shell expansion, then safely substitutes placeholders via awk with ENVIRON to handle special characters. Both reviewers noted this as a robust pattern.

Sanitization logic duplication (minor, both reviewers noted): The model name sanitization (sed 's/[^a-zA-Z0-9]/-/g; s/--*/-/g; s/^-//; s/-$//') is duplicated in two places — the parse-models job (line 72) for judge_model_id and the ai-pr-review job sanitize step (line 117) for model_id. A comment cross-referencing both locations, or extracting to a reusable step, would reduce maintenance risk.

Python script (generate_agent_rules.py): The sync_code_review_instructions() function follows established patterns in the file. Error handling is thorough with descriptive error messages for missing files and missing section headers.

Subagent config (.claude/agents/reviewing-local-changes.md): Appropriately configured with readonly: true and disallowedTools: Write, Edit to prevent accidental modifications during review.

Test Coverage

No tests added. Both reviewers agree this is reasonable given the nature of the changes (workflow configuration, agent prompts, build script tooling). The pre-commit hook integration provides a form of validation by running the sync on every commit that touches the relevant files.

Backwards Compatibility

No breaking changes. Both reviewers agree:

The workflow default model changes from "opus-4.6-thinking" to "gpt-5.2-codex-high,opus-4.6-thinking", which means existing label-triggered reviews will now use two models instead of one. This is behavioral but not breaking — users can still pass a single model via workflow_dispatch.
Artifact naming changes from review-output to review-output-{model_id} / review-output-consolidated, but artifacts are ephemeral and not consumed by external systems.
No changes to the Streamlit library, frontend, or user-facing APIs.

Security & Risk

Strengths (both reviewers agreed):

Separation of read-only review jobs from the write-access posting job is maintained.
Model names from workflow_dispatch input are properly sanitized for artifact naming and quoted for shell use.
The consolidation prompt avoids shell injection by using literal heredoc + awk with ENVIRON.
persist-credentials: false is used in both the review and consolidation job checkouts.

Model parsing robustness: The gpt-5.2-codex-high reviewer noted that a malformed model input like "," could cause the pipeline to error before reaching the fallback. Upon verification, the code already guards against this: after the grep -v '^$' | jq pipeline, the MODEL_COUNT check on line 63 catches zero-model cases and falls back to defaults. The pipefail concern is valid in theory (since grep returns exit code 1 when no lines match), but the workflow does not explicitly set pipefail — bash defaults in GitHub Actions do have pipefail enabled via set -eo pipefail. However, the practical impact is extremely low since the MODEL env var would need to be a degenerate value like pure commas, and the empty/whitespace guard on line 52 already handles the most common failure modes.

Risk assessment: Low. Changes are isolated to CI tooling and don't affect the Streamlit library runtime.

Accessibility

No frontend changes — not applicable.

Recommendations

Consider extracting the sanitization logic into a reusable shell function or at minimum add a cross-reference comment between the parse-models job (line 72) and the sanitize step (line 117) to keep them in sync. Both reviewers flagged this duplication.
Harden model parsing against degenerate input: While the zero-model fallback exists, the grep -v '^$' step could fail the pipeline under pipefail before reaching the guard. A simple fix: replace grep -v '^$' with { grep -v '^$' || true; } on line 59, or pipe through sed '/^$/d' instead (which doesn't fail on empty input).
Minor: heredoc indentation — The consolidation prompt heredoc content (lines 286-336) includes leading whitespace from the YAML run block indentation. This is functionally harmless (the AI model handles it fine), but using <<- with tab indentation could produce cleaner prompt output if desired. No action needed.

Reviewer Agreement Summary

Topic	gpt-5.2-codex-high	opus-4.6-thinking	Consensus
Overall verdict	APPROVED	APPROVED	Unanimous APPROVED
Code quality	Good	Good	Agreed
Test coverage	Acceptable (none needed)	Acceptable (none needed)	Agreed
Backwards compat	No breaking changes	No breaking changes	Agreed
Security	Low risk	Low risk, well-handled	Agreed
Sanitization duplication	Not flagged	Flagged as minor	Consolidated: valid observation
Model parsing robustness	Flagged as concern	Not flagged	Consolidated: valid but low-impact
`persist-credentials`	Not flagged	Flagged as missing	Resolved: already present in code (line 227) — reviewer error

Note on disagreement: The opus reviewer recommended adding persist-credentials: false to the consolidation job's checkout step (lines 225-227), claiming it was missing. Upon verification, persist-credentials: false is already present at line 227. This recommendation can be disregarded.

Verdict

APPROVED: Well-structured infrastructure enhancement that adds multi-model review support with proper security practices, graceful fallback handling, and a clean DRY approach for shared review instructions. The two minor robustness recommendations are non-blocking. No risk to the Streamlit library or user-facing functionality.

This is a consolidated AI review by opus-4.6-thinking (judge model), synthesizing reviews from 2/2 expected models. All expected models completed their reviews successfully. Please verify the feedback and use your judgment.

📋 Review by `gpt-5.2-codex-high`

Summary

Adds a local code review agent, introduces shared review instructions with a sync step, and upgrades the AI PR review workflow to support multi-model reviews with consolidation and a judge fallback.

Code Quality

Workflow and script changes are organized and readable. One edge-case to consider: the model parsing pipeline uses grep -v '^$' with pipefail; an input like "," yields no lines and causes the step to error before the intended fallback (lines 51-65 in .github/workflows/ai-pr-review.yml).

Test Coverage

No tests added; changes are workflow/config and a small Python utility, so this is reasonable. Validation will come from running the workflow and pre-commit hook.

Backwards Compatibility

No user-facing or runtime API changes; impacts are limited to tooling and CI behavior.

Security & Risk

Low risk overall. The only concern is a workflow failure on malformed model input as noted above.

Accessibility

No frontend changes.

Recommendations

Harden model parsing to tolerate comma-only or empty entries without failing the step (e.g., avoid grep failure or guard the pipeline) in .github/workflows/ai-pr-review.yml lines 51-65.

Verdict

APPROVED: Solid workflow/tooling improvements with a minor robustness tweak recommended.

This is an automated AI review by gpt-5.2-codex-high. Please verify the feedback and use your judgment.

📋 Review by `opus-4.6-thinking`

Summary

This PR enhances the AI PR review infrastructure with two main capabilities:

Multi-model PR reviews: The GitHub Actions workflow (ai-pr-review.yml) is refactored from a single-model review to a matrix-based approach that runs reviews with multiple AI models in parallel, then consolidates them into a single unified review using a "judge" model (the last model in the comma-separated list).
Local code review subagent: A new Claude Code subagent (.claude/agents/reviewing-local-changes.md) enables developers to perform code reviews locally using the same review checklist and output format as the CI workflow.
Shared review instructions: Code review instructions are extracted into a single source of truth (scripts/assets/code-review-instructions.md) and synced into target files via scripts/generate_agent_rules.py, ensuring consistency between the CI workflow prompt and the local subagent.

Supporting changes include updates to .claude/.gitignore, .pre-commit-config.yaml, and a minor fix in the CI skill documentation.

Code Quality

Workflow structure (ai-pr-review.yml): The workflow is well-organized into four clear jobs with appropriate dependency chains: parse-models -> ai-pr-review (matrix) -> consolidate-reviews -> ai-pr-post-review. Each job has properly scoped permissions (read-only for review jobs, write only for posting).

Security-conscious shell scripting: The workflow correctly uses environment variables instead of direct ${{ }} expression interpolation in shell scripts (e.g., MATRIX_MODEL: ${{ matrix.model }} then "$MATRIX_MODEL" in shell). The consolidation prompt uses a single-quoted heredoc (<< 'PROMPT_EOF') to prevent shell expansion, then safely substitutes placeholders via awk with ENVIRON to handle special characters like JSON double quotes. This is a robust pattern.

Sanitization logic duplication (minor): The model name sanitization (sed 's/[^a-zA-Z0-9]/-/g; s/--*/-/g; s/^-//; s/-$//') is duplicated in two places — the parse-models job (line 72) for judge_model_id and the ai-pr-review job sanitize step (line 117) for model_id. If the sanitization logic ever changes, both need to be updated in sync.

Python script (generate_agent_rules.py): The sync_code_review_instructions() function follows the established patterns in the file. Error handling is thorough with descriptive error messages for missing files and missing section headers. The function is properly integrated into the __main__ block and the pre-commit hook.

Subagent config (.claude/agents/reviewing-local-changes.md): Appropriately configured with readonly: true and disallowedTools: Write, Edit to prevent accidental modifications during review. The model: inherit setting allows flexibility.

Test Coverage

This PR modifies CI/CD infrastructure, agent configuration, and a build script — none of which have existing unit tests in the repository. The scripts/generate_agent_rules.py file has no test file currently, so the new sync_code_review_instructions() function follows the same (untested) pattern.

Given the nature of these changes (workflow configuration, agent prompts, script tooling), the lack of automated tests is acceptable. The pre-commit hook integration provides a form of validation by running the sync on every commit that touches the relevant files.

Backwards Compatibility

No breaking changes. The changes are purely to CI/CD infrastructure and developer tooling:

The workflow default model changes from "opus-4.6-thinking" to "gpt-5.2-codex-high,opus-4.6-thinking", which means existing label-triggered reviews will now use two models instead of one. This is a behavioral change but not a breaking one — users can still pass a single model via workflow_dispatch.
The artifact naming changes from review-output to review-output-{model_id} / review-output-consolidated, but artifacts are ephemeral and not consumed by external systems.
No changes to the Streamlit library, frontend, or user-facing APIs.

Security & Risk

Strengths:

The separation of read-only review jobs from the write-access posting job is maintained.
Model names from workflow_dispatch input are properly sanitized for artifact naming and quoted for shell use.
The consolidation prompt avoids shell injection by using literal heredoc + awk with ENVIRON.
persist-credentials: false is used in the review job checkout.

Minor observations:

The $(cat scripts/assets/code-review-instructions.md) command substitution in the review prompt (line 188) is safe — bash doesn't re-interpret backticks or $ in command substitution output within double quotes.
The consolidation job's actions/checkout@v6 (line 225) does not specify persist-credentials: false, unlike the review job checkout (line 137). This is low-risk since the consolidation job only has contents: read permission, but adding it for consistency would be good practice.

Risk assessment: Low. The changes are isolated to CI tooling and don't affect the Streamlit library runtime.

Accessibility

No frontend changes — not applicable.

Recommendations

Consider adding persist-credentials: false to the consolidation job's checkout step (line 225-227 of ai-pr-review.yml) for consistency with the review job's checkout, even though it has read-only permissions.
Consider extracting the sanitization logic into a reusable shell function or step to avoid the duplication between parse-models (line 72) and the sanitize step (line 117). If the sanitization regex changes, both locations must be updated. A simple approach would be to add a comment cross-referencing the two locations.
Minor: heredoc indentation — The consolidation prompt heredoc content (lines 286-336) includes leading whitespace from the YAML run block indentation. Since it uses << rather than <<-, these spaces are preserved in the prompt file. This is functionally harmless (the AI model handles it fine), and is consistent with the existing prompt pattern. No action needed, but <<- with tab indentation could make the output cleaner if desired.

Verdict

APPROVED: Well-structured infrastructure enhancement that adds multi-model review support with proper security practices, graceful fallback handling, and a clean DRY approach for shared review instructions. No risk to the Streamlit library or user-facing functionality.

This is an automated AI review by opus-4.6-thinking. Please verify the feedback and use your judgment.

Multi-model support remains available via comma-separated input.

github-actions · 2026-02-07T19:20:47Z

Summary

This PR adds multi-model support to the AI PR review workflow and introduces a local Claude code review subagent. The main changes are:

Multi-model reviews: The ai-pr-review.yml workflow now accepts a comma-separated list of models, runs reviews in parallel via a matrix strategy, and consolidates results using a "judge" model.
Shared review instructions: A new scripts/assets/code-review-instructions.md file serves as the single source of truth for review instructions, synced into both the workflow prompt and the Claude agent file.
Local code review agent: A new .claude/agents/reviewing-local-changes.md file enables local code review via Claude Code.
Tooling updates: scripts/generate_agent_rules.py gains a sync_code_review_instructions() function, and the pre-commit hook is updated to trigger on changes to the shared instructions file.

Code Quality

The code is well-structured and follows existing patterns in the repository. Specific observations:

Good security practice in the workflow: User-controlled inputs (model names) are consistently passed through shell environment variables ($MODEL, $MATRIX_MODEL, $JUDGE_MODEL) rather than ${{ }} interpolation, preventing command injection. This is called out with clear comments (e.g., .github/workflows/ai-pr-review.yml lines 58, 116, 152).
DRY principle applied well: The shared scripts/assets/code-review-instructions.md eliminates duplication between the workflow prompt and the Claude agent. The review instructions now reference AGENTS.md files (the canonical source) instead of .cursor/rules/*.mdc files, which is an improvement over the previous version.
Duplicated sanitization logic: The model name sanitization (sed 's/[^a-zA-Z0-9]/-/g; s/--*/-/g; s/^-//; s/-$//') appears in both the parse-models job (line 72) and the ai-pr-review job sanitize step (line 117). While the comment on line 71 notes "same logic as sanitize step," divergence between these two would cause artifact name mismatches and break the fallback download in the post-review job. Consider extracting this into a reusable composite action or at minimum adding a stronger cross-reference comment.
Heredoc indentation: The consolidation prompt heredoc (lines 285-336) uses << 'PROMPT_EOF' which preserves the ~10 spaces of YAML indentation. This produces a prompt with leading whitespace on every line. While unlikely to cause functional issues (the AI model will handle it), using a quoted PROMPT_EOF marker at column 0 or writing via a script would be cleaner. This is consistent with how the single-model prompt handles indentation, so it's not a regression.
Python code quality: The sync_code_review_instructions() function in scripts/generate_agent_rules.py follows the existing patterns well — proper error handling with FileNotFoundError/ValueError, explicit encoding, and clean string manipulation. Type annotations via Final are consistent with the rest of the file.

Test Coverage

This PR contains no unit tests or e2e tests. Given the nature of the changes (CI workflow configuration, agent definition files, and a build-time sync script), this is acceptable:

The workflow changes are best validated by running the workflow itself (which would happen when the ai-review label is applied or the workflow is manually dispatched).
The sync_code_review_instructions() function is invoked by the pre-commit hook (generate-agent-rules), providing implicit validation that the sync works correctly on every commit that touches the relevant files.
The Claude agent file (.claude/agents/reviewing-local-changes.md) is configuration, not executable code.

One improvement would be adding a simple unit test for sync_code_review_instructions() to validate edge cases (missing header, missing files), but this is optional for a build script.

Backwards Compatibility

This change is fully backwards compatible:

Default behavior preserved: When a single model is provided (the default opus-4.6-thinking), the workflow behaves as before — the consolidation job is skipped (model_count <= 1), and the post-review job falls back to downloading the single model's review artifact.
Input interface compatible: The model input still accepts a single string. Comma-separated values are a superset of the previous behavior.
No library/API changes: All changes are in CI tooling, agent configuration, and build scripts. No user-facing Streamlit library code is affected.

Security & Risk

Input sanitization is solid: The model name sanitization prevents artifact name injection. Shell variable usage (instead of ${{ }} expression interpolation) prevents script injection from user-provided model names. The comment on line 58 explicitly calls out this security measure.
Permissions are appropriately scoped: The ai-pr-review and consolidate-reviews jobs maintain read-only permissions (contents: read, pull-requests: read, issues: read). Only ai-pr-post-review has pull-requests: write, and it runs no AI agents.
The consolidation agent has CURSOR_API_KEY access (line 253), which is expected since it needs to call the Cursor API for the judge model.
Low risk of regression: The single-model path has a clean fallback (download judge model's artifact directly). If consolidation fails, the workflow degrades gracefully to posting the judge model's individual review.
Minor risk — consolidation checkout: The consolidate-reviews job checks out the repo (line 225) without specifying the PR's head SHA or fetching full history. If the consolidation agent attempts to use git to analyze PR changes (as the prompt suggests is possible on line 293), it would be looking at the wrong ref. This is unlikely to cause issues in practice since the agent's primary task is reading all_reviews.md, but consider either removing the git/gh mention from the consolidation prompt or checking out the correct ref.

Accessibility

No frontend changes are included in this PR. No accessibility considerations apply.

Recommendations

Consider checking out the PR branch in the consolidation job: If the consolidation agent might use git or gh to verify reviewer claims, it should have access to the correct code. Either checkout the PR's head SHA (like the review job does) or remove the "git/gh available" statement from the consolidation prompt to avoid misleading the agent.
Add artifact upload if: always() guard in the review job: Currently, if cursor-agent fails mid-review (e.g., timeout), any partial review.md won't be uploaded. Adding if: always() to the upload step (line 199) with a file existence check would allow partial reviews to be captured, improving debuggability.
Add a brief note about the consolidation cost: Running N models + 1 judge model means N+1 API calls. The PR description should note this so maintainers understand the cost implications of multi-model reviews.
PR description is empty: The body contains only the template placeholders. Consider documenting the design decisions (e.g., why the last model is the judge, the fallback strategy) in the PR description for future reference.

Verdict

APPROVED: Well-structured addition of multi-model review support with good security practices, clean fallback behavior, and appropriate separation of concerns. The recommendations above are minor improvements, not blocking issues.

This is an automated AI review by opus-4.6-thinking. Please verify the feedback and use your judgment.

lukasmasuch · 2026-02-09T10:28:26Z

This is configured as a subagent (not a skill) so that it always starts with a fresh context and enforces read-only access.

lukasmasuch · 2026-02-09T10:29:59Z

The shared review instructions

Dynamically detect the base branch from PR metadata when available, falling back to develop. This allows the agent to correctly compare against the actual merge target for stacked PRs.

Add local code review subagent

f0c3e29

lukasmasuch requested a review from a team as a code owner February 7, 2026 16:47

Copilot AI review requested due to automatic review settings February 7, 2026 16:47

Copilot started reviewing on behalf of lukasmasuch February 7, 2026 16:47 View session

lukasmasuch added the ai-review If applied to PR or issue will run AI review workflow label Feb 7, 2026

github-actions Bot removed the ai-review If applied to PR or issue will run AI review workflow label Feb 7, 2026

Copilot AI reviewed Feb 7, 2026

View reviewed changes

Comment thread scripts/generate_agent_rules.py

Comment thread .github/workflows/ai-pr-review.yml

cursor Bot reviewed Feb 7, 2026

View reviewed changes

Comment thread .claude/.gitignore

Comment thread .github/workflows/ai-pr-review.yml Outdated

lukasmasuch added 7 commits February 7, 2026 17:59

Update

4b6f77b

Update

9fc1ea1

Update

862b8d2

Update

fa439f2

Add mulit-model review support

f4a3714

Update default models and fix concurrency group in AI PR review workflow

957b3b8

- Change default model to multi-model: gpt-5.2-codex-high,opus-4.6-thinking - Fix concurrency group to use direct input reference instead of env variable - Add placeholder test and comment in alert module

Update

a9ebfb8

lukasmasuch added the ai-review If applied to PR or issue will run AI review workflow label Feb 7, 2026

github-actions Bot removed the ai-review If applied to PR or issue will run AI review workflow label Feb 7, 2026

lukasmasuch added 2 commits February 7, 2026 18:14

Remove placeholder comment and test from alert module

03959a5

Fix extra closing brace in MODEL env variable

89a83f9

lukasmasuch added the ai-review If applied to PR or issue will run AI review workflow label Feb 7, 2026

github-actions Bot removed the ai-review If applied to PR or issue will run AI review workflow label Feb 7, 2026

lukasmasuch changed the title ~~Add local code review subagent~~ Add local code review subagent and multi-model PR review support Feb 7, 2026

lukasmasuch marked this pull request as draft February 7, 2026 17:15

lukasmasuch commented Feb 7, 2026

View reviewed changes

Use compact JSON output for GITHUB_OUTPUT compatibility

939241d

lukasmasuch added the ai-review If applied to PR or issue will run AI review workflow label Feb 7, 2026

github-actions Bot removed the ai-review If applied to PR or issue will run AI review workflow label Feb 7, 2026

lukasmasuch added 2 commits February 7, 2026 19:18

lukasmasuch added the ai-review If applied to PR or issue will run AI review workflow label Feb 7, 2026

streamlit deleted a comment from github-actions Bot Feb 7, 2026

github-actions Bot removed the ai-review If applied to PR or issue will run AI review workflow label Feb 7, 2026

streamlit deleted a comment from github-actions Bot Feb 7, 2026

lukasmasuch added the ai-review If applied to PR or issue will run AI review workflow label Feb 7, 2026

github-actions Bot removed the ai-review If applied to PR or issue will run AI review workflow label Feb 7, 2026

Switch default model to single opus-4.6-thinking

716ffa3

Multi-model support remains available via comma-separated input.

lukasmasuch added security-assessment-completed change:other PR contains other type of change impact:internal PR changes only affect internal code ai-review If applied to PR or issue will run AI review workflow labels Feb 7, 2026

lukasmasuch marked this pull request as ready for review February 7, 2026 19:19

github-actions Bot removed the ai-review If applied to PR or issue will run AI review workflow label Feb 7, 2026

lukasmasuch commented Feb 9, 2026

View reviewed changes

Comment thread scripts/assets/code-review-instructions.md

Copy link
Copy Markdown

Collaborator Author

lukasmasuch Feb 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The shared review instructions

sfc-gh-bnisco reviewed Feb 9, 2026

View reviewed changes

Comment thread .claude/agents/reviewing-local-changes.md

Comment thread .claude/agents/reviewing-local-changes.md

Support stacked PRs in local code review agent

dc1f54b

Dynamically detect the base branch from PR metadata when available, falling back to develop. This allows the agent to correctly compare against the actual merge target for stacked PRs.

sfc-gh-bnisco approved these changes Feb 9, 2026

View reviewed changes

lukasmasuch added 3 commits February 10, 2026 12:16

Clarify that PR may not exist yet for current branch

ed358e0

Simplify base branch detection description

571e3ad

Clarify develop is used as fallback base branch

200f9be

graphite-app Bot reviewed Feb 10, 2026

View reviewed changes

Comment thread .github/workflows/ai-pr-review.yml

Comment thread .github/workflows/ai-pr-review.yml

sfc-gh-lmasuch merged commit 48a3bd7 into develop Feb 10, 2026
42 checks passed

sfc-gh-lmasuch deleted the lukasmasuch/code-review-subagent branch February 10, 2026 11:39


		Review this branch's changes and ensure the changes are bug-free, backwards compatible, and ready for merge.

		## Review Checklist

Conversation

lukasmasuch commented Feb 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Describe your changes

Uh oh!

snyk-io Bot commented Feb 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Snyk checks have passed. No issues have been found so far.

Uh oh!

github-actions Bot commented Feb 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ PR preview is ready!

Uh oh!

github-actions Bot commented Feb 7, 2026

Summary

Code Quality

Test Coverage

Backwards Compatibility

Security & Risk

Accessibility

Recommendations

Verdict

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

lukasmasuch Feb 7, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Feb 7, 2026

Summary

Code Quality

Test Coverage

Backwards Compatibility

Security & Risk

Accessibility

Recommendations

Verdict

Summary

Code Quality

Test Coverage

Backwards Compatibility

Security & Risk

Accessibility

Recommendations

Verdict

Summary

Code Quality

Test Coverage

Backwards Compatibility

Security & Risk

Accessibility

Recommendations

Verdict

Uh oh!

github-actions Bot commented Feb 7, 2026

Summary

Code Quality

Test Coverage

Backwards Compatibility

Security & Risk

Accessibility

Recommendations

Reviewer Agreement Summary

Verdict

Summary

Code Quality

Test Coverage

Backwards Compatibility

Security & Risk

Accessibility

Recommendations

Verdict

Summary

lukasmasuch commented Feb 7, 2026 •

edited

Loading

snyk-io Bot commented Feb 7, 2026 •

edited

Loading

github-actions Bot commented Feb 7, 2026 •

edited

Loading