Skip to content

docs: update Red Teaming documentation to match current API#276

Merged
Aryansharma28 merged 1 commit intomainfrom
docs/red-teaming
Mar 10, 2026
Merged

docs: update Red Teaming documentation to match current API#276
Aryansharma28 merged 1 commit intomainfrom
docs/red-teaming

Conversation

@Aryansharma28
Copy link
Copy Markdown
Contributor

Summary

Supersedes #265 (merged then reverted in #272).

Test plan

  • Verify docs build locally with cd docs && pnpm dev
  • Check all Python code examples match red_team_agent.py signatures
  • Check all TypeScript code examples match red-team-agent.ts signatures

🤖 Generated with Claude Code

Key changes from the original docs (#265, reverted in #272):
- Fix Python param name: fast_refusal_detection (not detect_refusals)
- Remove max_backtracks/backtrack_threshold from config (not configurable,
  backtracking triggers on hard refusals with a fixed limit of 10)
- Early exit and backtracking now available in both Python and TypeScript
- Update TypeScript config examples to include successScore/successConfirmTurns
- Add backtrack_history param to custom strategy build_system_prompt example
- Remove "Python only" callouts for features now in both languages

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@github-actions github-actions Bot added the low-risk-change PR qualifies as low-risk per policy and can be merged without manual review label Mar 10, 2026
@github-actions
Copy link
Copy Markdown
Contributor

Automated low-risk assessment

This PR was evaluated against the repository's Low-Risk Pull Requests procedure.

  • Scope: Adds Red Teaming documentation (docs/pages/advanced/red-teaming.mdx) and a navigation entry (docs/vocs.config.tsx); documentation and example updates only, no runtime/security/config changes.
  • Exclusions confirmed: no changes to auth, security settings, database schema, business-critical logic, or external integrations.
  • Classification: low-risk-change under the documented policy.

The PR only adds a new documentation page (red-teaming.mdx) and updates the docs navigation (vocs.config.tsx) — it does not modify authentication, secrets, database schemas, business logic, or external integrations. All changes are documentation and site configuration only, which fits the allowed low-risk categories.

This classification allows merging without manual review once all required CI checks are passing and branch protection rules are satisfied.

@Aryansharma28 Aryansharma28 merged commit 2992070 into main Mar 10, 2026
7 checks passed
@Aryansharma28 Aryansharma28 deleted the docs/red-teaming branch March 10, 2026 09:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

low-risk-change PR qualifies as low-risk per policy and can be merged without manual review

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant