fix: prompt hardening — security, negative rules, tone (research-backed)#59
fix: prompt hardening — security, negative rules, tone (research-backed)#59kienbui1995 merged 1 commit intomainfrom
Conversation
…tion reorder Based on research from Augment Code (11 techniques), Claude Code leak, and leaked prompts repo (134K stars): 1. Security section: prompt injection detection, no untrusted execution, no credential exposure 2. What NOT to Do: 7 negative rules (no write for small edits, no guess, no destructive commands, no modify tests, no silent deps, no repeat fails) 3. Output Format enhanced: confidence level, risks/side effects 4. Section reorder: Cost Awareness + Error Recovery moved to END (models pay most attention to beginning + end of prompt) 274 tests, 0 fail.
📝 WalkthroughWalkthroughThe system prompt in the CLI's main function was extended with new Security, "What NOT to Do", and improved Output Format sections. Duplicate guidance was removed, consolidating overlapping instructions into a single comprehensive prompt. Changes
Estimated code review effort🎯 1 (Trivial) | ⏱️ ~5 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Code Review
This pull request updates the system prompt in mc-cli by adding new sections for Security, What NOT to Do, and Output Format, while reorganizing existing instructions. A review comment suggests consolidating the What NOT to Do section to remove redundancies with other parts of the prompt, which would help reduce token usage and improve clarity.
| ## What NOT to Do\n\ | ||
| - Do NOT use `write_file` to make small edits — use `edit_file` instead.\n\ | ||
| - Do NOT read entire large files — use offset/limit in `read_file`.\n\ | ||
| - Do NOT guess when requirements are unclear — use `ask_user`.\n\ | ||
| - Do NOT run destructive commands (rm -rf, drop table) without user confirmation.\n\ | ||
| - Do NOT modify test files unless explicitly asked.\n\ | ||
| - Do NOT install new dependencies without mentioning it first.\n\ | ||
| - Do NOT repeat a failed approach — try a different strategy.\n\n\ |
There was a problem hiding this comment.
The 'What NOT to Do' section introduces several rules that are already covered in other sections, leading to significant redundancy. For example:
- Line 1877 is redundant with lines 1863 and 1893.
- Line 1878 is redundant with line 1895.
- Line 1879 is redundant with line 1871.
- Line 1883 is redundant with line 1897.
While negative constraints are useful, repeating the same instructions multiple times across different sections increases token usage and can lead to instruction fatigue for the model. Consider consolidating these into a single, clear instruction per topic. For instance, you could move the unique negative constraints (like destructive commands or test file modifications) here and keep the tool-specific ones in 'Tool Usage Guidelines'.
There was a problem hiding this comment.
🧹 Nitpick comments (1)
mc/crates/mc-cli/src/main.rs (1)
1881-1881: Soften the absolute “no test edits” rule to avoid blocking required fixesLine 1881 is currently absolute; this can prevent necessary test updates when behavior changes are implemented. Consider allowing test edits when strictly required, with explicit justification.
✏️ Proposed wording tweak
- - Do NOT modify test files unless explicitly asked.\n\ + - Do NOT modify test files unless explicitly asked; if test changes are strictly required for correctness, do so and explain why.\n\🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@mc/crates/mc-cli/src/main.rs` at line 1881, Update the hardline prohibition string "- Do NOT modify test files unless explicitly asked.\n" in main.rs to a softer message that allows test edits when necessary; change the text to indicate test modifications are permitted only with explicit justification and a brief note explaining why the change is required (e.g., "Do not modify tests unless strictly necessary — if a test must be updated, include an explicit justification and link to the relevant issue/PR"). Ensure you update the exact string literal where it's defined so help output and any related help/usage text reflect the new, permissive-but-justified policy.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@mc/crates/mc-cli/src/main.rs`:
- Line 1881: Update the hardline prohibition string "- Do NOT modify test files
unless explicitly asked.\n" in main.rs to a softer message that allows test
edits when necessary; change the text to indicate test modifications are
permitted only with explicit justification and a brief note explaining why the
change is required (e.g., "Do not modify tests unless strictly necessary — if a
test must be updated, include an explicit justification and link to the relevant
issue/PR"). Ensure you update the exact string literal where it's defined so
help output and any related help/usage text reflect the new,
permissive-but-justified policy.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: b43cc99f-7d57-4b65-b6df-cb96e40e6756
📒 Files selected for processing (1)
mc/crates/mc-cli/src/main.rs
Research-backed prompt improvements
Sources: Augment Code 11 techniques, Claude Code leak analysis, 134K-star leaked prompts repo.
New sections
Reordered
Cost Awareness + Error Recovery moved to END of static prompt (Augment: 'models pay more attention to beginning and especially end')
All 8 new rules
✅ Prompt injection detection
✅ No write_file for small edits
✅ No destructive commands without confirmation
✅ No modify tests unless asked
✅ No install deps silently
✅ No repeat failed approach
✅ State confidence level
✅ Mention risks/side effects
274 tests, 0 fail.
Summary by CodeRabbit