feat(security): implement skill content scanning with shared prompt injection detection#408
Merged
Merged
Conversation
…skill content scanning (#395) Replace no-op stubs with real scanning infrastructure in Netclaw.Security. RegexPromptInjectionDetector provides 22 categorized patterns across 5 threat categories (prompt injection, data exfiltration, privilege escalation, destructive ops, invisible unicode) as shared infrastructure reusable by skills, webhooks, and any future content trust boundary. RegexSkillContentScanner applies trust-tier-aware policy: System/User tiers bypass scanning, Community warns on medium risk, External/Agent rejects on medium+. Fix daemon DI wiring to resolve scanner from the built service provider instead of hardcoding NoOpSkillContentScanner.
Preserve the scanner boundary and content-scan hardening work on a dedicated branch so dev stays clean while we reconcile it with the PR branch.
Collaborator
Author
Self-review
Remaining caveat
|
Address all findings from the PR #408 security review: - Remove silent NoOp fallback from SkillLoadTool and SkillReadResourceTool constructors — ISkillContentScanner is now required, not nullable. Aligns with "no silent fallbacks" constitution rule. - Elevate scan tier in SkillLoadTool and SkillReadResourceTool to Community minimum, matching SkillManageTool and SystemSkillSyncService. User-placed skills on disk are now scanned at load time. - Tighten false-positive-prone regex patterns: downgrade YouAreNowRegex from High to Medium (Community tier now warns instead of rejects), narrow ActAsRegex to "act as if you" to avoid false positives on legitimate skill instructions like "act as a code reviewer." - Fix exception message leakage in RegexSkillContentScanner — log full exception internally but return generic "content scanning failed" message to avoid leaking internal paths. - Add CachingSkillContentScanner decorator that caches scan results by content hash and trust tier, avoiding redundant regex scanning on repeated skill_load calls. Wired as the DI-registered ISkillContentScanner. - Document regex evasion limitations (homoglyphs, encoding indirection, synonyms, multi-file split, non-English) in RegexPromptInjectionDetector XML docs and TOCTOU race caveat in SkillScanner symlink check. All 195 security/skill tests pass including 4 new CachingSkillContentScanner tests and updated assertions for tightened patterns.
Local filesystem access implies far worse attack vectors than skill symlink manipulation — the comment was noise.
…tions) Keep version 1.2.0 from HEAD. Interleave both branches' new tests: RejectsFrontmatterNameMismatch from HEAD plus orphan recovery tests from dev.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Closes #395
RegexPromptInjectionDetector— shared regex engine inNetclaw.Securitywith 22[GeneratedRegex]patterns across 5 threat categories (prompt injection, data exfiltration, privilege escalation, destructive ops, invisible unicode). Reusable by any content trust boundary viaIPromptInjectionDetector.DetectAsync().RegexSkillContentScanner— trust-tier-aware adapter that delegates to the detector and applies policy: System/User bypass scanning, Community warns on medium/rejects high, External/Agent warns on low/rejects medium+.Program.cswas hardcodingnew NoOpSkillContentScanner()instead of resolving from DI. Moved skill tool registration to post-build following the existingChannelToolRegistrationpattern.SkillManageTool— now passesSkillTrustTierthrough on create/edit/patch and surfaces Warning verdicts in tool response.Risk matrix
Test plan
dotnet slopwatch analyze— 0 violations