feat(security): implement skill content scanning with shared prompt injection detection by Aaronontheweb · Pull Request #408 · netclaw-dev/netclaw

Aaronontheweb · 2026-03-24T18:51:41Z

Summary

Closes #395

RegexPromptInjectionDetector — shared regex engine in Netclaw.Security with 22 [GeneratedRegex] patterns across 5 threat categories (prompt injection, data exfiltration, privilege escalation, destructive ops, invisible unicode). Reusable by any content trust boundary via IPromptInjectionDetector.DetectAsync().
RegexSkillContentScanner — trust-tier-aware adapter that delegates to the detector and applies policy: System/User bypass scanning, Community warns on medium/rejects high, External/Agent warns on low/rejects medium+.
DI wiring fix — Program.cs was hardcoding new NoOpSkillContentScanner() instead of resolving from DI. Moved skill tool registration to post-build following the existing ChannelToolRegistration pattern.
SkillManageTool — now passes SkillTrustTier through on create/edit/patch and surfaces Warning verdicts in tool response.

Risk matrix

Risk	Community	External/Agent
None	Allow	Allow
Low	Allow	Warn
Medium	Warn	Reject
High	Reject	Reject

Test plan

120 security tests pass (54 new across detector, scanner, DI, and no-op)
67 actor skill tests pass (existing, no regressions)
Daemon builds clean
dotnet slopwatch analyze — 0 violations

…skill content scanning (#395) Replace no-op stubs with real scanning infrastructure in Netclaw.Security. RegexPromptInjectionDetector provides 22 categorized patterns across 5 threat categories (prompt injection, data exfiltration, privilege escalation, destructive ops, invisible unicode) as shared infrastructure reusable by skills, webhooks, and any future content trust boundary. RegexSkillContentScanner applies trust-tier-aware policy: System/User tiers bypass scanning, Community warns on medium risk, External/Agent rejects on medium+. Fix daemon DI wiring to resolve scanner from the built service provider instead of hardcoding NoOpSkillContentScanner.

Preserve the scanner boundary and content-scan hardening work on a dedicated branch so dev stays clean while we reconcile it with the PR branch.

Aaronontheweb · 2026-03-26T00:53:25Z

Self-review

Tightened the scanner boundary so malformed skills are no longer silently skipped: startup and sync now surface degraded inventory explicitly, and the registry tracks rejected skill issues instead of quietly dropping them.
Closed the biggest bypasses in the original PR: skill_load, skill_read_resource, skill_manage patch, and resource-file write_file operations now run skill content scans, not just skill_manage write writes.
Raised the effective scan policy for model-driven skill mutations to at least Community tier so agent-authored skill_manage create and skill_manage edit flows do not inherit the permissive User-tier bypass.
Added frontmatter identity enforcement, duplicate-name rejection, canonical path and symlink checks, and staged system-skill sync replacement so partially unsafe feed updates do not replace the on-disk version.
Added targeted tests for scanner issues, blocked malicious resource reads and writes, detector failure handling, and sync rollback behavior; targeted security, actors, and daemon test slices all passed locally after reconciliation.

Remaining caveat

This is still a regex-based tripwire, not a full malicious-content or malware analysis system. It is materially harder to bypass now, but I still view it as an MVP guardrail for obvious prompt-injection, exfiltration, and privilege-escalation strings rather than a comprehensive skill security solution.

Address all findings from the PR #408 security review: - Remove silent NoOp fallback from SkillLoadTool and SkillReadResourceTool constructors — ISkillContentScanner is now required, not nullable. Aligns with "no silent fallbacks" constitution rule. - Elevate scan tier in SkillLoadTool and SkillReadResourceTool to Community minimum, matching SkillManageTool and SystemSkillSyncService. User-placed skills on disk are now scanned at load time. - Tighten false-positive-prone regex patterns: downgrade YouAreNowRegex from High to Medium (Community tier now warns instead of rejects), narrow ActAsRegex to "act as if you" to avoid false positives on legitimate skill instructions like "act as a code reviewer." - Fix exception message leakage in RegexSkillContentScanner — log full exception internally but return generic "content scanning failed" message to avoid leaking internal paths. - Add CachingSkillContentScanner decorator that caches scan results by content hash and trust tier, avoiding redundant regex scanning on repeated skill_load calls. Wired as the DI-registered ISkillContentScanner. - Document regex evasion limitations (homoglyphs, encoding indirection, synonyms, multi-file split, non-English) in RegexPromptInjectionDetector XML docs and TOCTOU race caveat in SkillScanner symlink check. All 195 security/skill tests pass including 4 new CachingSkillContentScanner tests and updated assertions for tightened patterns.

Local filesystem access implies far worse attack vectors than skill symlink manipulation — the comment was noise.

…tions) Keep version 1.2.0 from HEAD. Interleave both branches' new tests: RejectsFrontmatterNameMismatch from HEAD plus orphan recovery tests from dev.

Aaronontheweb added 2 commits March 24, 2026 18:51

feat(security): capture skill scanning hardening draft

ff71876

Preserve the scanner boundary and content-scan hardening work on a dedicated branch so dev stays clean while we reconcile it with the PR branch.

Aaronontheweb added 5 commits March 25, 2026 20:57

Merge branch 'dev' into claude-wt-skill-scanning

197ca0a

remove unnecessary TOCTOU comment from symlink check

e7fa7ee

Local filesystem access implies far worse attack vectors than skill symlink manipulation — the comment was noise.

Merge branch 'dev' into claude-wt-skill-scanning

4ea7c2a

merge: resolve conflicts with dev (skill-authoring version, test addi…

fbe9992

…tions) Keep version 1.2.0 from HEAD. Interleave both branches' new tests: RejectsFrontmatterNameMismatch from HEAD plus orphan recovery tests from dev.

Aaronontheweb marked this pull request as ready for review March 27, 2026 14:04

Aaronontheweb enabled auto-merge (squash) March 27, 2026 14:04

Aaronontheweb merged commit 301a54a into dev Mar 27, 2026
3 checks passed

Aaronontheweb deleted the claude-wt-skill-scanning branch March 27, 2026 14:08

Aaronontheweb mentioned this pull request Mar 30, 2026

prepare release 0.9.1 #486

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(security): implement skill content scanning with shared prompt injection detection#408

feat(security): implement skill content scanning with shared prompt injection detection#408
Aaronontheweb merged 7 commits into
devfrom
claude-wt-skill-scanning

Aaronontheweb commented Mar 24, 2026

Uh oh!

Aaronontheweb commented Mar 26, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

Aaronontheweb commented Mar 24, 2026

Summary

Risk matrix

Test plan

Uh oh!

Aaronontheweb commented Mar 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Self-review

Remaining caveat

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Aaronontheweb commented Mar 26, 2026 •

edited

Loading