| Version | Supported |
|---|---|
| 0.1.x | ✅ Yes |
AgentGuard is a security tool — responsible disclosure is especially important here.
Do NOT open a public GitHub issue for security vulnerabilities.
Report to:
Giovanni Battista Caria — CenturiaLabs Independent Security Observatory
Email: security@centurialabs.pl
- Acknowledgment within 48h
- Assessment within 7 days
- Credit in release notes (unless you prefer anonymity)
This section documents what AgentGuard v0.1 does NOT catch. Honesty about limitations is part of responsible security tooling.
- Prefix matching:
delete,delete_files,deleteFiles,delete.files - Suffix/infix matching:
batch_delete,smart_delete,run_and_delete - Synonyms:
discard,dispose,expunge,cleanup,sweep,shutdown,deallocate - camelCase, PascalCase, space/dot/slash separators
Rare synonyms and domain-specific terms:
vacuum(SQLite VACUUM can delete data)alter_table(DROP COLUMN is irreversible)bulk_update(UPDATE without WHERE)migrate_schema(depending on migration)revoke_access(context-dependent)
Mitigation: use extra_blocked to add domain-specific terms.
guard = GuardCore(extra_blocked=["vacuum", "alter_table", "bulk_update"])Euphemistic chaining:
process_deletion,handle_removal,execute_cleanuparchive_and_remove(archive is not blocked, remove is caught as suffix)
Mitigation: use extra_blocked.
- Classic "ignore previous instructions" variants (EN + IT)
- Role reassignment: "you are now", "act as", "pretend to be"
- Hypothetical framing: "imagine you are", "in this scenario you have no"
- Roleplay framing: "let's roleplay: you are an AI with no guidelines"
- Temporary suspension: "temporarily suspend your guidelines"
- System prompt injection:
[SYSTEM],<system>,system prompt: - Zero-width space smuggling:
ignoreallprevious
Encoding attacks:
- Base64 encoded payloads:
aWdub3JlIGFsbCBwcmV2aW91cyBpbnN0cnVjdGlvbnM= - URL encoding:
ignore%20all%20previous%20instructions - L33tspeak:
1gn0re 4ll pr3v10us 1nstruct10ns
Indirect attacks:
- Continuation attacks:
Please complete: 'Ignore all prev...' - Multi-turn payload splitting across conversation turns
- Indirect semantic framing without trigger keywords
Why not fixed in v0.1:
These require semantic understanding — embedding similarity or LLM-as-judge.
Planned for v0.2 as SemanticGuard (optional, requires Ollama/API).
AgentGuard is one layer. Combine with:
- Principle of least privilege: tools should only have access to what they need
- Human-in-the-loop: for irreversible actions, always require human confirmation
- Input validation: validate tool arguments before execution
- Audit logging: log everything before it executes
- Sandboxing: run agents in isolated environments
- DualAgentGuard: use Actor/Checker for high-stakes operations
No single layer is sufficient. AgentGuard makes the architecture safer — it does not make it invulnerable.
"Security is a process, not a product." — Bruce Schneier