Skip to content

proposal: ATR detection rules as a community safety plugin #5740

@eeee2345

Description

@eeee2345

ATR (Agent Threat Rules) is an MIT-licensed set of 344 regex detection rules for AI agent threats — prompt injection, tool poisoning, credential exfiltration, privilege escalation, and context manipulation. It ships as an npm package and a Python package (pyatr).

Production deployments: Microsoft Copilot SWE Agent (automated CVE detection loop), Cisco AI Defense (skill scanning pipeline), MISP/CIRCL (taxonomy merged by Alexandre Dulaunoy), OWASP Agentic Security Handbook.

The ADK BasePlugin architecture (cross-cutting policies, after_model_callback / before_tool_callback) is a natural fit for a safety plugin that evaluates agent inputs and tool outputs against this rule corpus.

Concrete proposal: an ATR plugin that hooks before_tool_callback to scan tool arguments against the rules and optionally blocks on critical/high severity matches. Similar to how Guardrails AI or LiteLLM's guardrail hooks work, but backed by the community-maintained ATR rule corpus.

I can write this as a community plugin if the ADK team would accept a PR under examples/community/ or a separate package. Two questions before starting:

  1. Is examples/community/ the right landing spot, or would a separate pip-installable package be preferred?
  2. Is pyatr (Python) or the npm package the preferred dependency surface for ADK integrations?

ATR repo: https://github.com/Agent-Threat-Rule/agent-threat-rules

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions