Skip to content

RFC-ASA-001: 6 production implementation contributions from 6 independent packages#1

Open
jyswee wants to merge 1 commit into
h-network:mainfrom
jyswee:rfc-asa-001-production-contributions
Open

RFC-ASA-001: 6 production implementation contributions from 6 independent packages#1
jyswee wants to merge 1 commit into
h-network:mainfrom
jyswee:rfc-asa-001-production-contributions

Conversation

@jyswee
Copy link
Copy Markdown

@jyswee jyswee commented May 11, 2026

RFC-ASA-001 Contributions - Why This PR

Hi Halil,

After you shared the Asimov Safety Architecture with me, we implemented the full 4-layer safety model across our production systems. Six separate packages - JavaScript, Python, a FastAPI reference server, an MCP proxy, a CLI wrapper, and framework SDKs for LangChain and CrewAI. 204 end-to-end tests, all passing.

The architecture works. The dual-gate model (deterministic denylist + independent LLM judge) catches things neither layer would catch alone. That's a sound design.

Implementing it at production scale revealed areas where we found ourselves building additional detail on top of the spec - things that only surface when you're processing real agent traffic across multiple tenants, multiple languages, and multiple deployment models. This PR captures 6 of those findings, offered as suggestions for the next version.

What we found

Layer 4 could benefit from concrete patterns. The RFC describes detecting "multi-command attack sequences" and leaves the specific sequences to implementors. We formalised 5 named patterns that kept appearing in production - reconnaissance-to-escalation, exfiltration sequences, brute-force loops, command spam, and credential harvesting. Each has formal detection rules, indicator sets, and example sequences. Gate 3 checks all 5 against a 200-entry history in about 5ms.

Layer 4 benefits from state management. Without persistent state across commands, behavioral analysis becomes difficult to implement reliably. We built a sliding window using sorted sets with timestamp scores. Tiered windows (5 minutes default, 1 hour for high-security). Auto-expiry. Fail-open if the state backend goes down - because the safety layer should never become a denial-of-service vector against the system it's protecting.

Layer 3 could use a rule language. The RFC describes "who should access what" and leaves the rule format open. We found ourselves needing something concrete, so we implemented glob-pattern matching with a three-role hierarchy (architect bypasses scope, expert must match rules, observer can't execute). Default-deny - empty scope means no access.

HMAC covers integrity - we found content-level checks complement it well. The RFC specifies HMAC-SHA256 for message integrity, which solves tampering. In production we hit a case where a compromised agent legitimately signed a message containing sensitive data. A valid signature on api_key=sk_live_... is still a security breach. We added content-level policy checking - 10 sensitive data patterns matched on every message, independent of HMAC verification.

Implementors would benefit from performance baselines. How fast should each gate be? What's achievable? We found ourselves wanting reference numbers during implementation, so we captured them: Gate 1 at ~1 microsecond (769K evals/sec, pure regex), Gate 3 at about 5ms, full pipeline at around 185ms.

Safety events benefit from a delivery specification. The RFC defines the evaluation pipeline (command in, verdict out) but the output is returned to the caller only. We found that SOC teams want to monitor agent safety independently, so we defined 5 core safety event types in OCSF format - evaluation allowed/blocked, behavioral pattern detected, scope violation, and content policy violation.

What we're NOT doing

We're not proposing a competing architecture. We're not changing the dual-gate model. We're not modifying the 4-layer structure. Every contribution fills in detail within your existing design.

The behavioral patterns slot into Layer 4. The scope rules slot into Layer 3. The content policy complements HMAC in the messaging layer. The event schema adds an output channel the pipeline doesn't currently have. The benchmarks go in an appendix.

The numbers

  • 6 implementations across 3 languages (JS, Python, Bash)
  • 204 E2E tests, 100% pass
  • Gate 1: ~1 microsecond, 769K evals/sec, 40 patterns, 0% false positives
  • Gate 3: ~5ms, 5 named patterns, 200-entry sliding window
  • Full pipeline: ~185ms (all gates)

All details in the attached CONTRIBUTIONS.md.

Looking forward to your feedback.

Joe Wee
Tyga.Cloud Ltd

…ckages

6 contributions based on implementing the full Asimov Safety Architecture
across 6 packages (JS, Python/FastAPI, MCP proxy, CLI, framework SDKs)
with 204 E2E tests passing:

1. Named Behavioral Patterns (ASA-B-001 to ASA-B-005) - formal detection
   rules for recon-escalation, exfiltration, brute-force, spam, credential harvest
2. Sliding Window State Management - Redis ZSET reference implementation
   for cross-command behavioral analysis with tiered windows
3. Scope Rule Language - glob-pattern matching with architect/expert/observer
   role hierarchy and default-deny semantics
4. Channel Content Policy - 10 sensitive data patterns complementing HMAC
   integrity with content-level inspection
5. Production Benchmarks - Gate 1: ~1us/769K evals/sec, Gate 3: ~5ms,
   full pipeline: ~185ms, 0% false positives
6. Safety Event Schema - 5 core OCSF-compatible event types for external
   observability (allow, block, behavioral, scope, content)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant