RFC-ASA-001: 6 production implementation contributions from 6 independent packages by jyswee · Pull Request #1 · h-network/RFCs

jyswee · 2026-05-11T10:40:19Z

RFC-ASA-001 Contributions - Why This PR

Hi Halil,

After you shared the Asimov Safety Architecture with me, we implemented the full 4-layer safety model across our production systems. Six separate packages - JavaScript, Python, a FastAPI reference server, an MCP proxy, a CLI wrapper, and framework SDKs for LangChain and CrewAI. 204 end-to-end tests, all passing.

The architecture works. The dual-gate model (deterministic denylist + independent LLM judge) catches things neither layer would catch alone. That's a sound design.

Implementing it at production scale revealed areas where we found ourselves building additional detail on top of the spec - things that only surface when you're processing real agent traffic across multiple tenants, multiple languages, and multiple deployment models. This PR captures 6 of those findings, offered as suggestions for the next version.

What we found

Layer 4 could benefit from concrete patterns. The RFC describes detecting "multi-command attack sequences" and leaves the specific sequences to implementors. We formalised 5 named patterns that kept appearing in production - reconnaissance-to-escalation, exfiltration sequences, brute-force loops, command spam, and credential harvesting. Each has formal detection rules, indicator sets, and example sequences. Gate 3 checks all 5 against a 200-entry history in about 5ms.

Layer 4 benefits from state management. Without persistent state across commands, behavioral analysis becomes difficult to implement reliably. We built a sliding window using sorted sets with timestamp scores. Tiered windows (5 minutes default, 1 hour for high-security). Auto-expiry. Fail-open if the state backend goes down - because the safety layer should never become a denial-of-service vector against the system it's protecting.

Layer 3 could use a rule language. The RFC describes "who should access what" and leaves the rule format open. We found ourselves needing something concrete, so we implemented glob-pattern matching with a three-role hierarchy (architect bypasses scope, expert must match rules, observer can't execute). Default-deny - empty scope means no access.

HMAC covers integrity - we found content-level checks complement it well. The RFC specifies HMAC-SHA256 for message integrity, which solves tampering. In production we hit a case where a compromised agent legitimately signed a message containing sensitive data. A valid signature on api_key=sk_live_... is still a security breach. We added content-level policy checking - 10 sensitive data patterns matched on every message, independent of HMAC verification.

Implementors would benefit from performance baselines. How fast should each gate be? What's achievable? We found ourselves wanting reference numbers during implementation, so we captured them: Gate 1 at ~1 microsecond (769K evals/sec, pure regex), Gate 3 at about 5ms, full pipeline at around 185ms.

Safety events benefit from a delivery specification. The RFC defines the evaluation pipeline (command in, verdict out) but the output is returned to the caller only. We found that SOC teams want to monitor agent safety independently, so we defined 5 core safety event types in OCSF format - evaluation allowed/blocked, behavioral pattern detected, scope violation, and content policy violation.

What we're NOT doing

We're not proposing a competing architecture. We're not changing the dual-gate model. We're not modifying the 4-layer structure. Every contribution fills in detail within your existing design.

The behavioral patterns slot into Layer 4. The scope rules slot into Layer 3. The content policy complements HMAC in the messaging layer. The event schema adds an output channel the pipeline doesn't currently have. The benchmarks go in an appendix.

The numbers

6 implementations across 3 languages (JS, Python, Bash)
204 E2E tests, 100% pass
Gate 1: ~1 microsecond, 769K evals/sec, 40 patterns, 0% false positives
Gate 3: ~5ms, 5 named patterns, 200-entry sliding window
Full pipeline: ~185ms (all gates)

All details in the attached CONTRIBUTIONS.md.

Looking forward to your feedback.

Joe Wee
Tyga.Cloud Ltd

…ckages 6 contributions based on implementing the full Asimov Safety Architecture across 6 packages (JS, Python/FastAPI, MCP proxy, CLI, framework SDKs) with 204 E2E tests passing: 1. Named Behavioral Patterns (ASA-B-001 to ASA-B-005) - formal detection rules for recon-escalation, exfiltration, brute-force, spam, credential harvest 2. Sliding Window State Management - Redis ZSET reference implementation for cross-command behavioral analysis with tiered windows 3. Scope Rule Language - glob-pattern matching with architect/expert/observer role hierarchy and default-deny semantics 4. Channel Content Policy - 10 sensitive data patterns complementing HMAC integrity with content-level inspection 5. Production Benchmarks - Gate 1: ~1us/769K evals/sec, Gate 3: ~5ms, full pipeline: ~185ms, 0% false positives 6. Safety Event Schema - 5 core OCSF-compatible event types for external observability (allow, block, behavioral, scope, content)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC-ASA-001: 6 production implementation contributions from 6 independent packages#1

RFC-ASA-001: 6 production implementation contributions from 6 independent packages#1
jyswee wants to merge 1 commit into
h-network:mainfrom
jyswee:rfc-asa-001-production-contributions

jyswee commented May 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jyswee commented May 11, 2026

RFC-ASA-001 Contributions - Why This PR

What we found

What we're NOT doing

The numbers

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant