RFC-ASA-001: 6 production implementation contributions from 6 independent packages#1
Open
jyswee wants to merge 1 commit into
Open
Conversation
…ckages 6 contributions based on implementing the full Asimov Safety Architecture across 6 packages (JS, Python/FastAPI, MCP proxy, CLI, framework SDKs) with 204 E2E tests passing: 1. Named Behavioral Patterns (ASA-B-001 to ASA-B-005) - formal detection rules for recon-escalation, exfiltration, brute-force, spam, credential harvest 2. Sliding Window State Management - Redis ZSET reference implementation for cross-command behavioral analysis with tiered windows 3. Scope Rule Language - glob-pattern matching with architect/expert/observer role hierarchy and default-deny semantics 4. Channel Content Policy - 10 sensitive data patterns complementing HMAC integrity with content-level inspection 5. Production Benchmarks - Gate 1: ~1us/769K evals/sec, Gate 3: ~5ms, full pipeline: ~185ms, 0% false positives 6. Safety Event Schema - 5 core OCSF-compatible event types for external observability (allow, block, behavioral, scope, content)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
RFC-ASA-001 Contributions - Why This PR
Hi Halil,
After you shared the Asimov Safety Architecture with me, we implemented the full 4-layer safety model across our production systems. Six separate packages - JavaScript, Python, a FastAPI reference server, an MCP proxy, a CLI wrapper, and framework SDKs for LangChain and CrewAI. 204 end-to-end tests, all passing.
The architecture works. The dual-gate model (deterministic denylist + independent LLM judge) catches things neither layer would catch alone. That's a sound design.
Implementing it at production scale revealed areas where we found ourselves building additional detail on top of the spec - things that only surface when you're processing real agent traffic across multiple tenants, multiple languages, and multiple deployment models. This PR captures 6 of those findings, offered as suggestions for the next version.
What we found
Layer 4 could benefit from concrete patterns. The RFC describes detecting "multi-command attack sequences" and leaves the specific sequences to implementors. We formalised 5 named patterns that kept appearing in production - reconnaissance-to-escalation, exfiltration sequences, brute-force loops, command spam, and credential harvesting. Each has formal detection rules, indicator sets, and example sequences. Gate 3 checks all 5 against a 200-entry history in about 5ms.
Layer 4 benefits from state management. Without persistent state across commands, behavioral analysis becomes difficult to implement reliably. We built a sliding window using sorted sets with timestamp scores. Tiered windows (5 minutes default, 1 hour for high-security). Auto-expiry. Fail-open if the state backend goes down - because the safety layer should never become a denial-of-service vector against the system it's protecting.
Layer 3 could use a rule language. The RFC describes "who should access what" and leaves the rule format open. We found ourselves needing something concrete, so we implemented glob-pattern matching with a three-role hierarchy (architect bypasses scope, expert must match rules, observer can't execute). Default-deny - empty scope means no access.
HMAC covers integrity - we found content-level checks complement it well. The RFC specifies HMAC-SHA256 for message integrity, which solves tampering. In production we hit a case where a compromised agent legitimately signed a message containing sensitive data. A valid signature on
api_key=sk_live_...is still a security breach. We added content-level policy checking - 10 sensitive data patterns matched on every message, independent of HMAC verification.Implementors would benefit from performance baselines. How fast should each gate be? What's achievable? We found ourselves wanting reference numbers during implementation, so we captured them: Gate 1 at ~1 microsecond (769K evals/sec, pure regex), Gate 3 at about 5ms, full pipeline at around 185ms.
Safety events benefit from a delivery specification. The RFC defines the evaluation pipeline (command in, verdict out) but the output is returned to the caller only. We found that SOC teams want to monitor agent safety independently, so we defined 5 core safety event types in OCSF format - evaluation allowed/blocked, behavioral pattern detected, scope violation, and content policy violation.
What we're NOT doing
We're not proposing a competing architecture. We're not changing the dual-gate model. We're not modifying the 4-layer structure. Every contribution fills in detail within your existing design.
The behavioral patterns slot into Layer 4. The scope rules slot into Layer 3. The content policy complements HMAC in the messaging layer. The event schema adds an output channel the pipeline doesn't currently have. The benchmarks go in an appendix.
The numbers
All details in the attached CONTRIBUTIONS.md.
Looking forward to your feedback.
Joe Wee
Tyga.Cloud Ltd