-
Notifications
You must be signed in to change notification settings - Fork 2
architecture memory and state
Note
👋 Hey there! Siyarix is a personal passion project built by a single developer that is growing and under active development. Some of the architectural components and features described on this page might currently be Planned, Work in Progress, or basic implementations. Stay tuned as it evolves! 🚀
Welcome to the heart of Siyarix! This document outlines our multi-layered memory and state management system. We designed this architecture to flawlessly handle everything from lightning-fast in-memory processing to reliable SQLite persistence and portable file-based exports.
At a high level, the system comprises several specialized components:
- KnowledgeGraph: Connects the dots on infrastructure relationships.
- MemoryManager: Powers our semantic memory using embeddings.
- ChatSession: Handles conversational history with powerful branching capabilities.
- SessionKernel: Persists state across sessions using JSON/JSONL.
- CacheManager: Keeps things snappy with LRU (Least Recently Used) and TTL (Time-To-Live) caching.
- Context Manager: Carefully optimizes what the LLM sees to maximize context window efficiency.
- Continuous Learning System (CLS): Learns new skills dynamically while strictly preserving privacy.
Siyarix categorizes memory into three distinct, robust layers.
Note
This layered approach ensures that fast, ephemeral data lives in RAM, critical operations persist safely to disk, and shareable insights can be effortlessly exported.
┌────────────────────────────────────────────────────────────┐
│ ⚡ In-Memory (Session Runtime) │
│ │
│ ┌──────────────┐ ┌──────────────────┐ ┌──────────────┐ │
│ │ Knowledge │ │ MemoryManager │ │ Context │ │
│ │ Graph │ │ (semantic memory │ │ Manager │ │
│ │ (entities, │ │ + embeddings) │ │ (window │ │
│ │ relations) │ │ │ │ build/ │ │
│ │ │ │ │ │ compress) │ │
│ └──────────────┘ └──────────────────┘ └──────────────┘ │
│ │
│ ┌──────────────┐ ┌──────────────────┐ ┌──────────────┐ │
│ │ CacheManager │ │ Conversation │ │ Continuous │ │
│ │ (LRU + TTL) │ │ History (deque) │ │ Learning │ │
│ │ │ │ Session Messages │ │ System (CLS) │ │
│ │ │ │ maxlen=300) │ │ (skill cache)│ │
│ └──────────────┘ └──────────────────┘ └──────────────┘ │
├────────────────────────────────────────────────────────────┤
│ 💾 SQLite (Persistent) │
│ │
│ ┌──────────────┐ ┌──────────────────┐ ┌──────────────┐ │
│ │ OfflineStore │ │ Continuous │ │ ProviderState│ │
│ │ (scans, │ │ Learning System │ │ Manager │ │
│ │ findings, │ │ .db) │ │ (cooldown, │ │
│ │ plans) │ │ │ │ failures) │ │
│ └──────────────┘ └──────────────────┘ └──────────────┘ │
├────────────────────────────────────────────────────────────┤
│ 📄 File-Based (Export/Import) │
│ │
│ ┌──────────────┐ ┌──────────────────┐ ┌──────────────┐ │
│ │ Reports │ │ ChatSession │ │ Knowledge │ │
│ │ (MD/HTML/ │ │ Exports │ │ Graph JSON │ │
│ │ JSON/SARIF) │ │ (JSONL tree fmt, │ │ Export │ │
│ │ │ │ PDF, TXT, MD) │ │ │ │
│ └──────────────┘ └──────────────────┘ └──────────────┘ │
│ │
│ ┌──────────────┐ ┌──────────────────┐ │
│ │ SessionKernel│ │ Tool Failure │ │
│ │ (JSON files) │ │ State │ │
│ │ │ │ (tool_failures │ │
│ │ │ │ .json) │ │
│ └──────────────┘ └──────────────────┘ │
└────────────────────────────────────────────────────────────┘
Located at siyarix/knowledge_graph.py, the KnowledgeGraph is a dynamic, in-memory directed graph. It maps out all discovered infrastructure entities and their intricate relationships.
Tip
Think of this as the "brain's map" of the target environment. It allows Siyarix to understand that a specific vulnerability lives on a service, which in turn runs on a particular host.
| Node | Attributes | Example |
|---|---|---|
HOST |
IP, hostname, OS, MAC | 10.0.0.1 |
PORT |
Number, protocol, state | 80/tcp open |
SERVICE |
Name, version, banner | Apache 2.4.41 |
VULNERABILITY |
CVE ID, severity, CVSS | CVE-2024-1234 |
DOMAIN |
FQDN, registrar, DNS | example.com |
CREDENTIAL |
Username, type, hash | admin:$2y$10$... |
FINDING |
Tool, description, ref | Nmap finding |
| Edge | Source → Target | Meaning |
|---|---|---|
RUNS_ON |
Service → Host | Service runs on host |
HAS_PORT |
Host → Port | Host has open port |
HAS_VULN |
Service → Vulnerability | Service has vulnerability |
RESOLVES_TO |
Domain → Host | Domain resolves to IP |
USES_CRED |
Service → Credential | Service uses credential |
RELATED_TO |
Finding → Finding | Related findings |
- Pathfinding: BFS (Breadth-First Search) to find the shortest path between any two entities.
- Advanced Querying: Extract subgraphs by node type, attribute, or relationship.
- Real-time Parsing: Instantly inserts new nodes and edges directly from tool parser outputs.
-
Persistence: Easily export/import state via JSON (
save_json/load_json) so no context is lost between sessions.
Located at siyarix/memory.py, the MemoryManager handles our semantic, long-term memory utilizing vector embeddings.
Important
Semantic memory empowers Siyarix to recall past learnings contextually, rather than relying on exact keyword matches.
memory = MemoryManager()
# Store a new memory with rich metadata
await memory.store(
content="Host 10.0.0.1 has Apache 2.4.41 running on port 80",
metadata={"source": "nmap", "session_id": "sess-123"}
)
# Search for related concepts
similar = await memory.search_similar("Apache versions", top_k=5)
# Grab all relevant context for a specific target
context = await memory.get_context(target="10.0.0.1")| Method | Purpose |
|---|---|
store(content, metadata) |
Saves a new memory entry into the semantic vault. |
search_similar(query, top_k) |
Uses embeddings to find the most conceptually similar memories. |
get_context(target) |
Retrieves a consolidated background context for a given target. |
Located at siyarix/context.py, the Context Manager is the gatekeeper for the LLM. It intelligently builds, compresses, and optimizes the context window so the LLM gets precisely what it needs without overflowing its token budget.
context = ContextManager(memory=memory_manager)
# Log conversation history
context.add_history("User message", "user")
context.add_history("Assistant response", "assistant")
# Build the perfectly sized context payload
history = context.get_history()
context = context.build_context(
conversation_history=history,
knowledge_subgraph=relevant_entities,
session_state={"mode": "autonomous", "target": "10.0.0.1"},
tool_availability=available_tools,
memory_entries=relevant_memories,
max_tokens=8192,
)When context gets too large, the CompactionEngine (siyarix/compaction.py) steps in to aggressively yet safely compress the payload.
Warning
Failing to compress context effectively can lead to LLM truncation errors and hallucinations. The CompactionEngine prevents this.
compactor = CompactionEngine()
tokens = compactor.analyze_tokens(raw_context)
compressed = compactor.compress_context(raw_context, target_tokens=4096)| Strategy | Description | Token Reduction |
|---|---|---|
| Truncation | Drops the oldest, least relevant conversation turns. | 20–40% |
| Summarization | Uses the LLM to summarize older history blocks. | 40–60% |
| KG Pruning | Retains only high-severity or immediately related graph entities. | 30–50% |
| Memory Prioritization | Filters out memories falling below a calculated importance threshold. | 50–70% |
| Deduplication | Strips out redundant tool outputs. | 10–20% |
Located at siyarix/chat/session.py, the ChatSession manages conversation state. It's not just a flat list—it natively supports complex branching via a JSONL tree structure.
Ever wanted to explore a different train of thought without breaking your current conversation? Siyarix supports conversation forks!
Session Root
├── Branch A (main thread)
│ ├── Message 1
│ ├── Message 2
│ │ └── Branch B (forked from message 2)
│ │ ├── Message 3
│ │ └── Message 4
│ └── Message 5
└── Branch C (forked from root)
└── Message 6
- Retains a rolling window of history (
maxlen=300). - Messages are robustly tracked using unique
id,parent,role,content,timestamp, andbranchidentifiers.
Exporting a session is as simple as calling ChatSession.export().
| Format | Description |
|---|---|
json |
Standard JSON array of messages. |
jsonl |
Advanced JSONL tree format (perfect for reloading). |
pdf |
A polished PDF document for reporting. |
txt |
A simple, raw plain-text transcript. |
md |
Markdown transcript for beautiful rendering. |
html |
An interactive HTML document. |
Located at siyarix/compat.py, the SessionKernel is the master controller for overarching session state and operational tracking.
kernel = SessionKernel()
session = kernel.start(
objective="Scan target network",
scope="10.0.0.0/24",
identity="operator-1",
)
# Track tactical operations
op = kernel.add_operation(session, "scan 10.0.0.1", "scan", "medium")
kernel.update_operation(session, op.operation_id, state="completed")
# Persist and Restore
path = kernel.save(session)
restored = kernel.load(session_id)Note
Unlike other modules that use SQLite, the SessionKernel utilizes JSON-based persistence to easily track operation cards, state, mode, risk tier, and related artifacts.
- Supports distinct persistence tiers:
EPHEMERAL,WORKSPACE, andORG_SHARED.
Located at siyarix/cache_manager.py, the CacheManager speeds up operations by temporarily holding onto frequently accessed data.
cache = CacheManager(
max_size=1000,
ttl=300,
persist_path="~/.siyarix/cache.db"
)
# Easily monitor cache health
stats = cache.get_stats()
# Result: CacheStats(hits=450, misses=30, hit_rate=0.94, size=200, evictions=15)- Implements LRU (Least Recently Used) paired with strict TTL (Time-To-Live).
- Optionally persists to disk to survive reboots.
Located at siyarix/learning_system.py, the Continuous Learning System is how Siyarix gets smarter over time. It organically acquires new skills by observing operator behavior.
Caution
Privacy First Guarantee: Real targets are NEVER stored. Every hostname, IP, URL, email, or hash is strictly replaced with a {target} placeholder before any data is saved.
-
Separate Store: Learning data is completely isolated inside
learning_store.db. - Zero Dependencies: Relies purely on the Python standard library, employing a BM25-style Jaccard similarity engine over NLP token sets.
- Bayesian Confidence: Skills are rated using a Bayesian-smoothed confidence formula that factors in time decay and operational complexity.
@dataclass
class LearnedStep:
tool: str
command_template: str # E.g., "nmap -sS {target}"
description: str
args: dict
@dataclass
class LearnedSkill:
skill_id: str
intent_pattern: str # The anonymised intent
steps: list[LearnedStep]
confidence: float # 0.0 to 1.0 (Bayesian-smoothed)
usage_count: int
success_count: int
tokens: list[str] # NLP tokens for rapid similarity matching
source: str # Origin: 'llm', 'offline', or 'inferred'-
Observe: Functions like
observe_llm_action()passively watch the execution. -
Anonymize: Scour and scrub the data, replacing real endpoints with
{target}. - Match: Run multi-tier similarity checks (≥0.60 is strong, <0.35 implies a brand new skill).
- Learn: Adjust confidence, extract parameters, and merge overlapping steps.
- Inject: High-confidence skills get promoted and can be executed automatically.
- Maintain: Constantly prune, decay old skills, and merge redundancies.
- Integrated Mode: Skills exceeding 80% confidence trigger automatic execution before the LLM is even consulted.
- Offline Mode: Learned skills dramatically enhance the heuristic planner.
- Synonyms: Maps human keywords to specific tools to beef up the NLP engine.
Ever wonder what happens from the moment Siyarix boots up until it safely shuts down?
🚀 Session Start
│
├── Load config from ~/.siyarix/settings.toml
├── Initialize KnowledgeGraph (empty or restore from JSON)
├── Initialize MemoryManager (load persisted embeddings)
├── Initialize CacheManager (load disk cache)
├── Initialize Continuous Learning System (load skill library)
├── Open OfflineStore (SQLite WAL)
├── Open ProviderStateManager (JSON file)
│
▼
🔥 Session Active
│
├── KnowledgeGraph populated from tool outputs (real-time)
├── MemoryManager updated from tool outputs
├── Conversation history appended (deque maxlen=300)
├── Continuous Learning System passively observes execution
├── Findings continuously stored in OfflineStore
├── Commands meticulously tracked via SessionKernel
├── Provider state tracked (cooldowns, failures, API costs)
├── Cache populated/evicted via LRU + TTL strategies
│
▼
🛑 Session End
│
├── Save KnowledgeGraph to JSON (if configured)
├── Persist MemoryManager embeddings safely to disk
├── Save comprehensive session via SessionKernel
├── Flush CacheManager memory to disk
├── Generate polished post-session reports
├── Safely close all SQLite connections
├── Trigger CLS maintenance (prune, decay, merge)
└── Clear ephemeral in-memory state gracefully
Here’s a quick-reference cheat sheet for how everything connects:
| Component | Role |
|---|---|
| Context Manager | Curates and compresses the LLM context from the KG, memory, and history. |
| MemoryManager | Manages vector-based semantic memory. |
| KnowledgeGraph | Maps real-time entity relationships. |
| ChatSession | Houses branching conversation trees in JSONL. |
| SessionKernel | Masters JSON-based session persistence and restoration. |
| CacheManager | Disk-backed LRU + TTL caching. |
| OfflineStore | Persists offline scans and findings to SQLite. |
| OfflineQueue | Queues requests for disconnected execution. |
| CompactionEngine | Trims context payload to respect LLM token budgets. |
| Continuous Learning System | Siyarix's privacy-first evolving skill library. |
| ProviderStateManager | Tracks API provider health, cooldowns, and failures. |
| ToolCallTracker | Remembers tool failures to avoid repeated mistakes. |
| EventBus | Broadcasts state changes globally (e.g., kg.updated, cache.evicted). |
Note
👋 Welcome to Siyarix! This is a personal passion project built by a single developer. It's currently under active development and growing fast. Expect rough edges, but lots of love! ❤️
Welcome to the Siyarix Documentation Map! This page serves as your master compass for navigating the extensive documentation we have built for the platform.
Whether you are a brand new user, a seasoned security operator, or a developer looking to contribute to the core engine, you can find exactly what you need here.
Not sure where to start? Pick the path that best describes you:
Just getting started? We highly recommend following these guides in order:
- Installation Guide — Get Siyarix running on your machine.
- Onboarding Wizard — Let our interactive wizard help you set up your API keys and environment.
- Setup & Configuration — A deeper dive into customizing your setup.
- Your First Run — A gentle walkthrough of your very first Siyarix command.
Ready to put Siyarix to work? Dive into our operational guides:
- Interactive Chat (REPL) — Learn how to use the powerful interactive terminal.
- Security Workflows — Best practices for recon, vulnerability assessment, and incident response.
- Cloud & IaC Scanning — How to secure your cloud environments and infrastructure code.
- Compliance Frameworks — Map your scans to SOC 2, HIPAA, ISO 27001, and more.
Looking under the hood or wanting to write some code? Start here:
- Contribution Guide — Our workflow, standards, and how you can help!
- Codebase Overview — A comprehensive map of our 82+ source modules.
- Testing Standards — How we ensure reliability with pytest and CI/CD.
- Module Architecture — Component design and responsibilities.
If you prefer to browse the raw structure, here is a complete layout of the docs/ folder:
docs/
├── 🚀 getting-started/ # Installation, onboarding, and configuration
│ ├── installation.md # Multi-platform install (pip, brew, winget, docker)
│ ├── onboarding.md # The interactive 11-step setup wizard
│ ├── setup.md # Managing API keys, credentials, and settings
│ ├── first-run.md # A walkthrough of your first session
│ ├── configuration.md # A deep-dive into advanced settings
│ └── troubleshooting.md # Common issues and how to fix them instantly
│
├── 📖 user/ # Daily operations and workflows
│ ├── cli-commands.md # Reference for 50+ CLI commands across 12 groups
│ ├── interactive-chat.md # Mastering the AI REPL and 54+ slash commands
│ ├── security-workflows.md # Recon, vulnerability assessment, incident response
│ ├── cloud-scanning.md # Multi-cloud security scanning (under development)
│ ├── compliance.md # Framework mapping (SOC 2, NIST, GDPR, PCI-DSS)
│ ├── threat-intelligence.md# Integrations with OTX, NVD, and MITRE ATT&CK
│ ├── playbooks.md # Building automated YAML-based IR playbooks
│ ├── workflow-files.md # DAG workflow reference (programmatic API)
│ ├── reporting.md # Multi-format report generation
│ ├── offline-registry.md # Running without AI (Offline/Registry execution mode)
│ └── ai-workflows.md # Advanced AI-driven autonomous operations
│
├── 💻 developer/ # Building, testing, and extending Siyarix
│ ├── codebase-overview.md # Full module structure mapping
│ ├── contribution-guide.md # How to submit PRs and our coding standards
│ ├── module-architecture.md# Component design and responsibilities
│ ├── testing.md # Writing tests (pytest), coverage, and CI/CD
│ └── building.md # Packaging, distribution, and Docker builds
│
├── 🏗️ architecture/ # System design and core internals
│ ├── overview.md # High-level data flow and layered orchestration
│ ├── ai-agent-pipeline.md # The AgentCore reasoning and execution pipeline
│ ├── provider-abstraction.md# How we unify 26 different AI providers
│ ├── execution-engine.md # Plan-based step orchestration
│ ├── memory-and-state.md # Knowledge graph, session persistence, and learning
│ ├── security-model.md # The Permission Gate, DLP, audit logging, and OPSEC
│ └── intent-routing.md # Semantic intent classification and routing
│
├── 🧠 ai/ # Deep dive into the AI provider & agent systems
│ ├── routing.md # Managing 26 providers, failovers, and circuit breakers
│ ├── persona-system.md # Overview of our 10 security personas
│ ├── agent-reasoning.md # The Observe-Reason-Act loop and tool call repair
│ ├── tool-execution.md # The tool registry, capability graph, and parsers
│ ├── ensemble.md # Parallel LLM voting strategies
│ ├── multi-wave.md # Iterative goal execution with context carry-over
│ ├── prompt-architecture.md# System prompt design and management
│ └── safety.md # Our rigorous 8-layer hallucination mitigation system
│
├── 🛡️ security/ # Safety, ethics, and threat models
│ ├── reporting.md # How to safely report vulnerabilities to us
│ ├── threat-model.md # System threat model and our mitigations
│ ├── operational-security.md# TOR routing, stealth modes, and OPSEC controls
│ ├── ethical-policy.md # Mandatory rules of engagement for all users
│ └── abuse-prevention.md # How we prevent misuse of the AI engine
│
└── ⚖️ legal/ # Licensing and governance
├── agpl-guide.md # A plain-English overview of the AGPL-3.0-or-later license
├── why-agpl.md # The philosophy behind our license choice
├── trademark-policy.md # Branding and trademark guidelines
├── responsible-ai.md # Our framework for ethical AI usage
├── disclaimer.md # Important legal disclaimers
└── plugin-exception.md # The license exception for building custom plugins
As you read through the documentation, you might encounter some specific terms. Here is a quick cheat sheet:
| Term | What It Means |
|---|---|
| Provider | The backend AI engine powering Siyarix (e.g., OpenAI, Anthropic, Ollama). |
| Tool | A traditional security executable installed on your system (e.g., nmap, nuclei). |
| Plan | A step-by-step sequence of tool commands intelligently generated by the AI. |
| Workflow | A hardcoded, predefined execution path (usually defined in YAML/JSON) that doesn't require AI generation. |
| Persona | A specialized behavioral profile given to the AI (e.g., instructing it to act specifically as a "Network Recon Specialist"). |
| Knowledge Graph | Siyarix's internal memory where it stores findings (like IP addresses, open ports) to contextually inform future steps. |
Need help finding something specific? Feel free to use the search bar at the top of the documentation site, or open a discussion on our GitHub!