architecture memory and state

🧠 Memory & State Management

Note

👋 Hey there! Siyarix is a personal passion project built by a single developer that is growing and under active development. Some of the architectural components and features described on this page might currently be Planned, Work in Progress, or basic implementations. Stay tuned as it evolves! 🚀

Welcome to the heart of Siyarix! This document outlines our multi-layered memory and state management system. We designed this architecture to flawlessly handle everything from lightning-fast in-memory processing to reliable SQLite persistence and portable file-based exports.

At a high level, the system comprises several specialized components:

KnowledgeGraph: Connects the dots on infrastructure relationships.
MemoryManager: Powers our semantic memory using embeddings.
ChatSession: Handles conversational history with powerful branching capabilities.
SessionKernel: Persists state across sessions using JSON/JSONL.
CacheManager: Keeps things snappy with LRU (Least Recently Used) and TTL (Time-To-Live) caching.
Context Manager: Carefully optimizes what the LLM sees to maximize context window efficiency.
Continuous Learning System (CLS): Learns new skills dynamically while strictly preserving privacy.

🥞 Memory Layers

Siyarix categorizes memory into three distinct, robust layers.

Note

This layered approach ensures that fast, ephemeral data lives in RAM, critical operations persist safely to disk, and shareable insights can be effortlessly exported.

┌────────────────────────────────────────────────────────────┐
│                  ⚡ In-Memory (Session Runtime)            │
│                                                            │
│  ┌──────────────┐  ┌──────────────────┐  ┌──────────────┐ │
│  │ Knowledge    │  │ MemoryManager    │  │ Context      │ │
│  │ Graph        │  │ (semantic memory │  │ Manager      │ │
│  │ (entities,   │  │  + embeddings)   │  │ (window      │ │
│  │  relations)  │  │                  │  │  build/      │ │
│  │              │  │                  │  │  compress)   │ │
│  └──────────────┘  └──────────────────┘  └──────────────┘ │
│                                                           │
│  ┌──────────────┐  ┌──────────────────┐  ┌──────────────┐ │
│  │ CacheManager │  │ Conversation     │  │ Continuous   │ │
│  │ (LRU + TTL)  │  │ History (deque)  │  │ Learning     │ │
│  │              │  │ Session Messages │  │ System (CLS) │ │
│  │              │  │ maxlen=300)      │  │ (skill cache)│ │
│  └──────────────┘  └──────────────────┘  └──────────────┘ │
├────────────────────────────────────────────────────────────┤
│                  💾 SQLite (Persistent)                    │
│                                                            │
│  ┌──────────────┐  ┌──────────────────┐  ┌──────────────┐ │
│  │ OfflineStore │  │ Continuous       │  │ ProviderState│ │
│  │ (scans,      │  │ Learning System  │  │ Manager      │ │
│  │  findings,   │  │  .db)            │  │ (cooldown,   │ │
│  │  plans)      │  │                  │  │  failures)   │ │
│  └──────────────┘  └──────────────────┘  └──────────────┘ │
├────────────────────────────────────────────────────────────┤
│                  📄 File-Based (Export/Import)             │
│                                                            │
│  ┌──────────────┐  ┌──────────────────┐  ┌──────────────┐ │
│  │ Reports      │  │ ChatSession      │  │ Knowledge    │ │
│  │ (MD/HTML/    │  │ Exports          │  │ Graph JSON   │ │
│  │  JSON/SARIF) │  │ (JSONL tree fmt, │  │ Export       │ │
│  │              │  │  PDF, TXT, MD)   │  │              │ │
│  └──────────────┘  └──────────────────┘  └──────────────┘ │
│                                                           │
│  ┌──────────────┐  ┌──────────────────┐                   │
│  │ SessionKernel│  │ Tool Failure     │                   │
│  │ (JSON files) │  │ State            │                   │
│  │              │  │ (tool_failures   │                   │
│  │              │  │  .json)          │                   │
│  └──────────────┘  └──────────────────┘                   │
└────────────────────────────────────────────────────────────┘

🕸️ 1. KnowledgeGraph

Located at siyarix/knowledge_graph.py, the KnowledgeGraph is a dynamic, in-memory directed graph. It maps out all discovered infrastructure entities and their intricate relationships.

Tip

Think of this as the "brain's map" of the target environment. It allows Siyarix to understand that a specific vulnerability lives on a service, which in turn runs on a particular host.

🟢 Node Types

Node	Attributes	Example
`HOST`	IP, hostname, OS, MAC	`10.0.0.1`
`PORT`	Number, protocol, state	`80/tcp open`
`SERVICE`	Name, version, banner	`Apache 2.4.41`
`VULNERABILITY`	CVE ID, severity, CVSS	`CVE-2024-1234`
`DOMAIN`	FQDN, registrar, DNS	`example.com`
`CREDENTIAL`	Username, type, hash	`admin:$2y$10$...`
`FINDING`	Tool, description, ref	Nmap finding

🔗 Edge Types

Edge	Source → Target	Meaning
`RUNS_ON`	Service → Host	Service runs on host
`HAS_PORT`	Host → Port	Host has open port
`HAS_VULN`	Service → Vulnerability	Service has vulnerability
`RESOLVES_TO`	Domain → Host	Domain resolves to IP
`USES_CRED`	Service → Credential	Service uses credential
`RELATED_TO`	Finding → Finding	Related findings

🛠️ Key Operations

Pathfinding: BFS (Breadth-First Search) to find the shortest path between any two entities.
Advanced Querying: Extract subgraphs by node type, attribute, or relationship.
Real-time Parsing: Instantly inserts new nodes and edges directly from tool parser outputs.
Persistence: Easily export/import state via JSON (save_json / load_json) so no context is lost between sessions.

🧠 2. MemoryManager

Located at siyarix/memory.py, the MemoryManager handles our semantic, long-term memory utilizing vector embeddings.

Important

Semantic memory empowers Siyarix to recall past learnings contextually, rather than relying on exact keyword matches.

💡 Core Methods

memory = MemoryManager()

# Store a new memory with rich metadata
await memory.store(
    content="Host 10.0.0.1 has Apache 2.4.41 running on port 80",
    metadata={"source": "nmap", "session_id": "sess-123"}
)

# Search for related concepts
similar = await memory.search_similar("Apache versions", top_k=5)

# Grab all relevant context for a specific target
context = await memory.get_context(target="10.0.0.1")

Method	Purpose
`store(content, metadata)`	Saves a new memory entry into the semantic vault.
`search_similar(query, top_k)`	Uses embeddings to find the most conceptually similar memories.
`get_context(target)`	Retrieves a consolidated background context for a given target.

🗜️ 3. Context Manager

Located at siyarix/context.py, the Context Manager is the gatekeeper for the LLM. It intelligently builds, compresses, and optimizes the context window so the LLM gets precisely what it needs without overflowing its token budget.

context = ContextManager(memory=memory_manager)

# Log conversation history
context.add_history("User message", "user")
context.add_history("Assistant response", "assistant")

# Build the perfectly sized context payload
history = context.get_history()
context = context.build_context(
    conversation_history=history,
    knowledge_subgraph=relevant_entities,
    session_state={"mode": "autonomous", "target": "10.0.0.1"},
    tool_availability=available_tools,
    memory_entries=relevant_memories,
    max_tokens=8192,
)

🗜️ Compression via CompactionEngine

When context gets too large, the CompactionEngine (siyarix/compaction.py) steps in to aggressively yet safely compress the payload.

Warning

Failing to compress context effectively can lead to LLM truncation errors and hallucinations. The CompactionEngine prevents this.

compactor = CompactionEngine()
tokens = compactor.analyze_tokens(raw_context)
compressed = compactor.compress_context(raw_context, target_tokens=4096)

Strategy	Description	Token Reduction
Truncation	Drops the oldest, least relevant conversation turns.	20–40%
Summarization	Uses the LLM to summarize older history blocks.	40–60%
KG Pruning	Retains only high-severity or immediately related graph entities.	30–50%
Memory Prioritization	Filters out memories falling below a calculated importance threshold.	50–70%
Deduplication	Strips out redundant tool outputs.	10–20%

💬 4. ChatSession

Located at siyarix/chat/session.py, the ChatSession manages conversation state. It's not just a flat list—it natively supports complex branching via a JSONL tree structure.

🌿 Branching Model

Ever wanted to explore a different train of thought without breaking your current conversation? Siyarix supports conversation forks!

Session Root
  ├── Branch A (main thread)
  │   ├── Message 1
  │   ├── Message 2
  │   │   └── Branch B (forked from message 2)
  │   │       ├── Message 3
  │   │       └── Message 4
  │   └── Message 5
  └── Branch C (forked from root)
      └── Message 6

⚙️ Session Configuration

Retains a rolling window of history (maxlen=300).
Messages are robustly tracked using unique id, parent, role, content, timestamp, and branch identifiers.

📤 Export Formats

Exporting a session is as simple as calling ChatSession.export().

Format	Description
`json`	Standard JSON array of messages.
`jsonl`	Advanced JSONL tree format (perfect for reloading).
`pdf`	A polished PDF document for reporting.
`txt`	A simple, raw plain-text transcript.
`md`	Markdown transcript for beautiful rendering.
`html`	An interactive HTML document.

🎛️ 5. SessionKernel

Located at siyarix/compat.py, the SessionKernel is the master controller for overarching session state and operational tracking.

kernel = SessionKernel()
session = kernel.start(
    objective="Scan target network",
    scope="10.0.0.0/24",
    identity="operator-1",
)

# Track tactical operations
op = kernel.add_operation(session, "scan 10.0.0.1", "scan", "medium")
kernel.update_operation(session, op.operation_id, state="completed")

# Persist and Restore
path = kernel.save(session)
restored = kernel.load(session_id)

Note

Unlike other modules that use SQLite, the SessionKernel utilizes JSON-based persistence to easily track operation cards, state, mode, risk tier, and related artifacts.

Supports distinct persistence tiers: EPHEMERAL, WORKSPACE, and ORG_SHARED.

⏱️ 6. CacheManager

Located at siyarix/cache_manager.py, the CacheManager speeds up operations by temporarily holding onto frequently accessed data.

cache = CacheManager(
    max_size=1000,
    ttl=300,
    persist_path="~/.siyarix/cache.db"
)

# Easily monitor cache health
stats = cache.get_stats()
# Result: CacheStats(hits=450, misses=30, hit_rate=0.94, size=200, evictions=15)

Implements LRU (Least Recently Used) paired with strict TTL (Time-To-Live).
Optionally persists to disk to survive reboots.

🎓 7. Continuous Learning System (CLS)

Located at siyarix/learning_system.py, the Continuous Learning System is how Siyarix gets smarter over time. It organically acquires new skills by observing operator behavior.

Caution

Privacy First Guarantee: Real targets are NEVER stored. Every hostname, IP, URL, email, or hash is strictly replaced with a {target} placeholder before any data is saved.

🏗️ Key Design Principles

Separate Store: Learning data is completely isolated inside learning_store.db.
Zero Dependencies: Relies purely on the Python standard library, employing a BM25-style Jaccard similarity engine over NLP token sets.
Bayesian Confidence: Skills are rated using a Bayesian-smoothed confidence formula that factors in time decay and operational complexity.

📦 Data Models

@dataclass
class LearnedStep:
    tool: str
    command_template: str     # E.g., "nmap -sS {target}"
    description: str
    args: dict

@dataclass
class LearnedSkill:
    skill_id: str
    intent_pattern: str       # The anonymised intent
    steps: list[LearnedStep]
    confidence: float         # 0.0 to 1.0 (Bayesian-smoothed)
    usage_count: int
    success_count: int
    tokens: list[str]         # NLP tokens for rapid similarity matching
    source: str               # Origin: 'llm', 'offline', or 'inferred'

🔄 The Learning Flow

Observe: Functions like observe_llm_action() passively watch the execution.
Anonymize: Scour and scrub the data, replacing real endpoints with {target}.
Match: Run multi-tier similarity checks (≥0.60 is strong, <0.35 implies a brand new skill).
Learn: Adjust confidence, extract parameters, and merge overlapping steps.
Inject: High-confidence skills get promoted and can be executed automatically.
Maintain: Constantly prune, decay old skills, and merge redundancies.

🔌 Integration

Integrated Mode: Skills exceeding 80% confidence trigger automatic execution before the LLM is even consulted.
Offline Mode: Learned skills dramatically enhance the heuristic planner.
Synonyms: Maps human keywords to specific tools to beef up the NLP engine.

♻️ State Lifecycle

Ever wonder what happens from the moment Siyarix boots up until it safely shuts down?

🚀 Session Start
    │
    ├── Load config from ~/.siyarix/settings.toml
    ├── Initialize KnowledgeGraph (empty or restore from JSON)
    ├── Initialize MemoryManager (load persisted embeddings)
    ├── Initialize CacheManager (load disk cache)
    ├── Initialize Continuous Learning System (load skill library)
    ├── Open OfflineStore (SQLite WAL)
    ├── Open ProviderStateManager (JSON file)
    │
    ▼
🔥 Session Active
    │
    ├── KnowledgeGraph populated from tool outputs (real-time)
    ├── MemoryManager updated from tool outputs
    ├── Conversation history appended (deque maxlen=300)
    ├── Continuous Learning System passively observes execution
    ├── Findings continuously stored in OfflineStore
    ├── Commands meticulously tracked via SessionKernel
    ├── Provider state tracked (cooldowns, failures, API costs)
    ├── Cache populated/evicted via LRU + TTL strategies
    │
    ▼
🛑 Session End
    │
    ├── Save KnowledgeGraph to JSON (if configured)
    ├── Persist MemoryManager embeddings safely to disk
    ├── Save comprehensive session via SessionKernel
    ├── Flush CacheManager memory to disk
    ├── Generate polished post-session reports
    ├── Safely close all SQLite connections
    ├── Trigger CLS maintenance (prune, decay, merge)
    └── Clear ephemeral in-memory state gracefully

🧩 Integration Points

Here’s a quick-reference cheat sheet for how everything connects:

Component	Role
Context Manager	Curates and compresses the LLM context from the KG, memory, and history.
MemoryManager	Manages vector-based semantic memory.
KnowledgeGraph	Maps real-time entity relationships.
ChatSession	Houses branching conversation trees in JSONL.
SessionKernel	Masters JSON-based session persistence and restoration.
CacheManager	Disk-backed LRU + TTL caching.
OfflineStore	Persists offline scans and findings to SQLite.
OfflineQueue	Queues requests for disconnected execution.
CompactionEngine	Trims context payload to respect LLM token budgets.
Continuous Learning System	Siyarix's privacy-first evolving skill library.
ProviderStateManager	Tracks API provider health, cooldowns, and failures.
ToolCallTracker	Remembers tool failures to avoid repeated mistakes.
EventBus	Broadcasts state changes globally (e.g., `kg.updated`, `cache.evicted`).

Note

👋 Welcome to Siyarix! This is a personal passion project built by a single developer. It's currently under active development and growing fast. Expect rough edges, but lots of love! ❤️

🗺️ Siyarix Documentation Map

Welcome to the Siyarix Documentation Map! This page serves as your master compass for navigating the extensive documentation we have built for the platform.

Whether you are a brand new user, a seasoned security operator, or a developer looking to contribute to the core engine, you can find exactly what you need here.

🧭 Quick Navigation

Not sure where to start? Pick the path that best describes you:

🌱 For New Users

Just getting started? We highly recommend following these guides in order:

Installation Guide — Get Siyarix running on your machine.
Onboarding Wizard — Let our interactive wizard help you set up your API keys and environment.
Setup & Configuration — A deeper dive into customizing your setup.
Your First Run — A gentle walkthrough of your very first Siyarix command.

🛡️ For Security Operators

Ready to put Siyarix to work? Dive into our operational guides:

Interactive Chat (REPL) — Learn how to use the powerful interactive terminal.
Security Workflows — Best practices for recon, vulnerability assessment, and incident response.
Cloud & IaC Scanning — How to secure your cloud environments and infrastructure code.
Compliance Frameworks — Map your scans to SOC 2, HIPAA, ISO 27001, and more.

💻 For Developers & Contributors

Looking under the hood or wanting to write some code? Start here:

Contribution Guide — Our workflow, standards, and how you can help!
Codebase Overview — A comprehensive map of our 82+ source modules.
Testing Standards — How we ensure reliability with pytest and CI/CD.
Module Architecture — Component design and responsibilities.

📂 The Complete Documentation Tree

If you prefer to browse the raw structure, here is a complete layout of the docs/ folder:

docs/
├── 🚀 getting-started/       # Installation, onboarding, and configuration
│   ├── installation.md       # Multi-platform install (pip, brew, winget, docker)
│   ├── onboarding.md         # The interactive 11-step setup wizard
│   ├── setup.md              # Managing API keys, credentials, and settings
│   ├── first-run.md          # A walkthrough of your first session
│   ├── configuration.md      # A deep-dive into advanced settings
│   └── troubleshooting.md    # Common issues and how to fix them instantly
│
├── 📖 user/                  # Daily operations and workflows
│   ├── cli-commands.md       # Reference for 50+ CLI commands across 12 groups
│   ├── interactive-chat.md   # Mastering the AI REPL and 54+ slash commands
│   ├── security-workflows.md # Recon, vulnerability assessment, incident response
│   ├── cloud-scanning.md     # Multi-cloud security scanning (under development)
│   ├── compliance.md         # Framework mapping (SOC 2, NIST, GDPR, PCI-DSS)
│   ├── threat-intelligence.md# Integrations with OTX, NVD, and MITRE ATT&CK
│   ├── playbooks.md          # Building automated YAML-based IR playbooks
│   ├── workflow-files.md     # DAG workflow reference (programmatic API)
│   ├── reporting.md          # Multi-format report generation
│   ├── offline-registry.md   # Running without AI (Offline/Registry execution mode)
│   └── ai-workflows.md       # Advanced AI-driven autonomous operations
│
├── 💻 developer/             # Building, testing, and extending Siyarix
│   ├── codebase-overview.md  # Full module structure mapping
│   ├── contribution-guide.md # How to submit PRs and our coding standards
│   ├── module-architecture.md# Component design and responsibilities
│   ├── testing.md            # Writing tests (pytest), coverage, and CI/CD
│   └── building.md           # Packaging, distribution, and Docker builds
│
├── 🏗️ architecture/          # System design and core internals
│   ├── overview.md           # High-level data flow and layered orchestration
│   ├── ai-agent-pipeline.md  # The AgentCore reasoning and execution pipeline
│   ├── provider-abstraction.md# How we unify 26 different AI providers
│   ├── execution-engine.md   # Plan-based step orchestration
│   ├── memory-and-state.md   # Knowledge graph, session persistence, and learning
│   ├── security-model.md     # The Permission Gate, DLP, audit logging, and OPSEC
│   └── intent-routing.md     # Semantic intent classification and routing
│
├── 🧠 ai/                    # Deep dive into the AI provider & agent systems
│   ├── routing.md            # Managing 26 providers, failovers, and circuit breakers
│   ├── persona-system.md     # Overview of our 10 security personas
│   ├── agent-reasoning.md    # The Observe-Reason-Act loop and tool call repair
│   ├── tool-execution.md     # The tool registry, capability graph, and parsers
│   ├── ensemble.md           # Parallel LLM voting strategies
│   ├── multi-wave.md         # Iterative goal execution with context carry-over
│   ├── prompt-architecture.md# System prompt design and management
│   └── safety.md             # Our rigorous 8-layer hallucination mitigation system
│
├── 🛡️ security/              # Safety, ethics, and threat models
│   ├── reporting.md          # How to safely report vulnerabilities to us
│   ├── threat-model.md       # System threat model and our mitigations
│   ├── operational-security.md# TOR routing, stealth modes, and OPSEC controls
│   ├── ethical-policy.md     # Mandatory rules of engagement for all users
│   └── abuse-prevention.md   # How we prevent misuse of the AI engine
│
└── ⚖️ legal/                 # Licensing and governance
    ├── agpl-guide.md         # A plain-English overview of the AGPL-3.0-or-later license
    ├── why-agpl.md           # The philosophy behind our license choice
    ├── trademark-policy.md   # Branding and trademark guidelines
    ├── responsible-ai.md     # Our framework for ethical AI usage
    ├── disclaimer.md         # Important legal disclaimers
    └── plugin-exception.md   # The license exception for building custom plugins

📖 Key Terminology

As you read through the documentation, you might encounter some specific terms. Here is a quick cheat sheet:

Term	What It Means
Provider	The backend AI engine powering Siyarix (e.g., OpenAI, Anthropic, Ollama).
Tool	A traditional security executable installed on your system (e.g., `nmap`, `nuclei`).
Plan	A step-by-step sequence of tool commands intelligently generated by the AI.
Workflow	A hardcoded, predefined execution path (usually defined in YAML/JSON) that doesn't require AI generation.
Persona	A specialized behavioral profile given to the AI (e.g., instructing it to act specifically as a "Network Recon Specialist").
Knowledge Graph	Siyarix's internal memory where it stores findings (like IP addresses, open ports) to contextually inform future steps.

Need help finding something specific? Feel free to use the search bar at the top of the documentation site, or open a discussion on our GitHub!

architecture memory and state

🧠 Memory & State Management

🥞 Memory Layers

🕸️ 1. KnowledgeGraph

🟢 Node Types

🔗 Edge Types

🛠️ Key Operations

🧠 2. MemoryManager

💡 Core Methods

🗜️ 3. Context Manager

🗜️ Compression via CompactionEngine

💬 4. ChatSession

🌿 Branching Model

⚙️ Session Configuration

📤 Export Formats

🎛️ 5. SessionKernel

⏱️ 6. CacheManager

🎓 7. Continuous Learning System (CLS)

🏗️ Key Design Principles

📦 Data Models

🔄 The Learning Flow

🔌 Integration

♻️ State Lifecycle

🧩 Integration Points

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!