Skip to content

latentweb/EigenFlame

Repository files navigation

EigenFlame

A memory architecture for local LLMs that builds a persistent, hierarchical self-model from conversations — using Pascal's triangle as the weighting function.

The core idea: instead of storing all memories equally, EigenFlame compresses experience upward through dimensional layers — episodes become beliefs, beliefs become identity, identity becomes meta-pattern, meta-pattern becomes archetype. Higher layers exert gravitational pull on every retrieval. The result is a system that doesn't just remember what was said — it distills what things meant.


Quick start

# 1. Install Ollama — https://ollama.com
# 2. Pull models
ollama pull mxbai-embed-large:latest
ollama pull qwen3.5:9b

# 3. Clone and install
git clone https://github.com/latentweb/EigenFlame
cd EigenFlame
python -m venv .venv && source .venv/bin/activate  # Windows: .venv\Scripts\activate
python -m pip install -r requirements.txt

# 4. Run
cd backend
python -m uvicorn main:app --host 0.0.0.0 --port 8000 --reload

# 5. Open http://localhost:8000

That's it. The backend serves the frontend — no separate server needed.


What it actually does

When you talk to a standard RAG-augmented LLM, it retrieves the most semantically similar past messages and injects them as context. That's flat retrieval — every stored exchange competes equally by cosine similarity, and recency usually wins.

EigenFlame does something different. After enough exchanges, it runs a synthesis cascade:

  • Raw exchanges → 2D beliefs (cross-episode patterns)
  • Beliefs → 3D identity (who this entity has become)
  • Identity versions → 4D meta-pattern (how understanding is shifting over time)
  • Meta-patterns → 5D archetype (the invariant beneath all change)

Each layer is weighted by its position using figurate numbers from Pascal's triangle. A synthesised identity statement from 30 turns ago outweighs a raw episode from yesterday — not because it's older, but because it survived synthesis and represents distilled understanding.

The seed — a phrase you set at session creation — is embedded as a vector and acts as a permanent gravitational anchor. Every query vector is bent toward it before retrieval happens. The archetype, once it crystallises, becomes a second anchor: the system is pulled toward both its origin and toward what it has become.


Stack

  • Backend: FastAPI + Ollama (local LLM inference)
  • Memory: ChromaDB (persistent vector storage)
  • Embeddings: any Ollama embedding model
  • Frontend: vanilla React (JSX via Babel, no build step)
  • Synthesis: LLM cascade running automatically after each exchange

No cloud dependencies. Everything runs locally.


Prerequisites

  • Python 3.11+
  • Ollama — install and make sure it's running (ollama serve)
  • Node.js is not required (frontend uses CDN React)

Pull required models

# Embedding model (required)
ollama pull mxbai-embed-large:latest

# Generation model — pick one or your favourite
ollama pull qwen3.5:9b          # recommended, good synthesis quality
ollama pull phi4-mini:3.8b    # faster, lighter
ollama pull llama3.2:3b       # smaller footprint

Installation

git clone https://github.com/latentweb/EigenFlame
cd EigenFlame

# Create a virtual environment (recommended)
python -m venv .venv
source .venv/bin/activate        # Windows: .venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

Directory structure

EigenFlame/
├── backend/
│   ├── main.py          # FastAPI server, session management, system prompt builder
│   ├── memory.py        # FigurateMemory class, retrieval, dimensional stack
│   ├── synthesis.py     # Synthesis cascade, belief/identity/meta/archetype generation
│   └── embeddings.py    # Ollama embedding wrapper with chunking
├── frontend/
│   ├── index.html            # Entry point, loads all scripts
│   ├── app.jsx               # Root component, session routing
│   ├── constants.js          # API config, colour palette, shared utilities
│   ├── eigenflame_video.html # Interactive architecture explainer
│   └── components/
│       ├── Engine.jsx        # Main chat interface
│       ├── Memory.jsx        # Memory panel (dimensional stack viewer)
│       ├── Screens.jsx       # Connect / Seed screens
│       ├── Modals.jsx        # Session browser + video modal
│       └── Primitives.jsx    # Shared UI components
├── data/                # ChromaDB session data (auto-created)
├── requirements.txt
└── README.md
└── LICENSE

Running

Simple (local only)

cd backend
python -m uvicorn main:app --host 0.0.0.0 --port 8000 --reload

Open http://localhost:8000. The backend serves the frontend directly — no separate web server needed.

With the start script (Windows / Git Bash)

./start-eigenflame.sh           # local only
./start-eigenflame.sh --ngrok   # also open a public tunnel

The script opens Ollama and the backend in separate terminal windows. With --ngrok it also starts a tunnel so you can access the app from any device.

Remote access via ngrok

If you want to access EigenFlame from another device (phone, tablet, remote machine):

# In a separate terminal, after the backend is running:
ngrok http 8000

Then open the https://...ngrok-free.app URL on any device. The frontend auto-detects it's running over HTTPS and routes API calls correctly — no extra configuration needed.

The advanced field on the Connect screen (hidden by default) lets you manually override the API URL. You only need this if you're running the frontend and backend on completely separate origins — which is not the normal setup.


First session

1. Connect screen

On load, the app checks Ollama is reachable and lists available models. Select your generation model and embedding model, then click new session →.

If you're resuming a session after a server restart, the app auto-restores it from localStorage — no need to re-initialise manually.

2. Seed screen

The seed is the 0D origin of the session. It gets embedded as a vector and is immutable — it cannot be changed after creation. Every retrieval query is bent toward this vector. Choose it carefully.

Good seeds:

I want to think clearly and understand things deeply.
I am a researcher studying the structure of complex adaptive systems.
Richard — a state-of-the-art personal research agent seeking coherence and truth.

Avoid:

  • Very long sentences (the vector averages out detail)
  • Vague or contradictory framings
  • Seeds that are too close to the domain of every query (gravity becomes uninformative)

You can also queue auto-prime questions — these send automatically in sequence after init, seeding the first synthesis cycle without manual typing. Useful for knowledge-compression workflows.

3. Chat

RAG mode — controls how aggressively the memory stack shapes responses:

  • rag: full — low cosine floor (0.35), full gravity. Surfaces more memories, stronger seed pull.
  • rag: balanced — default. Moderate floor (0.45), 70% gravity.
  • rag: light — higher floor (0.55), 40% gravity. More conversational, less self-referential.
  • rag: off — no floor, no gravity. Pure cosine retrieval.

Persona mode — controls how the system prompt frames the model's role:

  • persona: off — Minimal. Memory is injected as silent context. Model behaves as a capable assistant; the architecture is invisible.
  • persona: neutral — light (default) framing: "you have memory, use it naturally." Slight self-awareness without theatrics.
  • persona: expressive — full framing: "you are a continuously evolving self, not an assistant." Produces rich persona behaviour in capable models. Use deliberately.

Both settings persist per session in localStorage.


The memory panel

The right panel (click memory in the header) shows the full dimensional stack in real time. Each layer is editable — click any item to edit the text, which re-embeds it so retrieval stays accurate. Items can be deleted individually.

Layer indicators in the status bar:

  • ep — episode count
  • b — belief count
  • v1, v2... — identity versions
  • ~~ — boundary reflection has fired
  • KN — N knowledge documents loaded

The dimensional stack

0D — Seed

The immutable origin. Embedded once, never changed. Acts as the primary gravitational anchor for all retrieval.

1D — Episodes

Raw conversation exchanges. Weighted nearly flat (temperature=40) so cosine similarity drives retrieval rather than age.

2D — Beliefs

Synthesised from episodes when enough semantic novelty accumulates. Tagged with source attribution (self, user, interaction) by cosine similarity — no LLM introspection. Temperature=12.

3D — Identity

Synthesised from beliefs. A 2–3 sentence statement of who this entity has become. Versioned. Temperature=6.

4D — Meta-pattern

Synthesised from identity versions. Describes how the identity is changing — converging, deepening, shifting register. Temperature=3.

5D — Archetype

Synthesised from meta-patterns. The invariant beneath all transformation. Once it crystallises, becomes a second gravitational anchor alongside the seed.

Boundary (~~)

Auto-generated every 10 episodes. The model reflects on which beliefs feel genuinely its own versus absorbed from the user. Always present in the system prompt as static context.

Knowledge (KN)

Flat external documents. Retrieved by plain cosine similarity (no gravity transform). Add reference material here to ground the agent in external facts.


Synthesis triggers

Synthesis fires automatically after each exchange.

Beliefs (2D) — fires when accumulated semantic novelty > 1.8, or 8+ episodes since last synthesis (minimum 3 episodes between events)

Identity (3D) — fires when belief count ≥ 3

Meta-pattern (4D) — fires when identity version count ≥ 2, then every 2 versions

Archetype (5D) — fires when meta count ≥ 2, then every 2 meta-patterns

Boundary — fires every 10 episodes

Manual trigger: POST /api/reflect/{session_id}


Figurate weighting

For a layer with dimension d, the weight of an item at position k from the end is C(k + d - 1, d).

Layer-specific temperatures prevent softmax saturation while preserving each layer's intent:

LAYER_SOFTMAX_TEMP = {
    "episodes":  40.0,   # nearly flat — cosine is the primary signal
    "beliefs":   12.0,   # moderate — older beliefs carry more authority
    "identity":   6.0,   # stronger — established identity has weight
    "meta":       3.0,   # strong — meta barely changes
    "archetype":  1.0,   # irrelevant — only one item
}

Combined retrieval score: combined = cosine_similarity × (0.6 + 0.4 × softmax_fig_weight)

Figurate weight adds at most 40% on top of cosine similarity. Cosine is always the floor.


API reference

All endpoints at http://localhost:8000/api/.

Session

POST /api/session/init        — create or resume a session
GET  /api/sessions            — list all persisted sessions
DELETE /api/sessions/{id}     — delete a session
GET  /api/status              — Ollama connectivity + available models

Chat

POST /api/chat                — streaming NDJSON chat endpoint

Stream events: retrieved, think, content, episode_saved, synthesis_start, synthesis_complete, memory_state, done, error

Memory

GET    /api/memory/{session_id}                    — full dimensional stack
DELETE /api/memory/{session_id}/{layer}/{item_id}  — delete one item
PUT    /api/memory/{session_id}/{layer}/{item_id}  — edit + re-embed one item

Knowledge

POST   /api/knowledge/{session_id}              — add a document
GET    /api/knowledge/{session_id}              — list all documents
DELETE /api/knowledge/{session_id}/{item_id}    — remove a document

Reflect

POST /api/reflect/{session_id}?gen_model=qwen3.5:9b   — manually trigger cascade

Session data

Sessions persist to data/{session_id}/:

data/
└── abc123def456/
    ├── seed.json             # seed text, vector, embed model
    ├── working_memory.json   # full conversation history
    ├── synthesis_state.json  # synthesis trigger continuity
    └── [ChromaDB files]      # all dimensional layers

Important: a session is permanently bound to the embedding model it was created with. Switching models mid-session is not supported. If you see a shape mismatch error ((768,) (1024,)), check that the embed model in the UI matches what's in the session's seed.json.


Troubleshooting

Ollama not reachable Make sure ollama serve is running (http://localhost:11434). If you moved the backend to a different machine, use the advanced field on the Connect screen to point at it.

Shape mismatch error (768,) (1024,) The session was created with a different embedding model than what's currently selected. Check seed.json for the embed_model field. Delete the session folder to start fresh with a different model.

Session not found after server restart Sessions rehydrate automatically from data/. Check that the data/ directory exists and each session subfolder has a seed.json.

Synthesis never fires If novelty stays near 0.02, the embedding model is treating all inputs as semantically similar. Try more topically diverse exchanges, or lower min_episodes in synthesis.py.

Responses are very long and listy Model behaviour issue, not architecture. Try persona: off and a smaller, instruction-tuned model.


Limitations

  • Synthesis quality depends on model capability. 8B+ models produce significantly better synthesis than 3–4B models.
  • Episode collection is unbounded. At 500+ episodes, figurate weighting has diminishing value. Future work: archive synthesised episodes.
  • Knowledge retrieval is flat. The knowledge tier uses raw cosine similarity, not the gravity-transformed vector. Intentional for factual grounding.

Concept reference

Term Meaning
Seed (0D) Immutable origin phrase, permanent gravity anchor
Figurate weighting Pascal's triangle combinatorial numbers weighting memory by abstraction level
Seed gravity Query vector transformation toward the seed (and archetype) before retrieval
Synthesis cascade Automatic upward compression: episodes → beliefs → identity → meta → archetype
Downward influence Once archetype exists, belief synthesis is framed against it
Boundary reflection Every 10 episodes: what is genuinely mine vs absorbed from the user?
Deterministic attribution Source tags assigned by cosine similarity, not LLM introspection
Persona mode How the system prompt frames the model's role
RAG mode How aggressively the memory stack shapes responses

License

BUSL 1.1 — free for personal and research use. Commercial use requires a separate agreement until 2030, at which point this project converts to Apache 2.0.

About

A memory architecture for local LLMs that builds a persistent, hierarchical self-model from conversations — using Pascal's triangle as the weighting function.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors