An autonomous agent that ingests human knowledge, follows contradictions without flinching, and builds a model of what the data actually points to β not what consensus says.
KAE is a self-directing CLI agent that:
- Chooses its own starting point β no human bias injected at seed (unless you want to)
- Ingests knowledge from Wikipedia, arxiv, Project Gutenberg, and the open web
- Thinks visibly β DeepSeek R1's
<think>blocks stream live to your terminal in real time - Builds a knowledge graph β concepts as nodes, relationships as edges, weighted by evidence
- Flags anomalies β where mainstream consensus goes silent, contradicts itself, or suspiciously avoids a thread
- Generates a living report β builds as it runs, saves automatically on exit
The hypothesis: if you feed it everything and let it run unbiased, it arrives at the same place the outliers, mystics, and fringe researchers already are. But this time with receipts.
KAE (Knowledge Archaeology Engine)
βββ ingests broad human knowledge (Wikipedia, arXiv, Gutenberg, web)
βββ builds a knowledge graph β nodes, edges, anomalies
βββ embeds and deposits into Qdrant β kae_chunks (text) + kae_nodes (graph)
KAE LENS
βββ event-driven: reads kae_chunks, fires when new points appear
βββ adaptive density assessment β variable search width
βββ LLM reasoning (DeepSeek R1 / Gemini Flash via OpenRouter)
βββ writes findings back to Qdrant β kae_lens_findings
βββ live dashboard: TUI (Bubbletea) + Web (SSE, port 8080)
KAE ANALYZER
βββ CLI for post-run inspection: runs, anomalies, convergence, search, export
KAE FORENSICS
βββ scans any Qdrant collection for data quality anomalies
βββ detects missing payload fields and zero-magnitude (un-embedded) vectors
βββ repairs in place: re-embeds via OpenAI and upserts corrected vectors
βββ dry-run by default; --repair to apply fixes
KAE MCP SERVER
βββ exposes KAE + Qdrant to any MCP-compatible AI assistant
- Go 1.22+
- At least one LLM provider API key (see table below)
- Docker (optional β for Qdrant vector memory via
setup.sh)
# Install Go on WSL2/Ubuntu
sudo apt install golang-go
# Verify
go versionKAE supports five backends via a unified provider:model syntax:
| Provider | Prefix | Key env var |
|---|---|---|
| OpenRouter | openrouter: (default, bare names also work) |
OPENROUTER_API_KEY |
| Anthropic | anthropic: |
ANTHROPIC_API_KEY |
| OpenAI | openai: |
OPENAI_API_KEY |
| Google Gemini | gemini: |
GEMINI_API_KEY |
| Ollama (local) | ollama: |
OLLAMA_URL (optional, defaults to localhost:11434) |
# Clone or copy the project
cd kae
# Run the setup script β installs Go deps, builds binary, starts Qdrant v1.17.1 via Docker
./setup.sh
# Copy the generated .env and fill in your keys.env reference:
# At least one provider key is required
OPENROUTER_API_KEY=your_key_here
ANTHROPIC_API_KEY=your_key_here
OPENAI_API_KEY=your_key_here
GEMINI_API_KEY=your_key_here
# Local Ollama β defaults to http://localhost:11434
OLLAMA_URL=http://localhost:11434
# Optional β Qdrant vector memory (setup.sh starts this automatically)
QDRANT_URL=http://localhost:6333
# Optional β real semantic embeddings via any OpenAI-compatible endpoint
# Without these, KAE falls back to feature hashing (fast, no API needed)
EMBEDDINGS_URL=https://api.openai.com
EMBEDDINGS_KEY=your_openai_key_here
EMBEDDINGS_MODEL=text-embedding-3-small
# Optional β CORE open-access full-text papers (core.ac.uk/services/api)
# Without this key CORE is silently skipped; all other sources still run
CORE_API_KEY=your_core_key_here# Fully autonomous β agent picks its own seed
go run .
# Seed it yourself
go run . --seed "observer effect"
# Use any provider:model
go run . --model "anthropic:claude-opus-4-6"
go run . --model "openai:gpt-4o"
go run . --model "gemini:gemini-2.5-flash"
go run . --model "ollama:llama3.1"
# Ensemble mode β fan out to multiple providers, measure disagreement
go run . --ensemble --models "anthropic:claude-opus-4-6,openai:gpt-4o,gemini:gemini-2.5-flash"
# Auto-stop when the graph stagnates (no new nodes for N cycles)
go run . --novelty-threshold 0.05 --stagnation-window 5
# Auto-restart on stagnation β saves report then starts a fresh run automatically
go run . --auto-restart
go run . --auto-restart --seed "consciousness" # keep the same seed each restart
go run . --auto-restart --headless # headless + auto-restart (good for overnight runs)
# Auto-branch on high model controversy
go run . --ensemble --models "..." --branch-threshold 0.7 --max-branches 4
# Cross-run meta-analysis β find "convergent heresies" across past runs
go run . --analyze --min-runs 3
# Tier 2: show attractor concepts (emerged in 3+ independent runs)
go run . --attractors --attractor-min-runs 3
# Tier 2: domain bridge/moat analysis from persistent meta-graph
go run . --domain-analysis
# Tier 2: skip updating the meta-graph for this run
go run . --no-meta-graph --seed "quick test"
# Tier 2: citation crawl β automatically fires when a high-anomaly concept is detected
# Fetches suppressed lineages from Semantic Scholar and expands their citation chains
go run . --citation-threshold 0.6 # only crawl when anomaly score >= 0.6 (default 0.5)
go run . --no-cite-crawl # disable citation crawl entirely
# Limit cycles
go run . --cycles 50
# Resume from previous graph snapshot
go run . --resume-graph graph_snapshot.json --cycles 25
# Save current graph snapshot on exit
go run . --save-graph graph_snapshot.json
# Search across all previous runs (default: isolated to current run)
go run . --shared
# Headless mode (no TUI β for scripts and MCP)
go run . --headless --seed "consciousness" --cycles 5
# Debug mode (tail -f debug.log in a second terminal)
go run . --debug
# Build a binary
go build -o kae .
./kae --seed "consciousness"ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β π§ KNOWLEDGE ARCHAEOLOGY ENGINE βΈ THINKING focus: observer β
β nodes: 247 edges: 891 anomalies: 34 cycle: 12 β
β ββββββββββββββββββββββββββ¦βββββββββββββββββββββββββββββββββββββββββ£
β π THINKING β π EMERGENT CONCEPTS β
β β β
β The observer effect β 1. consciousness β
β implies that the act β 2. quantum_field β
β of measurement itself β 3. [ANOMALY] observer_effect β
β collapses the wave β 4. vedic_akasha β
β function. But physics β 5. zero_point_field β
β refuses to define β βββββββββββββββββββββββββββββββββββββββββ£
β what an "observer" β π LIVE REPORT β
β actually is... β β
β ββββββββββββββββββββββββββ£ ## Cycle 12 β 14:32:07 β
β β‘ OUTPUT β Nodes: 247 | Edges: 891 β
β β β
β CONNECTIONS: quantum β Emergent concepts: β
β field | vedic_akasha β - consciousness (weight: 18.4) β
β | zero_point_field β - observer_effect β (weight: 14.2) β
β ANOMALY: mainstream β - quantum_field (weight: 11.8) β
β physics avoids... β β
β NEXT: akashic field β β
βββββββββββββββββββββββββββ©βββββββββββββββββββββββββββββββββββββββββ
q / ctrl+c β quit gracefully | report saves automatically
Panels:
- π THINKING β R1's raw
<think>reasoning, streamed live in blue - β‘ OUTPUT β The agent's structured conclusions and connections
- π EMERGENT CONCEPTS β Top-weighted nodes in the knowledge graph, updated each cycle
- π LIVE REPORT β The growing synthesis document, builds automatically
KAE Lens is an autonomous post-processing layer that fires when KAE deposits new knowledge into Qdrant. It reasons over the full topology of the ingested graph and surfaces connections, contradictions, clusters, and anomalies that KAE never explicitly made. For anomalies and contradictions, it runs a second focused LLM pass to produce a data-grounded correction from the actual source evidence.
cd kae-lens
# Start Qdrant (if not already running)
make qdrant-up
# Configure β Lens picks up your existing KAE .env keys automatically:
# OPENROUTER_API_KEY (required), OPENAI_API_KEY (optional, falls back to OpenRouter)
# Build and run
make build
make run-lens
# TUI in terminal + web dashboard at http://localhost:8080| Type | Correction pass | Meaning |
|---|---|---|
connection |
β | Unexpected cross-domain semantic link |
contradiction |
yes | Conflicting claims between knowledge nodes |
cluster |
β | Emergent concept group KAE never tagged |
anomaly |
yes | Outlier breaking mainstream consensus |
When a correction is produced it is stored on the finding and shown in the TUI trace panel (enter to expand).
Every finding also carries a source URL map (point_id β URL) built from the HTTP(S) source URLs of the chunks that were in scope. Source links appear in the TUI SOURCES block, as clickable links in the web dashboard, and URLs in HTML reports are auto-linkified.
Lens adjusts its search width to local vector density so sparse regions get wide nets and dense regions get tight focused ones:
| Density | Nearby Points | Width | Threshold |
|---|---|---|---|
| very_sparse | 0 | 50 | 0.60 |
| sparse | 1β10 | 40 | 0.60 |
| medium | 11β50 | 20 | 0.65 |
| dense | 51β200 | 12 | 0.70 |
| very_dense | 200+ | 6 | 0.70 |
Lens findings are themselves embedded and stored in kae_lens_findings β a future pass can run Lens against its own findings, building third-order knowledge structures.
A standalone CLI for inspecting KAE runs stored in Qdrant.
cd kae-analyzer
go build -o kae-analyzer .
kae-analyzer runs # List all runs
kae-analyzer analyze --run-id run_1775826869 # Analyze a specific run
kae-analyzer compare --runs run_123,run_456 # Compare runs for convergence
kae-analyzer anomalies --min-weight 4.0 # Find high-weight anomalies
kae-analyzer search --query "pseudo-psychology" # Search concepts
kae-analyzer convergence --seed pseudopsychology # Analyze convergence patterns
kae-analyzer stats # Overall statistics
kae-analyzer export # Export analysis to JSONA data quality tool for auditing and repairing Qdrant collections β catches points with missing payload fields or zero-magnitude vectors (never embedded) and fixes them in place.
cd kae-forensics
go build -o kae-forensics .
./kae-forensics # dry-run: scan and report anomalies
./kae-forensics --repair # re-embed and upsert corrected vectorsChecks performed:
- Weak vector β magnitude < 0.01 (un-embedded or corrupted); repaired by re-embedding the
documentpayload via OpenAItext-embedding-3-small - Missing
source_materialβ payload field absent; flagged for review
Requires OPENAI_API_KEY when running with --repair. The collection name and gRPC address are configured at the top of main.go.
Exposes KAE and Qdrant to any MCP-compatible AI assistant (Claude, Cursor, etc.).
cd mcp
go build -o kae-mcp .
./kae-mcpAvailable tools:
| Tool | Description |
|---|---|
qdrant_collections |
List all Qdrant collections with vector counts |
qdrant_list_runs |
List all KAE runs with node and anomaly counts |
qdrant_top_nodes |
Get highest-weight emergent concept nodes, optionally filtered by run |
qdrant_search_chunks |
Keyword search over ingested source passages |
qdrant_compare_runs |
Compare runs for independently converging concepts |
kae_start_run |
Start a new KAE run in headless mode, returns the report |
kae_meta_attractors |
Show attractor concepts from the persistent meta-graph (Tier 2) |
kae_domain_analysis |
Show domain bridges and moats from the meta-graph (Tier 2) |
kae/
βββ main.go # Entry point, CLI flags
βββ go.mod # Dependencies
βββ setup.sh # Start Qdrant (v1.17.1) + build binary
βββ internal/
β βββ config/
β β βββ config.go # Config loader (env vars + .env) β all provider keys
β βββ llm/
β β βββ provider.go # Provider interface (Stream, ModelName) + Chunk/Message types
β β βββ factory.go # NewProvider("provider:model", keys) β routes to backend
β β βββ client.go # OpenRouter streaming client (satisfies Provider)
β β βββ anthropic.go # Native Anthropic API β SSE streaming, adaptive thinking
β β βββ openai.go # Native OpenAI API
β β βββ gemini.go # Google Gemini API β SSE, thought parts
β β βββ ollama.go # Local Ollama β NDJSON streaming
β β βββ compat.go # Shared OpenAI-compatible SSE helper
β βββ ensemble/
β β βββ ensemble.go # Fan-out to N providers; controversy scoring; dissenter detection
β βββ runcontrol/
β β βββ controller.go # Novelty decay tracking; auto-stop; branch triggering
β βββ anomaly/
β β βββ cluster.go # Cosine-similarity clustering of Qdrant anomaly nodes
β β βββ reporter.go # Markdown report generator for meta-analysis
β βββ graph/
β β βββ graph.go # Thread-safe knowledge graph (nodes, edges, anomalies)
β βββ embeddings/
β β βββ embedder.go # APIEmbedder (OpenAI-compat) or HashEmbedder fallback
β βββ store/
β β βββ qdrant.go # Qdrant REST client β upsert, search, collections
β β βββ scroll.go # Scroll API β FetchAnomalyNodes for meta-analysis
β βββ agent/
β β βββ engine.go # Core agent loop β ensemble, run controller, provider routing
β βββ ingestion/
β β βββ wiki.go # Wikipedia ingestion
β β βββ arxiv.go # arxiv paper ingestion
β β βββ gutenberg.go # Project Gutenberg β gutendex API + formats map
β βββ ui/
β βββ app.go # Bubbletea TUI β 4-panel layout
βββ kae-lens/ # Autonomous post-processing intelligence layer
β βββ cmd/lens/main.go # Lens binary entry point
β βββ config/lens.yaml # Configuration
β βββ internal/
β β βββ lens/
β β β βββ watcher.go # Polls Qdrant for unprocessed KAE points
β β β βββ density.go # Adaptive search width by local vector density
β β β βββ reasoner.go # Core agent loop
β β β βββ synthesizer.go # LLM reasoning β findings JSON
β β β βββ writer.go # Embeds and upserts findings to kae_lens_findings
β β β βββ tui/ # Bubbletea terminal dashboard
β β β βββ web/ # HTTP + SSE web dashboard (port 8080)
β β βββ llm/ # OpenRouter + OpenAI client
β β βββ qdrantclient/ # Qdrant gRPC client helpers
β βββ collections/ # Qdrant payload schemas
βββ kae-analyzer/ # Post-run analysis CLI
β βββ main.go # runs, analyze, compare, anomalies, search, convergence, export
βββ kae-forensics/ # Data quality auditor and repair tool
β βββ main.go # Scans for weak vectors / missing fields; --repair re-embeds in place
βββ mcp/ # MCP server for AI assistant integration
βββ main.go # JSON-RPC over stdio β 8 tools
Phase 0 SEED Agent chooses its own entry concept (or uses --seed)
Phase 1 INGEST Pulls sources on current topic (Wikipedia, arXiv, Gutenberg,
Semantic Scholar, OpenAlex, CORE*, PubMed) *requires CORE_API_KEY
Phase 2 EMBED Embeds chunks and stores them in Qdrant
Phase 3 SEARCH Retrieves semantically similar passages from vector memory
Phase 4 THINK Single model reasons visibly β you watch it think
OR
Phase 4 ENSEMBLE N models reason in parallel; controversy score computed
Phase 5 CONNECT Extracts connections, adds nodes/edges to knowledge graph
Phase 6 SCORE Contradiction scoring per topic
Phase 7 ANOMALY Scans for where consensus goes silent or contradicts itself
ββββββββββββββββΊ If anomaly score β₯ threshold: background CITATION CRAWL
(Semantic Scholar suppressed lineages + citation chain BFS)
Results queued and picked up by next cycle's INGEST phase
Phase 8 REPORT Updates the live markdown + HTML report
ββββββββββββββββΊ Novelty check β LOOP or STOP
Runs until:
- Graph novelty drops below
--novelty-thresholdfor--stagnation-windowcycles β saves report and stops (or restarts if--auto-restartis set) --cycleslimit reached- You hit
qorctrl+c(graceful save)
KAE uses two model roles, each configurable with provider:model syntax:
| Role | Default | Purpose |
|---|---|---|
Brain (--model) |
deepseek/deepseek-r1 |
Deep reasoning, visible <think> blocks, connection-making |
Fast (--fast) |
google/gemini-2.5-flash |
Bulk passes, seed selection |
Examples:
# OpenRouter (default β bare name works)
--model "deepseek/deepseek-r1"
# Anthropic native API with adaptive thinking
--model "anthropic:claude-opus-4-6"
# Local Ollama
--model "ollama:llama3.1"In ensemble mode (--ensemble), the brain role is replaced by N providers running in parallel. Each provider independently reasons over the same context; a controversy score is computed from concept-overlap disagreement (Jaccard). Topics with controversy > --branch-threshold are flagged as anomalies and can auto-trigger focus branches.
KAE uses Qdrant as optional persistent vector memory. When running, every concept node is embedded and stored β future cycles retrieve semantically similar nodes from previous sessions to ground the reasoning.
| Source | Key required | Strength |
|---|---|---|
| Wikipedia | none | Broad concept grounding |
| arXiv | none | Cutting-edge preprints (physics, AI, math) |
| Project Gutenberg | none | Ancient philosophy and primary texts |
| Semantic Scholar | none | Academic index with one-sentence tl;dr summaries |
| OpenAlex | none | Massive open index β tags works with broad scientific concepts |
| CORE | CORE_API_KEY |
World's largest open-access aggregator β full abstract density |
| PubMed | none | Biomedical and neuroscience abstracts |
All sources run in parallel each cycle. CORE is silently skipped if the key is absent.
| Setting | Detail |
|---|---|
| Version | qdrant/qdrant:v1.17.1 (pinned) |
| Collections | kae_chunks (text chunks), kae_nodes (graph), kae_meta_graph (cross-run meta-graph), kae_lens_findings (Lens findings) |
| Distance | Cosine |
| Payload indexes | domain, label (keyword, created before HNSW builds) |
kae_chunks payload |
text, source, run_topic (exploration theme), semantic_domain (content classification), domain_confidence (0β1), run_id |
| Batch size | 64 points per upsert request |
| Retry | 3 attempts, 100ms/300ms backoff |
hnsw_ef |
max(kΓ4, 64) at query time |
| Embedding fallback | Feature hashing (128-dim, no API needed) |
| Embedding (configured) | Any OpenAI-compatible endpoint β default text-embedding-3-small (1536-dim) |
| Memory isolation | Each run searches only its own chunks by default β use --shared to search across all runs |
| Network access | Binds to 0.0.0.0:6333 β accessible on LAN by default |
Qdrant is fully optional. If unavailable, the agent runs entirely in-memory with no degradation to the core loop.
- Core agent loop
- OpenRouter streaming with R1 think-block parser
- Thread-safe knowledge graph
- Bubbletea TUI
- Wikipedia, arxiv, Project Gutenberg ingestion
- Qdrant vector memory with run isolation
- Graph persistence (save/load JSON snapshots)
- Markdown + HTML report export
- Multi-provider support β Anthropic, OpenAI, Gemini, Ollama, OpenRouter via unified
provider:modelsyntax - Multi-model ensemble reasoning β parallel fan-out, controversy scoring, dissenter detection
- Novelty decay detection β auto-stop when graph stagnates; configurable threshold + window
- Auto-restart (
--auto-restart) β saves report and starts a fresh run on stagnation - Auto-branching β high ensemble controversy triggers focus branch
- Anomaly clustering β cosine-similarity grouping of anomaly nodes across runs
- Cross-run meta-analysis (
--analyze) β finds "convergent heresies" (anomalies that appear independently across multiple runs) - Headless mode (
--headless) β run without TUI for scripting and MCP integration - Expanded ingestion β Semantic Scholar, OpenAlex, CORE, PubMed alongside Wikipedia, arXiv, Gutenberg
- KAE Analyzer β standalone CLI for post-run inspection (runs, anomalies, convergence, search, export)
- KAE MCP Server β exposes KAE + Qdrant to any MCP-compatible AI assistant
- KAE Lens β autonomous post-processing layer; adaptive density reasoning; TUI + web dashboard
- Lens anomaly correction β data-grounded second LLM pass resolves anomaly/contradiction findings against source evidence
- Lens performance tuning β per-call LLM timeout, relaxed density thresholds, paced batch polling
- Source paper links β findings carry a
source_urlsmap; TUI, web dashboard, and HTML reports all surface clickable links to the originating papers - Domain contamination fix β
kae_chunksnow storesrun_topic(what the run was exploring) separately fromsemantic_domain(what the chunk is actually about); each embed batch is LLM-classified viaClassifyDomainBatch; migrate existing chunks withgo run ./scripts/migrate_domains [--dry-run]
- Persistent meta-graph (
kae_meta_graph) β cross-run concept aggregation with attractor detection; update runs synchronously after each run and reports merged/new concept counts - Citation chain excavation β BFS over Semantic Scholar citation graph; suppressed lineage detection; wired into score phase β automatically fires on high-anomaly concepts and queues results for the next ingest cycle
- Domain boundary detection β bridge concepts (cross-domain connectors) and moats (isolated domain pairs)
- Active learning / adaptive ingestion
- Self-improvement feedback loop
- Lens Pass 2 β reason over findings to build third-order knowledge structures
- Extended visualization
If you ingest enough human knowledge with no agenda,
follow contradictions instead of avoiding them,
and let an unbiased reasoner connect the dots βThe emergent model looks nothing like the textbook.
But it looks exactly like what the outliers figured out
working alone, across centuries, in every culture.That's the report we're building.
KAE v1.0 β Built in WSL2 | Go | OpenRouter Β· Anthropic Β· OpenAI Β· Gemini Β· Ollama | Qdrant v1.17.1 | Pure curiosity