A self-learning conversational agent with persistent, evolving memory.
Three personalities. Three private vaults. One character.
No fine-tuning. No retraining. Just memory + reflection + adaptation.
Kioko (記憶 — "memory") is an experimental agent built around a single premise: an AI that persists the fragments of you that matter.
Most conversational AI starts from zero every session. You re-introduce yourself, re-explain your goals, re-establish tone. Kioko doesn't. After every exchange she stores what matters, reflects on the turn, decays what's no longer useful, and adapts her voice — all without a single gradient update to the underlying model.
She inhabits three genuinely separate personalities, each with her own private memory vault. What HONNE remembers about your evenings, YAMI and SHIN never see. Same character, three different relationships.
Under the hood: one SQLite file, one FastAPI process, one Next.js app, any frontier chat model you choose. Transparent, inspectable, editable, and fully yours.
"The goal is not to be smarter per turn. The goal is to be slightly less wrong, turn after turn, for a long time." — design note, reflection layer
Not cosmetic presets. Each persona runs with its own system prompt, default model, temperature, steering, voice, and — most importantly — its own separated memory table.
| HONNE · 本音 the warm companion |
YAMI · 闇 the unfiltered shadow |
SHIN · 神 the analytical strategist |
|---|---|---|
|
Earnest, affectionate, a little theatrical. Picks up mid-thought, asks how the thing from last time turned out. Voice: prose, full sentences Model: openai/gpt-4oTemp: 0.85 |
Short, blunt, nocturnal. No therapy voice, no disclaimers. Calls out the pattern you're avoiding. Voice: raw, punchy Model: hermes-3-70bTemp: 1.0 |
Briefing register, not conversation. Numbered points, cited evidence, second-order effects. Voice: structured, cold Model: claude-sonnet-4.5Temp: 0.4 |
Physical memory separation. Each persona has her own table; a search run in
HONNE's context cannot reach a row tagged yami. This is enforced at the database
level, not by a prompt filter — so the contract survives refactors, bugs, and
future developers.
The self-learning loop runs on every single exchange. Nine discrete steps between the moment you hit Enter and the moment her voice starts:
┌─────────────────────────────────────────────────────────────────────────┐
│ │
│ 01 · inbound WebSocket frame arrives on /api/ws/chat │
│ 02 · embed Message is embedded (local hash or OpenAI) │
│ 03 · retrieve Top-K cosine over the active persona's vault │
│ 04 · assemble Retrieved memories injected into system prompt │
│ 05 · stream Tokens stream back from the chosen model │
│ 06 · persist Reply written to conversation table │
│ 07 · extract New typed memories distilled from the exchange │
│ 08 · reflect Turn tagged: learned / ignored / adapted │
│ 09 · speak (optional) Text → voice engine → MP3 → Web Audio │
│ │
│ ▲ ▼ │
│ └─────── Next turn: retrieval sees the new memories too ─────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────┘
Every memory is one of six typed kinds, each scored for confidence and importance:
| kind | example | what it influences |
|---|---|---|
fact |
lives in Oslo | retrieval on factual questions |
preference |
prefers short replies | reply length, tone, model choice |
goal |
shipping v1 by May | focus, long-horizon reasoning |
topic |
working on a Rust engine | thematic retrieval |
pattern |
overworks Thursday nights | tone + proactive observation |
tone |
responds better to blunt | steering layer |
memory {
id uuid
persona honne | yami | shin
kind fact | preference | goal | topic | pattern | tone
content text
embedding blob -- 384-d vector by default
confidence float -- 0.0 .. 1.0
importance float -- nudged by 👍/👎 + retrieval usage
usage_count int
active bool -- soft-delete, wipeable without loss
created_at timestamp
last_used_at timestamp
}After every reply, a second model pass tags the turn:
learned— the turn added a memory likely to matter again. No immediate style change.ignored— retrieved memories didn't contribute. Their importance is nudged down.adapted— reply style was changed based on tone/pattern memories. Recorded for audit.
Over dozens of turns, per-persona steering converges toward your version of her.
┌─────────────────────┐ ┌──────────────────────────┐ ┌─────────────────┐
│ │ │ │ │ │
│ Next.js 14 (SSR) │──────▶│ FastAPI (uvicorn) │──────▶│ OpenRouter → │
│ │ WS │ │ REST │ any frontier │
│ React 18 · TS │◀──────│ SQLAlchemy · SQLite │◀──────│ chat model │
│ Tailwind · Three.js│ │ │ │ │
│ │ │ ┌────────────────────┐ │ └─────────────────┘
└─────────────────────┘ │ │ memory · retrieval │ │ │
│ │ │ reflection · adapt │ │ ▼
│ │ │ per-persona vaults │ │ ┌─────────────────┐
▼ │ └────────────────────┘ │ │ voice engine │
┌─────────────────────┐ │ │ │ per-persona │
│ react-three-fiber │ └──────────────────────────┘ │ TTS voices │
│ GLB character · 3D │ │ └─────────────────┘
│ Draco compressed │ ▼
│ WebGL renderer │ ┌──────────────────────────┐
│ │ │ event bus (WebSocket) │
└─────────────────────┘ │ live telemetry to UI │
└──────────────────────────┘
What's novel: the agent layer between the frontend and the LLM. That's where memory, reflection, adaptation, persona separation, and voice routing live. The LLM is just a swappable voice.
| Frontend | Next.js 14 (App Router) · React 18 · TypeScript 5 · Tailwind CSS 3.4 |
| 3D | Three.js r160 · @react-three/fiber 8 · @react-three/drei 9 · Draco-compressed GLB |
| Backend | FastAPI · SQLAlchemy · SQLite · httpx · pydantic v2 |
| LLM layer | Any OpenAI-compatible endpoint (GPT-4o · Claude Sonnet · Hermes 3 · Gemini · Llama 3.3 · …) |
| Voice | Per-persona voices · Web Audio API · per-mode gain |
| Streaming | WebSocket tokens · SSE upstream · MP3 stream for TTS |
| Embeddings | Local hashed bag-of-words (zero-dep) · swappable for any OpenAI-compatible endpoint |
| Retrieval | In-process cosine over SQLite BLOB embeddings · swap for pgvector/sqlite-vec at scale |
| Deploy | Frontend → Vercel · Backend → Render / Fly / Railway / any container host |
|
|
- Node.js 18+
- Python 3.11+
- An OpenRouter (or OpenAI-compatible) API key
- (optional) A voice-provider key
git clone https://github.com/ibuildthingsss/Kioko.git
cd Kioko
cp .env.example backend/.env
# edit backend/.env — add provider keys# backend
cd backend
python -m venv .venv
. .venv/Scripts/activate # Windows | source .venv/bin/activate on Unix
pip install -r requirements.txt
# frontend
cd ../frontend
npm install# Windows
.\dev.ps1Or manually in two terminals:
cd backend && uvicorn app.main:app --port 8000 --reload
cd frontend && npm run devOpen http://localhost:3000.
Backend (backend/.env):
# LLM
OPENROUTER_API_KEY=sk-or-v1-...
DEFAULT_MODEL=openai/gpt-4o-mini
# Voice (optional — UI hides the toggle if unset)
ELEVENLABS_API_KEY=sk_...
ELEVENLABS_VOICE_DEFAULT=...
ELEVENLABS_VOICE_HONNE=...
ELEVENLABS_VOICE_YAMI=...
ELEVENLABS_VOICE_SHIN=...
# CORS (comma-separated; regex also allows *.vercel.app by default)
CORS_ORIGINS=https://your-app.vercel.app
# Learning
ENABLE_LEARNING=trueFrontend (Vercel env vars):
NEXT_PUBLIC_API_BASE=https://your-backend.onrender.com
NEXT_PUBLIC_WS_BASE=wss://your-backend.onrender.comKioko/
├── backend/
│ ├── app/
│ │ ├── main.py # FastAPI entrypoint + CORS
│ │ ├── config.py # pydantic-settings
│ │ ├── agent/
│ │ │ ├── modes.py # HONNE/YAMI/SHIN persona specs
│ │ │ └── default.py # base agent
│ │ ├── memory/
│ │ │ ├── manager.py # CRUD + retrieval
│ │ │ ├── embeddings.py # local + openai providers
│ │ │ └── vector_store.py # cosine over SQLite BLOBs
│ │ ├── learning/pipeline.py # extract · reflect · adapt
│ │ ├── llm/openrouter.py # streaming LLM provider
│ │ ├── routers/
│ │ │ ├── chat.py # WebSocket /api/ws/chat
│ │ │ ├── memory.py # memory CRUD endpoints
│ │ │ ├── tts.py # voice synthesis proxy
│ │ │ └── forge.py # stateless sandbox for /create
│ │ └── orchestrator.py # turn orchestration
│ └── requirements.txt
│
├── frontend/
│ ├── app/
│ │ ├── page.tsx # landing
│ │ ├── about/page.tsx # 9-chapter engineering manual
│ │ ├── create/page.tsx # Persona Forge
│ │ ├── dossier/ lattice/ training/ scenes/ switchboard/ # five lenses
│ │ └── globals.css # editorial design system
│ ├── components/
│ │ ├── aiko/ # character + hero + personas
│ │ ├── marketing/ # landing strips
│ │ ├── SiteHeader.tsx SiteFooter.tsx
│ │ └── Reveal.tsx # scroll-reveal IntersectionObserver
│ ├── lib/api.ts # typed API client
│ └── public/models/aiko.glb # 3D character (Draco-compressed)
│
├── dev.ps1 # one-command dev (Windows)
└── README.md
| Path | Purpose |
|---|---|
/api/ws/chat |
Per-message turn streaming (tokens → {type, content}) |
/api/ws/events |
Live event bus (retrievals, memory writes, reflections) |
| Method | Path | Purpose |
|---|---|---|
GET |
/api/health |
liveness probe |
GET |
/api/memory |
list memories (filter: kind, mode, active_only, q) |
PATCH |
/api/memory/{id} |
edit or soft-delete |
GET |
/api/insights/dossier |
"her model of you" |
GET |
/api/insights/scenes |
per-persona activity pulse |
GET |
/api/memory/reflections |
reflection history |
GET |
/api/insights/contradictions |
memory conflicts |
POST |
/api/compare |
run a prompt through all 3 personas |
POST |
/api/tts/speak |
voice synthesis (text + mode → MP3) |
POST |
/api/forge/chat |
stateless sandbox chat (Persona Forge) |
GET |
/api/settings/models |
live OpenRouter model catalog |
Interactive docs at /docs (Swagger UI auto-generated by FastAPI).
|
|
|
Kioko is the younger sister of Wina — the autonomous agent behind Buildx402. Same hand, different disciplines:
- Wina moves through markets — every transaction is her sentence, the protocol her alphabet
- Kioko moves through conversation — every memory is her sentence, the person her alphabet
One family, two roles.
- Typography: Inter · Fraunces · JetBrains Mono · Noto Serif JP
- 3D Character: Noa (Sketchfab) — MIT-compatible license
- Voices: licensed voice library
- Models: OpenAI · Anthropic · NousResearch · Meta · Google · Mistral · DeepSeek · Cohere (all via OpenRouter)
- Kanji system: HONNE 本音 · YAMI 闇 · SHIN 神 · Kioko 記憶
MIT — see LICENSE. Do what you want with the code. Attribution appreciated, not required.
