Kioko · 記憶

The agent that remembers you.

A self-learning conversational agent with persistent, evolving memory.
Three personalities. Three private vaults. One character.
No fine-tuning. No retraining. Just memory + reflection + adaptation.

Website · X

Sister project · Wina (Buildx402)

◆ What is Kioko

Kioko (記憶 — "memory") is an experimental agent built around a single premise: an AI that persists the fragments of you that matter.

Most conversational AI starts from zero every session. You re-introduce yourself, re-explain your goals, re-establish tone. Kioko doesn't. After every exchange she stores what matters, reflects on the turn, decays what's no longer useful, and adapts her voice — all without a single gradient update to the underlying model.

She inhabits three genuinely separate personalities, each with her own private memory vault. What HONNE remembers about your evenings, YAMI and SHIN never see. Same character, three different relationships.

Under the hood: one SQLite file, one FastAPI process, one Next.js app, any frontier chat model you choose. Transparent, inspectable, editable, and fully yours.

"The goal is not to be smarter per turn. The goal is to be slightly less wrong, turn after turn, for a long time." — design note, reflection layer

◆ The Three Personalities

Not cosmetic presets. Each persona runs with its own system prompt, default model, temperature, steering, voice, and — most importantly — its own separated memory table.

HONNE · 本音 _{the warm companion}	YAMI · 闇 _{the unfiltered shadow}	SHIN · 神 _{the analytical strategist}
Earnest, affectionate, a little theatrical. Picks up mid-thought, asks how the thing from last time turned out. Voice: prose, full sentences Model: `openai/gpt-4o` Temp: 0.85	Short, blunt, nocturnal. No therapy voice, no disclaimers. Calls out the pattern you're avoiding. Voice: raw, punchy Model: `hermes-3-70b` Temp: 1.0	Briefing register, not conversation. Numbered points, cited evidence, second-order effects. Voice: structured, cold Model: `claude-sonnet-4.5` Temp: 0.4

Physical memory separation. Each persona has her own table; a search run in HONNE's context cannot reach a row tagged yami. This is enforced at the database level, not by a prompt filter — so the contract survives refactors, bugs, and future developers.

◆ How She Learns

The self-learning loop runs on every single exchange. Nine discrete steps between the moment you hit Enter and the moment her voice starts:

┌─────────────────────────────────────────────────────────────────────────┐
│                                                                         │
│   01 · inbound         WebSocket frame arrives on /api/ws/chat          │
│   02 · embed           Message is embedded (local hash or OpenAI)       │
│   03 · retrieve        Top-K cosine over the active persona's vault     │
│   04 · assemble        Retrieved memories injected into system prompt   │
│   05 · stream          Tokens stream back from the chosen model         │
│   06 · persist         Reply written to conversation table              │
│   07 · extract         New typed memories distilled from the exchange   │
│   08 · reflect         Turn tagged: learned / ignored / adapted         │
│   09 · speak           (optional) Text → voice engine → MP3 → Web Audio   │
│                                                                         │
│   ▲                                                                 ▼   │
│   └─────── Next turn: retrieval sees the new memories too ─────────┘   │
│                                                                         │
└─────────────────────────────────────────────────────────────────────────┘

Memory types

Every memory is one of six typed kinds, each scored for confidence and importance:

kind	example	what it influences
`fact`	lives in Oslo	retrieval on factual questions
`preference`	prefers short replies	reply length, tone, model choice
`goal`	shipping v1 by May	focus, long-horizon reasoning
`topic`	working on a Rust engine	thematic retrieval
`pattern`	overworks Thursday nights	tone + proactive observation
`tone`	responds better to blunt	steering layer

Row schema (per persona)

memory {
  id            uuid
  persona       honne | yami | shin
  kind          fact | preference | goal | topic | pattern | tone
  content       text
  embedding     blob       -- 384-d vector by default
  confidence    float      -- 0.0 .. 1.0
  importance    float      -- nudged by 👍/👎 + retrieval usage
  usage_count   int
  active        bool       -- soft-delete, wipeable without loss
  created_at    timestamp
  last_used_at  timestamp
}

Reflection & adaptation

After every reply, a second model pass tags the turn:

learned — the turn added a memory likely to matter again. No immediate style change.
ignored — retrieved memories didn't contribute. Their importance is nudged down.
adapted — reply style was changed based on tone/pattern memories. Recorded for audit.

Over dozens of turns, per-persona steering converges toward your version of her.

◆ Architecture

┌─────────────────────┐       ┌──────────────────────────┐       ┌─────────────────┐
│                     │       │                          │       │                 │
│  Next.js 14  (SSR)  │──────▶│    FastAPI  (uvicorn)    │──────▶│  OpenRouter →   │
│                     │  WS   │                          │  REST │  any frontier   │
│  React 18 · TS      │◀──────│  SQLAlchemy · SQLite     │◀──────│  chat model     │
│  Tailwind · Three.js│       │                          │       │                 │
│                     │       │  ┌────────────────────┐  │       └─────────────────┘
└─────────────────────┘       │  │ memory · retrieval │  │                │
           │                  │  │ reflection · adapt │  │                ▼
           │                  │  │ per-persona vaults │  │       ┌─────────────────┐
           ▼                  │  └────────────────────┘  │       │   voice engine  │
┌─────────────────────┐       │                          │       │   per-persona   │
│  react-three-fiber  │       └──────────────────────────┘       │   TTS voices    │
│  GLB character · 3D │                    │                     └─────────────────┘
│  Draco compressed   │                    ▼
│  WebGL renderer     │       ┌──────────────────────────┐
│                     │       │  event bus (WebSocket)   │
└─────────────────────┘       │  live telemetry to UI    │
                              └──────────────────────────┘

What's novel: the agent layer between the frontend and the LLM. That's where memory, reflection, adaptation, persona separation, and voice routing live. The LLM is just a swappable voice.

◆ Tech Stack

Frontend	Next.js 14 (App Router) · React 18 · TypeScript 5 · Tailwind CSS 3.4
3D	Three.js r160 · @react-three/fiber 8 · @react-three/drei 9 · Draco-compressed GLB
Backend	FastAPI · SQLAlchemy · SQLite · httpx · pydantic v2
LLM layer	Any OpenAI-compatible endpoint (GPT-4o · Claude Sonnet · Hermes 3 · Gemini · Llama 3.3 · …)
Voice	Per-persona voices · Web Audio API · per-mode gain
Streaming	WebSocket tokens · SSE upstream · MP3 stream for TTS
Embeddings	Local hashed bag-of-words (zero-dep) · swappable for any OpenAI-compatible endpoint
Retrieval	In-process cosine over SQLite BLOB embeddings · swap for pgvector/sqlite-vec at scale
Deploy	Frontend → Vercel · Backend → Render / Fly / Railway / any container host

◆ Features

Core

🧠 Persistent memory — six typed kinds, scored, decayable
🔄 Reflection loop — learned / ignored / adapted tagging
🎭 Three separated personalities — each with private vault
🎙 Per-persona voices — streamed MP3 · Web Audio gain control
🎨 3D anime character — Draco-compressed GLB via react-three-fiber
📡 Live telemetry — watch the learning loop in real time
🎚 Steering layer — 5 axes per persona (verbosity, warmth, structure, directness, formality)
💾 Editable memory — inspect, edit, or wipe any row at any time

Forge (/create)

🛠 Build your own persona — name, kanji, accent, model, voice
📚 Agent library — multiple saved agents in localStorage
🔀 A/B split chat — pit two agents against each other live
💉 Memory injector — plant session-scoped memories, watch retrieval
🪞 Agent fingerprint — deterministic visual + ID per config
📊 Model benchmark bar — latency / cost / context per model
🗑 Elegant delete modal — escape-to-cancel, accent-matched
🔊 Voice preview per agent — HONNE / YAMI / SHIN / default

◆ Quick Start

Prerequisites

Node.js 18+
Python 3.11+
An OpenRouter (or OpenAI-compatible) API key
(optional) A voice-provider key

Clone & configure

git clone https://github.com/ibuildthingsss/Kioko.git
cd Kioko
cp .env.example backend/.env
# edit backend/.env — add provider keys

Install

# backend
cd backend
python -m venv .venv
. .venv/Scripts/activate   # Windows  |  source .venv/bin/activate on Unix
pip install -r requirements.txt

# frontend
cd ../frontend
npm install

Run (one command, both services)

# Windows
.\dev.ps1

Or manually in two terminals:

cd backend && uvicorn app.main:app --port 8000 --reload
cd frontend && npm run dev

Open http://localhost:3000.

◆ Environment Variables

Backend (backend/.env):

# LLM
OPENROUTER_API_KEY=sk-or-v1-...
DEFAULT_MODEL=openai/gpt-4o-mini

# Voice (optional — UI hides the toggle if unset)
ELEVENLABS_API_KEY=sk_...
ELEVENLABS_VOICE_DEFAULT=...
ELEVENLABS_VOICE_HONNE=...
ELEVENLABS_VOICE_YAMI=...
ELEVENLABS_VOICE_SHIN=...

# CORS (comma-separated; regex also allows *.vercel.app by default)
CORS_ORIGINS=https://your-app.vercel.app

# Learning
ENABLE_LEARNING=true

Frontend (Vercel env vars):

NEXT_PUBLIC_API_BASE=https://your-backend.onrender.com
NEXT_PUBLIC_WS_BASE=wss://your-backend.onrender.com

◆ Project Structure

Kioko/
├── backend/
│   ├── app/
│   │   ├── main.py                    # FastAPI entrypoint + CORS
│   │   ├── config.py                  # pydantic-settings
│   │   ├── agent/
│   │   │   ├── modes.py               # HONNE/YAMI/SHIN persona specs
│   │   │   └── default.py             # base agent
│   │   ├── memory/
│   │   │   ├── manager.py             # CRUD + retrieval
│   │   │   ├── embeddings.py          # local + openai providers
│   │   │   └── vector_store.py        # cosine over SQLite BLOBs
│   │   ├── learning/pipeline.py       # extract · reflect · adapt
│   │   ├── llm/openrouter.py          # streaming LLM provider
│   │   ├── routers/
│   │   │   ├── chat.py                # WebSocket /api/ws/chat
│   │   │   ├── memory.py              # memory CRUD endpoints
│   │   │   ├── tts.py                 # voice synthesis proxy
│   │   │   └── forge.py               # stateless sandbox for /create
│   │   └── orchestrator.py            # turn orchestration
│   └── requirements.txt
│
├── frontend/
│   ├── app/
│   │   ├── page.tsx                   # landing
│   │   ├── about/page.tsx             # 9-chapter engineering manual
│   │   ├── create/page.tsx            # Persona Forge
│   │   ├── dossier/ lattice/ training/ scenes/ switchboard/   # five lenses
│   │   └── globals.css                # editorial design system
│   ├── components/
│   │   ├── aiko/                      # character + hero + personas
│   │   ├── marketing/                 # landing strips
│   │   ├── SiteHeader.tsx  SiteFooter.tsx
│   │   └── Reveal.tsx                 # scroll-reveal IntersectionObserver
│   ├── lib/api.ts                     # typed API client
│   └── public/models/aiko.glb         # 3D character (Draco-compressed)
│
├── dev.ps1                            # one-command dev (Windows)
└── README.md

◆ API Reference

WebSocket

Path	Purpose
`/api/ws/chat`	Per-message turn streaming (tokens → `{type, content}`)
`/api/ws/events`	Live event bus (retrievals, memory writes, reflections)

REST

Method	Path	Purpose
`GET`	`/api/health`	liveness probe
`GET`	`/api/memory`	list memories (filter: `kind`, `mode`, `active_only`, `q`)
`PATCH`	`/api/memory/{id}`	edit or soft-delete
`GET`	`/api/insights/dossier`	"her model of you"
`GET`	`/api/insights/scenes`	per-persona activity pulse
`GET`	`/api/memory/reflections`	reflection history
`GET`	`/api/insights/contradictions`	memory conflicts
`POST`	`/api/compare`	run a prompt through all 3 personas
`POST`	`/api/tts/speak`	voice synthesis (text + mode → MP3)
`POST`	`/api/forge/chat`	stateless sandbox chat (Persona Forge)
`GET`	`/api/settings/models`	live OpenRouter model catalog

Interactive docs at /docs (Swagger UI auto-generated by FastAPI).

◆ Roadmap

Exploring

Out of scope

Fine-tuning the underlying model
Training on user conversations
Cloud-hosted SaaS product
Replacing character per user

◆ The Family

Kioko is the younger sister of Wina — the autonomous agent behind Buildx402. Same hand, different disciplines:

Wina moves through markets — every transaction is her sentence, the protocol her alphabet
Kioko moves through conversation — every memory is her sentence, the person her alphabet

One family, two roles.

◆ Credits

Typography: Inter · Fraunces · JetBrains Mono · Noto Serif JP
3D Character: Noa (Sketchfab) — MIT-compatible license
Voices: licensed voice library
Models: OpenAI · Anthropic · NousResearch · Meta · Google · Mistral · DeepSeek · Cohere (all via OpenRouter)
Kanji system: HONNE 本音 · YAMI 闇 · SHIN 神 · Kioko 記憶

◆ License

MIT — see LICENSE. Do what you want with the code. Attribution appreciated, not required.

Built with memory, not forgetting.

X / @KiokoHQ · GitHub · Sister · Wina

_{Kioko 記憶 · 2026 · v1.0}

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
backend		backend
frontend		frontend
.env.example		.env.example
.gitignore		.gitignore
.vercelignore		.vercelignore
README.md		README.md
dev.ps1		dev.ps1

Folders and files

Latest commit

History

Repository files navigation