AdelieAI

A persona engine that ships small enough to deploy. Self-hosted. OSS. Batteries included.

🚀 Try the persona style without training

Two flavors of the trained character live on HuggingFace — pull either and run.

Flavor Size Hardware Repo

LoRA + FP16 ~165 MB GPU (≥14 GB VRAM) ramyun/adelie-qwen-roleplay-v2-lora

GGUF q4_k_m ~4.4 GB CPU laptop ramyun/adelie-qwen-roleplay-v2-gguf
# GPU path
huggingface-cli download ramyun/adelie-qwen-roleplay-v2-lora \
    --local-dir models/ours/qwen-roleplay-v2

# CPU path — no GPU needed
huggingface-cli download ramyun/adelie-qwen-roleplay-v2-gguf \
    --local-dir models/ours/qwen-roleplay-v2-gguf
Both inherit Qwen2.5's Tongyi Qianwen License — commercial use permitted under the terms there.

What you get

A pipeline that takes you from persona idea → deployable character:

Prepare 60–120 dialogue pairs in your character's style
Train a LoRA adapter on a Qwen 7B base (~80 seconds on a single 3090)
Compare the new character against base + previous versions (LLM-as-judge)
Pack everything into a .adelie persona pack — one self-contained artifact
(v0.2 · current) Quantize to GGUF q4_k_m — the same persona ships at 1/3 the size (AWQ track parked behind a Windows triton blocker, see experiments/05_awq_quantize/results.md)
Drop into a game NPC, Discord bot, customer-service worker, or CLI chat

Each .adelie pack is a single character with a consistent style, optional RAG-grounded knowledge, and a reproducible training recipe.

Choose your tier

A persona's tech needs depend on the use case. AdelieAI is built as a tiered stack so you can dial in the right depth without paying for what you don't use:

Tier	Use case	What's added
T1 — Toy	prototype chatbot	system prompt only
T2 — Standard NPC ✨	game NPCs, brand chat, companions	+ LoRA + vector RAG + quantization
T3 — Vertical Advisor	code helper, customer support	+ DPO, tool-use, retrieval-as-tool
T4 — Domain Expert	legal/medical advisor	+ RDF/OWL KG, OWL reasoner
T5 — Multi-agent Quest	game world, simulation	+ vLLM multi-LoRA, LangGraph orchestration

Three industry verticals showcase the tier ladder out of the box. Same engine, three industry-shaped faces — each captured below against the real Qwen2.5-7B-Instruct + qwen-roleplay-v2 on a single RTX 3090.

route	persona	tier	what it shows
`/demo/gaming`	💰 `cynical_merchant`	T2	RPG shop scene — JRPG dialogue HUD, inventory mock, gold counter, blunt merchant tone
`/demo/legal`	🔍 `cold_detective`	T3	Noir detective office — cork board with case summary, evidence memos, red string connectors, transcript paper, citation chips, `evidence_search` tool active
`/demo/knowledge`	🐉 `ancient_dragon`	T4	Ancient archive — inline-SVG KG with 8 nodes (asserted edges solid, OWL-inferred edges dashed flowing), parchment-scroll dialogue, side-panel SPARQL query + reasoner output ("☑ consistent" + inferred triples), backed by real `rdflib` + OWL-RL forward chaining over a Turtle corpus (transitive `descendantOf+`, subClassOf inference)

T2 — /demo/gaming · 💰 cynical_merchant · JRPG shop scene

T3 — /demo/legal · 🔍 cold_detective · noir detective office with evidence_search tool

T4 — /demo/knowledge · 🐉 ancient_dragon · ancient archive backed by rdflib + OWL-RL

/health introspects which tier the running build supports. Full framework + decision tree: docs/CAPABILITY_TIERS.md. Repo organization (7-area modular design): docs/ARCHITECTURE.md. Per-area docs (personas / retrieval / tools / agents / training / serving / evaluation): see docs/. Terminology cheat-sheet (Hybrid system prompt + Hybrid RAG 용어): docs/GLOSSARY.md.

Don't have weights downloaded? StubLLMClient ships persona-aware canned replies — visiting the demos still shows in-character output (penguin / fish / knight / merchant / detective each have a small canned set), so OSS visitors get the shape without GPU.

Why a persona engine?

Most LLM toolkits give you "an assistant" — generic, hedged, breaks character. Game NPCs, brand personas, virtual companions, vertical-domain workers all need the opposite: a model that stays in character across long interactions and runs on hardware the user actually has.

AdelieAI ships:

Hybrid RAG for grounding personas in lore / docs / knowledge bases (BM25 + dense + RRF + cross-encoder rerank)
LangGraph 4-node agent so personas reason in steps (planner → retriever → reasoner → reporter)
TRL + PEFT LoRA training with reproducibility manifest (recipe.md + MANIFEST.json)
Adapter comparison harness with LLM-as-judge scoring
EvalGardener — agent-in-the-loop self-improving behavioral test suite (docs/eval/methods/iteration_loop.md); per-round markdown audit trail under docs/eval/iterations/
3-tier rating + dismiss → DPO export — one-click feedback under each turn (👍 good · ➖ fine · 👎 bad · ⊘ dismiss); scripts/export_dpo.py harvests (chosen, rejected) JSONL pairs from divergent ratings (RLHF-shaped, not 5-star reviewer)
/web/metrics dashboard — per-persona activity rollup (turns / tokens / avg latency / last activity) on top of the chat log
Improvement timeline — docs/MILESTONES.md records every decision (and N-th return to the same area), so the why survives across sessions
From-scratch nanoGPT for the curious — same architecture family as Qwen2 (RMSNorm + RoPE + SwiGLU)
HTMX + Jinja2 console so you can drive everything from a browser

Three modes — Persona, Demo, Session

The web console exposes three top-level modes. They share infrastructure where it makes sense, but answer different questions:

Mode	Route	Answers	Backend
Persona	`/web/personas` · `/web/chat/{id}`	"What does this character say?" — open-ended chat with a persona, multi-turn, with the rating widget per turn	`core/personas/` (chat store, registry, grounding)
Demo	`/demo/{gaming, legal, knowledge}`	"What does this character look like in its native habitat?" — same chat backend, dressed up with a per-vertical UI skin	shares `core/personas/` chat backend; UI is `core/api/templates/demos/`
Session	`/web/sessions` · `/web/sessions/{id}`	"Given a research goal, what's the synthesized answer?" — a one-shot agentic run, not a chat. State machine + event log + final report with citations	`core/agent/` (4-node LangGraph) + `core/session/` (state machine)

How they relate

Persona = identity + style — one character, one chat thread.
Demo = vertical skin — same persona engine, different visual layer per industry. The merchant in /web/chat/cynical_merchant and /demo/gaming is the same underlying chat; only the rendering differs.
Session = workflow — orthogonal to personas. A goal goes in (e.g. "summarize Q3 finance risks"), a structured event log comes out, ending with a planner → retriever → reasoner → reporter chain. Optionally persona-flavored, but its primary purpose is reasoning task completion, not character interaction.

Persona  (who) ──────┐
                     ├─ Demo (where) — visual layer
                     └─ same chat infrastructure

Session  (what task) — separate track, agentic + RAG-grounded

The screenshots below walk through each mode.

Live console

All frames captured against the real Qwen2.5-7B-Instruct + qwen-roleplay-v2 on a single RTX 3090 — note the llm: indicator in the top nav, the in-character Korean replies, and real per-turn latency (2–4 s).

Chat thread (real model output + rating widget + DPO badge)

Three turns with the merchant on the same prompt ("할인 좀 안 돼?"), three distinct in-character replies (real-model sampling), real per-turn latency ({latency}s · {tokens} tok), and the 3-tier rating widget (good · fine · bad · dismiss) under each turn — rated good/good/bad here, so the header surfaces DPO 2 harvest-ready pairs. Persona meta + system prompt in the sidebar.

Gallery — six personas + per-card rating rollup

Six characters — three general role-play (penguin / fish / knight) plus three vertical signatures (cynical_merchant / cold_detective / ancient_dragon). Each card shows tier badge + industry pill + base / adapter / RAG / turn-count meta. Cards with chat history additionally surface a rating-summary footer (good · fine · bad · dismiss) and the DPO N harvest-ready badge — so you see at a glance which character has accumulated training-quality preference data.

`/web/metrics` — per-persona activity rollup

Built from chat_turns; complementary to the agentic-flow event log under /web/sessions.

Sessions + introspection (smaller frames)

_{Agentic sessions — RAG-grounded runs (planner → retriever → reasoner → reporter)}	_{/health JSON — active backends, retriever, store}
_{Docs fallback — graceful behavior when no embedder is mounted}	_{Swagger UI — API surface at /docs}

A note on file numbering — docs/screenshots/ files are clustered by feature track, not numbered sequentially. The gaps are intentional, leaving room to add captures inside a track without renaming the rest:

01–06 console basics (persona gallery, chat thread, sessions, health, docs, swagger)

20–24 industry vertical demos (/demo/gaming, /demo/legal, /demo/knowledge)

30–32 Step 6 features (rating widget, DPO badges, /web/metrics)

Regenerate any time with scripts/recapture_clean.py (resets DBs first, drives real-model conversations, snaps the six core frames) or scripts/capture_screenshots.py (legacy walker for 01–06 only).

Hardware footprint

What it actually costs to train / run AdelieAI on a single RTX 3090 (24 GB).

Training a LoRA adapter (`scripts/train_lora_roleplay.py`)

Resource	Usage	Why
Peak VRAM	~22–23 GB	7B base (bf16) + LoRA r=16 + AdamW optimizer state + KV cache during eval
System RAM	~10–15 GB	HF Transformers load + dataset cache
Disk (output)	~80 MB / adapter	LoRA r=16 → `models/ours/qwen-roleplay-v2/adapter.safetensors`
Wall-clock	~25–30 min	60 role-play + 60 general pairs × 4 epochs
Tricks	`bf16=True` · `gradient_checkpointing=True` · `per_device_batch=2` · `grad_accum=4` (effective batch 8)	Without checkpointing the run goes OOM at 24 GB

13B+ needs QLoRA (4-bit quantized base) to fit. 7B is the sweet spot for a 24 GB consumer card.

Inference (serving)

Backend	VRAM	RAM	Disk
`StubLLMClient` (no model)	0	negligible	0
`TransformersClient` FP16/bf16 + LoRA	~14 GB	~6 GB	~14 GB
`GGUFClient` q4_k_m (CPU)	0	~5 GB	~4.4 GB
(planned) AWQ q4 (GPU)	~5 GB	—	~5 GB

KV cache — the inference memory cost that surprises people

Transformer self-attention reuses every prior token's K/V on each new step. The cache is what makes generation O(N) instead of O(N²), but it costs memory that scales with context length.

Qwen2.5-7B uses Grouped-Query Attention (only 4 KV heads × 128 dim, shared across the 28 query heads), which keeps the cache small for its size:

per_token_KV (fp16) = 2 (K+V) × num_kv_heads(4) × head_dim(128) × layers(28) × 2 bytes
                    = ~112 KB / token

Context length	KV cache (fp16)
4 K tokens	~459 MB
16 K tokens	~1.8 GB
32 K tokens	~3.7 GB

This is on top of the weights (~14 GB FP16 / ~4.4 GB q4_k_m). Long-context workloads can balloon the cache past the weights themselves — vLLM's PagedAttention exists exactly to amortize this across requests, but for single-process generation the table above is the budget you live with.

A non-GQA 7B (e.g., older LLaMA-1) would be ~7× larger per token because every query head has its own K/V — one of the quiet wins in modern model architecture.

Gotchas (the ones that bit us in practice)

Symptom	Cause	Fix
`UnicodeDecodeError: 'cp949' codec` on Windows before training even starts	Korean strings in `dataset.py` / configs decoded under cp949	Always export `PYTHONUTF8=1` (or run `python -X utf8 …`). Both are baked into the script invocations in `docs/TRAINING.md`.
`CUDA out of memory` mid-step despite the table above	`gradient_checkpointing` flipped off, or `per_device_batch > 2`	Keep checkpointing on; the 24 GB budget requires it. Lower batch before raising LR.
`~/.cache/huggingface/hub` silently grows to 30+ GB	First-time downloads of `Qwen2.5-7B-Instruct`, `multilingual-e5-small`, `bge-reranker-v2-m3`, etc. accumulate	Watch with `du -sh ~/.cache/huggingface/hub`. Prune unused snapshots with `huggingface-cli delete-cache`.
`Segmentation fault` on `import trl`	TRL must be imported after PEFT to avoid a known C-extension load order bug	`core/training/trainer.py` enforces the order — don't reorder its imports.
LoRA v1 underperforms LoRA v2 on general questions	Single-register training → catastrophic forgetting	Mix general-domain pairs at ≥1:1 ratio. See `docs/MILESTONES.md` — `[training/lora] (1st cycle)` log.

Full pitfalls list: docs/training/README.md § Pitfalls, docs/serving/README.md § Pitfalls.

Detailed methodology

docs/TRAINING.md — full LoRA recipe, hyperparameter rationale, why bf16 over fp16, how the manifest is built
docs/training/README.md — area README + roadmap (DPO trainer, distillation, multi-GPU)
docs/serving/README.md — backend matrix + decision tree (Stub / Scripted / Transformers / GGUF)
docs/MILESTONES.md — why each step happened, including the dead-ends (e.g., 60-pair LoRA underperformed v2 → pivot to prompt-first)

Persona pack format

A .adelie persona pack is the unit of distribution:

penguin_relaxed.adelie/
├── MANIFEST.json
├── adapter.safetensors
├── system_prompt.md
├── rag_corpus/
└── recipe.md

Full spec: docs/PERSONA_PACK.md. Roadmap to v0.2 adds merged.awq.safetensors and merged.q4_k_m.gguf so the same persona ships across GPU server / vLLM cluster / end-user CPU.

Architecture in one table

Layer	Components
LLM serving	`transformers` + LoRA adapter auto-loader, SSE token streaming, sampling presets
RAG pipeline	Recursive splitter · multilingual-e5 (KO+EN) · ChromaDB · BM25 · RRF fusion · bge-reranker-v2-m3 cross-encoder
Agent loop	LangGraph 4-node graph (planner → retriever → reasoner → reporter)
Personas	Built-in registry, multi-turn chat store (SQLite default), per-turn token + latency telemetry
Sessions	Agentic-mode state machine · event sourcing · SQLAlchemy (SQLite default, Postgres swap) · IDOR guard
Evaluation	LLM-as-judge faithfulness / relevance / citation coverage; head-to-head adapter comparison
Console UI	HTMX + Jinja2 — persona gallery, chat thread, agentic sessions; single process, no JS framework
Training	TRL `SFTTrainer` LoRA, plus a pure-PyTorch nanoGPT for from-scratch experiments
Logging	Structured JSON + per-request id propagation
Quantization	GGUF q4_k_m via llama-cpp-python; merged adapter → 4.4 GB single file (3.25× smaller)
Tests	221 unit + Playwright E2E walker

Design principles

Asset ownership. Every model lives under models/{upstream,ours}/<id>/MANIFEST.json listing source URL, revision sha, license, and exact update_command. HF Hub is a download channel, not a runtime dependency.
Protocol-first. LLMClient, Retriever, SessionStore, Reranker, Embedder, VectorStore, BM25Index, Chunker are all typing.Protocol — implementations are interchangeable.
Zero API spend. No call sites for Anthropic, OpenAI, or any hosted vendor. All inference is local.
Apache-2.0 OSS preferred. Qwen2.5 family · multilingual-e5 · bge-reranker. Mixed licenses are documented in MANIFEST and labelled in the model registry.
Shipping size matters. A persona is not "done" until it ships at deployable size. The v0.2 quantization track is first-class, not an afterthought.

Install

git clone https://github.com/southglory/AdelieAI
cd AdelieAI
python -m venv .venv
.venv/Scripts/pip install -e ".[dev,train]"

# Torch with CUDA (e.g. RTX 3090)
.venv/Scripts/pip install --upgrade torch --index-url https://download.pytorch.org/whl/cu124

# Pull the default models (each takes a few minutes)
.venv/Scripts/python -m huggingface_hub snapshot_download \
    Qwen/Qwen2.5-7B-Instruct \
    --local-dir models/upstream/Qwen2.5-7B-Instruct
.venv/Scripts/python -m huggingface_hub snapshot_download \
    intfloat/multilingual-e5-small \
    --local-dir models/upstream/multilingual-e5-small
.venv/Scripts/python -m huggingface_hub snapshot_download \
    BAAI/bge-reranker-v2-m3 \
    --local-dir models/upstream/bge-reranker-v2-m3

(Optional) Pull our pre-trained Korean role-play adapter

If you want to skip the training run and just try the persona style, two flavors are published:

# FP16 + GPU path (LoRA adapter, ~165 MB) — use with TransformersClient
.venv/Scripts/python -m huggingface_hub snapshot_download \
    ramyun/adelie-qwen-roleplay-v2-lora \
    --local-dir models/ours/qwen-roleplay-v2

# CPU path (q4_k_m GGUF, ~4.4 GB) — use with GGUFClient, no GPU needed
.venv/Scripts/python -m huggingface_hub snapshot_download \
    ramyun/adelie-qwen-roleplay-v2-gguf \
    --local-dir models/ours/qwen-roleplay-v2-gguf

Both inherit the Tongyi Qianwen License v1 from the Qwen2.5-7B base. See the model cards on HF for full provenance + recipe.

Run

PYTHONUTF8=1 .venv/Scripts/uvicorn core.api.app:app --port 8770

Open http://localhost:8770/web/personas — pick a character and start chatting.

/health returns:

{
  "status": "ok",
  "llm": "Qwen/Qwen2.5-7B-Instruct",
  "embedder": "intfloat/multilingual-e5-small",
  "reranker": "BAAI/bge-reranker-v2-m3",
  "retriever": "HybridRetriever",
  "store": "SqlSessionStore"
}

When a persona pack is mounted, llm becomes base+persona-id.

Chat with a persona

Three Korean role-play personas ship out of the box: 🐧 penguin_relaxed (Adelie penguin lazing on the ice), 🐟 fish_swimmer (a fish drifting through open water), and ⚔️ knight_brave (a sworn knight facing down dragons). The display names render in Korean (놀고 있는 펭귄 / 헤엄치는 물고기 / 용감한 기사) because the personas speak Korean — that's the style the LoRA was trained on. With or without a LoRA adapter mounted, the system prompt drives the character; with qwen-roleplay-v2 mounted, the LoRA additionally tilts the model toward the role-play register.

Open /web/personas — gallery of cards with base / adapter / RAG status / turn count.
Click a card → /web/chat/{persona_id} — chat thread with the character.
Send a message → reply streams back inline, with {latency}s · {tokens} tok mini-stat next to each turn.
Sidebar shows persona meta: model id, system prompt, turn count, adapter id.
Hit reset to clear the thread for that user/persona pair only.

Persistence: every turn is stored in data/chats.db (SQLite by default; swap via CHAT_DATABASE_URL).

The persona registry is hard-coded for v0.1.5; v0.2 swaps it for .adelie pack auto-discovery — see docs/PERSONA_PACK.md.

Quantize a persona

The v0.2 quantization recipe lives in the sibling differentia-llm repo (the private incubator). The same merged checkpoint shrinks from 14.5 GB → 4.36 GB (a 3.25× compression) without losing the role-play style.

# 0. one-time: prebuilt CPU wheel + format library
.venv/Scripts/pip install llama-cpp-python --only-binary=:all: \
    --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cpu
.venv/Scripts/pip install gguf sentencepiece

# 1. merge LoRA adapter into base
python ../differentia-llm/experiments/05_awq_quantize/merge.py \
    --base models/upstream/Qwen2.5-7B-Instruct \
    --adapter models/ours/qwen-roleplay-v2 \
    --output models/ours/qwen-roleplay-v2-merged

# 2. convert + quantize
python ../differentia-llm/experiments/06_gguf_export/run.py \
    --merged models/ours/qwen-roleplay-v2-merged \
    --output models/ours/qwen-roleplay-v2-gguf \
    --quant q4_k_m

# 3. mount the .gguf file
MODEL_PATH=models/ours/qwen-roleplay-v2-gguf/qwen-roleplay-v2.q4_k_m.gguf \
PYTHONUTF8=1 .venv/Scripts/uvicorn core.api.app:app --port 8770

/health reports llm: qwen-roleplay-v2-gguf and the persona gallery serves the quantized character with the same UX as the FP16 path. CPU inference is slower than GPU FP16 (a few seconds per turn vs sub-second), so the GPU path remains canonical for production demos; the GGUF path is for shipping.

models/ours/qwen-roleplay-v2-gguf/recipe.md and docs/PERSONA_PACK.md document the full recipe.

Train a persona

PYTHONUTF8=1 .venv/Scripts/python -X utf8 \
    scripts/train_lora_roleplay.py \
    --dataset mixed --epochs 4 \
    --output models/ours/qwen-roleplay-v2

Outputs MANIFEST.json + recipe.md + adapter weights (~150 MB, gitignored). Mount at runtime:

LORA_PATH=models/ours/qwen-roleplay-v2 \
PYTHONUTF8=1 .venv/Scripts/uvicorn core.api.app:app --port 8770

docs/TRAINING.md — when to LoRA vs prompt vs full fine-tune, dataset rules, hyperparameter rationale (r=16, α=32, lr=2e-4, 4 epochs), v1 → v2 lessons, known traps.

Design a new persona

Want a sixth character? personas/_template/ is the starting point — duplicate it, fill in the sheet, write 60 + 60 dialogue pairs, train. docs/persona_design_guide.md walks through the design decisions, the good/bad pair examples, and seven traps from the v1 → v2 cycle. Five empty slots (personas/npc1/ … personas/npc5/) are pre-allocated for the v0.3 multi-persona work (experiments 09 · 11 · 12 in the differentia-llm sibling repo).

Compare personas

PYTHONUTF8=1 .venv/Scripts/python -X utf8 \
    scripts/compare_adapters.py \
    --adapter v1=models/ours/qwen-roleplay-v1 \
    --adapter v2=models/ours/qwen-roleplay-v2

Writes docs/compare/{ts}.json (full text + scores) and docs/compare/{ts}.md (table + per-prompt outputs). Default judge is the base model itself; production setups should plug in a stronger external judge.

From-scratch transformer

Pure-PyTorch decoder-only transformer (core/training/models/nano_gpt.py, ~250 lines, no transformers dependency at the model layer). RMSNorm + RoPE + SwiGLU — same architecture family as Qwen2 so LoRA-tuned and from-scratch results compare like-for-like.

PYTHONUTF8=1 .venv/Scripts/python -X utf8 \
    scripts/train_nano_gpt.py \
    --output models/ours/nano-gpt-v0 \
    --steps 1500

A 69M model trains end-to-end in ~5 minutes on an RTX 3090.

Roadmap

version	adds
v0.1	Persona pack format spec, LoRA training, hybrid RAG, LangGraph agent, comparison harness
v0.1.5	Persona gallery + multi-turn chat UI with per-turn telemetry (token count + latency)
v0.2 (current)	GGUF q4_k_m quantization — same persona ships at 1/3 the size on Windows / CPU. Adds `GGUFClient`, `MODEL_PATH=*.gguf` dispatch, and a reproducible `experiments/06_gguf_export/` recipe. AWQ track is parked behind a Windows triton blocker (see `experiments/05_awq_quantize/results.md`) — re-opens on Linux/WSL.
v0.3	Distillation track (7B teacher → 1.5B student) — mobile-class personas
v0.4	vLLM serving — multiple personas concurrent on one GPU
v0.5	Tool-use personas — function calling per persona
v0.6	Multi-persona orchestration — N personas cooperating on a single quest

Testing

# 161 unit tests
.venv/Scripts/python -m pytest tests -q

# End-to-end Playwright walk
.venv/Scripts/python -m pytest tests/e2e -v --browser chromium -o addopts=

Mascot

Adelie penguin — small, sturdy, plays on the ice without making a fuss. The engine, in spirit: focused, self-reliant, plays in its own pond.

Contributing

Apache 2.0. Small PRs welcome. See CONTRIBUTING.md for the first-issue list.

Sibling project

differentia-llm — the private incubator AdelieAI was extracted from. Multi-agent orchestration experiments, mission notes, live training journals.

made with cold flippers 🐧

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
assets		assets
core		core
docs		docs
models		models
personas		personas
scripts		scripts
tests		tests
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Flavor	Size	Hardware	Repo
LoRA + FP16	~165 MB	GPU (≥14 GB VRAM)	`ramyun/adelie-qwen-roleplay-v2-lora`
GGUF q4_k_m	~4.4 GB	CPU laptop	`ramyun/adelie-qwen-roleplay-v2-gguf`

Folders and files

Latest commit

History

Repository files navigation

AdelieAI

🚀 Try the persona style without training

What you get

Choose your tier

Why a persona engine?

Three modes — Persona, Demo, Session

How they relate

Live console

Chat thread (real model output + rating widget + DPO badge)

Gallery — six personas + per-card rating rollup

/web/metrics — per-persona activity rollup

Sessions + introspection (smaller frames)

Hardware footprint

Training a LoRA adapter (scripts/train_lora_roleplay.py)

Inference (serving)

KV cache — the inference memory cost that surprises people

Gotchas (the ones that bit us in practice)

Detailed methodology

Persona pack format

Architecture in one table

Design principles

Install

(Optional) Pull our pre-trained Korean role-play adapter

Run

Chat with a persona

Quantize a persona

Train a persona

Design a new persona

Compare personas

From-scratch transformer

Roadmap

Testing

Mascot

Contributing

Sibling project

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`/web/metrics` — per-persona activity rollup

Training a LoRA adapter (`scripts/train_lora_roleplay.py`)

Packages