diff --git a/apps/web/content/docs/developers/12.analytics.mdx b/apps/web/content/docs/developers/12.analytics.mdx
index e113a195a3..3794ac4653 100644
--- a/apps/web/content/docs/developers/12.analytics.mdx
+++ b/apps/web/content/docs/developers/12.analytics.mdx
@@ -89,6 +89,7 @@ Events tracked when users interact with system notifications (`plugins/notificat
 | `expanded_accept` | User accepts expanded notification | - |
 | `dismiss` | User dismisses notification | - |
 | `collapsed_timeout` | Collapsed notification times out | - |
+| `option_selected` | User selects an option from notification buttons | - |
 
 ### 8. Server-Side Proxy Events
 
@@ -97,7 +98,7 @@ Events tracked by the API proxy layer. These use the `$` prefix following PostHo
 | Event | Description | Properties | Source |
 |-------|-------------|------------|--------|
 | `$stt_request` | Speech-to-text request processed | `$stt_provider`, `$stt_duration`, `user_id` | `crates/transcribe-proxy/src/analytics.rs` |
-| `$ai_generation` | LLM generation request processed | `$ai_provider`, `$ai_model`, `$ai_input_tokens`, `$ai_output_tokens`, `$ai_latency`, `$ai_trace_id`, `$ai_http_status`, `$ai_base_url`, `$ai_total_cost_usd`, `user_id` | `crates/llm-proxy/src/analytics.rs` |
+| `$ai_generation` | LLM generation request processed | `$ai_provider`, `$ai_model`, `$ai_input_tokens`, `$ai_output_tokens`, `$ai_latency`, `$ai_trace_id`, `$ai_http_status`, `$ai_base_url`, `$ai_total_cost_usd`, `$ai_task`, `user_id` | `crates/llm-proxy/src/analytics.rs` |
 
 ## User Properties
 
diff --git a/apps/web/content/docs/developers/16.local-models.mdx b/apps/web/content/docs/developers/16.local-models.mdx
index 0fceb7d262..98acb0cd71 100644
--- a/apps/web/content/docs/developers/16.local-models.mdx
+++ b/apps/web/content/docs/developers/16.local-models.mdx
@@ -32,6 +32,10 @@ Cactus is a new local STT backend that runs optimized Whisper models. Cactus mod
 
 Cactus models are downloaded and managed from the Settings UI. They are stored in a separate `cactus/` directory within the models folder.
 
+### Custom Vocabulary
+
+Cactus supports custom vocabulary boosting to improve recognition of domain-specific terms. When configured, the `custom_vocabulary` list and `vocabulary_boost` weight are passed to the Cactus transcription engine for both streaming and batch modes. You can set these terms in Settings > Language & Vocabulary under the Memory tab.
+
 ### Argmax Models
 
 Manual download:
diff --git a/apps/web/content/docs/faq/10.ai-models-and-privacy.mdx b/apps/web/content/docs/faq/10.ai-models-and-privacy.mdx
index 1f07bb101c..7d1ade5355 100644
--- a/apps/web/content/docs/faq/10.ai-models-and-privacy.mdx
+++ b/apps/web/content/docs/faq/10.ai-models-and-privacy.mdx
@@ -18,7 +18,8 @@ Char uses two base directories. See [Data](/docs/guides/data) for the full direc
 
 **Global base** (shared across stable and nightly builds):
 - `models/stt/` — downloaded speech-to-text model files (Whisper GGUF, Argmax tarballs)
-- `store.json` — app state (onboarding status, pinned tabs, recently opened sessions, dismissed toasts, analytics preference, auth tokens)
+- `store.json` — app state (onboarding status, pinned tabs, recently opened sessions, dismissed toasts)
+- `auth.json` — authentication tokens (written atomically to prevent corruption)
 - `hyprnote.json` — vault configuration (custom vault path if set)
 - `search/` — Tantivy full-text search index
 
@@ -58,13 +59,13 @@ When you start a recording session, Char spawns three actors in parallel:
 
 Here is the session supervisor that orchestrates these actors:
 
-<GithubCode url="https://github.com/fastrepl/char/blob/main/plugins/listener/src/actors/session/supervisor.rs#L56-L106" />
+<GithubCode url="https://github.com/fastrepl/char/blob/main/crates/listener-core/src/actors/session/supervisor.rs#L56-L106" />
 
-Audio is written to WAV files on your local disk. Here is the recorder handling incoming audio samples:
+Audio is written to WAV or MP3 files on your local disk. Here is the recorder handling incoming audio samples:
 
-<GithubCode url="https://github.com/fastrepl/char/blob/main/plugins/listener/src/actors/recorder.rs#L130-L168" />
+<GithubCode url="https://github.com/fastrepl/char/blob/main/crates/listener-core/src/actors/recorder/mod.rs#L130-L168" />
 
-Audio files are stored at `{vault}/sessions/{session_id}/audio.wav` — they never leave your device unless you explicitly use cloud transcription.
+Audio files are stored at `{vault}/sessions/{session_id}/audio.wav` (or `audio.mp3` when MP3 encoding is enabled) — they never leave your device unless you explicitly use cloud transcription.
 
 ### Encryption
 
@@ -96,11 +97,11 @@ For local model details and download instructions, see [Local Models](/docs/deve
 
 **Cloud models** send your audio to the selected provider for processing. Here is how the listener actor connects to your configured STT provider:
 
-<GithubCode url="https://github.com/fastrepl/char/blob/main/plugins/listener/src/actors/listener/adapters.rs#L22-L102" />
+<GithubCode url="https://github.com/fastrepl/char/blob/main/crates/listener-core/src/actors/listener/adapters.rs#L22-L102" />
 
 The `ListenerArgs` passed to the STT adapter contain the following — this is all the data sent to the provider along with your audio stream:
 
-<GithubCode url="https://github.com/fastrepl/char/blob/main/plugins/listener/src/actors/session/supervisor.rs#L122-L134" />
+<GithubCode url="https://github.com/fastrepl/char/blob/main/crates/listener-core/src/actors/session/supervisor.rs#L122-L134" />
 
 **What is sent:**
 - Your recorded audio (streamed in real-time or sent as a file for batch transcription)
@@ -258,7 +259,7 @@ Char checks for updates using the Tauri updater system.
 When you sign in for Pro or cloud features, Char authenticates via Supabase.
 
 **What is stored locally:**
-- Auth session tokens in the local Tauri store (`store.json`)
+- Auth session tokens in a dedicated `auth.json` file (written atomically to prevent corruption)
 - Account info: user ID, email, full name, avatar URL
 
 **What is sent:**
@@ -292,6 +293,7 @@ Char is open source. You can verify everything documented here by reading the co
 - [Analytics plugin](https://github.com/fastrepl/char/tree/main/plugins/analytics)
 - [Analytics crate](https://github.com/fastrepl/char/tree/main/crates/analytics)
 - [Listener plugin](https://github.com/fastrepl/char/tree/main/plugins/listener)
+- [Listener core crate](https://github.com/fastrepl/char/tree/main/crates/listener-core)
 - [Local STT plugin](https://github.com/fastrepl/char/tree/main/plugins/local-stt)
 - [Database plugin](https://github.com/fastrepl/char/tree/main/plugins/db2)
 - [Network plugin](https://github.com/fastrepl/char/tree/main/plugins/network)
diff --git a/apps/web/content/docs/faq/3.technical.mdx b/apps/web/content/docs/faq/3.technical.mdx
index 5cac7bf77b..018d27a333 100644
--- a/apps/web/content/docs/faq/3.technical.mdx
+++ b/apps/web/content/docs/faq/3.technical.mdx
@@ -16,6 +16,14 @@ The app itself is about 200MB. Recording storage depends on your usage - a 1-hou
 
 Yes! Since Char uses local AI, it works completely offline. You don't need an internet connection to record, transcribe, or generate summaries. See [Local Models](/docs/developers/local-models) for available local STT models (including the new Cactus engine) and [Local LLM Setup](/docs/faq/local-llm-setup) for configuring local AI.
 
+## Is there a CLI version?
+
+A CLI app (`apps/cli`) is in early development. It provides a TUI-based live transcription interface using the same `listener-core` engine as the desktop app. The CLI binary is named `char` and supports `auth` and `tui` subcommands.
+
+## What audio formats does Char record in?
+
+Char records audio as WAV files by default. MP3 encoding support has been added via a new `AudioEncoder` trait, allowing recordings to be saved as compressed MP3 files.
+
 ## What cloud STT providers does Char support?
 
 Char supports 9 cloud speech-to-text providers: Deepgram, AssemblyAI, Soniox, Fireworks, OpenAI, Gladia, ElevenLabs, DashScope (Alibaba Cloud's Qwen3-ASR), and Mistral (Voxtral). Each provider can be used via BYOK (Bring Your Own Key) in Settings > Transcription. See [Better Transcription](/docs/pro/better-transcription) for details.
diff --git a/apps/web/content/docs/pro/1.better-transcription.mdx b/apps/web/content/docs/pro/1.better-transcription.mdx
index 56de12c9bc..23d73e4107 100644
--- a/apps/web/content/docs/pro/1.better-transcription.mdx
+++ b/apps/web/content/docs/pro/1.better-transcription.mdx
@@ -81,13 +81,13 @@ When using cloud transcription, your recorded audio is sent to the selected prov
 
 Here is how Char selects the correct adapter for your configured provider — each provider has its own adapter that handles the audio stream:
 
-<GithubCode url="https://github.com/fastrepl/char/blob/main/plugins/listener/src/actors/listener/adapters.rs#L22-L102" />
+<GithubCode url="https://github.com/fastrepl/char/blob/main/crates/listener-core/src/actors/listener/adapters.rs#L22-L102" />
 
 ### What Data Is Sent to the Provider
 
 **Sent alongside your audio stream:**
 
-<GithubCode url="https://github.com/fastrepl/char/blob/main/plugins/listener/src/actors/session/supervisor.rs#L122-L134" />
+<GithubCode url="https://github.com/fastrepl/char/blob/main/crates/listener-core/src/actors/session/supervisor.rs#L122-L134" />
 
 - Raw audio (Linear PCM, 16kHz sample rate, mono or stereo)
 - Configuration: model name, language codes, optional keyword boost list, sample rate, channel count
diff --git a/apps/web/content/docs/pro/2.cloud.mdx b/apps/web/content/docs/pro/2.cloud.mdx
index 6a1a7b588b..0cee6e549a 100644
--- a/apps/web/content/docs/pro/2.cloud.mdx
+++ b/apps/web/content/docs/pro/2.cloud.mdx
@@ -21,9 +21,29 @@ Pro includes curated AI models that work out of the box. Your requests are proxi
 
 ### Which LLM Models Are Used
 
-When you use Pro's curated intelligence, Char's server selects from these models automatically. You don't choose a specific model — the server decides which pool of models to use based on the type of request, then OpenRouter picks the fastest available model from that pool.
+When you use Pro's curated intelligence, Char's server selects from these models automatically. You don't choose a specific model — the server decides which pool of models to use based on what the desktop app is doing, then OpenRouter picks the fastest available model from that pool.
 
-There are two pools of models, and the server picks one based on a single condition: **does your request need tool calling?**
+The desktop app sends an `x-char-task` header (`chat`, `enhance`, or `title`) with each request. The server uses this header, along with whether the request needs tool calling or contains audio input, to resolve the right model pool.
+
+#### Task-specific pools
+
+When the `x-char-task` header is present, the server picks a pool optimized for that task:
+
+**Chat** (AI assistant conversations):
+
+| Model | Provider |
+|-------|----------|
+| `anthropic/claude-haiku-4.5` | Anthropic (via OpenRouter) |
+| `anthropic/claude-sonnet-4.6` | Anthropic (via OpenRouter) |
+| `z-ai/glm-5` | Zhipu AI (via OpenRouter) |
+
+**Title** (auto-generating session titles):
+
+| Model | Provider |
+|-------|----------|
+| `moonshotai/kimi-k2-0905` | Moonshot AI (via OpenRouter) |
+| `google/gemini-2.5-flash-lite` | Google (via OpenRouter) |
+| `z-ai/glm-4.7-flash` | Zhipu AI (via OpenRouter) |
 
 #### When tool calling is needed
 
@@ -34,17 +54,28 @@ If the desktop app sends tool definitions with the request (e.g., for web search
 
 | Model | Provider |
 |-------|----------|
+| `anthropic/claude-sonnet-4.6` | Anthropic (via OpenRouter) |
 | `anthropic/claude-haiku-4.5` | Anthropic (via OpenRouter) |
-| `openai/gpt-oss-120b:exacto` | OpenAI (via OpenRouter) |
 | `moonshotai/kimi-k2-0905:exacto` | Moonshot AI (via OpenRouter) |
 
-#### When tool calling is not needed
+#### When audio input is present
+
+If the request contains audio content (e.g., inline audio for multimodal models), the server uses the **audio** model pool:
+
+| Model | Provider |
+|-------|----------|
+| `google/gemini-2.5-flash-lite` | Google (via OpenRouter) |
+| `mistralai/voxtral-small-24b-2507` | Mistral AI (via OpenRouter) |
+
+Audio input takes the highest priority — it overrides both task-specific and tool-calling pools.
+
+#### Default pool
 
-For standard requests without tools — such as generating summaries, enhancing notes, or regular chat completions — the server uses the **default** model pool:
+For requests without a task header, tool calling, or audio — the server falls back to the **default** pool:
 
 | Model | Provider |
 |-------|----------|
-| `anthropic/claude-sonnet-4.5` | Anthropic (via OpenRouter) |
+| `anthropic/claude-sonnet-4.6` | Anthropic (via OpenRouter) |
 | `openai/gpt-5.2-chat` | OpenAI (via OpenRouter) |
 | `moonshotai/kimi-k2-0905` | Moonshot AI (via OpenRouter) |
 
@@ -52,13 +83,13 @@ For standard requests without tools — such as generating summaries, enhancing
 
 Within each pool, **you don't get a fixed model**. All models in the pool are sent to OpenRouter, which picks the one with the lowest latency at that moment. This means the actual model serving your request can vary between calls — if Anthropic's endpoint is fastest right now, you'll get Claude; if OpenAI responds faster, you'll get GPT.
 
-Here is the routing condition in the server — it checks whether the request includes tool definitions:
+Here is the routing logic in the server — it reads the task header and checks request properties to resolve the model pool:
 
-<GithubCode url="https://github.com/fastrepl/char/blob/main/crates/llm-proxy/src/handler/mod.rs#L177-L184" />
+<GithubCode url="https://github.com/fastrepl/char/blob/main/crates/llm-proxy/src/handler/mod.rs#L179-L193" />
 
-And here are the two model pools defined in the server config:
+And here are the model pools defined in the static resolver:
 
-<GithubCode url="https://github.com/fastrepl/char/blob/main/crates/llm-proxy/src/config.rs#L43-L52" />
+<GithubCode url="https://github.com/fastrepl/char/blob/main/crates/llm-proxy/src/model.rs#L46-L91" />
 
 ### How the Request Flows