From 1ae91862801a2e4ac61a5453e7941f1ed89b2ab4 Mon Sep 17 00:00:00 2001 From: Classic298 <27028174+Classic298@users.noreply.github.com> Date: Wed, 22 Apr 2026 02:13:00 +0200 Subject: [PATCH] Docs improvements --- docs/faq.mdx | 10 + docs/features/channels/index.md | 4 + .../chat-features/chatshare.md | 2 - .../chat-features/code-execution/index.md | 2 +- .../chat-features/follow-up-prompts.md | 2 +- .../web-search/agentic-search.mdx | 4 +- .../plugin/development/events.mdx | 2 +- .../plugin/development/rich-ui.mdx | 13 + .../extensibility/plugin/functions/action.mdx | 2 +- .../extensibility/plugin/functions/pipe.mdx | 91 ++--- docs/features/extensibility/plugin/index.mdx | 10 + .../plugin/tools/development.mdx | 103 +++--- .../extensibility/plugin/tools/index.mdx | 83 +++-- docs/getting-started/essentials.mdx | 209 +++++++++++ docs/getting-started/index.md | 18 + .../starting-with-functions.mdx | 2 +- docs/getting-started/quick-start/index.mdx | 4 + docs/intro.mdx | 5 + docs/reference/env-configuration.mdx | 2 +- docs/troubleshooting/context-window.mdx | 330 ++++++++++++++++++ docs/troubleshooting/index.mdx | 2 + docs/troubleshooting/performance.md | 4 +- docs/troubleshooting/rag.mdx | 2 +- .../tutorials/integrations/libre-translate.md | 6 +- 24 files changed, 759 insertions(+), 153 deletions(-) create mode 100644 docs/getting-started/essentials.mdx create mode 100644 docs/troubleshooting/context-window.mdx diff --git a/docs/faq.mdx b/docs/faq.mdx index ffb76729a9..58ea258419 100644 --- a/docs/faq.mdx +++ b/docs/faq.mdx @@ -42,6 +42,16 @@ For more details on enterprise solutions and branding customizations, [click her **A:** You can access the **File Manager** by going to **Settings > Data Controls > Manage Files > Manage**. This dashboard allows you to search through all your uploaded documents, view their details, and delete them. Deleting a file here also automatically cleans up any associated Knowledge Base entries and vector embeddings. +### Q: I get "The prompt is too long" / "context length exceeded" after a while in a chat. How do I fix it? + +**A:** This error comes from the **model provider**, not from Open WebUI — the provider counts the tokens of everything you sent (system prompt + the *entire* chat history + attached files + tool calls + your new message) and rejects the request once it exceeds the model's context window. The "prompt" the model sees is the whole conversation, not just your latest message. + +Open WebUI intentionally does **not** ship a built-in context trimmer. Every model has a different tokenizer and a different context window, and every deployment wants a different truncation policy (by tokens, by turns, by message count, file-attachments-first, summarize-and-replace, per-model budgets, and so on). There is no single policy that is correct for every user, so we expose the hook instead of choosing one for you. + +Context management is done with [filter Functions](/features/extensibility/plugin/functions/filter): `inlet()` receives the full `body["messages"]` on every request and can modify it freely (drop old turns, enforce a turn limit, summarize, trim attachments, etc.). Many community-maintained context filters are already available one-click on [openwebui.com](https://openwebui.com/) — browse, install, and tune the valves. If none fits, copy the closest one into **Admin Panel → Functions** and edit it. + +For the full write-up with examples, see [Context Window / Prompt Too Long](/troubleshooting/context-window). + ### Q: Can I use Open WebUI offline, in air-gapped networks, or in extreme environments like outer space? **A:** **Yes.** Open WebUI is a self-hosted, **internet-independent AI platform** designed to work in **air-gapped networks**, **remote deployments**, and any environment where cloud-based systems are impractical or impossible. Whether you need to **run an LLM without internet**, deploy a **private AI with no cloud dependency**, or operate a **local AI chatbot offline**, Open WebUI supports all of these out of the box. It runs entirely on local hardware and does not make external calls by default. diff --git a/docs/features/channels/index.md b/docs/features/channels/index.md index 83a577a42e..f206a96ee8 100644 --- a/docs/features/channels/index.md +++ b/docs/features/channels/index.md @@ -94,6 +94,10 @@ With [**native function calling**](/features/extensibility/plugin/tools#tool-cal This removes the need to manually bridge information between private chats and shared channels. The AI does it for you. +:::tip Community action: Forward to Channel +If you want a one-click path from a chat message into a channel, the community **[Forward to Channel](https://openwebui.com/posts/b60c1f03-e29c-47c0-862c-3741a382616e)** action adds a button to each assistant message that posts the reply (or a selection) into a channel of your choice. Useful for promoting good answers from private chats into team-visible spaces without copy-paste. +::: + --- ## Getting Started diff --git a/docs/features/chat-conversations/chat-features/chatshare.md b/docs/features/chat-conversations/chat-features/chatshare.md index 3ef42fdf0b..bd7fbfde68 100644 --- a/docs/features/chat-conversations/chat-features/chatshare.md +++ b/docs/features/chat-conversations/chat-features/chatshare.md @@ -46,8 +46,6 @@ Note: You can change the permission level of your shared chats on the community ::: -Example of a shared chat to the community platform website: https://openwebui.com/c/iamg30/5e3c569f-905e-4d68-a96d-8a99cc65c90f - #### Copying a Share Link When you select `Copy Link`, a unique share link is generated that can be shared with others. diff --git a/docs/features/chat-conversations/chat-features/code-execution/index.md b/docs/features/chat-conversations/chat-features/code-execution/index.md index 6037062d33..86b7add586 100644 --- a/docs/features/chat-conversations/chat-features/code-execution/index.md +++ b/docs/features/chat-conversations/chat-features/code-execution/index.md @@ -7,7 +7,7 @@ Open WebUI offers powerful code execution capabilities directly within your chat ## Key Features -- **Code Interpreter Capability**: Enable models to autonomously write and execute Python code as part of their responses. Works with both Default Mode (XML-based) and Native Mode (tool calling via `execute_code`). +- **Code Interpreter Capability**: Enable models to autonomously write and execute Python code as part of their responses. Runs via the `execute_code` tool in Native (Agentic) Mode — the only supported tool-calling mode. An older XML-based integration exists for legacy Default Mode but is unsupported; new deployments should use Native Mode. - **Python Code Execution**: Run Python scripts directly in your browser using Pyodide, or on a server using Jupyter. Supports popular libraries like pandas and matplotlib with no setup required. diff --git a/docs/features/chat-conversations/chat-features/follow-up-prompts.md b/docs/features/chat-conversations/chat-features/follow-up-prompts.md index d85f29cc6d..cf354b12f2 100644 --- a/docs/features/chat-conversations/chat-features/follow-up-prompts.md +++ b/docs/features/chat-conversations/chat-features/follow-up-prompts.md @@ -44,4 +44,4 @@ Controls what happens when you click a follow-up prompt. ## Regenerating Follow-Ups -If you want to regenerate follow-up suggestions for a specific response, you can use the [Regenerate Followups](https://openwebui.com/f/silentoplayz/regenerate_followups) action button from the community. +If you want to regenerate follow-up suggestions for a specific response, you can use the [Regenerate Follow-ups](https://openwebui.com/posts/9b5ac6d6-dfd6-4cad-bc1d-5518b138f22d) action button from the community. diff --git a/docs/features/chat-conversations/web-search/agentic-search.mdx b/docs/features/chat-conversations/web-search/agentic-search.mdx index 37ea410d72..5003226134 100644 --- a/docs/features/chat-conversations/web-search/agentic-search.mdx +++ b/docs/features/chat-conversations/web-search/agentic-search.mdx @@ -37,9 +37,9 @@ To unlock these features, your model must support native tool calling and have s 5. **Use a Quality Model**: Ensure you're using a frontier model with strong reasoning capabilities for best results. :::tip Model Capability, Default Features, and Chat Toggle -In **Native Mode**, the `search_web` and `fetch_url` tools require both the **Web Search** capability to be enabled *and* **Web Search** to be checked under **Default Features** in the model settings (or toggled on in the chat). If either is missing, the tools will not be injected — even though other builtin tools may still appear. +In **Native Mode** (the supported mode), the `search_web` and `fetch_url` tools require both the **Web Search** capability to be enabled *and* **Web Search** to be checked under **Default Features** in the model settings (or toggled on in the chat). If either is missing, the tools will not be injected — even though other builtin tools may still appear. -In **Default Mode** (non-native), the chat toggle controls whether web search is performed via RAG-style injection. +Default Mode's RAG-style injection behavior is documented here only for legacy deployments. Default Mode is no longer supported; all models should be configured for Native Mode. **Important**: If you disable the `web_search` capability on a model but use Native Mode, the tools won't be available even if you manually toggle Web Search on in the chat. ::: diff --git a/docs/features/extensibility/plugin/development/events.mdx b/docs/features/extensibility/plugin/development/events.mdx index 395e136e2f..f22c3dd608 100644 --- a/docs/features/extensibility/plugin/development/events.mdx +++ b/docs/features/extensibility/plugin/development/events.mdx @@ -357,7 +357,7 @@ While this event can technically be emitted from any plugin type (tools, pipes, * **Chat Overview**: Favorited messages (pins) are highlighted in the conversation overview, making it easier for users to locate key information later. #### Example: "Pin Message" Action -For a practical implementation of this event in a real-world plugin, see the **[Pin Message Action on Open WebUI Community](https://openwebui.com/posts/pin_message_action_143594d1)**. This action demonstrates how to toggle the favorite status in the database and immediately sync the UI using the `chat:message:favorite` event. +For a practical implementation of this event in a real-world plugin, see the **[Pin Message Action on Open WebUI Community](https://openwebui.com/posts/143594d1-0838-4f9a-9af2-b94d2952f7ba)**. This action demonstrates how to toggle the favorite status in the database and immediately sync the UI using the `chat:message:favorite` event. --- diff --git a/docs/features/extensibility/plugin/development/rich-ui.mdx b/docs/features/extensibility/plugin/development/rich-ui.mdx index ae159a89cd..c3d78a9844 100644 --- a/docs/features/extensibility/plugin/development/rich-ui.mdx +++ b/docs/features/extensibility/plugin/development/rich-ui.mdx @@ -247,6 +247,19 @@ If your Rich UI embed needs to trigger downloads, interact with Open WebUI's fro As an alternative for ephemeral interactions that need full page access, consider using the [`execute` event](/features/extensibility/plugin/development/events#execute-works-with-both-__event_call__-and-__event_emitter__) instead, which runs unsandboxed in the main page context. ::: +:::tip Community Showcase: Streaming Rich UI with same-origin +If you want to see how far Rich UI can go when same-origin is enabled, take a look at the community **[Inline Visualizer v2](https://github.com/Classic298/open-webui-plugins)** tool (also on the community site via the [Show-and-tell discussion](https://github.com/open-webui/open-webui/discussions/23901)). + +It demonstrates patterns that aren't in the basic docs: + +- **Live streaming HTML/SVG.** The tool returns an empty wrapper; the model then emits markup inline between plain-text `@@@VIZ-START / @@@VIZ-END` markers in its normal response. A same-origin observer inside the iframe tails the parent chat's DOM, extracts the growing block, and reconciles new nodes into the iframe as tokens arrive — so dashboards and diagrams paint live, token-by-token, instead of popping in at the end of the stream. +- **Bidirectional bridges.** `sendPrompt(text)` turns any clickable node into a follow-up user message. `saveState(k, v)` / `loadState(k, fallback)` proxies parent `localStorage` scoped per-message so sliders and toggles survive reloads. `copyText`, `toast(msg, kind)`, and `openLink` round it out. +- **A shipped design system.** Theme-aware CSS variables, a 9-ramp color palette, SVG utility classes, auto light/dark adaptation, and 230 localized strings across 46 languages — all delivered from a single tool with no core changes. +- **Incremental DOM reconciliation.** A safe-cut HTML parser flushes the longest valid prefix on every tick; the reconciler only appends new nodes so existing elements never re-mount and animations never re-trigger during the stream. + +This is a useful reference when you're trying to decide whether a generative-UI / streaming-UI feature needs a core change or can live purely in plugin-land. (Spoiler: almost always the latter.) +::: + ## Rendering Position - **Tool embeds** inside a tool call result render **inline** at the tool call indicator (the "View Result from..." line) diff --git a/docs/features/extensibility/plugin/functions/action.mdx b/docs/features/extensibility/plugin/functions/action.mdx index b87a952631..308f265862 100644 --- a/docs/features/extensibility/plugin/functions/action.mdx +++ b/docs/features/extensibility/plugin/functions/action.mdx @@ -17,7 +17,7 @@ Action functions should always be defined as `async`. The backend is progressive Actions are admin-managed functions that extend the chat interface with custom interactive capabilities. When a message is generated by a model that has actions configured, these actions appear as clickable buttons above the message. -A scaffold of Action code can be found [in the community section](https://openwebui.com/f/hub/custom_action/). For more Action Function examples built by the community, visit [https://openwebui.com/search](https://openwebui.com/search). +A minimal scaffold is shown in the [Function Structure](#function-structure) section below. For real-world Action examples built by the community, browse [openwebui.com](https://openwebui.com/). An example of a graph visualization Action can be seen in the video below. diff --git a/docs/features/extensibility/plugin/functions/pipe.mdx b/docs/features/extensibility/plugin/functions/pipe.mdx index 9bc8fa3d14..7c7b5beee9 100644 --- a/docs/features/extensibility/plugin/functions/pipe.mdx +++ b/docs/features/extensibility/plugin/functions/pipe.mdx @@ -137,7 +137,8 @@ Let's dive into a practical example where we'll create a Pipe that proxies reque ```python from pydantic import BaseModel, Field -import requests +import httpx + class Pipe: class Valves(BaseModel): @@ -157,40 +158,37 @@ class Pipe: def __init__(self): self.valves = self.Valves() - def pipes(self): - if self.valves.OPENAI_API_KEY: - try: - headers = { - "Authorization": f"Bearer {self.valves.OPENAI_API_KEY}", - "Content-Type": "application/json", - } + async def pipes(self): + if not self.valves.OPENAI_API_KEY: + return [{"id": "error", "name": "API Key not provided."}] - r = requests.get( + headers = { + "Authorization": f"Bearer {self.valves.OPENAI_API_KEY}", + "Content-Type": "application/json", + } + + try: + async with httpx.AsyncClient() as client: + r = await client.get( f"{self.valves.OPENAI_API_BASE_URL}/models", headers=headers ) + r.raise_for_status() models = r.json() - return [ - { - "id": model["id"], - "name": f'{self.valves.NAME_PREFIX}{model.get("name", model["id"])}', - } - for model in models["data"] - if "gpt" in model["id"] - ] - - except Exception as e: - return [ - { - "id": "error", - "name": "Error fetching models. Please check your API Key.", - }, - ] - else: + + return [ + { + "id": model["id"], + "name": f'{self.valves.NAME_PREFIX}{model.get("name", model["id"])}', + } + for model in models["data"] + if "gpt" in model["id"] + ] + except Exception: return [ { "id": "error", - "name": "API Key not provided.", - }, + "name": "Error fetching models. Please check your API Key.", + } ] async def pipe(self, body: dict, __user__: dict): @@ -205,24 +203,35 @@ class Pipe: # Update the model id in the body payload = {**body, "model": model_id} - try: - r = requests.post( - url=f"{self.valves.OPENAI_API_BASE_URL}/chat/completions", - json=payload, - headers=headers, - stream=True, - ) - - r.raise_for_status() + url = f"{self.valves.OPENAI_API_BASE_URL}/chat/completions" + try: if body.get("stream", False): - return r.iter_lines() - else: + async def event_stream(): + async with httpx.AsyncClient(timeout=None) as client: + async with client.stream( + "POST", url, json=payload, headers=headers + ) as r: + r.raise_for_status() + async for line in r.aiter_lines(): + yield line + + return event_stream() + + async with httpx.AsyncClient(timeout=None) as client: + r = await client.post(url, json=payload, headers=headers) + r.raise_for_status() return r.json() except Exception as e: return f"Error: {e}" ``` +:::tip Use an async HTTP client +This example uses [`httpx.AsyncClient`](https://www.python-httpx.org/async/) instead of `requests` because both `pipes()` and `pipe()` run inside Open WebUI's async event loop. Calling the synchronous `requests` library from an `async def` method blocks the loop for the full duration of the HTTP request (and, for streaming, the entire stream), which starves every other concurrent request on the instance. `httpx` is async-native, already a dependency, and a drop-in replacement for the common patterns. + +If you must use a synchronous third-party library in an async handler, wrap the blocking call with `await anyio.to_thread.run_sync(...)` so it runs on a worker thread instead of the event loop. +::: + ### Detailed Breakdown #### Valves Configuration @@ -261,8 +270,8 @@ class Pipe: 1. **Prepare Headers**: Sets up the headers with the API key and content type. 2. **Extract Model ID**: Extracts the actual model ID from the selected model name. 3. **Prepare Payload**: Updates the body with the correct model ID. - 4. **Make API Request**: Sends a POST request to the OpenAI API's chat completions endpoint. - 5. **Handle Streaming**: If `stream` is `True`, returns an iterable of lines. + 4. **Make API Request**: Sends a POST request to the OpenAI API's chat completions endpoint via an `httpx.AsyncClient`. + 5. **Handle Streaming**: If `stream` is `True`, returns an async generator that yields SSE lines from the upstream response. 6. **Error Handling**: Catches exceptions and returns an error message. ### Extending the Proxy Pipe diff --git a/docs/features/extensibility/plugin/index.mdx b/docs/features/extensibility/plugin/index.mdx index 45fb2972ab..aabc13ecf0 100644 --- a/docs/features/extensibility/plugin/index.mdx +++ b/docs/features/extensibility/plugin/index.mdx @@ -31,6 +31,16 @@ title: "Tools & Functions (Plugins)" Getting started with Tools and Functions is easy because everything’s already built into the core system! You just **click a button** and **import these features directly from the community**, so there’s no coding or deep technical work required. +:::tip Plugins can do *way* more than you think — and way more than is shown here +The pages that follow document every capability the plugin system exposes: every class shape, every lifecycle method, every `__arg__`, every event type, every return contract, every hook that touches the pipeline. That surface is *complete*. + +What's **not** documented — because it can't be — is **what to use it for**. The ideas. The creative combinations. The "huh, I didn't realize you could do that with just an inlet filter and a `saveState` bridge" moments. Those live in the community's heads, not in these docs. + +These are **developer docs**. The primitives are all here; the creativity is on you (and on the thousands of community plugins that have already stretched the system into shapes nobody on the core team predicted — live-streaming HTML dashboards, per-user cost enforcement, summarize-and-replace context managers, bidirectional interactive UIs, entire embedded design systems, in-chat MCP apps, forensic watermarking, and so on). + +If you're weighing a feature request and thinking *"this needs a core change,"* ask *"can this be a plugin?"* first. Almost always the answer is yes. +::: + ## What are "Tools" and "Functions"? Let's start by thinking of **Open WebUI** as a "base" software that can do many tasks related to using Large Language Models (LLMs). But sometimes, you need extra features or abilities that don't come *out of the box*—this is where **tools** and **functions** come into play. diff --git a/docs/features/extensibility/plugin/tools/development.mdx b/docs/features/extensibility/plugin/tools/development.mdx index 651c6b9044..8c86128c01 100644 --- a/docs/features/extensibility/plugin/tools/development.mdx +++ b/docs/features/extensibility/plugin/tools/development.mdx @@ -123,24 +123,16 @@ class Tools: Event Emitters are used to add additional information to the chat interface. Similarly to Filter Outlets, Event Emitters are capable of appending content to the chat. Unlike Filter Outlets, they are not capable of stripping information. Additionally, emitters can be activated at any stage during the Tool. -**⚠️ CRITICAL: Function Calling Mode Compatibility** - -Event Emitter behavior is **significantly different** depending on your function calling mode. The function calling mode is controlled by the `function_calling` parameter: - -- **Default Mode**: Uses traditional function calling approach with wider model compatibility -- **Native Mode (Agentic Mode)**: Leverages model's built-in tool-calling capabilities for reduced latency and autonomous behavior - -Before using event emitters, you must understand these critical limitations: +:::danger Author tools for Native Mode +Default Mode is **legacy and no longer supported** — see the [Tool Calling Modes guide](/features/extensibility/plugin/tools#tool-calling-modes-default-vs-native) for the full policy. Write your tools to work correctly under **Native (Agentic) Mode**, which is the only supported mode going forward. The event-emitter compatibility matrix below still documents Default Mode behavior for historical reference and for maintainers of pre-existing tools that haven't been migrated yet — but new tools should not depend on Default-Mode-only event types. If your tool's UX fundamentally requires an event type that only works in Default Mode (`message`, `chat:message:delta`, `chat:message`, `replace` mid-stream), redesign the UX around Native-compatible events (`status`, `notification`, `citation`, `chat:message:files`, `confirmation`, `chat:message:follow_ups`) rather than requiring users to switch their model to legacy mode. +::: -- **Default Mode** (`function_calling = "default"`): Full event emitter support with all event types working as expected -- **Native Mode (Agentic Mode)** (`function_calling = "native"`): **Limited event emitter support** - many event types don't work properly due to native function calling bypassing Open WebUI's custom tool processing pipeline +Event Emitter behavior differs between the two function calling modes. The function calling mode is controlled by the `function_calling` parameter: -**When to Use Each Mode:** -For a comprehensive guide on choosing a function calling mode, including model requirements and administrator setup, refer to the [**Central Tool Calling Guide**](/features/extensibility/plugin/tools#tool-calling-modes-default-vs-native). +- **Native Mode (Agentic Mode)** (`function_calling = "native"`) — the only supported mode. Uses the model's structured tool-call API. **Limited event-emitter surface** — see the matrix below for exactly which event types are supported. +- **Default Mode** (`function_calling = "default"`) — legacy, prompt-injection-based. Full event-emitter surface, but the mode itself is unsupported and should not be selected for new deployments. -In general: -- **Use Default Mode** when you need full event emitter functionality, complex tool interactions, or real-time UI updates. -- **Use Native Mode (Agentic Mode)** when you have a quality model and need reduced latency, autonomous tool selection, and system-level tools (Agentic Research, Knowledge Base exploration, Memory) without complex custom emitter requirements. +For the full mode policy, model requirements, and configuration, see the [**Tool Calling Modes guide**](/features/extensibility/plugin/tools#tool-calling-modes-default-vs-native). In short: **use Native Mode**; the matrix below tells you which event types work there. #### Function Calling Mode Configuration @@ -1185,58 +1177,43 @@ await __event_emitter__({ #### Comprehensive Function Calling Mode Guide -Choosing the right function calling mode is crucial for your tool's functionality. This guide helps you make an informed decision based on your specific requirements. - -**Mode Comparison Overview:** - -| Aspect | Default Mode | Native Mode | -|--------|-------------|-------------| -| **Latency** | Higher - processes through Open WebUI pipeline | Lower - direct model handling | -| **Event Support** | ✅ Full - all event types work perfectly | ⚠️ Limited - many event types broken | -| **Complexity** | Handles complex tool interactions well | Best for simple tool calls | -| **Compatibility** | Works with all models | Requires models with native tool calling | -| **Streaming** | Perfect for real-time updates | Poor - content gets overwritten | -| **Citations** | ✅ Full support | ✅ Full support | -| **Status Updates** | ✅ Full support | ✅ Full support | -| **Message Events** | ✅ Full support | ❌ Broken - content disappears | - -**Decision Framework:** +:::danger Write tools for Native Mode — Default Mode is unsupported +Default Mode is **legacy and no longer supported**. New tools should be written to work under Native Mode. The mode-comparison details below are retained so that maintainers of legacy Default-Mode tools understand the gap they need to close when migrating — they are **not** a recommendation to choose Default Mode for new work. -1. **Do you need real-time content streaming, live updates, or dynamic message modification?** - - **Yes** → Use **Default Mode** (Native mode will break these features) - - **No** → Either mode works +If your tool design depends on a Default-Mode-only event type, redesign the UX around the Native-compatible surface (`status`, `notification`, `citation`, `chat:message:files`, `confirmation`, `chat:message:follow_ups`, plus returning a final content block from the tool method). Requiring users to switch their model to legacy mode to run your tool is not acceptable going forward. +::: -2. **Is your tool primarily for simple data retrieval or computation?** - - **Yes** → **Native Mode** is fine (lower latency) - - **No** → Consider **Default Mode** for complex interactions +**Mode Comparison Overview** *(for migration reference only — Native is the only supported target)*: -3. **Do you need maximum performance and minimal latency?** - - **Yes** → **Native Mode** (if compatible with your features) - - **No** → **Default Mode** provides more features +| Aspect | Default Mode (Legacy / Unsupported) | Native Mode (Supported) | +|--------|-------------|-------------| +| **Status** | ❌ Legacy, no longer supported | ✅ Required for new tools | +| **Latency** | Higher — full prompt-injection pipeline | Lower — direct tool-call API | +| **KV Cache** | ❌ Broken every turn | ✅ Preserved | +| **Event Support** | Full event-type surface | Limited surface (see matrix above) | +| **Streaming** | Mid-stream content rewrites work | Mid-stream rewrites are overwritten by completion snapshots | +| **Status / Notifications / Citations** | ✅ | ✅ | +| **Message content events (`message`, `chat:message`, `replace`)** | ✅ | ❌ Overwritten | +| **Built-in system tools (Memory, Notes, Knowledge, Web Search, Image Gen, Code Interpreter)** | ❌ | ✅ | + +**How to design for Native Mode:** + +- Deliver your tool's primary output as the **return value** of the tool method. That content is emitted by the model as part of its response and is not subject to the event-overwrite issue. +- Use `status` and `notification` for progress and user-facing alerts — they are fully supported in Native Mode. +- Use `citation` / `source` events for references — fully supported in Native Mode. +- Use `confirmation` / `input` via `__event_call__` for interactive flows — fully supported in Native Mode. +- Attach files via the `chat:message:files` event — fully supported in Native Mode. +- Avoid depending on `message`, `chat:message:delta`, `chat:message`, or `replace` mid-stream — these are the four events that break under Native Mode. If you need progressive output, emit a series of `status` updates and return the final content from the tool. -4. **Are you building interactive experiences, dashboards, or multi-step workflows?** - - **Yes** → **Default Mode** required - - **No** → Either mode works +
+Migrating a Default-Mode tool to Native Mode -**Recommended Usage Patterns:** +Common rewrites when moving a Default-Mode tool to Native: -
-🏆 Best Practices for Mode Selection - -**Choose Default Mode For:** -- Tools with progressive content updates -- Interactive dashboards or live data displays -- Multi-step workflows with visual feedback -- Complex tool chains with intermediate results -- Educational tools that show step-by-step processes -- Any tool that needs `message`, `replace`, or `chat:message` events - -**Choose Native Mode For:** -- Simple API calls or database queries -- Basic calculations or data transformations -- Tools that only need status updates and citations -- Performance-critical applications where latency matters -- Simple retrieval tools without complex UI requirements +- Replace mid-stream `message` events that append content with a single final return value containing the assembled content. +- Replace `replace`-based progress bars with a sequence of `status` events (`"description": "Step 3 of 5 — fetching..."`). +- Replace `chat:message`-based dashboards with a return value containing the rendered dashboard (Markdown / HTML is fine). +- Keep `citation`, `notification`, `confirmation`, and file attachments as-is — they work identically in both modes. **Universal Compatibility Pattern:** ```python @@ -1315,8 +1292,8 @@ async def mode_adaptive_tool( **Common Issues and Solutions:** **Issue: Content appears then disappears** -- **Cause**: Using message events in Native mode -- **Solution**: Switch to Default mode or use status events instead +- **Cause**: Using `message` / `chat:message` / `replace` events in Native Mode — completion snapshots overwrite them. +- **Solution**: Return the content from the tool method and use `status` events for progress instead. Do **not** switch to Default Mode — Default Mode is legacy and no longer supported. **Issue: Tool seems unresponsive** - **Cause**: Function calling not enabled for model diff --git a/docs/features/extensibility/plugin/tools/index.mdx b/docs/features/extensibility/plugin/tools/index.mdx index ac1b97db6f..44bbbdaa8d 100644 --- a/docs/features/extensibility/plugin/tools/index.mdx +++ b/docs/features/extensibility/plugin/tools/index.mdx @@ -85,7 +85,7 @@ While chatting, click the **➕ (plus)** icon in the input area. You’ll see a 4. ✅ Check the Tools you want this model to always have access to by default. 5. Click **Save**. -You can also let your LLM auto-select the right Tools using the [**AutoTool Filter**](https://openwebui.com/f/hub/autotool_filter/). +For models that support it, **Native tool calling mode** (see [Tool Calling Modes](#tool-calling-modes-default-vs-native) below) lets the model itself decide which of the attached tools to call on each turn. This replaces the older prompt-injection "auto-tool" filter approach and is the recommended way to let the model auto-select tools. :::caution Attached Tools Still Require User Access Attaching a workspace tool to a model does **not** bypass access control. When a user chats with the model, Open WebUI checks whether **that specific user** has read access to each attached tool. Tools the user cannot access are silently skipped — the model won't be able to call them. @@ -99,23 +99,30 @@ Attaching a workspace tool to a model does **not** bypass access control. When a ## Tool Calling Modes: Default vs. Native -Open WebUI offers two distinct ways for models to interact with tools: a standard **Default Mode** and a high-performance **Native Mode (Agentic Mode)**. Choosing the right mode depends on your model's capabilities and your performance requirements. +:::danger Default Mode Is Legacy and No Longer Supported — Use Native Mode +**All models should be configured for Native (Agentic) Mode.** Default Mode is a legacy prompt-injection fallback that predates widespread native function-calling support in LLMs. It is **no longer supported**: it will not receive feature work, bug fixes, or built-in system tools, and it is incompatible with modern Open WebUI features (Agentic Research, Interleaved Thinking, built-in Memory/Notes/Knowledge/Channels tools, web-search/image-gen/code-interpreter tool injection). -### 🟡 Default Mode (Prompt-based) — Legacy +Default Mode still exists in the dropdown only so that instances with older tool code paths keep running while admins migrate. If your deployment is still on Default Mode, switch every model to Native Mode (see [How to Enable Native Mode](#how-to-enable-native-mode-agentic-mode) below). If a specific model has trouble with Native Mode, the correct fix is to pick a better model for tool calling — not to fall back to Default. +::: + +Open WebUI exposes two modes in **Model Settings → Advanced Params → Function Calling**: **Native Mode (Agentic Mode)** — the only supported mode — and **Default Mode**, kept in the UI for backwards compatibility only. + +### 🔴 Default Mode (Prompt-based) — Legacy / Unsupported -:::warning Legacy Mode -Default Mode is maintained purely for **backwards compatibility** with older or smaller models that lack native function-calling support. It is considered **legacy** and should not be used when your model supports native tool calling. New deployments should use **Native Mode** exclusively. +:::warning Unsupported +Default Mode is **legacy and no longer supported**. It is documented here for reference only. New deployments must use Native Mode; existing deployments should migrate. Bug reports, feature requests, and support questions about Default Mode behavior will not be actioned. ::: -In Default Mode, Open WebUI manages tool selection by injecting a specific prompt template that guides the model to output a tool request. -- **Compatibility**: Works with **practically any model**, including older or smaller local models that lack native function-calling support. -- **Flexibility**: Highly customizable via prompt templates. -- **Caveats**: - - Can be slower (requires extra tokens) and less reliable for complex, multi-step tool chaining. - - **Breaks KV cache**: The injected prompt changes every turn, preventing LLM engines from reusing cached key-value pairs. This increases latency and cost for every message in the conversation. - - Does not support built-in system tools (memory, notes, channels, etc.). +In Default Mode, Open WebUI manages tool selection by injecting a long prompt template that guides the model to output a tool request in a bespoke format. It was a reasonable approach in 2023; it has been obsolete since mainstream providers and open-weights models gained proper function-calling APIs. -### 🟢 Native Mode (Agentic Mode / System Function Calling) — Recommended +Why it is legacy: +- **Breaks KV cache.** The injected prompt changes every turn, preventing LLM engines from reusing cached key-value pairs. Every message pays the full prefill cost again. +- **Higher latency and token cost.** Bulky tool-description prompts on every turn. +- **Unreliable for multi-step chaining.** Parsing natural-language tool requests is fragile compared to structured tool calls. +- **Cannot access built-in system tools.** Memory, Notes, Knowledge, Channels, Agentic Research, Interleaved Thinking, and the tool-injected Web Search / Image Generation / Code Interpreter features are **Native-only**. +- **Does not support modern capabilities.** Every new feature shipped since 2024 targets Native Mode. + +### 🟢 Native Mode (Agentic Mode / System Function Calling) — The Only Supported Mode Native Mode (also called **Agentic Mode**) leverages the model's built-in capability to handle tool definitions and return structured tool calls (JSON). This is the **recommended mode** for all models that support it — which includes the vast majority of modern models (2024+). :::warning Model Quality Matters @@ -133,13 +140,22 @@ Native Mode (also called **Agentic Mode**) leverages the model's built-in capabi #### How to Enable Native Mode (Agentic Mode) Native Mode can be enabled at two levels: -1. **Global/Administrator Level (Recommended)**: - * Navigate to **Admin Panel > Settings > Models**. - * Scroll to **Model Specific Settings** for your target model. - * Under **Advanced Parameters**, find the **Function Calling** dropdown and select `Native`. -2. **Per-Chat Basis**: +1. **Universal Default for Every Model (Fastest — Recommended)**: + * Navigate to **Admin Panel → Settings → Models**. + * Click the **gear icon** (⚙️) at the **top right** of the models list — this opens **global model parameters**, which apply to *every* model in your instance (current and future) unless a specific model overrides them. + * Under **Advanced Parameters**, set **Function Calling** to `Native`. + * Save. All existing models that haven't explicitly set their own value, and all models you add later, inherit `Native`. You do **not** need to edit them one by one. +2. **Per-Model Override**: + * In **Admin Panel → Settings → Models**, pick a specific model and click its edit button. + * Under **Advanced Parameters**, set **Function Calling** to `Native`. This value overrides the global default for that model only. + * Use this when a specific model needs different parameters — otherwise prefer the global setting. +3. **Per-Chat Override**: * Inside a chat, click the ⚙️ **Chat Controls** icon. - * Go to **Advanced Params** and set **Function Calling** to `Native`. + * Under **Advanced Params**, set **Function Calling** to `Native`. Applies to that chat only. + +:::tip Set Function Calling Globally — Once, For All Models +Tired of switching every model to Native one at a time? The **global model parameters** menu (the gear icon at the top right of **Admin Panel → Settings → Models**) lets you configure any advanced parameter — `function_calling`, temperature, top_p, max_tokens, etc. — **once, for every model in your Open WebUI instance**. Values set there become the default for every existing model that hasn't overridden them *and* every model you add later. Set `Function Calling = Native` there, save, done. +::: ![Chat Controls](/images/features/plugin/tools/chat-controls.png) @@ -157,8 +173,8 @@ For reliable agentic tool calling, use high-tier frontier models: These models excel at multi-step reasoning, proper JSON formatting, and autonomous tool selection. ::: -- **Large Local Models**: Some large local models (e.g., Qwen 3 32B, Llama 3.3 70B) can work with Native Mode, but results vary significantly by model quality. -- **Small Local Models Warning**: **Small local models** (under 30B parameters) often struggle with Native Mode. They may produce malformed JSON, fail to follow strict state management, or make poor tool selection decisions. For these models, **Default Mode** is usually more reliable. +- **Large Local Models**: Large local models (e.g., Qwen 3 32B, Llama 3.3 70B, DeepSeek V3/R1) work well with Native Mode; results scale with model quality. +- **Small Local Models**: Small local models (under ~30B parameters) often produce malformed JSON or fail multi-step tool chains even in Native Mode. **The fix is to use a stronger model for tool-calling workloads**, not to fall back to Default Mode — Default is legacy and unsupported. If your hardware forces you to use a small model, accept that tool calling will be unreliable at this tier, or offload only tool-using conversations to a cloud model. #### Known Model-Specific Issues @@ -178,23 +194,24 @@ These models excel at multi-step reasoning, proper JSON formatting, and autonomo - Complex multi-step workflows (15-30 tool calls) may cause "schema drift" where argument formats degrade **Workarounds**: -- **Use Default Mode** (prompt-based) instead of Native Mode for DeepSeek — this is the recommended approach -- Lower temperature when using tool calling -- Limit multi-round tool calling sessions -- Consider alternative models for agentic workflows +- **Use a different model for agentic workloads.** Claude 4.5 Sonnet, GPT-5, Gemini 3 Flash, and MiniMax M2.5 are all reliable in Native Mode and are the recommended choice when DeepSeek V3.2 misbehaves. +- Lower temperature when using tool calling with DeepSeek V3.2. +- Limit multi-round tool-calling sessions. -**This is a DeepSeek model/API issue**, not an Open WebUI issue. Open WebUI correctly sends tools in standard OpenAI format — the malformed output originates from DeepSeek's non-standard internal format. +Default Mode is **not** a supported workaround even for DeepSeek — it is legacy and will not be extended to cover this case. This is a DeepSeek model/API issue, not an Open WebUI issue. Open WebUI correctly sends tools in standard OpenAI format; the malformed output originates from DeepSeek's non-standard internal DSML format. ::: -| Feature | Default Mode (Legacy) | Native Mode (Recommended) | +| Feature | Default Mode (Legacy / Unsupported) | Native Mode (The Only Supported Mode) | |:---|:---|:---| -| **Status** | Legacy / backwards compat | ✅ Recommended | +| **Status** | ❌ Legacy, no longer supported | ✅ Required — all models should use this | | **Latency** | Medium/High | Low | -| **KV Cache** | ❌ Can break cache | ✅ Cache-friendly | -| **Model Compatibility** | Universal | Requires Tool-Calling Support | -| **Logic** | Prompt-based (Open WebUI) | Model-native (API/Ollama) | -| **System Tools** | ❌ Not available | ✅ Full access | -| **Complex Chaining** | ⚠️ Limited | ✅ Excellent | +| **KV Cache** | ❌ Breaks cache on every turn | ✅ Cache-friendly | +| **Model Compatibility** | Any text model (obsolete concern) | Every mainstream model since 2024 | +| **Logic** | Prompt-injection parsed by Open WebUI | Structured tool calls via provider API | +| **System Tools** | ❌ Not available | ✅ Full access (Memory, Notes, Knowledge, Channels, Web Search, Image Gen, Code Interpreter) | +| **Agentic Research / Interleaved Thinking** | ❌ Unsupported | ✅ Supported | +| **Complex Chaining** | ⚠️ Unreliable | ✅ Excellent | +| **Future development** | ❌ None | ✅ All new features target this mode | ### Built-in System Tools (Native/Agentic Mode) diff --git a/docs/getting-started/essentials.mdx b/docs/getting-started/essentials.mdx new file mode 100644 index 0000000000..0d6a755199 --- /dev/null +++ b/docs/getting-started/essentials.mdx @@ -0,0 +1,209 @@ +--- +sidebar_position: 5 +title: "Essentials for New Users" +--- + +# Essentials for New Users + +So you've installed Open WebUI, connected a provider, and had your first conversation. Now what? + +This page walks through the handful of things that make the difference between *"a chat UI"* and *"a setup that actually works well day-to-day."* Nothing here is required — you can ignore it and keep chatting — but if you've ever wondered "can Open WebUI do X?" the answer is almost certainly **yes, with one of these pieces**. + +Work through in order, or jump to the part that matches your question: + +1. [**Plugins** — what they are and when to install one](#-plugins-the-extensibility-story) +2. [**Task models** — the invisible model behind the UI, and the hidden costs of the default](#-task-models-the-invisible-model-behind-the-ui) +3. [**Context management** — why long chats eventually error out](#-context-management-why-long-chats-break) +4. [**Tool calling** — letting the model do things, not just talk](#-tool-calling-letting-the-model-do-things) +5. [**Basic RAG** — chatting with your own documents](#-basic-rag-chatting-with-your-own-documents) +6. [**Open Terminal** — giving the model a real computer](#-open-terminal-giving-the-model-a-real-computer) + +--- + +## 🧩 Plugins: the extensibility story + +Open WebUI is intentionally small at the core. Most of the "wow" things people show in demos — auto-translation, token/cost tracking, image generation buttons, custom post-processing, provider integrations that aren't OpenAI or Ollama — are **plugins**, not built-in features. Knowing the plugin landscape is the biggest single unlock for a new user. + +There are two plugin families, and the name of the family tells you what it does: + +| Family | What it does | Where it runs | Examples | +| :--- | :--- | :--- | :--- | +| **Tools** | Give the model new abilities ("call this function when you need X") | In Open WebUI's process, invoked by the model during a chat | Langfuse / OpenLit observability, Home Assistant control, arXiv / PubMed lookups, Wolfram Alpha, Jira / Linear ticket creation, SQL queries against your own DB | +| **Functions — Pipes** | Add a new "model" to the model picker, backed by custom code | Same | Model-routing pipes (cheap vs. expensive based on prompt), multi-step agent loops, proprietary corporate-LLM backends without an OpenAI-compatible endpoint | +| **Functions — Filters** | Modify every request and/or response as it passes through | Same, on every chat turn | Context trimming, PII scrubbing, token / cost counting, Langfuse tracing, response reformatting | +| **Functions — Actions** | Add a button under each message that runs custom code | Same, when the user clicks | "Regenerate follow-ups", "Translate reply", "Pin message", "Save to Knowledge" | + +**How you install them:** the [**Open WebUI Community site**](https://openwebui.com/) hosts the one-click catalog — pick one, click "Get", paste into **Admin Panel → Functions** (or **Tools**), flip it on, and configure its **valves** (the plugin's settings). + +**When to install one:** when you think "it would be nice if Open WebUI did X." It almost certainly already does — via a plugin. **Always browse the community site first** — there are thousands of Tools, Filters, Actions, and Pipes already written, and the one you need is usually already there. Even if nothing matches exactly, the closest hit is usually ~20 lines off from what you want and you can fork it from the admin panel. + +Reference reading: +- [Plugin overview](/features/extensibility) +- [Tools reference](/features/extensibility/plugin/tools) +- [Functions reference (Pipes/Filters/Actions)](/features/extensibility/plugin/functions) + +--- + +## 🤖 Task models: the invisible model behind the UI + +This is the highest-leverage change you can make right after install, because the default is silently costing you money, latency, and patience. + +Every time Open WebUI needs a short bit of "thinking" for a UI feature — writing a chat **title** for the sidebar, generating **tags**, suggesting **follow-up questions**, powering the **autocomplete** in the prompt box — it calls a **Task Model**. By default that task model is whatever main model you're currently chatting with, which means: + +- Your expensive flagship model gets hit every time you open a new chat just to write "Groceries list." +- On a slow local model, every keystroke feels laggy because autocomplete is blocking on a 30B-parameter model. +- A reasoning model (o1, r1, Claude with extended thinking) spends 5 seconds *thinking* before producing the three-word title. + +These costs are easy to miss because they happen in the background. Fix them first. + +**Fix:** in **Admin Panel → Settings → Interface**, set a dedicated Task Model. Two fields, because the right choice depends on what your main chat model is: + +- **Task Model (External)** — used when you are chatting with a cloud model (OpenAI, Anthropic, etc.). Set this to a fast, cheap, *non-reasoning* cloud model like `gpt-5-nano`, `gemini-2.5-flash-lite`, or `llama-3.1-8b-instant`. +- **Task Model (Local)** — used when you are chatting with a local model (Ollama, llama.cpp, vLLM). Set this to a tiny local model like `qwen3:1b`, `gemma3:1b`, or `llama3.2:3b`. + +The main chat experience doesn't change. The background chores just stop dragging. + +While you're in the Interface settings, if you are on a low-spec machine or simply don't want some of these features, you can also disable the chores entirely. Each one has both an **admin toggle** in the same page and an **environment variable** — use whichever fits your workflow: + +| Chore | Admin toggle (Settings → Interface) | Env var | +| :--- | :--- | :--- | +| Autocomplete *(the big one — fires on every keystroke)* | **Autocomplete Generation** | `ENABLE_AUTOCOMPLETE_GENERATION=False` | +| Follow-up suggestions | **Follow-up Generation** | `ENABLE_FOLLOW_UP_GENERATION=False` | +| Chat title generation | **Title Generation** | `ENABLE_TITLE_GENERATION=False` | +| Tag generation | **Tags Generation** | `ENABLE_TAGS_GENERATION=False` | + +Autocomplete is the single biggest "make it snappy" toggle on weak hardware — it fires on every keystroke, so a slow task model turns the whole prompt box into molasses. + +More detail: [Performance & RAM → Dedicated Task Models](/troubleshooting/performance#1-dedicated-task-models). + +--- + +## 🧠 Context management: why long chats break + +After enough back-and-forth you will eventually see: + +> `The prompt is too long: 207601, model maximum context length: 202751` + +This error does **not** come from Open WebUI — it comes from your model provider. Every time you send a new message, the *entire* conversation (system prompt + all previous turns + attached files + tool call results + your new message) is sent as the "prompt." When the sum exceeds the model's context window, the provider rejects the request. + +Open WebUI intentionally does not ship a built-in trimmer, because: + +- Every model uses a **different tokenizer** (GPT ≠ Claude ≠ Gemini ≠ GLM ≠ Llama). +- Every model has a **different context window** (8k → 1M). +- Every deployment wants a **different policy** (trim by tokens, by turns, by message count, drop attachments first, summarize older messages, etc.). + +There is no single correct answer. The supported approach is to install a **filter Function** that trims the conversation on your terms. Community filters for most of the common policies already exist and can be installed with one click; if none fits, the code is short enough to copy and adapt. + +➡️ Full guide, including a minimal "newest N turns" filter you can paste into your instance: [Troubleshooting → Context Window / Prompt Too Long](/troubleshooting/context-window). + +--- + +## 🔧 Tool calling: letting the model do things + +Tool calling is what turns an LLM from "a very smart text box" into "an assistant that can actually go look things up, run code, and take actions." You attach a **Tool** to your chat (or your model), and the model decides — mid-response — when to call it, with what arguments. Open WebUI runs it, returns the result, and the model continues. + +Two things every new user should configure: + +### 1. Turn on Native tool calling + +Open WebUI has two tool-calling modes in the UI: **Native** and **Default**. **Default is legacy and no longer supported** — it is kept in the dropdown only so existing deployments keep running during migration. **All models should be configured to use Native Mode.** Native is faster, preserves KV cache, supports built-in system tools (Memory, Notes, Knowledge, Web Search, Image Gen, Code Interpreter), and is the only mode that receives feature work going forward. + +Every mainstream model supports it — OpenAI, Anthropic, Gemini, Llama 3.1+, Qwen 2.5+, DeepSeek, GLM, and essentially any other current model. Turn it on: + +- **Best — once, for every model:** in **Admin Panel → Settings → Models**, click the ⚙️ gear at the **top right** of the models list. That opens **global model parameters** — set **Function Calling = Native** there, save, and every current *and future* model in your instance inherits it. No per-model click-through required. +- Per-model override: **Admin Panel → Settings → Models → [your model] → Advanced Params → Function Calling = Native** +- Per-chat override: in a chat's **Chat Controls** (right sidebar) + +If a tool "isn't being called" on a capable model, 90% of the time Native Mode just needs flipping on. If a specific small model struggles with Native Mode, the fix is to use a stronger model for tool-using conversations — not to fall back to Default Mode. + +### 2. Install a few Tools + +A lot of what people reach for — **web search**, **code execution**, **image generation**, **memory**, **knowledge-base retrieval** — is already built in and doesn't need a community plugin. Turn those on in **Admin Panel → Settings** (Web Search, Code Interpreter, Images, etc.) and attach them to your models. You'll get them injected automatically as built-in system tools in Native Mode. + +Tools from the [community site](https://openwebui.com/) are for everything *not* built in. Good examples to explore: + +- **Observability / cost tracking** — Langfuse, OpenLit, Portkey tools that log every chat turn, token usage, and latency to your own observability stack. Essential once more than a handful of people use your instance. +- **Smart-home / automation integrations** — Home Assistant tools that let the model actually control devices, routines, and scenes from a conversation. +- **Research lookups** — arXiv, PubMed, Semantic Scholar, Wolfram Alpha — the model gets structured results it couldn't recall from training data, with real citations. +- **Issue / ticket / messaging integrations** — create Jira / Linear / GitHub issues, post to Slack or Discord, send an email — the model stops being a read-only assistant. +- **Database / API tools** — expose a read-only SQL query tool against your own database, or a tool that hits your internal API — the model starts answering questions grounded in your real data. +- **Domain tools** — weather, stocks, time, crypto prices, shipping-tracking, recipe APIs, whatever matches your work. + +Tools show up in the `+` menu in the chat input. Enable the ones you want for a given chat; the model only sees the tools you've enabled. + +:::tip Seriously — browse the community site +The list above is a *tiny* sample of what's out there. The [**Open WebUI Community**](https://openwebui.com/) has **thousands** of community-built Tools, Filters, Actions, and Pipes covering use cases nobody on the core team would have thought of. Before you write anything yourself, **browse the community site** — sort by popularity, filter by category, and skim a few pages. You'll almost always find something that does exactly what you need, or is two lines off. One-click install, configure the valves, done. + +This is the single biggest reason to treat Open WebUI as a platform rather than an app. The community is the feature set. +::: + +More detail: +- [Tools reference](/features/extensibility/plugin/tools) +- [Tool-calling modes (Default vs Native)](/features/extensibility/plugin/tools#tool-calling-modes-default-vs-native) +- [Open WebUI Community — Tools, Functions, Models](https://openwebui.com/) + +--- + +## 📚 Basic RAG: chatting with your own documents + +**RAG** (Retrieval-Augmented Generation) is the feature that lets you say "Here's a 400-page PDF, answer my questions about it" without the model having to (or being able to) read the whole thing every turn. Open WebUI splits your documents into chunks, embeds them as vectors, stores them in a vector DB, and at chat time retrieves just the relevant bits and passes those to the model. + +Two ways to use it, in order of simplicity: + +1. **One-off attachments.** Drag a file into any chat input and ask questions. The file is chunked and embedded just for that chat. +2. **Knowledge bases.** For documents you want to reuse across many chats (company handbook, codebase, research library, user manual), go to **Workspace → Knowledge** and create a knowledge base. You can then attach the *entire* knowledge base to a chat (via the `#` shortcut in the input), or bind it to a model in **Workspace → Models** so that model always has it available. + +The defaults are reasonable for getting started. When you outgrow them, there are three knobs that matter most: + +- **Embedding engine.** The default (SentenceTransformers `all-MiniLM-L6-v2`) runs locally on CPU and consumes ~500 MB of RAM per worker. For any multi-user deployment, point at an external embeddings API (OpenAI, or Ollama with `nomic-embed-text`) via `RAG_EMBEDDING_ENGINE`. +- **Content extraction engine.** The default uses `pypdf`, which leaks memory during heavy ingestion. For anything beyond casual use, switch to **Tika** or **Docling** via `CONTENT_EXTRACTION_ENGINE`. +- **Vector database.** The default ChromaDB (local SQLite-backed) does not tolerate multi-worker deployments. At scale, use **Milvus**, **Qdrant**, or **PGVector**. + +None of these matter for "a single user with a handful of PDFs." All of them start mattering the moment you have 100 documents or 10 concurrent users. + +More detail: +- [RAG overview](/features/chat-conversations/rag/) +- [Knowledge workspace](/features/workspace/knowledge) +- [Performance tuning for RAG](/troubleshooting/performance#embedding-engine) + +--- + +## ⚡ Open Terminal: giving the model a real computer + +If "run Python" is too restrictive and you want the model to actually work on your machine — clone repos, install packages, run test suites, spin up a local preview of a website, iterate on a data report against a real CSV — that's [**Open Terminal**](/features/open-terminal). It connects a real shell (sandboxed in a Docker container by default, or bare-metal if you want) as a tool the model can call the same way it calls any other tool. In-chat file browser, live web previews, and skill definitions are included. + +This is the biggest "aha" feature once you get past basic chat. It turns Open WebUI from a chat UI into a place where the model actually builds things for you. If Native Mode is on and you've given the model a capable terminal, ask it to build you a small app or run an analysis on a folder of files and watch it go. + +More detail: +- [Open Terminal — give your AI a real computer](/features/open-terminal) +- [Use cases — software development, data reports, app builder, research assistant, …](/features/open-terminal/use-cases/advanced-workflows/) + +--- + +## What to do next + +You don't need all of the above on day one. A reasonable order for a new install: + +1. **Day one:** pick a good default model, have a few conversations, get a feel for the UI. +2. **First thing after that:** set a Task Model and decide which background chores you actually want enabled. This is the single biggest "feels better" change you can make, and it directly addresses hidden per-chat costs. +3. **Within the first week:** turn on Native Mode globally and install one or two Tools that match your work. +4. **When you hit it:** install a context filter the first time you see "prompt is too long." +5. **When you need it:** set up a Knowledge base the first time you want to ask questions across multiple documents. +6. **When you're ready to go big:** point the model at Open Terminal and let it actually build things for you. +7. **When you scale up:** revisit the RAG infrastructure section if you go beyond a single user. + +Everything else — enterprise SSO, multi-replica HA, Redis scaling, observability — is in [Advanced Topics](/getting-started/advanced-topics) and [Troubleshooting](/troubleshooting) when and if you need it. + +--- + +## Any questions? Think something's missing? Got stuck? + +This page is the condensed version — the real docs go much deeper. If you didn't find what you needed, try these, roughly in order: + +- 🔎 **[Search the docs](https://docs.openwebui.com/)** — use the search box at the top of any page. A lot more is in here than the Essentials overview covers. +- 💬 **[Ask on GitHub Discussions](https://github.com/open-webui/open-webui/discussions)** — best for open-ended questions, feature discussions, and "how would I do X?" threads. Searchable and visible to future users who hit the same thing. +- 🎮 **[Ask on Discord](https://discord.gg/5rJgQTnV4s)** — most active community. Try the `#questions` channel; there's also an experimental bot there with full docs + issue context that can answer most questions in a few seconds. +- 👽 **[Ask on Reddit](https://www.reddit.com/r/OpenWebUI/)** — good for broader discussion, deployment stories, and community showcases. +- 🐛 **[Report a bug](https://github.com/open-webui/open-webui/issues)** — only after you've confirmed it's a bug (reproducible, latest version, template filled in). "It doesn't work" issues get closed; "here's the exact repro, here are the logs" issues get fixed. + +Welcome aboard. 👋 diff --git a/docs/getting-started/index.md b/docs/getting-started/index.md index 9449018a1e..4190a8a93a 100644 --- a/docs/getting-started/index.md +++ b/docs/getting-started/index.md @@ -30,6 +30,24 @@ Everything you need for a working setup. Choose Docker for the fastest path, Pyt --- +## ✨ Essentials for New Users + +**Installed and chatting — now what?** + +Five short sections that cover the things every new user eventually wishes they'd known on day one: what plugins are and how to install them, why long chats eventually error out (and how to fix it with a filter), the "invisible" Task Model that powers titles/autocomplete, getting started with RAG over your own documents, and turning on Native tool calling. + +| | | +| :--- | :--- | +| 🧩 **Plugins** | Tools, Pipes, Filters, Actions — the extensibility story | +| 🧠 **Context management** | Why long chats hit a wall and how to handle it | +| 🤖 **Task models** | Keep titles, tags, and autocomplete off your main model | +| 📚 **Basic RAG** | Chatting with your own documents | +| 🔧 **Tool calling** | Native mode + first Tools to install | + +[**Read the essentials →**](/getting-started/essentials) + +--- + ## 🤖 Connect an Agent **Go beyond simple model providers. Connect an autonomous AI agent.** diff --git a/docs/getting-started/quick-start/connect-a-provider/starting-with-functions.mdx b/docs/getting-started/quick-start/connect-a-provider/starting-with-functions.mdx index cee3ad6b38..04b924c77e 100644 --- a/docs/getting-started/quick-start/connect-a-provider/starting-with-functions.mdx +++ b/docs/getting-started/quick-start/connect-a-provider/starting-with-functions.mdx @@ -7,7 +7,7 @@ title: "Functions" **Pipe Functions** are Python plugins that appear as selectable models in your chat sidebar. Behind the scenes, they can do anything Python can do: integrate a proprietary AI provider, control your smart home with natural language, query a database, run a search engine, build a multi-step agent, generate charts, automate workflows, or serve as a calculator. No LLM is even required. If you can write the logic in Python, it becomes a "model" users can chat with. -This guide walks you through importing and enabling your first Pipe Function, using the [Anthropic Pipe](https://openwebui.com/f/justinrahb/anthropic) as an example. +This guide walks you through importing and enabling your first Pipe Function, using the [Anthropic Pipe](https://openwebui.com/posts/60984ebf-cf4e-4822-98b8-0492463c2852) as an example. :::info Already have an OpenAI-compatible provider? If your provider supports the **OpenAI Chat Completions API protocol** (OpenAI, Google Gemini, Mistral, Groq, DeepSeek, and many more), you don't need a Function. Just [add a connection](/getting-started/quick-start/connect-a-provider/starting-with-openai-compatible). Anthropic also has native support via their OpenAI-compatible endpoint; see the [Anthropic guide](/getting-started/quick-start/connect-a-provider/starting-with-anthropic). Functions are for everything else: proprietary APIs, custom agents, or entirely novel interfaces. diff --git a/docs/getting-started/quick-start/index.mdx b/docs/getting-started/quick-start/index.mdx index fe0b94acba..79799e6c5c 100644 --- a/docs/getting-started/quick-start/index.mdx +++ b/docs/getting-started/quick-start/index.mdx @@ -191,6 +191,10 @@ Want more than a model? AI agents can execute terminal commands, read and write Learn more about how agents differ from providers in the [**Connect an Agent overview →**](/getting-started/quick-start/connect-an-agent) +### New to Open WebUI? + +If this is your first time with Open WebUI, read the [**Essentials for New Users**](/getting-started/essentials) guide next. It covers the five things every new user eventually needs to know — plugins, context management, task models, RAG, and tool calling — in one place. + ### Explore Features Once connected, explore what Open WebUI can do: [Features Overview →](/features) diff --git a/docs/intro.mdx b/docs/intro.mdx index 8609a27213..701568e8d0 100644 --- a/docs/intro.mdx +++ b/docs/intro.mdx @@ -67,12 +67,17 @@ The desktop app is a **work in progress** and is not yet stable. For production +:::tip Running? Read this next. +Installed Open WebUI but don't know where to start? The [**Essentials for New Users**](/getting-started/essentials) guide walks through the five things every new user eventually needs to know — plugins, context management, task models, RAG, tool calling, and Open Terminal — in one page. +::: + --- ## Getting Started - [**Quick Start**](/getting-started/quick-start) — Docker, Python, Kubernetes install options - [**Connect a Provider**](/getting-started/quick-start/connect-a-provider) — Ollama, OpenAI, Anthropic, vLLM, and more +- [**Essentials for New Users**](/getting-started/essentials) — Start here after your first install. Plugins, context management, task models, RAG, tool calling, Open Terminal. - [**Connect an Agent**](/getting-started/quick-start/connect-an-agent) — Hermes Agent, OpenClaw, and other autonomous AI agents - [**Updating**](/getting-started/updating) — Keep your instance current - [**Development Branch**](/getting-started/quick-start) — Help test the latest changes before stable release diff --git a/docs/reference/env-configuration.mdx b/docs/reference/env-configuration.mdx index b321ba2278..6c95fd04f5 100644 --- a/docs/reference/env-configuration.mdx +++ b/docs/reference/env-configuration.mdx @@ -3642,7 +3642,7 @@ This Tenant ID (also known as Directory ID) is required for the work/school inte - Type: `bool` - Default: `True` -- Description: Only used in Default Function Calling mode - If True: An LLM generates optimized, distilled search queries from the conversation context. If False: The user's last message is used verbatim as the web search query +- Description: Only applies to Default Function Calling mode, which is legacy and no longer supported. If True: an LLM generates optimized, distilled search queries from the conversation context. If False: the user's last message is used verbatim as the web search query. Native Mode (the supported mode) uses the model's own `search_web` tool call and does not consult this setting. - Persistence: This environment variable is a `PersistentConfig` variable. #### `WEB_SEARCH_TRUST_ENV` diff --git a/docs/troubleshooting/context-window.mdx b/docs/troubleshooting/context-window.mdx new file mode 100644 index 0000000000..291cf64968 --- /dev/null +++ b/docs/troubleshooting/context-window.mdx @@ -0,0 +1,330 @@ +--- +sidebar_position: 17 +title: "Context Window / Prompt Too Long" +--- + +# "The prompt is too long" / Model Context Length Exceeded + +## What you're seeing + +Errors like: + +- `The prompt is too long: 207601, model maximum context length: 202751` +- `This model's maximum context length is 128000 tokens. However, your messages resulted in …` +- `Input is too long for the model` +- `context length exceeded` + +These come from the **model provider** (OpenAI, Anthropic, Google, your Ollama server, GLM-4/5.x, etc.), **not from Open WebUI**. The provider counted the tokens of everything you sent and rejected the request because it exceeds the model's context window. + +## Why it happens + +The "prompt" a model sees is the **entire conversation** — not just the message you just typed. Every time you send a new message, Open WebUI forwards: + +- Your system prompt +- The **full chat history** (every previous user/assistant turn in that conversation) +- Any **attached files** that are inlined into context (not retrieved via RAG) +- Any **tool definitions** and prior **tool call results** +- Any **inlet-injected context** (from filters, RAG, web search, memories, etc.) +- Your newest user message + +As a chat grows, the history grows. Large attachments or long tool-call outputs can eat the entire window in a single turn. Once the sum of all of that exceeds the model's context window, the provider rejects the request. + +## Why Open WebUI doesn't auto-truncate for you + +Open WebUI intentionally does **not** ship a built-in context trimmer. This is a design choice, not an oversight, and it is unlikely to change. Here's why: + +1. **Every model uses a different tokenizer.** The token count for the same text differs between OpenAI (tiktoken), Anthropic, Gemini, GLM, Llama-family, Mistral, Qwen, and so on. A truly correct trimmer would need a per-model tokenizer for every provider in existence. Getting that wrong ships silent data corruption. +2. **Every model has a different context window.** 8k, 32k, 128k, 200k, 1M — and that's before you factor in reserved output tokens, provider-side overhead, and multimodal content. +3. **Everyone wants a different truncation policy.** We have seen users ask for all of the following, and all of them are reasonable: + - Trim by **token count**. + - Trim by **number of messages**. + - Trim by **number of conversational turns**. + - Trim only **non-system, non-assistant** messages. + - Trim **file attachments** first, keep the dialogue. + - Trim **tool-call results** first, keep everything else. + - Set a **hard ceiling** on chat length (block further messages beyond N turns). + - Summarize older messages instead of dropping them, and replace the dropped block with the summary. + - Per-model policies (keep 1M tokens for Gemini, 128k for GPT-4, 32k for smaller local models). + +There is no single policy that is correct for every deployment, every user, and every model. A built-in implementation would be wrong for most users by definition, and would hide the much better option: give the user the hook and let them pick. + +## The supported way: use a filter Function + +Context management in Open WebUI is done with [**filter Functions**](/features/extensibility/plugin/functions/filter). `inlet()` runs on every request before the payload is sent to the model — it receives the full `body` (including `body["messages"]`) and can modify it freely. That is the hook you use. + +Typical approaches, in increasing order of sophistication: + +1. **Hard chat-length cap.** Refuse or error if `len(body["messages"]) > N`. Simple and predictable; no tokenization needed. +2. **Newest-N-turns window.** Keep the system prompt and only the most recent N user/assistant turns; drop the older ones. +3. **Token-budget window, per model.** Estimate tokens per message (e.g., with `tiktoken` for OpenAI-family or a char/4 heuristic for others) and trim from the oldest non-system message until the total fits the model's window. +4. **Summarize-and-replace.** When the window is about to overflow, call a cheap model to summarize the oldest block of messages, then replace that block with a single assistant-authored summary message. Preserves long-running context without busting the window. +5. **Attachment- or tool-output-first trimming.** Strip large file contents or tool results from old turns before touching the dialogue. + +Community filters for most of these already exist on the [Open WebUI community site](https://openwebui.com/). Install one, configure its valves, and you're done. If none fits your policy exactly, copy the closest one into the **Functions** admin page and edit it — filters are pure Python and easy to tweak. + +### Minimal example: "newest N turns" filter + +
+Show the full filter code (keeps the last N non-system messages) + +```python +from pydantic import BaseModel, Field + + +class Filter: + class Valves(BaseModel): + priority: int = Field( + default=0, + description="Run before other filters that depend on the final message list.", + ) + max_turns: int = Field( + default=20, + description="Maximum number of non-system messages to keep (older are dropped).", + ) + + def __init__(self): + self.valves = self.Valves() + + async def inlet(self, body: dict) -> dict: + messages = body.get("messages", []) + if not messages: + return body + + system_msgs = [m for m in messages if m.get("role") == "system"] + other_msgs = [m for m in messages if m.get("role") != "system"] + + if len(other_msgs) > self.valves.max_turns: + other_msgs = other_msgs[-self.valves.max_turns :] + + # Tool-call repair: after slicing, the new leading messages + # might be orphaned tool-call results or an assistant whose + # tool_calls reference tool messages that got dropped. + # Providers (OpenAI / Anthropic / …) 400 on those — so prune + # until the window starts on something the provider accepts. + while other_msgs and other_msgs[0].get("role") == "tool": + other_msgs.pop(0) + + if ( + other_msgs + and other_msgs[0].get("role") == "assistant" + and other_msgs[0].get("tool_calls") + ): + expected = {tc.get("id") for tc in other_msgs[0]["tool_calls"]} + seen = { + m.get("tool_call_id") + for m in other_msgs[1:] + if m.get("role") == "tool" + } + if not expected.issubset(seen): + other_msgs.pop(0) + + body["messages"] = system_msgs + other_msgs + return body +``` + +
+ +Enable this filter globally or attach it to specific models in **Admin Panel → Functions**. The `max_turns` valve is configurable per-model via the model card, so you can set a smaller window for local 8k models and a larger one for Gemini 1M. + +:::info Why the tool-call repair block? +With tool calling on, an `assistant` message that invokes tools is paired with one or more `tool` messages carrying results that share the same `tool_call_id`. If `max_turns` happens to slice the conversation in the middle of that pair — keeping the orphan half — the upstream provider returns a 400 because the tool call / result structure is invalid. The repair block drops the orphans so the window always starts on a clean boundary. This matches what production community filters for context management do; the rest of the filter is the generic trimming logic. +::: + +### Slightly more involved: per-model token budget + +Counting turns is easy to reason about but wrong in practice — 40 turns of one-liners fit in 8k tokens, five turns with a 200-page PDF attachment do not. The more useful policy is "keep everything until we're about to bust the model's context window, then drop the oldest non-system messages until we fit." + +This second example does that. It: + +- Estimates tokens from character length (cheap heuristic, no dependencies; swap in `tiktoken` or a real tokenizer if you want strict counts). +- Reads per-model budgets from a valve, so a single instance of the filter works for your 8k local model and your 1M Gemini at the same time. +- Leaves a configurable headroom for the response. +- Re-applies the tool-call repair from the first example after trimming. + +
+Show the full filter code (per-model token-budget trimmer) + +```python +import json +from pydantic import BaseModel, Field + + +class Filter: + class Valves(BaseModel): + priority: int = Field( + default=0, + description="Run before other filters that depend on the final message list.", + ) + default_budget_tokens: int = Field( + default=8000, + description="Fallback input-token budget for any model not listed in model_budgets.", + ) + response_headroom_tokens: int = Field( + default=2000, + description="Tokens to reserve for the model's reply. Trimmed from the budget before fitting.", + ) + model_budgets_json: str = Field( + default=( + '{\n' + ' "gpt-4o": 120000,\n' + ' "gpt-4o-mini": 120000,\n' + ' "claude-3-5-sonnet": 180000,\n' + ' "gemini-1.5-pro": 900000,\n' + ' "llama3.1:8b": 6000\n' + '}' + ), + description="JSON mapping of model id (or prefix) to input-token budget.", + ) + + def __init__(self): + self.valves = self.Valves() + + # ---- helpers ----------------------------------------------------------- + + @staticmethod + def _estimate_tokens(content) -> int: + """~4 chars per token is close enough for a trim budget. + For strict counts, replace with tiktoken or a provider tokenizer.""" + if content is None: + return 0 + if isinstance(content, str): + return max(1, len(content) // 4) + # Some providers deliver multimodal content as a list of parts. + if isinstance(content, list): + return sum( + Filter._estimate_tokens(part.get("text", "")) if isinstance(part, dict) else 0 + for part in content + ) + return 0 + + def _message_tokens(self, msg: dict) -> int: + # Content + a small per-message overhead for role/formatting. + tokens = self._estimate_tokens(msg.get("content")) + # Tool calls carry arguments in JSON; count them too. + for tc in msg.get("tool_calls") or []: + args = tc.get("function", {}).get("arguments", "") + tokens += self._estimate_tokens(args) + return tokens + 4 + + def _budget_for(self, model_id: str) -> int: + try: + budgets = json.loads(self.valves.model_budgets_json or "{}") + except Exception: + budgets = {} + if model_id in budgets: + return int(budgets[model_id]) + # Allow prefix match — "gpt-4o-2024-11-20" uses the "gpt-4o" budget. + # Sort by key length descending so more specific prefixes win: + # "gpt-4o-mini" must match before "gpt-4o". + for key, value in sorted(budgets.items(), key=lambda kv: -len(kv[0])): + if model_id.startswith(key): + return int(value) + return self.valves.default_budget_tokens + + @staticmethod + def _repair_tool_calls(other_msgs: list[dict]) -> list[dict]: + while other_msgs and other_msgs[0].get("role") == "tool": + other_msgs.pop(0) + if ( + other_msgs + and other_msgs[0].get("role") == "assistant" + and other_msgs[0].get("tool_calls") + ): + expected = {tc.get("id") for tc in other_msgs[0]["tool_calls"]} + seen = { + m.get("tool_call_id") + for m in other_msgs[1:] + if m.get("role") == "tool" + } + if not expected.issubset(seen): + other_msgs.pop(0) + return other_msgs + + # ---- inlet ------------------------------------------------------------- + + async def inlet(self, body: dict) -> dict: + messages = body.get("messages", []) + if not messages: + return body + + model_id = body.get("model", "") or "" + budget = self._budget_for(model_id) - self.valves.response_headroom_tokens + if budget <= 0: + return body # Misconfigured — don't mangle the request, let the provider reject. + + system_msgs = [m for m in messages if m.get("role") == "system"] + other_msgs = [m for m in messages if m.get("role") != "system"] + + used = sum(self._message_tokens(m) for m in system_msgs + other_msgs) + + # Drop oldest non-system messages one at a time until we're under budget + # or nothing is left to drop. System messages stay put; if they alone + # already exceed the budget, the provider will reject the request and + # that's the right signal (the admin needs to shrink the system prompt). + while used > budget and other_msgs: + dropped = other_msgs.pop(0) + used -= self._message_tokens(dropped) + + other_msgs = self._repair_tool_calls(other_msgs) + + body["messages"] = system_msgs + other_msgs + return body +``` + +
+ +A few things worth noticing: + +- **Configure once, run everywhere.** Set this filter as a **global** filter in Admin Panel → Functions. The `model_budgets_json` valve lets you enumerate every model you care about; anything else falls back to `default_budget_tokens`. Admins can tune budgets at runtime without touching code. +- **Prefix match on model id, longest-first.** `gpt-4o-2024-11-20` transparently uses the `gpt-4o` budget, and `gpt-4o-mini-2024-07-18` correctly uses the `gpt-4o-mini` budget (more specific wins). The `_budget_for` helper sorts keys by length descending before the prefix loop — otherwise dict insertion order would decide, and `"gpt-4o"` would shadow `"gpt-4o-mini"` for anyone who listed it first. +- **Multimodal content is partially counted.** The estimator walks list-of-parts content and sums text parts. **Image / audio / file parts count as zero.** For a char/4 heuristic that's fine for a trim budget, but if you rely heavily on image inputs with small providers (e.g. a local 8k vision model), add a per-image allowance inside `_estimate_tokens` (something like 255 tokens per image is a reasonable start). +- **Same tool-call repair.** Reused from the first example. This is the block that keeps the request valid after trimming. +- **Fail-open when misconfigured.** If you somehow set the headroom larger than the budget, the filter passes the request through untouched rather than wiping the conversation. The provider's error is better than a silent delete. + +:::warning Check your model ids +Open WebUI doesn't always present the raw provider id to `body["model"]`. If an admin sets a connection `prefix_id`, every model is wrapped as `{prefix}.{raw_id}` (e.g. `openai.gpt-4o`). Pipe-function manifolds wrap their sub-models as `{pipe.id}.{sub_id}` (e.g. `anthropic.claude-3-5-sonnet-20241022`). Custom Workspace models can have arbitrary ids, often UUIDs. + +**Copy the exact id shown in the model picker** into `model_budgets_json` — not the upstream provider's id. If you get the format wrong, requests silently land on `default_budget_tokens` and you won't notice until a chat that fits a real budget fails to fit the fallback. +::: + +:::warning RAG and native-tool definitions are added AFTER this filter +This filter runs in `inlet()`, which is before Open WebUI's RAG retrieval (`chat_completion_files_handler`) and before native-tool definitions are attached to the payload. Both can add non-trivial bytes to the request **after** the filter has trimmed. If you rely on Knowledge bases or if your models have heavy built-in tool specs (web search + memory + code interpreter + MCP servers + …), **reserve extra headroom** by bumping `response_headroom_tokens` — it doubles as a general "leave room for post-filter additions" budget. +::: + +If you need higher-fidelity token counting, swap `_estimate_tokens` for `tiktoken.encoding_for_model(model_id).encode(text)` (OpenAI-family) or your provider's own tokenizer. For everything else — Anthropic, Gemini, local models — the char/4 heuristic is close enough to keep you safely under the limit, *as long as you've left enough headroom for the RAG / tool additions above*. + +## You almost certainly want a community filter, not this one + +The two examples on this page are deliberately minimal — they exist to show the shape of the `inlet()` hook and to teach the one non-obvious detail (tool-call repair). For a real deployment, **don't write your own from scratch** and don't ship these as-is. Go browse the [Open WebUI Community](https://openwebui.com/) and pick a context-management filter someone else has already battle-tested. + +Production-grade community filters typically handle things the minimal examples above skip: + +- **Real tokenizers per provider** — `tiktoken` for OpenAI, Anthropic's tokenizer for Claude, Gemini's for Google, `transformers` tokenizers for local models. Not char/4 heuristics. +- **Proper image / audio / file token accounting** — provider-specific allowances for every content-part type, not "zero." +- **Summarize-and-replace strategies** — when the window is about to overflow, call a cheap model to summarize the oldest block and replace it with one summary message, preserving long-running context instead of silently forgetting. +- **Per-user / per-role policies** — power users get larger budgets than free users; service accounts get different defaults than humans. +- **Per-model-family policies** — more intelligent than a prefix match (e.g. recognize all Claude 3.x Sonnet variants via a regex or metadata). +- **Tool-result-first or attachment-first trimming** — drop the giant scraped web pages and RAG citations from old turns before touching dialogue. +- **Sliding-window summarization with checkpoints** — keep running summaries stored in `__metadata__` across turns so you don't re-summarize on every request. +- **Hard message caps and user-facing errors** — refuse a request with a friendly "this chat is too long, please start a new one" event-emitter message instead of silently dropping context. +- **Observability hooks** — log every trim decision to Langfuse, OpenLit, or your stack of choice so you can audit what the filter actually did. +- **Configurable valves for everything** — admins tune everything at runtime without touching code. + +None of that is hard to do, but all of it together is a week of work if you're starting from one of the minimal examples above. Someone on the community site has almost certainly already done it. **Search first.** + +:::tip Really, search first +When you're shopping for a context-management filter, look for names like *context window*, *trim*, *summarize*, *conversation length*, *token budget*, *history limiter*, and the provider name of the models you use. Sort by popularity on the community site — the top-downloaded filters tend to be the ones that already solved the edge cases you haven't hit yet. +::: + +## What users will experience + +- With a filter in place, old turns are silently removed / summarized / replaced before the request reaches the model. The user keeps chatting as normal. The model simply "forgets" older history according to your policy. +- Without a filter, long conversations will eventually hit the provider's context limit and return the "prompt is too long" error. Users will need to start a new chat. + +Both are valid UX choices. Pick the one that matches your deployment. + +## Related + +- [Filter Functions](/features/extensibility/plugin/functions/filter) — the full reference for `inlet()` / `stream()` / `outlet()` +- [Open WebUI Community](https://openwebui.com/) — browse and install community-built filters, including context-management ones +- [Chat Parameters](/features/chat-conversations/chat-features/chat-params) — per-chat, per-user, and per-model parameter precedence diff --git a/docs/troubleshooting/index.mdx b/docs/troubleshooting/index.mdx index 1fc258c70e..470c3f6bd7 100644 --- a/docs/troubleshooting/index.mdx +++ b/docs/troubleshooting/index.mdx @@ -30,6 +30,7 @@ Use this page to find the right guide for your issue. If you're unsure where to | Image not generating, ComfyUI workflow errors | [Image Generation](./image-generation) | | Web search returns empty content or proxy errors | [Web Search](./web-search) | | Slow performance, high RAM, OOM crashes | [Performance & RAM](./performance) | +| "The prompt is too long" / context length exceeded | [Context Window / Prompt Too Long](./context-window) | | Forgot admin password | [Reset Admin Password](./password-reset) | | `no such table` or `table already exists` on startup | [Database Migration](./manual-database-migration) | @@ -39,6 +40,7 @@ Use this page to find the right guide for your issue. If you're unsure where to | :--- | :--- | | [Connection Errors](./connection-error) | HTTPS, CORS, WebSocket, reverse proxy, Ollama, SSL, Podman, MCP | | [Performance & RAM](./performance) | Speed tuning, database optimization, scaling infrastructure, resource efficiency | +| [Context Window / Prompt Too Long](./context-window) | Why "prompt is too long" errors happen and how to manage chat history with filters | | [RAG](./rag) | Document ingestion, retrieval quality, embeddings, upload limits, worker crashes | | [SSO & OAuth](./sso) | OIDC, Microsoft, Google, Authentik, cookie & session issues | | [Audio](./audio) | Speech-to-Text, Text-to-Speech, microphone access, ElevenLabs | diff --git a/docs/troubleshooting/performance.md b/docs/troubleshooting/performance.md index 74dc0b6838..62d6663dc6 100644 --- a/docs/troubleshooting/performance.md +++ b/docs/troubleshooting/performance.md @@ -426,7 +426,7 @@ If resource usage is critical, disable automated features that constantly trigge 1. **Database**: **PostgreSQL** (Mandatory). 2. **Content Extraction**: **Tika** or **Docling** (Mandatory — default pypdf leaks memory). See [Content Extraction Engine](#content-extraction-engine). 3. **Embeddings**: **External** — `RAG_EMBEDDING_ENGINE=openai` or `ollama` (Mandatory — default SentenceTransformers consumes too much RAM at scale). See [Embedding Engine](#embedding-engine). -4. **Tool Calling**: **Native Mode** (strongly recommended — Default Mode is legacy and breaks KV cache). See [Tool Calling Modes](/features/extensibility/plugin/tools#tool-calling-modes-default-vs-native). +4. **Tool Calling**: **Native Mode** (mandatory — Default Mode is legacy, no longer supported, and breaks KV cache). All models should be configured for Native Mode. See [Tool Calling Modes](/features/extensibility/plugin/tools#tool-calling-modes-default-vs-native). 5. **Workers**: `THREAD_POOL_SIZE=2000` (Prevent timeouts). 6. **Streaming**: `CHAT_RESPONSE_STREAM_DELTA_CHUNK_SIZE=7` (Reduce CPU/Net/DB writes). 7. **Chat Saving**: `ENABLE_REALTIME_CHAT_SAVE=False`. @@ -459,7 +459,7 @@ These are real-world mistakes that cause organizations to massively over-provisi | **Running SentenceTransformers at scale** | Each worker loads ~500MB embedding model → RAM usage explodes → you add more machines | Use external embeddings (`RAG_EMBEDDING_ENGINE=openai` or `ollama`) | | **Redis Cluster when single Redis suffices** | Too many replicas → too many connections → Redis can't handle them → you deploy Redis Cluster to compensate | Fix the root cause (fewer replicas, `timeout 1800`, `maxclients 10000`) | | **Scaling replicas to mask memory leaks** | Leaky processes → OOM kills → auto-scaler adds more pods → more Redis connections → Redis overwhelmed | Fix the leaks first (content extraction, embedding engine), then right-size | -| **Using Default (prompt-based) tool calling** | Injected prompts may break KV cache → higher latency → more resources needed per request | Switch to Native Mode for all capable models | +| **Using Default (prompt-based) tool calling** | Legacy / no longer supported; injected prompts break KV cache → higher latency → more resources needed per request; cannot access built-in system tools | Switch every model to Native Mode | | **Not configuring Redis stale connection timeout** | Connections accumulate forever → Redis OOM → you deploy Redis Cluster | Add `timeout 1800` to redis.conf | | **Using base64-encoded icons in Actions/Filters** | Icon data is embedded in `/api/models` responses sent to the frontend on every page load for every model. A 500 KB base64 icon on 3 actions across 20 models = **30 MB of payload bloat** per request → slow frontend loads, high bandwidth usage, unnecessary backend memory pressure | Host icons as static files and reference them by URL in `icon_url` / `self.icon`. See [Action Function icon_url warning](/features/extensibility/plugin/functions/action#example---specifying-action-frontmatter) | diff --git a/docs/troubleshooting/rag.mdx b/docs/troubleshooting/rag.mdx index fa3f23036c..baf94edadf 100644 --- a/docs/troubleshooting/rag.mdx +++ b/docs/troubleshooting/rag.mdx @@ -406,7 +406,7 @@ If you have **Native Function Calling enabled**, the model needs both the **abil | **Default Mode** | Open WebUI auto-injects RAG results from the **attached KB(s) only** | No automatic RAG — user must manually add a knowledge base to the chat via `#` | | **Native Function Calling** | Model receives tools scoped to **attached KB(s) only** — must actively call them | Model receives tools with access to **all accessible KBs** (if Builtin Tools enabled) — must actively call them | -Key takeaway: in default mode, attaching a KB enables automatic RAG scoped to those KBs. In native mode, the model must use its tools regardless — attaching a KB only restricts *which* KBs are searchable. +Key takeaway: in Native Mode (the supported mode), the model must use its knowledge tools regardless — attaching a KB only restricts *which* KBs are searchable. Default Mode's auto-injection behavior is documented here for legacy deployments only; Default Mode is no longer supported and all models should be on Native Mode. :::tip Preventing Knowledge Base Access in Native Mode If you want to prevent a model from accessing **any** knowledge base in native mode, you don't need to disable Builtin Tools entirely. Instead, disable only the **Knowledge Base** category in **Workspace > Models > Edit > Builtin Tools**. This removes all knowledge-related tools while keeping other builtin tools (web search, memory, notes, etc.) active. See [Granular Builtin Tool Categories](/features/extensibility/plugin/tools#granular-builtin-tool-categories-per-model) for the full list of categories. diff --git a/docs/tutorials/integrations/libre-translate.md b/docs/tutorials/integrations/libre-translate.md index 25c471a8c6..c1e8ba98cb 100644 --- a/docs/tutorials/integrations/libre-translate.md +++ b/docs/tutorials/integrations/libre-translate.md @@ -78,9 +78,9 @@ This will start the LibreTranslate service in detached mode. Once you have LibreTranslate up and running in Docker, you can configure the integration within Open WebUI. There are several community integrations available, including: -- [LibreTranslate Filter Function](https://openwebui.com/f/iamg30/libretranslate_filter) -- [LibreTranslate Action Function](https://openwebui.com/f/jthesse/libretranslate_action) -- [MultiLanguage LibreTranslate Action Function](https://openwebui.com/f/iamg30/multilanguage_libretranslate_action) +- [LibreTranslate Filter Function](https://openwebui.com/posts/4993ae7e-bd2a-41dc-9e88-9941854495cc) +- [LibreTranslate Action Function](https://openwebui.com/posts/103a14c1-174a-4445-bb9b-d48640e43b07) +- [MultiLanguage LibreTranslate Action Function](https://openwebui.com/posts/f250971e-8163-4a0b-a30c-45fdfb2ba4f8) - [LibreTranslate Filter Pipeline](https://github.com/open-webui/pipelines/blob/main/examples/filters/libretranslate_filter_pipeline.py) Choose the integration that best suits your needs and follow the instructions to configure it within Open WebUI.