feat(openai): inline sources in content (opt-in) for clients ignoring extra#345
feat(openai): inline sources in content (opt-in) for clients ignoring extra#345etiquet wants to merge 1 commit into
extra#345Conversation
… `extra`
Open WebUI 0.9.2 (and 0.8.x) silently drops the non-standard `extra`
field that OpenRAG returns at the chat-completion top-level — verified
by inspecting `backend/open_webui/utils/middleware.py` (the
`non_streaming_chat_response_handler` reads `choices`, `output`, and
`usage` only) and the Svelte/TS frontend (no `response.extra` ever
referenced for message rendering). Same behavior in LibreChat,
Continue, Cursor, etc. — the OpenAI contract has no notion of citations.
This means our deployed RAG aliases on Mirai-OWUI (PR mirai-open-webui#118)
display the LLM answer correctly but render no sources at all, even
though OpenRAG retrieves them and embeds them in `extra.sources`. The
end-user has the conclusion without the audit trail — a hard regression
for legal/compliance use cases.
Approach: opt-in flag `INLINE_SOURCES_IN_CONTENT` that appends a
markdown source block to the assistant `content` after stripping the
LLM's `[Sources: ...]` tag. The `extra` field is unchanged so clients
that already consume it (the Mirai pipe_openrag, MyRAG bridge,
programmatic agents) keep their structured access.
Default is `false` — no breaking change for existing deployments.
Implementation
--------------
- `components/utils.py`:
- `inline_sources_enabled()` — env-var read, lazy.
- `format_sources_as_markdown(sources)` — render a deduplicated,
score-filtered, ranked markdown source list. Dedup on file_url
(OpenRAG returns several chunks of the same file with different
scores; we show each file once with its best chunk). Ranks by
relevance_score desc and caps at INLINE_SOURCES_TOP_K (default 5).
- Three knobs: `INLINE_SOURCES_IN_CONTENT`, `INLINE_SOURCES_TOP_K`,
`INLINE_SOURCES_MIN_SCORE`.
- `routers/openai.py` — three call sites, each appends the markdown
block to the cleaned content right after `extract_and_strip_sources_block`:
- non-streaming `/v1/chat/completions`
- non-streaming `/v1/completions`
- streaming via `stream_with_source_filtering` (in `components/utils.py`),
which now emits one extra delta chunk carrying the markdown block
before the finish-reason chunk. Sent before finish so clients that
buffer until finish (most do) see it as part of the content.
- Tests:
- `TestFormatSourcesAsMarkdown`: dedup, ranking, min_score, top_k,
enabled/disabled, empty input.
- `TestStreamWithInlineSources`: inline block appears when enabled,
is omitted when disabled, `extra.sources` still reaches the finish
chunk regardless.
- `deploy/.env.example.vm`: documents the three knobs and recommends
enabling on the Mirai deployment (OWUI 0.9.2 ignores `extra`).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
📝 WalkthroughWalkthroughThe pull request introduces an opt-in feature that appends inline markdown-formatted sources to OpenAI-compatible API responses. Configuration parameters control enablement and source filtering via environment variables, with supporting utility functions and integration into both streaming and non-streaming response paths. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
openrag/components/test_source_filtering.py (1)
247-342: Test coverage LGTM for the document path; consider adding a web-source fixture.The new tests faithfully cover dedup, ranking, top-k, min-score, and streaming on/off behavior using document-shaped sources. Once the web-source dedup gap in
format_sources_as_markdownis addressed, please add a fixture with{"source_type": "web", "url": "https://...", "title": "...", "snippet": "..."}toTestFormatSourcesAsMarkdown.SOURCESto lock the behavior in.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@openrag/components/test_source_filtering.py` around lines 247 - 342, The tests cover document-style sources but miss web-source behavior; update format_sources_as_markdown to handle web sources by treating items with source_type == "web" using the "url" field for deduplication/sorting (use url like file_url), include title and snippet in the rendered markdown, and ensure min-score, top-k, and dedup logic (currently keyed on file_url) also apply to web entries; then add a web-source fixture to TestFormatSourcesAsMarkdown.SOURCES (e.g. {"source_type":"web","url":"https://...","title":"Web A","snippet":"...","relevance_score":0.85}) so tests lock the expected behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@openrag/components/utils.py`:
- Around line 285-301: The label escaping only handles pipes and therefore
titles containing '[' or ']' break markdown links; update where label is
produced/used (the _label function output and the line setting label) to escape
'[' and ']' (and keep existing '|' escape) before embedding into the link—i.e.,
replace "[" and "]" with escaped versions (and preserve the current pipe escape)
so the link text like in the f"{i}. [{label}]({url}){score_suffix}" cannot
prematurely terminate the markdown link.
- Around line 269-302: The code is dropping web sources because the dedup key
and URL resolution only check file-related keys; update the dedup key expression
(where key = s.get("file_url") or s.get("filename") or s.get("source") or "") to
also consider s.get("url") and s.get("title")/s.get("source_type") as fallbacks
so web entries (source_type == "web") aren't skipped, and update the URL
resolution (where url = s.get("file_url") or s.get("chunk_url") or "") to
include s.get("url") and s.get("chunk_url")/s.get("file_url") fallbacks for web
entries; ensure _label(s) still uses
s.get("title")/s.get("filename")/s.get("file_id") and add a unit test fixture
with a source_type: "web" entry in the TestFormatSourcesAsMarkdown tests to lock
in the behavior.
---
Nitpick comments:
In `@openrag/components/test_source_filtering.py`:
- Around line 247-342: The tests cover document-style sources but miss
web-source behavior; update format_sources_as_markdown to handle web sources by
treating items with source_type == "web" using the "url" field for
deduplication/sorting (use url like file_url), include title and snippet in the
rendered markdown, and ensure min-score, top-k, and dedup logic (currently keyed
on file_url) also apply to web entries; then add a web-source fixture to
TestFormatSourcesAsMarkdown.SOURCES (e.g.
{"source_type":"web","url":"https://...","title":"Web
A","snippet":"...","relevance_score":0.85}) so tests lock the expected behavior.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 50b48869-1f77-45f4-850d-309f36f76a10
📒 Files selected for processing (4)
.env.exampleopenrag/components/test_source_filtering.pyopenrag/components/utils.pyopenrag/routers/openai.py
There was a problem hiding this comment.
Two bugs fixed (see comments below): web sources were silently dropped because the dedup key and URL resolution only checked file_url/filename; and [/] in titles weren't escaped, breaking markdown links.
format_sources_as_markdown logic is otherwise clean. Dedup on best-score chunk per URL is correct. The streaming extra-delta is emitted before finish_reason so clients that buffer until finish see it as part of the content.
Minor: page 1 is excluded from the page annotation (str(page) not in {"0", "1"}). Likely intentional to avoid noise on single-page docs, but could use a comment.
|
Fixed both CodeRabbit issues: Web sources dropped (
Diff: # dedup key
key = s.get("file_url") or s.get("url") or s.get("filename") or s.get("source") or ""
# link URL
url = s.get("file_url") or s.get("url") or s.get("chunk_url") or ""
# label escaping
label = _label(s).replace("\\", "\\\\").replace("[", "\\[").replace("]", "\\]").replace("|", "\\|") |
|
Reimplimented in #347 for editing rights |
Companion PR for #344.
TL;DR
OpenRAG returns its source list in a non-standard
extrafield at the top of/v1/chat/completionsresponses. Vanilla OpenAI-compat clients (Open WebUI 0.9.2, LibreChat, Continue, Cursor, curl, …) silently drop it — the user sees the answer without sources. This PR adds an opt-in flagINLINE_SOURCES_IN_CONTENTthat also writes the source list into the assistant messagecontentas a markdown block, so any OpenAI-compat client renders it natively. The structuredextrafield is unchanged for clients that already consume it.Default =
false→ no breaking change. See #344 for full context.Changes
components/utils.pyinline_sources_enabled()(env-var read) +format_sources_as_markdown(sources)helper (dedup onfile_url, ranked byrelevance_score, capped byINLINE_SOURCES_TOP_K, filtered byINLINE_SOURCES_MIN_SCORE). Streaming path injects one extra delta chunk beforefinish_reason.routers/openai.pyextract_and_strip_sources_block/filter_sources_by_citations.components/test_source_filtering.py.env.exampleINLINE_SOURCES_IN_CONTENT,INLINE_SOURCES_TOP_K,INLINE_SOURCES_MIN_SCORE).Knobs
Sample output (with flag on)
The block is dedup'd by
file_url(OpenRAG can return multiple chunks of the same file with different scores ; we show each file once with its best chunk), ranked by relevance score desc, and capped atINLINE_SOURCES_TOP_K.Validation
Tested in production on a deployment serving the French Ministry of the Interior :
/v1/chat/completions: block appears inchoices[0].message.content,extra.sourcesunchanged./v1/chat/completions: extra delta chunk emitted between content stream andfinish_reason; OWUI 0.9.2 buffers it correctly and renders the full message with the source block./v1/completions: block appears inchoices[0].text.pipe_openrag.py(which emits OWUI nativecitationevents) keeps working in parallel for operators who want richer UX — both can coexist.pytest components/test_source_filtering.py).Why opt-in (not always-on)
extra(the existing pipe, MyRAG bridge, programmatic agents).Alternatives considered & rejected
extra(PR upstream)open-webui/open-webui), multi-month timeline, doesn't help LibreChat/Continue/curl.choices[0].message.annotations(OpenAI 2024+)A trailing markdown block is the minimum delta that works in every OpenAI-compat client today, with zero client-side change.
Test plan
pytest components/test_source_filtering.py— 5 new + existing tests greenOPENAI_API_BASE_URLS) : sources visible in chatextra.sourcesalways present in response (rétrocompat for pipe_openrag.py and MyRAG)🤖 Generated with Claude Code
Summary by CodeRabbit
Release Notes
New Features
Tests
Chores