Skip to content

feat(openai): inline sources in content (opt-in) for clients ignoring extra#345

Closed
etiquet wants to merge 1 commit into
linagora:devfrom
IA-Generative:feat/inline-sources-in-content-upstream
Closed

feat(openai): inline sources in content (opt-in) for clients ignoring extra#345
etiquet wants to merge 1 commit into
linagora:devfrom
IA-Generative:feat/inline-sources-in-content-upstream

Conversation

@etiquet
Copy link
Copy Markdown
Contributor

@etiquet etiquet commented Apr 26, 2026

Companion PR for #344.

TL;DR

OpenRAG returns its source list in a non-standard extra field at the top of /v1/chat/completions responses. Vanilla OpenAI-compat clients (Open WebUI 0.9.2, LibreChat, Continue, Cursor, curl, …) silently drop it — the user sees the answer without sources. This PR adds an opt-in flag INLINE_SOURCES_IN_CONTENT that also writes the source list into the assistant message content as a markdown block, so any OpenAI-compat client renders it natively. The structured extra field is unchanged for clients that already consume it.

Default = false → no breaking change. See #344 for full context.

Changes

File What
components/utils.py inline_sources_enabled() (env-var read) + format_sources_as_markdown(sources) helper (dedup on file_url, ranked by relevance_score, capped by INLINE_SOURCES_TOP_K, filtered by INLINE_SOURCES_MIN_SCORE). Streaming path injects one extra delta chunk before finish_reason.
routers/openai.py 3 call sites — append the markdown block right after extract_and_strip_sources_block/filter_sources_by_citations.
components/test_source_filtering.py 5 new tests : on/off behavior, dedup, ranking, top_k, min_score, streaming with the extra delta chunk, no-inline when disabled.
.env.example Documents the 3 knobs (INLINE_SOURCES_IN_CONTENT, INLINE_SOURCES_TOP_K, INLINE_SOURCES_MIN_SCORE).

Knobs

INLINE_SOURCES_IN_CONTENT=true     # default false — opt-in
INLINE_SOURCES_TOP_K=5              # default 5 — max sources rendered
INLINE_SOURCES_MIN_SCORE=0.65       # default -inf — filter weak chunks

Sample output (with flag on)

…answer from the LLM…

---
**Sources :**

1. [Article L423-1 — Conditions de séjour](https://api.openrag.example.com/static/foo.md) — score 0.71
2. [Décret 2024-… — Procédure CESEDA](https://api.openrag.example.com/static/bar.pdf) — score 0.69
…

The block is dedup'd by file_url (OpenRAG can return multiple chunks of the same file with different scores ; we show each file once with its best chunk), ranked by relevance score desc, and capped at INLINE_SOURCES_TOP_K.

Validation

Tested in production on a deployment serving the French Ministry of the Interior :

  • Non-streaming /v1/chat/completions : block appears in choices[0].message.content, extra.sources unchanged.
  • Streaming /v1/chat/completions : extra delta chunk emitted between content stream and finish_reason ; OWUI 0.9.2 buffers it correctly and renders the full message with the source block.
  • Non-streaming /v1/completions : block appears in choices[0].text.
  • With flag off (default) : exactly identical to current behavior, no regression.
  • Open WebUI 0.9.2 user-side : sources visible as clickable markdown links inside the answer card. No client modification needed. The ecosystem-specific pipe_openrag.py (which emits OWUI native citation events) keeps working in parallel for operators who want richer UX — both can coexist.
  • 5 new unit tests pass (pytest components/test_source_filtering.py).

Why opt-in (not always-on)

  • Zero breaking change for current deployments using clients that consume extra (the existing pipe, MyRAG bridge, programmatic agents).
  • The decision belongs to the operator : some workflows want structured citations only (clean for downstream parsing), others want the visible-content fallback for vanilla clients.
  • Easy to A/B compare.

Alternatives considered & rejected

Option Rejected because
Push OWUI to read extra (PR upstream) No traction (0 related PR on open-webui/open-webui), multi-month timeline, doesn't help LibreChat/Continue/curl.
Use choices[0].message.annotations (OpenAI 2024+) OWUI doesn't render those either today.
Force every operator to install a Pipe Function Non-trivial UX, doesn't scale to other OpenAI-compat clients.
Custom SSE event types Format undocumented + version-dependent across OWUI versions.

A trailing markdown block is the minimum delta that works in every OpenAI-compat client today, with zero client-side change.

Test plan

  • pytest components/test_source_filtering.py — 5 new + existing tests green
  • Production VM (Open WebUI 0.9.2 talking to OpenRAG via vanilla OPENAI_API_BASE_URLS) : sources visible in chat
  • With flag off : output byte-identical to current behavior
  • extra.sources always present in response (rétrocompat for pipe_openrag.py and MyRAG)

🤖 Generated with Claude Code

Summary by CodeRabbit

Release Notes

  • New Features

    • Optional inline markdown sources now appear directly in assistant responses
    • Sources are ranked, filtered, and deduplicated for improved relevance
    • Configurable behavior to control source inclusion and filtering thresholds
  • Tests

    • Added unit tests for source formatting and filtering functionality
  • Chores

    • Extended environment configuration documentation

… `extra`

Open WebUI 0.9.2 (and 0.8.x) silently drops the non-standard `extra`
field that OpenRAG returns at the chat-completion top-level — verified
by inspecting `backend/open_webui/utils/middleware.py` (the
`non_streaming_chat_response_handler` reads `choices`, `output`, and
`usage` only) and the Svelte/TS frontend (no `response.extra` ever
referenced for message rendering). Same behavior in LibreChat,
Continue, Cursor, etc. — the OpenAI contract has no notion of citations.

This means our deployed RAG aliases on Mirai-OWUI (PR mirai-open-webui#118)
display the LLM answer correctly but render no sources at all, even
though OpenRAG retrieves them and embeds them in `extra.sources`. The
end-user has the conclusion without the audit trail — a hard regression
for legal/compliance use cases.

Approach: opt-in flag `INLINE_SOURCES_IN_CONTENT` that appends a
markdown source block to the assistant `content` after stripping the
LLM's `[Sources: ...]` tag. The `extra` field is unchanged so clients
that already consume it (the Mirai pipe_openrag, MyRAG bridge,
programmatic agents) keep their structured access.

Default is `false` — no breaking change for existing deployments.

Implementation
--------------
- `components/utils.py`:
  - `inline_sources_enabled()` — env-var read, lazy.
  - `format_sources_as_markdown(sources)` — render a deduplicated,
    score-filtered, ranked markdown source list. Dedup on file_url
    (OpenRAG returns several chunks of the same file with different
    scores; we show each file once with its best chunk). Ranks by
    relevance_score desc and caps at INLINE_SOURCES_TOP_K (default 5).
  - Three knobs: `INLINE_SOURCES_IN_CONTENT`, `INLINE_SOURCES_TOP_K`,
    `INLINE_SOURCES_MIN_SCORE`.

- `routers/openai.py` — three call sites, each appends the markdown
  block to the cleaned content right after `extract_and_strip_sources_block`:
  - non-streaming `/v1/chat/completions`
  - non-streaming `/v1/completions`
  - streaming via `stream_with_source_filtering` (in `components/utils.py`),
    which now emits one extra delta chunk carrying the markdown block
    before the finish-reason chunk. Sent before finish so clients that
    buffer until finish (most do) see it as part of the content.

- Tests:
  - `TestFormatSourcesAsMarkdown`: dedup, ranking, min_score, top_k,
    enabled/disabled, empty input.
  - `TestStreamWithInlineSources`: inline block appears when enabled,
    is omitted when disabled, `extra.sources` still reaches the finish
    chunk regardless.

- `deploy/.env.example.vm`: documents the three knobs and recommends
  enabling on the Mirai deployment (OWUI 0.9.2 ignores `extra`).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 26, 2026

📝 Walkthrough

Walkthrough

The pull request introduces an opt-in feature that appends inline markdown-formatted sources to OpenAI-compatible API responses. Configuration parameters control enablement and source filtering via environment variables, with supporting utility functions and integration into both streaming and non-streaming response paths.

Changes

Cohort / File(s) Summary
Configuration
.env.example
Added optional environment variables for inline markdown sources: INLINE_SOURCES_IN_CONTENT (enable/disable), INLINE_SOURCES_TOP_K (limit source count), INLINE_SOURCES_MIN_SCORE (filter threshold).
Core Utilities
openrag/components/utils.py
Added helper functions: inline_sources_enabled(), _inline_sources_max_items(), _inline_sources_min_score(), and format_sources_as_markdown() to deduplicate, filter, and render sources as markdown blocks while preserving existing structured extra.sources field.
API Integration
openrag/routers/openai.py
Modified /chat/completions and /completions endpoints to conditionally append formatted markdown sources to response content when feature is enabled, after stripping original embedded source tags.
Test Coverage
openrag/components/test_source_filtering.py
Added unit tests validating markdown output behavior (omitted by default, injected when enabled), source deduplication by file_url with highest relevance score, filtering by minimum score threshold, truncation via top-K limit, and preservation of structured sources payload in streamed finish chunks.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 Hop, hop! Sources now align,
In markdown, fresh and so refined,
Deduplicated, ranked with care,
Top-K treasures floating there,
Content blooms more bright and whole! ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 26.32% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately and concisely describes the main change: adding an opt-in feature to inline markdown sources directly into OpenAI-compatible response content for clients that ignore the extra field.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
openrag/components/test_source_filtering.py (1)

247-342: Test coverage LGTM for the document path; consider adding a web-source fixture.

The new tests faithfully cover dedup, ranking, top-k, min-score, and streaming on/off behavior using document-shaped sources. Once the web-source dedup gap in format_sources_as_markdown is addressed, please add a fixture with {"source_type": "web", "url": "https://...", "title": "...", "snippet": "..."} to TestFormatSourcesAsMarkdown.SOURCES to lock the behavior in.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@openrag/components/test_source_filtering.py` around lines 247 - 342, The
tests cover document-style sources but miss web-source behavior; update
format_sources_as_markdown to handle web sources by treating items with
source_type == "web" using the "url" field for deduplication/sorting (use url
like file_url), include title and snippet in the rendered markdown, and ensure
min-score, top-k, and dedup logic (currently keyed on file_url) also apply to
web entries; then add a web-source fixture to
TestFormatSourcesAsMarkdown.SOURCES (e.g.
{"source_type":"web","url":"https://...","title":"Web
A","snippet":"...","relevance_score":0.85}) so tests lock the expected behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@openrag/components/utils.py`:
- Around line 285-301: The label escaping only handles pipes and therefore
titles containing '[' or ']' break markdown links; update where label is
produced/used (the _label function output and the line setting label) to escape
'[' and ']' (and keep existing '|' escape) before embedding into the link—i.e.,
replace "[" and "]" with escaped versions (and preserve the current pipe escape)
so the link text like in the f"{i}. [{label}]({url}){score_suffix}" cannot
prematurely terminate the markdown link.
- Around line 269-302: The code is dropping web sources because the dedup key
and URL resolution only check file-related keys; update the dedup key expression
(where key = s.get("file_url") or s.get("filename") or s.get("source") or "") to
also consider s.get("url") and s.get("title")/s.get("source_type") as fallbacks
so web entries (source_type == "web") aren't skipped, and update the URL
resolution (where url = s.get("file_url") or s.get("chunk_url") or "") to
include s.get("url") and s.get("chunk_url")/s.get("file_url") fallbacks for web
entries; ensure _label(s) still uses
s.get("title")/s.get("filename")/s.get("file_id") and add a unit test fixture
with a source_type: "web" entry in the TestFormatSourcesAsMarkdown tests to lock
in the behavior.

---

Nitpick comments:
In `@openrag/components/test_source_filtering.py`:
- Around line 247-342: The tests cover document-style sources but miss
web-source behavior; update format_sources_as_markdown to handle web sources by
treating items with source_type == "web" using the "url" field for
deduplication/sorting (use url like file_url), include title and snippet in the
rendered markdown, and ensure min-score, top-k, and dedup logic (currently keyed
on file_url) also apply to web entries; then add a web-source fixture to
TestFormatSourcesAsMarkdown.SOURCES (e.g.
{"source_type":"web","url":"https://...","title":"Web
A","snippet":"...","relevance_score":0.85}) so tests lock the expected behavior.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 50b48869-1f77-45f4-850d-309f36f76a10

📥 Commits

Reviewing files that changed from the base of the PR and between 7fc1a57 and d401671.

📒 Files selected for processing (4)
  • .env.example
  • openrag/components/test_source_filtering.py
  • openrag/components/utils.py
  • openrag/routers/openai.py

Comment thread openrag/components/utils.py
Comment thread openrag/components/utils.py
Copy link
Copy Markdown
Collaborator

@EnjoyBacon7 EnjoyBacon7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two bugs fixed (see comments below): web sources were silently dropped because the dedup key and URL resolution only checked file_url/filename; and [/] in titles weren't escaped, breaking markdown links.

format_sources_as_markdown logic is otherwise clean. Dedup on best-score chunk per URL is correct. The streaming extra-delta is emitted before finish_reason so clients that buffer until finish see it as part of the content.

Minor: page 1 is excluded from the page annotation (str(page) not in {"0", "1"}). Likely intentional to avoid noise on single-page docs, but could use a comment.

@EnjoyBacon7
Copy link
Copy Markdown
Collaborator

Fixed both CodeRabbit issues:

Web sources dropped (format_sources_as_markdown): web entries carry url not file_url/filename, so the dedup key resolved to "" and they were skipped. Added url as a fallback in both the key and the link URL resolution. Also added a test fixture with source_type: "web" to lock this in.

[/] not escaped in label: only | was escaped; a title like Report [draft] would break the markdown link. Fixed to also escape [ and ].

Diff:

# dedup key
key = s.get("file_url") or s.get("url") or s.get("filename") or s.get("source") or ""

# link URL
url = s.get("file_url") or s.get("url") or s.get("chunk_url") or ""

# label escaping
label = _label(s).replace("\\", "\\\\").replace("[", "\\[").replace("]", "\\]").replace("|", "\\|")

@EnjoyBacon7
Copy link
Copy Markdown
Collaborator

Reimplimented in #347 for editing rights

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants