Skip to content

feat(agents): surface KB source citations in RAG responses#10228

Merged
mudler merged 4 commits into
mudler:masterfrom
shihyunhuang:feat/-source-citation-in-RAG
Jun 9, 2026
Merged

feat(agents): surface KB source citations in RAG responses#10228
mudler merged 4 commits into
mudler:masterfrom
shihyunhuang:feat/-source-citation-in-RAG

Conversation

@petechentw

Copy link
Copy Markdown
Contributor

Summary

Closes #9331

When an agent answers a question using the Knowledge Base (RAG), the response now appends a `Sources:` block listing the original documents the answer was drawn from — with clickable links back to the raw files.

This addresses all three points raised in the issue:

  • The original file name (e.g. `How to use localai.pdf`) is included as a citation
  • Citations appear at the end of the answer under a `Sources:` section
  • Each file name is a clickable link that opens the original document

Changes

  • `core/services/agents/citations.go` (new) — thread-safe citation collector, `AppendKBCitations` to build a markdown Sources block, raw-file URL builder, and markdown link escaping
  • `core/services/agents/knowledge.go` — `KBAutoSearchPrompt` now returns a `KBSearchContext` (prompt + structured citations) instead of a plain string; adds `KBCitation`, `KBSearchContext`, and `KBCitationCollector` types; fixes URL encoding with `url.PathEscape` and `url.Values`
  • `core/services/agents/executor.go` — wires citation collection through the full chat flow; appends citations to the final response while saving the citation-free version to memory so citations don't pollute long-term KB storage
  • `core/services/agentpool/agent_pool.go` — integrates citation appending into the local chat path; falls back to a fresh auto-search if no citations were collected from tool calls

Example output

```
The quarterly revenue increased by 12% year-over-year...

Sources:
[1] Q3-report.pdf
[2] How to use localai.pdf
```

Notes

  • Duplicate citations are deduplicated per source document (many chunks from the same file appear only once)
  • The clean response without citations is saved to long-term memory to avoid polluting future KB searches
  • Collection names and entry keys with special characters are properly URL-encoded
  • Citations are collected from both auto-search results and `search_memory` tool calls during multi-step reasoning

shihyunhuang and others added 4 commits June 9, 2026 01:51
Signed-off-by: Pete Chen <petechentw@gmail.com>
Render structured KB citations as a Sources block after agent responses, linking each source to the existing raw collection entry endpoint.

Keep long-term memory writes on the original model response so citation blocks do not get stored back into the knowledge base.

Tested with: go test ./core/services/agents

Assisted-by: Codex:gpt-5
Signed-off-by: Pete Chen <petechentw@gmail.com>
Signed-off-by: Pete Chen <petechentw@gmail.com>
Apply the shared KB citation post-processing to standalone LocalAGI chat responses so the React agent chat receives the same clickable Sources block as the native executor path. Also fix the run target to use the current cmd/local-ai entrypoint.

Assisted-by: Codex:gpt-5
Signed-off-by: Pete Chen <petechentw@gmail.com>
@petechentw petechentw force-pushed the feat/-source-citation-in-RAG branch from cf34a8f to a621190 Compare June 9, 2026 08:52

@mudler mudler left a comment

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice! thanks @petechentw !

@mudler mudler merged commit d2e6b93 into mudler:master Jun 9, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support source citations / file links in Agent RAG responses

5 participants