feat: add RAG toolkit for knowledge base queries #1003

MkDev11 · 2026-01-21T14:32:32Z

Closes #410

Description

This PR adds RAG (Retrieval-Augmented Generation) capability to eigent using CAMEL's built-in vector retrieval infrastructure.

What it does:

Adds a new RAGToolkit that lets agents store and query knowledge bases
Uses QdrantStorage for local vector storage (no external database needed)
OpenAI embeddings for semantic search
Each task gets its own isolated collection

Tools provided:

add_document - Add text content to a knowledge base with optional metadata
query_knowledge_base - Search for relevant information using semantic similarity
list_knowledge_bases - Show available knowledge bases

How to use:
When creating an agent, select rag_toolkit from the tools list. The agent can then store and retrieve information from its knowledge base during task execution.

Live Test Results

Storage: /tmp/rag_test_99hdzva0

1. Adding documents...
   Successfully added document (ID: 0b0539081e9e) to knowledge base 'task_test-123'
   Successfully added document (ID: 93fb3e5da69d) to knowledge base 'task_test-123'

2. Querying...
   Result:
[Result 1] (relevance: N/A)
No suitable information retrieved from Eigent is an AI automation platform using multi-agent systems. with similarity_threshold = 0.7.

3. List knowledge bases...
   Available knowledge bases:
- task_test-123

✅ RAG Toolkit works correctly!

Dependencies Added

qdrant-client - Local vector database
unstructured - Document parsing for CAMEL's VectorRetriever

What is the purpose of this pull request?

Bug fix
New Feature
Documentation update
Other

Closes eigent-ai#410 - Add RAGToolkit with document ingestion and retrieval capabilities - Use CAMEL's VectorRetriever with QdrantStorage for local vector storage - Provide add_document, query_knowledge_base, list_knowledge_bases tools - Register RAG toolkit in agent.py toolkits dictionary - Add 15 unit tests covering toolkit functionality

…ient, unstructured)

MkDev11 · 2026-01-22T14:48:58Z

@Wendong-Fan @4pmtong please review the PR and let me know your feedbacks. thanks.

Wendong-Fan · 2026-01-23T01:01:25Z

thanks @MkDev11 for the contribution! @jino and @a7m-1st would help reviewing this

MkDev11 · 2026-01-23T01:03:32Z

cool! I am open to your idea :)

JINO-ROHIT · 2026-01-23T06:09:31Z

hello @MkDev11 thanks for the PR! i think we already have some overlapping features in the main camel repo here -

as a toolkit - https://github.com/camel-ai/camel/blob/master/camel/toolkits/retrieval_toolkit.py
a dedicated retriever section - https://github.com/camel-ai/camel/tree/master/camel/retrievers

perhaps, if theres a specific functionality missing in those modules, would you be open to raising a PR there?

thanks!

Based on feedback from JINO-ROHIT, refactored RAGToolkit to use CAMEL's infrastructure via composition. Features: - Uses CAMEL's RetrievalToolkit for file/URL retrieval - Uses CAMEL's VectorRetriever for raw text support - Task-based collection isolation - Eigent AbstractToolkit integration Tools provided: - add_document: Add raw text to knowledge base - query_knowledge_base: Query added documents - information_retrieval: Query files/URLs (CAMEL's method) - list_knowledge_bases: List available KBs

MkDev11 · 2026-01-23T06:55:25Z

hello @MkDev11 thanks for the PR! i think we already have some overlapping features in the main camel repo here -

as a toolkit - https://github.com/camel-ai/camel/blob/master/camel/toolkits/retrieval_toolkit.py

a dedicated retriever section - https://github.com/camel-ai/camel/tree/master/camel/retrievers

perhaps, if theres a specific functionality missing in those modules, would you be open to raising a PR there?

thanks!

Thanks for the feedback @JINO-ROHIT! You're right - I've refactored the PR to use CAMEL's existing infrastructure instead of duplicating functionality.

Changes Made

The RAGToolkit now uses composition to wrap CAMEL's components:

RetrievalToolkit - for file/URL retrieval (information_retrieval method)
VectorRetriever - for raw text document support (add_document + query_knowledge_base)

Eigent-Specific Additions

These are the features that are specific to eigent and wouldn't belong in the main CAMEL repo:

Task-based collection isolation - Each task gets its own isolated knowledge base (task_{api_task_id})
AbstractToolkit integration - Compatibility with eigent's agent toolkit system
Convenience wrappers - add_document() and query_knowledge_base() for simpler raw text workflows

Why Not Contribute Upstream?

The task isolation and AbstractToolkit integration are eigent-specific concerns for multi-tenant agent orchestration. They depend on eigent's internal architecture and wouldn't be applicable to the general CAMEL framework.

Let me know if you have any other feedback!

a7m-1st · 2026-01-23T22:55:40Z

forwarding:

as for the RAG though 🤔 We discussed with @Wendong-Fan earlier and I think he will leave comments
We actually want to:

Include it as an optional toolkit (same as Google Calendar, Notion etc.. toolkits)

I think Camel AI already has a similar toolkit, we need to integrate in Camel AI first then extend in Eigent if required

But more decisively I believe Wendong will leave comments 👍

backend/app/utils/toolkit/rag_toolkit.py

Wendong-Fan · 2026-01-24T01:35:45Z

Thanks @MkDev11 and @a7m-1st ! I want to share some architectural feedback.

The Problem

Looking at the code, the RAGToolkit currently mixes generic RAG functionality with eigent-specific concerns:

  # eigent-specific: hardcoded task isolation pattern
  self._task_storage_path = self.storage_path / f"task_{api_task_id}"
  collection_name=f"task_{self.api_task_id}_raw"

The toolkit itself shouldn't know about eigent's task isolation strategy. This makes the code:

Not reusable upstream in CAMEL
Tightly coupled to eigent's multi-tenant architecture

Suggested Architecture

Contribute to CAMEL: If add_document() and query_knowledge_base() are useful abstractions, consider adding them to CAMEL's
RetrievalToolkit. The toolkit should accept collection_name and storage_path as constructor parameters - keeping it generic.
Eigent orchestration layer: In eigent's get_toolkits() (in agent.py), we configure the toolkit with task-specific values:

  # In eigent's agent.py
  toolkit = RAGToolkit(
      collection_name=f"task_{api_task_id}",
      storage_path=f"~/.eigent/rag/{api_task_id}",
  )

This way:

The toolkit remains generic and portable
Task isolation is an eigent concern handled at the orchestration level
The same toolkit code can live upstream in CAMEL

Would you be open to refactoring along these lines? Happy to discuss further!

…on layer Per Wendong-Fan's architectural feedback: - RAGToolkit now accepts collection_name and storage_path as params - Removed hardcoded task_* patterns from toolkit - Task isolation handled in get_toolkits() in agent.py - Toolkit is now generic and portable for upstream contribution

MkDev11 · 2026-01-24T15:29:08Z

@Wendong-Fan @a7m-1st Just refactored the PR! please review again and let me know the result

Wendong-Fan · 2026-01-25T00:38:16Z

@Wendong-Fan @a7m-1st Just refactored the PR! please review again and let me know the result

thanks @MkDev11 ! could @a7m-1st and @bytecraftii help reviewing this?

backend/app/utils/agent.py

backend/app/utils/toolkit/rag_toolkit.py

Changes: - Add DEFAULT_RAG_STORAGE_PATH and DEFAULT_COLLECTION_NAME constants - Add TODO comments for embedding model flexibility - Add get_task_collection_name() helper function in agent.py - Simplify query results format (numbered list, no scores) - Remove list_knowledge_bases from exposed tools (not useful with task isolation) - Update tests to expect 3 tools instead of 4

- Add RAW_TEXT_SUBDIR constant for path cleaner - Fix docstring format with types (str), (int), etc. - Change logger.debug to logger.warning for missing API key - Add validation to raise ValueError if collection_name is None - Update tests to pass collection_name and add validation test

Keep RAGToolkit task-specific isolation handling

MkDev11 · 2026-01-26T22:15:30Z

@Wendong-Fan @a7m-1st any update for me?

a7m-1st · 2026-01-27T13:53:13Z

I could test it after #999

…o new agent module structure

MkDev11 added 3 commits January 21, 2026 15:29

fix: use correct CAMEL API parameter (extra_info instead of metadata)

16d6732

fix: handle score formatting and add required dependencies (qdrant-cl…

ba43d82

…ient, unstructured)

Merge branch 'main' into feature/rag-toolkit

e74c146

Wendong-Fan requested a review from a7m-1st January 23, 2026 00:59

MkDev11 and others added 2 commits January 23, 2026 01:29

Merge branch 'main' into feature/rag-toolkit

830097c

a7m-1st reviewed Jan 23, 2026

View reviewed changes

backend/app/utils/toolkit/rag_toolkit.py Outdated Show resolved Hide resolved

MkDev11 requested a review from a7m-1st January 24, 2026 15:27

Wendong-Fan requested review from Zephyroam and bytecii and removed request for Zephyroam January 25, 2026 00:37

MkDev11 and others added 3 commits January 25, 2026 01:45

Merge upstream/main into feature/rag-toolkit

0e07de0

Merge branch 'main' into feature/rag-toolkit

098b61e

Merge branch 'main' into feature/rag-toolkit

36fb5ac

bytecii reviewed Jan 25, 2026

View reviewed changes

MkDev11 and others added 3 commits January 25, 2026 08:30

Merge branch 'main' into feature/rag-toolkit

0aa9641

MkDev11 requested a review from bytecii January 26, 2026 12:48

fix: resolve merge conflict with main in agent.py

a724d9b

Keep RAGToolkit task-specific isolation handling

Wendong-Fan mentioned this pull request Jan 28, 2026

[Feature Request] Add RAG & Graph RAG capability #410

Open

MkDev11 and others added 2 commits January 28, 2026 22:15

Merge branch 'main' into feature/rag-toolkit

5e4afd0

fix: resolve merge conflict with upstream/main, migrate RAG toolkit t…

020e7ad

…o new agent module structure

feat: add RAG toolkit for knowledge base queries #1003

Are you sure you want to change the base?

feat: add RAG toolkit for knowledge base queries #1003

Uh oh!

Conversation

MkDev11 commented Jan 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Live Test Results

Dependencies Added

What is the purpose of this pull request?

Uh oh!

MkDev11 commented Jan 22, 2026

Uh oh!

Wendong-Fan commented Jan 23, 2026

Uh oh!

MkDev11 commented Jan 23, 2026

Uh oh!

JINO-ROHIT commented Jan 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

MkDev11 commented Jan 23, 2026

Changes Made

Eigent-Specific Additions

Why Not Contribute Upstream?

Uh oh!

a7m-1st commented Jan 23, 2026

Uh oh!

Uh oh!

Wendong-Fan commented Jan 24, 2026

Uh oh!

MkDev11 commented Jan 24, 2026

Uh oh!

Wendong-Fan commented Jan 25, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

MkDev11 commented Jan 26, 2026

Uh oh!

a7m-1st commented Jan 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

MkDev11 commented Jan 21, 2026 •

edited

Loading

JINO-ROHIT commented Jan 23, 2026 •

edited

Loading