Skip to content

Conversation

@MkDev11
Copy link
Contributor

@MkDev11 MkDev11 commented Jan 21, 2026

Closes #410

Description

This PR adds RAG (Retrieval-Augmented Generation) capability to eigent using CAMEL's built-in vector retrieval infrastructure.

What it does:

  • Adds a new RAGToolkit that lets agents store and query knowledge bases
  • Uses QdrantStorage for local vector storage (no external database needed)
  • OpenAI embeddings for semantic search
  • Each task gets its own isolated collection

Tools provided:

  • add_document - Add text content to a knowledge base with optional metadata
  • query_knowledge_base - Search for relevant information using semantic similarity
  • list_knowledge_bases - Show available knowledge bases

How to use:
When creating an agent, select rag_toolkit from the tools list. The agent can then store and retrieve information from its knowledge base during task execution.

Live Test Results

Storage: /tmp/rag_test_99hdzva0

1. Adding documents...
   Successfully added document (ID: 0b0539081e9e) to knowledge base 'task_test-123'
   Successfully added document (ID: 93fb3e5da69d) to knowledge base 'task_test-123'

2. Querying...
   Result:
[Result 1] (relevance: N/A)
No suitable information retrieved from Eigent is an AI automation platform using multi-agent systems. with similarity_threshold = 0.7.

3. List knowledge bases...
   Available knowledge bases:
- task_test-123

✅ RAG Toolkit works correctly!

Dependencies Added

  • qdrant-client - Local vector database
  • unstructured - Document parsing for CAMEL's VectorRetriever

What is the purpose of this pull request?

  • Bug fix
  • New Feature
  • Documentation update
  • Other

Closes eigent-ai#410

- Add RAGToolkit with document ingestion and retrieval capabilities
- Use CAMEL's VectorRetriever with QdrantStorage for local vector storage
- Provide add_document, query_knowledge_base, list_knowledge_bases tools
- Register RAG toolkit in agent.py toolkits dictionary
- Add 15 unit tests covering toolkit functionality
@MkDev11
Copy link
Contributor Author

MkDev11 commented Jan 22, 2026

@Wendong-Fan @4pmtong please review the PR and let me know your feedbacks. thanks.

@Wendong-Fan Wendong-Fan requested a review from a7m-1st January 23, 2026 00:59
@Wendong-Fan
Copy link
Contributor

thanks @MkDev11 for the contribution! @jino and @a7m-1st would help reviewing this

@MkDev11
Copy link
Contributor Author

MkDev11 commented Jan 23, 2026

cool! I am open to your idea :)

@JINO-ROHIT
Copy link
Collaborator

JINO-ROHIT commented Jan 23, 2026

hello @MkDev11 thanks for the PR! i think we already have some overlapping features in the main camel repo here -

  1. as a toolkit - https://github.com/camel-ai/camel/blob/master/camel/toolkits/retrieval_toolkit.py
  2. a dedicated retriever section - https://github.com/camel-ai/camel/tree/master/camel/retrievers

perhaps, if theres a specific functionality missing in those modules, would you be open to raising a PR there?

thanks!

MkDev11 and others added 2 commits January 23, 2026 01:29
Based on feedback from JINO-ROHIT, refactored RAGToolkit to use
CAMEL's infrastructure via composition.

Features:
- Uses CAMEL's RetrievalToolkit for file/URL retrieval
- Uses CAMEL's VectorRetriever for raw text support
- Task-based collection isolation
- Eigent AbstractToolkit integration

Tools provided:
- add_document: Add raw text to knowledge base
- query_knowledge_base: Query added documents
- information_retrieval: Query files/URLs (CAMEL's method)
- list_knowledge_bases: List available KBs
@MkDev11
Copy link
Contributor Author

MkDev11 commented Jan 23, 2026

hello @MkDev11 thanks for the PR! i think we already have some overlapping features in the main camel repo here -

  1. as a toolkit - https://github.com/camel-ai/camel/blob/master/camel/toolkits/retrieval_toolkit.py
  2. a dedicated retriever section - https://github.com/camel-ai/camel/tree/master/camel/retrievers

perhaps, if theres a specific functionality missing in those modules, would you be open to raising a PR there?

thanks!

Thanks for the feedback @JINO-ROHIT! You're right - I've refactored the PR to use CAMEL's existing infrastructure instead of duplicating functionality.

Changes Made

The RAGToolkit now uses composition to wrap CAMEL's components:

  • RetrievalToolkit - for file/URL retrieval (information_retrieval method)
  • VectorRetriever - for raw text document support (add_document + query_knowledge_base)

Eigent-Specific Additions

These are the features that are specific to eigent and wouldn't belong in the main CAMEL repo:

  1. Task-based collection isolation - Each task gets its own isolated knowledge base (task_{api_task_id})
  2. AbstractToolkit integration - Compatibility with eigent's agent toolkit system
  3. Convenience wrappers - add_document() and query_knowledge_base() for simpler raw text workflows

Why Not Contribute Upstream?

The task isolation and AbstractToolkit integration are eigent-specific concerns for multi-tenant agent orchestration. They depend on eigent's internal architecture and wouldn't be applicable to the general CAMEL framework.

Let me know if you have any other feedback!

@a7m-1st
Copy link
Collaborator

a7m-1st commented Jan 23, 2026

forwarding:

as for the RAG though 🤔 We discussed with @Wendong-Fan earlier and I think he will leave comments
We actually want to:

  • Include it as an optional toolkit (same as Google Calendar, Notion etc.. toolkits)
  • I think Camel AI already has a similar toolkit, we need to integrate in Camel AI first then extend in Eigent if required

But more decisively I believe Wendong will leave comments 👍

@Wendong-Fan
Copy link
Contributor

Thanks @MkDev11 and @a7m-1st ! I want to share some architectural feedback.

The Problem

Looking at the code, the RAGToolkit currently mixes generic RAG functionality with eigent-specific concerns:

  # eigent-specific: hardcoded task isolation pattern
  self._task_storage_path = self.storage_path / f"task_{api_task_id}"
  collection_name=f"task_{self.api_task_id}_raw"

The toolkit itself shouldn't know about eigent's task isolation strategy. This makes the code:

  1. Not reusable upstream in CAMEL
  2. Tightly coupled to eigent's multi-tenant architecture

Suggested Architecture

  1. Contribute to CAMEL: If add_document() and query_knowledge_base() are useful abstractions, consider adding them to CAMEL's
    RetrievalToolkit. The toolkit should accept collection_name and storage_path as constructor parameters - keeping it generic.
  2. Eigent orchestration layer: In eigent's get_toolkits() (in agent.py), we configure the toolkit with task-specific values:
  # In eigent's agent.py
  toolkit = RAGToolkit(
      collection_name=f"task_{api_task_id}",
      storage_path=f"~/.eigent/rag/{api_task_id}",
  )

This way:

  • The toolkit remains generic and portable
  • Task isolation is an eigent concern handled at the orchestration level
  • The same toolkit code can live upstream in CAMEL

Would you be open to refactoring along these lines? Happy to discuss further!

…on layer

Per Wendong-Fan's architectural feedback:
- RAGToolkit now accepts collection_name and storage_path as params
- Removed hardcoded task_* patterns from toolkit
- Task isolation handled in get_toolkits() in agent.py
- Toolkit is now generic and portable for upstream contribution
@MkDev11 MkDev11 requested a review from a7m-1st January 24, 2026 15:27
@MkDev11
Copy link
Contributor Author

MkDev11 commented Jan 24, 2026

@Wendong-Fan @a7m-1st Just refactored the PR! please review again and let me know the result

@Wendong-Fan Wendong-Fan requested review from Zephyroam and bytecii and removed request for Zephyroam January 25, 2026 00:37
@Wendong-Fan
Copy link
Contributor

@Wendong-Fan @a7m-1st Just refactored the PR! please review again and let me know the result

thanks @MkDev11 ! could @a7m-1st and @bytecraftii help reviewing this?

MkDev11 and others added 3 commits January 25, 2026 08:30
Changes:
- Add DEFAULT_RAG_STORAGE_PATH and DEFAULT_COLLECTION_NAME constants
- Add TODO comments for embedding model flexibility
- Add get_task_collection_name() helper function in agent.py
- Simplify query results format (numbered list, no scores)
- Remove list_knowledge_bases from exposed tools (not useful with task isolation)
- Update tests to expect 3 tools instead of 4
- Add RAW_TEXT_SUBDIR constant for path cleaner
- Fix docstring format with types (str), (int), etc.
- Change logger.debug to logger.warning for missing API key
- Add validation to raise ValueError if collection_name is None
- Update tests to pass collection_name and add validation test
@MkDev11 MkDev11 requested a review from bytecii January 26, 2026 12:48
Keep RAGToolkit task-specific isolation handling
@MkDev11
Copy link
Contributor Author

MkDev11 commented Jan 26, 2026

@Wendong-Fan @a7m-1st any update for me?

@a7m-1st
Copy link
Collaborator

a7m-1st commented Jan 27, 2026

I could test it after #999

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature Request] Add RAG & Graph RAG capability

5 participants