-
Notifications
You must be signed in to change notification settings - Fork 1.4k
feat: add RAG toolkit for knowledge base queries #1003
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Closes eigent-ai#410 - Add RAGToolkit with document ingestion and retrieval capabilities - Use CAMEL's VectorRetriever with QdrantStorage for local vector storage - Provide add_document, query_knowledge_base, list_knowledge_bases tools - Register RAG toolkit in agent.py toolkits dictionary - Add 15 unit tests covering toolkit functionality
…ient, unstructured)
|
@Wendong-Fan @4pmtong please review the PR and let me know your feedbacks. thanks. |
|
cool! I am open to your idea :) |
|
hello @MkDev11 thanks for the PR! i think we already have some overlapping features in the main camel repo here -
perhaps, if theres a specific functionality missing in those modules, would you be open to raising a PR there? thanks! |
Based on feedback from JINO-ROHIT, refactored RAGToolkit to use CAMEL's infrastructure via composition. Features: - Uses CAMEL's RetrievalToolkit for file/URL retrieval - Uses CAMEL's VectorRetriever for raw text support - Task-based collection isolation - Eigent AbstractToolkit integration Tools provided: - add_document: Add raw text to knowledge base - query_knowledge_base: Query added documents - information_retrieval: Query files/URLs (CAMEL's method) - list_knowledge_bases: List available KBs
Thanks for the feedback @JINO-ROHIT! You're right - I've refactored the PR to use CAMEL's existing infrastructure instead of duplicating functionality. Changes MadeThe
Eigent-Specific AdditionsThese are the features that are specific to eigent and wouldn't belong in the main CAMEL repo:
Why Not Contribute Upstream?The task isolation and Let me know if you have any other feedback! |
|
forwarding:
But more decisively I believe Wendong will leave comments 👍 |
|
Thanks @MkDev11 and @a7m-1st ! I want to share some architectural feedback. The Problem Looking at the code, the RAGToolkit currently mixes generic RAG functionality with eigent-specific concerns: The toolkit itself shouldn't know about eigent's task isolation strategy. This makes the code:
Suggested Architecture
This way:
Would you be open to refactoring along these lines? Happy to discuss further! |
…on layer Per Wendong-Fan's architectural feedback: - RAGToolkit now accepts collection_name and storage_path as params - Removed hardcoded task_* patterns from toolkit - Task isolation handled in get_toolkits() in agent.py - Toolkit is now generic and portable for upstream contribution
|
@Wendong-Fan @a7m-1st Just refactored the PR! please review again and let me know the result |
thanks @MkDev11 ! could @a7m-1st and @bytecraftii help reviewing this? |
Changes: - Add DEFAULT_RAG_STORAGE_PATH and DEFAULT_COLLECTION_NAME constants - Add TODO comments for embedding model flexibility - Add get_task_collection_name() helper function in agent.py - Simplify query results format (numbered list, no scores) - Remove list_knowledge_bases from exposed tools (not useful with task isolation) - Update tests to expect 3 tools instead of 4
- Add RAW_TEXT_SUBDIR constant for path cleaner - Fix docstring format with types (str), (int), etc. - Change logger.debug to logger.warning for missing API key - Add validation to raise ValueError if collection_name is None - Update tests to pass collection_name and add validation test
Keep RAGToolkit task-specific isolation handling
|
@Wendong-Fan @a7m-1st any update for me? |
|
I could test it after #999 |
…o new agent module structure
Closes #410
Description
This PR adds RAG (Retrieval-Augmented Generation) capability to eigent using CAMEL's built-in vector retrieval infrastructure.
What it does:
RAGToolkitthat lets agents store and query knowledge basesTools provided:
add_document- Add text content to a knowledge base with optional metadataquery_knowledge_base- Search for relevant information using semantic similaritylist_knowledge_bases- Show available knowledge basesHow to use:
When creating an agent, select
rag_toolkitfrom the tools list. The agent can then store and retrieve information from its knowledge base during task execution.Live Test Results
Dependencies Added
qdrant-client- Local vector databaseunstructured- Document parsing for CAMEL's VectorRetrieverWhat is the purpose of this pull request?