Document management for AI agents in Magento 2. Upload files that agents can retrieve as context before answering queries — enabling retrieval-augmented generation (RAG) without a vector database.
- Upload and manage documents (PDF, TXT) in the Magento admin
- Documents are stored and indexed so that agents can fetch relevant excerpts at query time
- Integrates with
Gtstudio_AiAgents— assign a knowledge base to any agent
- Magento 2.4.4+
- PHP 8.1+
Gtstudio_AiConnectorenabled and configuredGtstudio_AiAgentsenabledsmalot/pdfparser: ^2.12(PDF text extraction)
php bin/magento module:enable Gtstudio_AiKnowledgeBase
php bin/magento setup:upgrade
php bin/magento setup:di:compile
php bin/magento setup:static-content:deploy -f --area adminhtml
php bin/magento cache:flushNavigate to AI Studio → Agents & Tools → Knowledge Base.
Click Add New, fill in:
| Field | Description |
|---|---|
| Title | Human-readable label (auto-populated from PDF metadata on upload) |
| Upload PDF Document | Upload a PDF file — text and metadata are extracted automatically |
| Content | Extracted text (editable; used for retrieval) |
| Tags | Comma-separated keywords (auto-populated from PDF metadata) |
| Agents | Associate this document with one or more agents |
| Is Active | Only active entries are searchable by agents |
When an agent that has knowledge base documents attached receives a question:
- The question is matched against document excerpts using keyword or semantic similarity
- Relevant excerpts are prepended to the agent's system prompt as context
- The agent responds with awareness of those excerpts
No full document text is sent to the LLM — only the most relevant excerpts, keeping token usage low.
The text extraction pipeline uses a registry pattern. Register a custom extractor for a new MIME type:
<!-- etc/di.xml -->
<type name="Gtstudio\AiKnowledgeBase\Model\Extractor\ExtractorPool">
<arguments>
<argument name="extractors" xsi:type="array">
<item name="application/vnd.openxmlformats-officedocument.wordprocessingml.document"
xsi:type="object">
Vendor\Module\Model\Extractor\DocxExtractor
</item>
</argument>
</arguments>
</type>Implement Gtstudio\AiKnowledgeBase\Api\ExtractorInterface:
interface ExtractorInterface
{
/**
* Extract plain text from the given file path.
*/
public function extract(string $filePath): string;
}Override the retrieval service to use a vector database, OpenSearch k-NN, or any other similarity search:
<preference for="Gtstudio\AiKnowledgeBase\Api\RetrievalServiceInterface"
type="Vendor\Module\Model\VectorRetrievalService"/>Document chunking (splitting documents into excerpt-sized pieces) can be customised:
<type name="Gtstudio\AiKnowledgeBase\Model\Chunker\TextChunker">
<arguments>
<!-- Maximum characters per chunk -->
<argument name="chunkSize" xsi:type="number">1500</argument>
<!-- Overlap between consecutive chunks -->
<argument name="overlap" xsi:type="number">200</argument>
</arguments>
</type>| Table | Purpose |
|---|---|
gtstudio_ai_knowledge_base |
Document metadata (name, description, file path, agent association) |
gtstudio_ai_knowledge_base_chunk |
Extracted text chunks ready for retrieval |
| Resource | Controls |
|---|---|
Gtstudio_AiKnowledgeBase::management |
Access to the Knowledge Base admin section |