Gtstudio_AiKnowledgeBase

Document management for AI agents in Magento 2. Upload files that agents can retrieve as context before answering queries — enabling retrieval-augmented generation (RAG) without a vector database.

What It Does

Upload and manage documents (PDF, TXT) in the Magento admin
Documents are stored and indexed so that agents can fetch relevant excerpts at query time
Integrates with Gtstudio_AiAgents — assign a knowledge base to any agent

Requirements

Magento 2.4.4+
PHP 8.1+
Gtstudio_AiConnector enabled and configured
Gtstudio_AiAgents enabled
smalot/pdfparser: ^2.12 (PDF text extraction)

Installation

php bin/magento module:enable Gtstudio_AiKnowledgeBase
php bin/magento setup:upgrade
php bin/magento setup:di:compile
php bin/magento setup:static-content:deploy -f --area adminhtml
php bin/magento cache:flush

Usage

Uploading Documents

Navigate to AI Studio → Agents & Tools → Knowledge Base.

Click Add New, fill in:

Field	Description
Title	Human-readable label (auto-populated from PDF metadata on upload)
Upload PDF Document	Upload a PDF file — text and metadata are extracted automatically
Content	Extracted text (editable; used for retrieval)
Tags	Comma-separated keywords (auto-populated from PDF metadata)
Agents	Associate this document with one or more agents
Is Active	Only active entries are searchable by agents

How Retrieval Works

When an agent that has knowledge base documents attached receives a question:

The question is matched against document excerpts using keyword or semantic similarity
Relevant excerpts are prepended to the agent's system prompt as context
The agent responds with awareness of those excerpts

No full document text is sent to the LLM — only the most relevant excerpts, keeping token usage low.

Extensibility

Supporting Additional File Formats

The text extraction pipeline uses a registry pattern. Register a custom extractor for a new MIME type:

<!-- etc/di.xml -->
<type name="Gtstudio\AiKnowledgeBase\Model\Extractor\ExtractorPool">
    <arguments>
        <argument name="extractors" xsi:type="array">
            <item name="application/vnd.openxmlformats-officedocument.wordprocessingml.document"
                  xsi:type="object">
                Vendor\Module\Model\Extractor\DocxExtractor
            </item>
        </argument>
    </arguments>
</type>

Implement Gtstudio\AiKnowledgeBase\Api\ExtractorInterface:

interface ExtractorInterface
{
    /**
     * Extract plain text from the given file path.
     */
    public function extract(string $filePath): string;
}

Custom Retrieval Strategy

Override the retrieval service to use a vector database, OpenSearch k-NN, or any other similarity search:

<preference for="Gtstudio\AiKnowledgeBase\Api\RetrievalServiceInterface"
            type="Vendor\Module\Model\VectorRetrievalService"/>

Chunking Strategy

Document chunking (splitting documents into excerpt-sized pieces) can be customised:

<type name="Gtstudio\AiKnowledgeBase\Model\Chunker\TextChunker">
    <arguments>
        <!-- Maximum characters per chunk -->
        <argument name="chunkSize" xsi:type="number">1500</argument>
        <!-- Overlap between consecutive chunks -->
        <argument name="overlap" xsi:type="number">200</argument>
    </arguments>
</type>

Database Tables

Table	Purpose
`gtstudio_ai_knowledge_base`	Document metadata (name, description, file path, agent association)
`gtstudio_ai_knowledge_base_chunk`	Extracted text chunks ready for retrieval

ACL Resources

Resource	Controls
`Gtstudio_AiKnowledgeBase::management`	Access to the Knowledge Base admin section

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
Api		Api
Block/Form/AiKnowledgeBase		Block/Form/AiKnowledgeBase
Command/AiKnowledgeBase		Command/AiKnowledgeBase
Controller/Adminhtml/AiKnowledgeBase		Controller/Adminhtml/AiKnowledgeBase
Mapper		Mapper
Model		Model
Query/AiKnowledgeBase		Query/AiKnowledgeBase
Setup/Patch/Data		Setup/Patch/Data
Ui		Ui
etc		etc
view/adminhtml		view/adminhtml
README.md		README.md
composer.json		composer.json
registration.php		registration.php

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Gtstudio_AiKnowledgeBase

What It Does

Requirements

Installation

Usage

Uploading Documents

How Retrieval Works

Extensibility

Supporting Additional File Formats

Custom Retrieval Strategy

Chunking Strategy

Database Tables

ACL Resources

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Gtstudio_AiKnowledgeBase

What It Does

Requirements

Installation

Usage

Uploading Documents

How Retrieval Works

Extensibility

Supporting Additional File Formats

Custom Retrieval Strategy

Chunking Strategy

Database Tables

ACL Resources

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages