-
Notifications
You must be signed in to change notification settings - Fork 25
InferX Knowledge Base Toolkit How to
Jason edited this page May 20, 2026
·
1 revision
This toolkit converts source material into Markdown that is easier to use for retrieval, summarization, and prompt construction.
It provides a containerized document conversion workflow using inferx/knowledgebase:v0.10.
Use the container when you want to process files from /input and generate Markdown artifacts in /output.
Supported input types:
- DOCX
- PPTX
- HTML / HTM
- Markdown
- Text
The container walks /input recursively and processes all supported files it finds.
sudo docker run --rm \
-v /home/brad/test/input:/input \
-v /home/brad/test/output:/output \
-e "API_KEY=YOUR_API_KEY_HERE" \
inferx/knowledgebase:v0.10 \
base_url=https://model.inferx.net/funccall/tn-a3t79iogb2/endpoints/Qwen3-Coder-Next-FP8/v1 \
api_key=YOUR_API_KEY_HERE \
model=Qwen/Qwen3-Coder-Next-FP8-
merged.md: original output with boundaries and summaries -
optimized.md: compressed output for KV-cache-oriented usage -
llm.md: prompt-ready version with instructions and citation guidance
The llm.md output is intended to be used directly in prompts. It includes:
- a system instruction block
- citation rules
- section-preserving formatting
- contextual handling for diagrams and partially rendered formulas
Expected citation form:
[bitcoin.pdf, Section 4 - Proof-of-Work][bitcoin.pdf, Section 11][bitcoin.pdf, Section 5, Step 3]
Avoid citations that omit the filename or section reference.
-
base_url: LLM endpoint URL -
api_key: API key for authentication -
model: model identifier, defaulting toQwen/Qwen3-Coder-Next-FP8
-
API_KEY: alternative way to provide the API key
- Conversion is roughly 10 seconds per file
- Lossless compression is effectively immediate
- LLM-ready formatting is produced as part of normal output generation
- Lossless compression is deterministic and safe, but token reduction is small
- OCR models are preloaded into the Docker image
Use this image for local document sets that need Markdown conversion and prompt-ready output.