StructuredTextEngine

Fully local RAG backend — semantic retrieval, cosine reranking, and grounded LLM inference
Built from first principles · No framework abstraction

StructuredTextEngine is a fully local Retrieval-Augmented Generation (RAG) backend built from first principles. It loads documents from disk, indexes them into a persistent vector store, retrieves semantically relevant context, and generates grounded answers using a local LLM — no internet required.

What This Project Demonstrates

Building a RAG pipeline from scratch with clean modular architecture
Persistent vector storage using ChromaDB across sessions
Semantic retrieval + cosine similarity reranking for better relevance
Modular LLM provider design — swap providers without touching business logic
Dependency injection via Container — clean separation of concerns
Grounding enforcement — answers only from retrieved context

Features

Fully local pipeline — no API keys, no internet, runs entirely on-device
Persistent ChromaDB — documents indexed once, retrieved across sessions
Semantic search — SentenceTransformers embeddings for dense retrieval
Cosine similarity reranking — retrieved docs rescored against query before answering
Grounding enforcement — LLM answers only from retrieved context; returns "I don't know" otherwise
Modular architecture — Container wires all dependencies; each layer independently testable
Full pipeline logging — every stage logged for systematic debugging

Tech Stack

Component	Technology
API	FastAPI, Uvicorn
LLM	Ollama (Mistral local)
Embeddings	SentenceTransformers (all-MiniLM-L6-v2)
Vector Store	ChromaDB (persistent)
Reranking	Cosine similarity (numpy)
Language	Python 3.11

Architecture

User Query (POST /process)
│
▼
FastAPI → Router → Controller
│
▼
TextService
├── VectorRetriever
│   ├── EmbeddingService → query embedding
│   ├── VectorStore (ChromaDB) → top-10 recall
│   └── Reranker → cosine similarity → top-3
├── PromptManager → build grounded prompt
└── LLMClient → Ollama/Mistral → answer
│
▼
TextResponse

RAG Pipeline

Documents (docs/*.txt)
↓
DocumentLoader → chunks
↓
EmbeddingService → dense vectors
↓
ChromaDB (persistent storage)

Query
↓
EmbeddingService → query vector
↓
ChromaDB → top-10 recall
↓
Reranker → cosine similarity → top-3
↓
PromptManager → grounded prompt
↓
LLMClient → Ollama/Mistral
↓
Grounded Answer

Example

Request

POST /process
{
  "text": "What are embeddings?"
}

Retrieved Context

Embeddings are numerical representations of text that capture semantic meaning.
Sentence Transformers are used to generate dense vector embeddings for semantic search.
Vector search finds semantically similar documents using cosine similarity.

Response

Embeddings are numerical representations of text that capture semantic meaning.

Evaluation

5 queries tested against the local pipeline:

Query	Type	Result
What is Python?	factual	PASS ✅
What is RAG?	acronym query	PARTIAL ⚠️
What are embeddings?	conceptual	PASS ✅
What is the capital of France?	grounding test	PASS ✅
Who created FastAPI?	grounding test	PASS ✅

Score: 4/5

How to Run

1. Clone and install dependencies

git clone https://github.com/solankinitish/structured-text-engine
cd structured-text-engine
pip install -r requirements.txt

2. Install Ollama and pull Mistral

ollama pull mistral

3. Start the API server

uvicorn app.api.server:app --reload

4. Run the evaluation

python -m scripts.evaluate

5. Test manually

curl -X POST http://localhost:8000/process \
  -H "Content-Type: application/json" \
  -d '{"text": "What are embeddings?"}'

Project Structure

structured-text-engine/
├── app/
│   ├── api/
│   │   ├── server.py           — FastAPI app, middleware
│   │   ├── routes.py           — API endpoints
│   │   ├── middleware.py       — request logging
│   │   └── error_handlers.py  — exception handling
│   ├── core/
│   │   └── container.py        — dependency injection
│   ├── llm/
│   │   └── llm_client.py       — Ollama LLM wrapper
│   ├── retrieval/
│   │   ├── document_loader.py  — loads .txt files from docs/
│   │   ├── embedding_service.py — SentenceTransformers embeddings
│   │   ├── vector_store.py     — ChromaDB persistent store
│   │   ├── vector_retriever.py — semantic retrieval + reranking
│   │   └── reranker.py         — cosine similarity reranker
│   ├── prompts/
│   │   └── prompt_manager.py   — prompt templates
│   ├── services/
│   │   └── text_service.py     — orchestrates RAG pipeline
│   ├── models/
│   │   └── schemas.py          — Pydantic request/response models
│   └── utils/
│       └── logger.py           — logging
├── docs/
│   └── knowledge.txt           — document knowledge base
├── scripts/
│   └── evaluate.py             — evaluation benchmark
├── data/
│   └── vector_db/              — ChromaDB persistent storage
├── requirements.txt
└── .gitignore

Known Limitations

Acronym sensitivity — queries using acronyms (e.g. "RAG") may fail to retrieve documents where the full term is used; query expansion would address this
Static knowledge base — documents loaded once at startup; adding new docs requires restart
Single file ingestion — DocumentLoader reads .txt files only; PDF/HTML not supported
Paragraph chunking — splits on double newline; no overlap, no semantic chunking
Local LLM latency — Ollama/Mistral averages 10-40s per query
No authentication — API is open, no rate limiting

Summary

StructuredTextEngine demonstrates a fully local production-style RAG backend combining persistent vector storage, semantic retrieval, cosine similarity reranking, modular dependency injection, and local LLM inference — built and understood from first principles.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
app		app
docs		docs
scripts		scripts
tests		tests
.env.example		.env.example
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

StructuredTextEngine

What This Project Demonstrates

Features

Tech Stack

Architecture

RAG Pipeline

Example

Evaluation

How to Run

Project Structure

Known Limitations

Summary

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

StructuredTextEngine

What This Project Demonstrates

Features

Tech Stack

Architecture

RAG Pipeline

Example

Evaluation

How to Run

Project Structure

Known Limitations

Summary

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages