EZRAG

EZRAG is a lightweight Retrieval-Augmented Generation playground powered by Streamlit, Chroma DB, and SentenceTransformers. Drop Markdown or text files into documents/, fire up the UI, and start querying your knowledge base.

Project Structure

main.py – Streamlit UI that exposes query input, collection refresh, and results display.
chroma_impl.py – VectorStore wrapper responsible for loading documents, chunking, embedding, and talking to Chroma.
documents/ – Place .md, .markdown, or .txt sources here; everything inside is eligible for ingestion.
chroma_store/ – Created on first run to persist the Chroma collection locally.

Chroma Strategy

1. Chunking

Documents are chunked in chroma_impl.py using a tiered approach:

Split by Markdown headers (#-level 1–3) to preserve meaningful sections.
If the file lacks headers, fall back to paragraph splitting on blank lines.
As a final guard, split into long sentences (>50 characters) to ensure even sparse text is captured.

2. Embeddings & Storage

Each chunk is embedded with sentence-transformers/all-MiniLM-L6-v2 (384-dimensional cosine space).
Chunks are packaged with metadata (source, type, chunk_number, file_hash) and inserted into a persistent Chroma collection (chroma_store/).
Chroma IDs include the file hash so updates to document content automatically invalidate old entries.
Before inserting, the store inspects Chroma for existing entries that share the same filename and MD5 hash.
Unchanged files are skipped; modified files trigger a delete + reinsert cycle

3. Search

Queries are encoded with the same model and sent to Chroma’s cosine-based similarity search.
Results include chunk metadata and raw text, which the Streamlit app surfaces with relevance hints and source details.

Getting Started

Create & activate a virtual environment

python -m venv venv
source venv/bin/activate  # Windows: venv\Scripts\activate

Install dependencies
```
pip install -r requirements.txt
```
Add your documents
- Drop Markdown/text files into the documents/ folder.
Run the Streamlit interface
```
streamlit run main.py
```

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
documents		documents
.gitignore		.gitignore
README.md		README.md
chroma_impl.py		chroma_impl.py
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

EZRAG

Project Structure

Chroma Strategy

1. Chunking

2. Embeddings & Storage

3. Search

Getting Started

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

EZRAG

Project Structure

Chroma Strategy

1. Chunking

2. Embeddings & Storage

3. Search

Getting Started

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages