# RAG in Production: From Prototype to Deployment

[Fertiglobe](https://www.linkedin.com/company/fertiglobe/)

[medium: Building and Evaluating your First RAG](https://medium.com/henkel-data-and-analytics/building-and-evaluating-your-first-rag-90da2c55e6ff)

[repo: prod-rag](https://github.com/El-Moghazy2/prod-rag)


> **Based on:** ["Building and Evaluating your First RAG"](https://medium.com/henkel-data-and-analytics/building-and-evaluating-your-first-rag) by Abdelrhman ElMoghazy, Henkel Data & Analytics

This 3-hour hands-on workshop expands the original article into a comprehensive, production-ready RAG system. We replace Azure OpenAI with **fully local Ollama** models for zero-cost execution.

---

## Learning Objectives

By the end of this workshop, you will be able to:

1. **Build a RAG pipeline** from scratch using LangGraph, FAISS, and Ollama
2. **Add guardrails** for input validation, prompt injection detection, and output grounding
3. **Engineer prompts** using 4 different strategies and compare them quantitatively
4. **Optimize context** through chunk size tuning, k-value analysis, and re-ranking
5. **Evaluate automatically** using RAGAS with automated test set generation

---

## Prerequisites

- Python 3.10+
- [Ollama](https://ollama.com) installed and running
- Models pulled: `ollama pull llama3.2` and `ollama pull nomic-embed-text`

---
## Section 0: Workshop Setup (~5 min)
---

### Environment Setup

Create and activate a conda environment before installing dependencies:

```bash
conda create -n rag-workshop python=3.10 -y
conda activate rag-workshop
```

**Or** using pip with a virtual environment:

```bash
python -m venv rag-workshop
# Windows (PowerShell)
Set-ExecutionPolicy -Scope CurrentUser -ExecutionPolicy RemoteSigned
rag-workshop\Scripts\activate
# macOS / Linux
source rag-workshop/bin/activate
```

**Or** using [Hatch](https://hatch.pypa.io):

```bash
pip install hatch
hatch env create
hatch shell
```

### Install Ollama

Download and install Ollama from [ollama.com/download](https://ollama.com/download), then pull the required models:

```bash
ollama serve        # start the Ollama server (keep running in a separate terminal)
ollama pull llama3.2           # LLM for generation
ollama pull nomic-embed-text   # embedding model for retrieval
ollama pull llama-guard3       # content safety classifier (Section 2 guardrails)
```

In [None]:
# Install all dependencies
%pip install langchain langchain-ollama langchain-community langgraph faiss-cpu \
    PyMuPDF requests ragas python-dotenv openpyxl pandas matplotlib numpy pydantic rank-bm25 -q

Note: you may need to restart the kernel to use updated packages.



[notice] A new release of pip is available: 25.2 -> 26.0.1
[notice] To update, run: python.exe -m pip install --upgrade pip


In [None]:
# Verify Ollama is running and models are available
import requests

OLLAMA_BASE_URL = "http://localhost:11434"

try:
    response = requests.get(f"{OLLAMA_BASE_URL}/api/tags", timeout=5)
    response.raise_for_status()
    models = [m["name"] for m in response.json().get("models", [])]
    print(f"Ollama is running! Available models: {models}")
    
    required = ["llama3.2", "nomic-embed-text"]
    for model in required:
        found = any(model in m for m in models)
        status = "FOUND" if found else "MISSING - run: ollama pull " + model
        print(f"  {model}: {status}")
except requests.ConnectionError:
    print("ERROR: Ollama is not running! Start it with: ollama serve")

Ollama is running! Available models: ['llama-guard3:latest', 'nomic-embed-text:latest', 'llama3.2:latest']
  llama3.2: FOUND
  nomic-embed-text: FOUND
