In [None]:
from langchain_ollama import ChatOllama

llm = ChatOllama(model="ministral-3:3b")

In [4]:
response = llm.invoke("How does RAG work?")
print(response.content)

**Retrieval-Augmented Generation (RAG)** is a technique used in **large language models (LLMs)** to improve accuracy, context awareness, and reduce hallucinations by combining **retrieval** (finding relevant information) with **generation** (producing text). It is particularly useful when the model’s internal knowledge cutoff (e.g., 2023-10-01 for me) is insufficient for up-to-date or domain-specific queries.

---

### **Core Components of RAG**
RAG typically consists of **three main parts**:

1. **Retrieval Module**
   - Fetches relevant information from an external knowledge base (e.g., a database, corpus, or vector store).
   - Uses techniques like:
     - **Keyword matching** (simple term-based search).
     - **Semantic search** (e.g., embeddings + cosine similarity, like in **BM25** or **FAISS**).
     - **Hybrid approaches** (combining keyword + semantic search).

2. **Augmentation**
   - The retrieved documents are **contextualized** (e.g., concatenated with the user’s query) t