# Building a RAG Application with Ollama deepseek-r1:32b, Llama Index, and LangChain


## 1. Introduction

Welcome to this step-by-step guide on building a **Retrieval-Augmented Generation (RAG)** application! In this notebook, we will combine the power of retrieval methods with advanced language generation techniques. Our goal is to create a system that can retrieve relevant information from your data sources and use a state-of-the-art language model to generate insightful responses.

### What is RAG?

RAG stands for *Retrieval-Augmented Generation*. It is an approach that:
- **Retrieves**: Searches and fetches relevant documents or pieces of data.
- **Generates**: Leverages a large language model (LLM) to produce contextually accurate and insightful outputs based on the retrieved information.

This technique is especially useful for tasks such as:
- Question Answering
- Chatbots and Conversational Agents
- Document Summarization
- Knowledge-based Systems

### Key Components

1. **Ollama deepseek-r1:32b**:  
   A powerful model used for embedding and retrieval. We'll explain how to set it up and use it effectively, even if you are starting from scratch.

2. **Llama Index**:  
   A tool to efficiently build and manage indexes over your data. It simplifies organizing and querying your documents.

3. **LangChain**:  
   A versatile framework that helps integrate various components (like LLMs and indexes) into a coherent application. It provides a high-level interface to work with large language models.

### What to Expect

In the sections that follow, we will cover:
- **Setting Up Your Environment**:  
  How to install and configure Ollama deepseek-r1:32b, including a dedicated section for those who haven’t set it up yet.

- **Data Ingestion & Indexing with Llama Index**:  
  Step-by-step instructions on how to prepare your data and build an index to enable efficient retrieval.

- **Integrating with LangChain**:  
  How to tie everything together by interfacing the index with your language model for retrieval-augmented generation.

- **Example Use Cases & Exercises**:  
  Practical code snippets and exercises to help you apply what you’ve learned in real-world scenarios.

By the end of this notebook, you’ll have a clear understanding of how to build and deploy your own RAG application, empowering you to tackle complex information retrieval and generation tasks.

Let's get started!


## 2. Environment Setup

In this section, we will prepare our environment by installing the necessary Python libraries, setting up Ollama with the **deepseek-r1:32b** model, and verifying that our setup is working correctly.


### 2.1. Installing Required Libraries

Our RAG application will leverage the following key Python packages:
- **LangChain**: For integrating with large language models.
- **Llama Index**: For building and querying document indexes.
- **Requests**: For making HTTP calls (useful if you interact with an API).

Open your terminal or command prompt and run the following command to install these packages:

```bash
pip install langchain llama-index requests
```

For the latest installation instructions or updates, please refer to the official documentation:
- [LangChain GitHub Repository](https://github.com/hwchase17/langchain)
- [Llama Index Documentation](https://gpt-index.readthedocs.io/en/latest/)
- [Requests Documentation](https://requests.readthedocs.io/en/latest/)

---

### 2.2. Setting Up Ollama and deepseek-r1:32b

**Ollama** is a platform that enables you to run large language models locally. In our case, we will use it to host the **deepseek-r1:32b** model.

### Steps to Set Up:

1. **Install Ollama**:  
   Visit the [Ollama website](https://ollama.com) and follow the installation instructions for your operating system.

2. **Download deepseek-r1:32b**:  
   If you haven’t already downloaded the model, you can pull it via the Ollama CLI:
   ```bash
   ollama pull deepseek-r1:32b
   ```

3. **Start the Model**:  
   Ensure that the model is running on your machine. The exact steps might vary depending on your installation. Consult the [Ollama documentation](https://ollama.com/docs) for detailed guidance.

> **Note:** If you prefer using an HTTP API (if provided by your Ollama installation) over the CLI, instructions will be provided later in the notebook.

Once you have completed these steps, your local deepseek-r1:32b model should be ready for use.


---


### 2.3. Verifying the Setup

Before proceeding, let’s verify that both our Python environment and deepseek-r1:32b are working as expected.

#### 2.3.1. Verify Python Package Installation

Run the following code snippet in a Python cell to ensure that all necessary packages are installed and importable:

In [4]:
import langchain
import llama_index
import requests

#### 2.3.2. Verify deepseek-r1:32b via the Ollama CLI

If you’re using the Ollama CLI to interact with deepseek-r1:32b, you can run a quick test. Create a helper function in your notebook that sends a prompt to the model:

In [5]:
import subprocess

def query_deepseek(prompt: str) -> str:
    """
    Sends a prompt to the deepseek-r1:32b model via the Ollama CLI.
    """
    command = ["ollama", "run", "deepseek-r1:32b", prompt]
    result = subprocess.run(command, capture_output=True, text=True)
    
    if result.returncode != 0:
        raise RuntimeError(f"Error calling deepseek-r1:32b: {result.stderr}")
    
    return result.stdout.strip()

# Test the function:
try:
    test_response = query_deepseek("Hello, deepseek-r1:32b! Please confirm you are running.")
    print("Model Response:", test_response)
except Exception as e:
    print(e)


Model Response: <think>
Okay, so I'm trying to figure out how to confirm if the system named deepseek-r1:32b is running. Hmm, first off, I need to understand what exactly this refers to. The name seems like it could be a specific instance or model of an AI or some kind of software. Maybe it's related to machine learning or artificial intelligence processing units?

I'm not entirely sure how to approach this. Perhaps I should start by checking if there are any status indicators or logs that show whether the system is active. If I have access to the server where deepseek-r1:32b is running, maybe I can look at its process list using commands like 'top' or 'htop' in Linux. That would help me see if it's currently using CPU or memory resources.

Another thought: Maybe there's a web interface or dashboard that monitors system instances. If I can log into such a portal, I might find status information about deepseek-r1:32b there. I should check if I have the credentials for any monitoring too

#### 2.3.3. (Optional) Verify deepseek-r1:32b via an HTTP API

If your Ollama installation provides an HTTP API endpoint, you can test it using the `requests` library. Adjust the API endpoint as needed:

In [32]:
import requests

def query_deepseek_api(prompt: str) -> dict:
    url = "http://localhost:11434/api/generate"
    payload = {
        "model": "deepseek-r1:32b",
        "prompt": prompt,
        "stream": False
    }
    response = requests.post(url, json=payload)
    response.raise_for_status()
    return response.json()

# Example usage
if __name__ == "__main__":
    result = query_deepseek_api("Why is the sky blue?")
    print("API Response:", result)


API Response: {'model': 'deepseek-r1:32b', 'created_at': '2025-02-02T07:37:41.133629Z', 'response': "<think>\n\n</think>\n\nThe sky appears blue because of a phenomenon called Rayleigh scattering. When sunlight passes through Earth's atmosphere, it interacts with molecules and small particles in the air. Sunlight is made up of different colors, each with its own wavelength. Blue light has a shorter wavelength compared to other colors like red or orange.\n\nAs sunlight travels through the atmosphere, the shorter blue wavelengths are scattered in all directions by the gases and molecules in the air, especially nitrogen and oxygen molecules. This scattering is much more effective for blue light than for the longer wavelengths like red. So, when you look up at the sky, you see the scattered blue light coming from all over the sky.\n\nThis explains why the sky looks blue during the day. At sunrise or sunset, the light has to pass through more of the atmosphere, which scatters away the blue 

In [33]:
import requests
import time
import json

def query_deepseek_api(prompt: str) -> dict:
    url = "http://localhost:11434/api/generate"
    payload = {
        "model": "deepseek-r1:32b",
        "prompt": prompt,
        "stream": False
    }
    
    start_time = time.perf_counter()
    response = requests.post(url, json=payload)
    elapsed_time = time.perf_counter() - start_time
    response.raise_for_status()
    result = response.json()
    
    # Try to extract usage telemetry from the response headers first.
    usage_header = response.headers.get("X-Usage-Data")
    if usage_header:
        try:
            usage_info = json.loads(usage_header)
        except json.JSONDecodeError:
            usage_info = usage_header
    else:
        # Fallback to checking the JSON response body for usage info.
        usage_info = result.get("usage", "No usage data provided.")
    
    telemetry = {
        "elapsed_time_seconds": elapsed_time,
        "status_code": response.status_code,
        "response_size_bytes": len(response.content),
        "usage": usage_info
    }
    
    result["telemetry"] = telemetry
    return result

# Example usage
if __name__ == "__main__":
    prompt = "Why is the sky blue?"
    result = query_deepseek_api(prompt)
    print("API Response:", result)


API Response: {'model': 'deepseek-r1:32b', 'created_at': '2025-02-02T07:40:00.147784Z', 'response': "<think>\n\n</think>\n\nThe sky appears blue because of a phenomenon called Rayleigh scattering. When sunlight reaches Earth's atmosphere, it interacts with molecules and small particles in the air. Sunlight is made up of different colors, each with its own wavelength. Blue light has a shorter wavelength compared to other colors like red or orange.\n\nAs sunlight travels through the atmosphere, the shorter blue wavelengths are scattered in all directions by the gases and particles in the air. This scattering is much more effective for blue light than for longer wavelengths like red. So, when you look up at the sky, you're seeing the scattered blue light coming from all over the sky.\n\nDuring sunrise or sunset, the sun is lower on the horizon, so its light has to pass through more of the Earth's atmosphere. This causes more scattering of the shorter blue wavelengths, allowing the longer 

# 3. Data Ingestion and Document Preparation

In this section, we'll upload our documents and prepare them for indexing. Our documents can be in various formats (e.g., plain text, PDFs), but for this example we'll focus on plain text files stored in a directory.

---

## 3.1. Uploading Documents

Place your documents in a folder (for example, a directory named `data/`). Then, use Llama Index's built-in reader to load the files. For instance, if your documents are plain text files, you can use the `SimpleDirectoryReader`:

In [29]:
pip install llama-index-core llama-index-readers-file llama-index-llms-ollama llama-index-embeddings-huggingface

Collecting llama-index-llms-ollama
  Downloading llama_index_llms_ollama-0.5.0-py3-none-any.whl.metadata (3.8 kB)
Collecting llama-index-embeddings-huggingface
  Downloading llama_index_embeddings_huggingface-0.5.1-py3-none-any.whl.metadata (767 bytes)
Collecting sentence-transformers>=2.6.1 (from llama-index-embeddings-huggingface)
  Downloading sentence_transformers-3.4.1-py3-none-any.whl.metadata (10 kB)
Collecting minijinja>=1.0 (from huggingface-hub[inference]>=0.19.0->llama-index-embeddings-huggingface)
  Downloading minijinja-2.7.0-cp38-abi3-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl.metadata (8.6 kB)


Downloading llama_index_llms_ollama-0.5.0-py3-none-any.whl (6.7 kB)
Downloading llama_index_embeddings_huggingface-0.5.1-py3-none-any.whl (8.9 kB)
Downloading sentence_transformers-3.4.1-py3-none-any.whl (275 kB)
Downloading minijinja-2.7.0-cp38-abi3-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl (1.7 MB)
[2K   [38;2;114;156;31m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.7/1.7 MB[0m [31m20.4 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: minijinja, llama-index-llms-ollama, sentence-transformers, llama-index-embeddings-huggingface
Successfully installed llama-index-embeddings-huggingface-0.5.1 llama-index-llms-ollama-0.5.0 minijinja-2.7.0 sentence-transformers-3.4.1
Note: you may need to restart the kernel to use updated packages.


In [31]:
from llama_index.core import SimpleDirectoryReader

# Define path to the root directory containing subfolders
directory_path = "/Users/rileymete/Downloads/CompanyBenefits"

# Load all files, including those inside subfolders
documents = SimpleDirectoryReader(directory_path, recursive=True).load_data()

print(f"Loaded {len(documents)} documents from {directory_path}")

Ignoring wrong pointing object 8 0 (offset 0)
Ignoring wrong pointing object 10 0 (offset 0)
Ignoring wrong pointing object 12 0 (offset 0)
Ignoring wrong pointing object 14 0 (offset 0)
Ignoring wrong pointing object 16 0 (offset 0)
Ignoring wrong pointing object 18 0 (offset 0)
Ignoring wrong pointing object 20 0 (offset 0)
Ignoring wrong pointing object 26 0 (offset 0)
Ignoring wrong pointing object 6 0 (offset 0)
Ignoring wrong pointing object 8 0 (offset 0)
Ignoring wrong pointing object 10 0 (offset 0)
Ignoring wrong pointing object 12 0 (offset 0)
Ignoring wrong pointing object 14 0 (offset 0)
Ignoring wrong pointing object 16 0 (offset 0)
Ignoring wrong pointing object 19 0 (offset 0)
Ignoring wrong pointing object 21 0 (offset 0)
Ignoring wrong pointing object 23 0 (offset 0)
Ignoring wrong pointing object 25 0 (offset 0)
Ignoring wrong pointing object 37 0 (offset 0)
Ignoring wrong pointing object 39 0 (offset 0)
Ignoring wrong pointing object 41 0 (offset 0)
Ignoring wrong p

Loaded 383 documents from /Users/rileymete/Downloads/CompanyBenefits


## 3. Indexing with Llama Index – Advanced Customization

In this step, we focus on how to index and customize the processing of the documents you have already uploaded (e.g., into a local folder). This section covers:

1. **Loading Your Uploaded Documents**  
   Using Llama Index’s built-in loaders to ingest files from a directory.

2. **Transformations**  
   Customizing how the documents are split into nodes (chunks) and adding metadata to improve retrieval.

3. **Indexing and Querying**  
   Building a vector index with your transformed documents and querying it for relevant information.

---

### 3.1. Loading Your Uploaded Documents

Assuming you have already uploaded your documents into a local folder (for example, `./data`), you can use the `SimpleDirectoryReader` to load them:


In [None]:
from llama_index.core import SimpleDirectoryReader

# Define path to the root directory containing subfolders
directory_path = "/Users/rileymete/Downloads/CompanyBenefits"

# Load all files, including those inside subfolders
documents = SimpleDirectoryReader(directory_path, recursive=True).load_data()

print(f"Loaded {len(documents)} documents from {directory_path}")

### 3.2. Transformations

Before indexing, you can customize how your documents are processed. This typically involves splitting them into smaller chunks (nodes) and adding metadata. You have two main options:


In [34]:
from llama_index.core.node_parser import SentenceSplitter
from llama_index.core import Settings

# Customize the text splitter with desired parameters
text_splitter = SentenceSplitter(chunk_size=512, chunk_overlap=10)

# Option 1: Set the custom text splitter globally
Settings.text_splitter = text_splitter

print("Custom text splitter configured: chunk size 512 with 10 words overlap.")

Custom text splitter configured: chunk size 512 with 10 words overlap.


#### Build the Vector Index Using Your Custom Transformation

Now, build your vector index by passing the loaded documents along with your custom text splitter as a transformation.

In [None]:
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.core import VectorStoreIndex

# Create a local embedding model using a Hugging Face model
embedding_model = HuggingFaceEmbedding(model_name="sentence-transformers/all-MiniLM-L6-v2")

# Build the index with the custom text splitter transformation and the local embedding model
index = VectorStoreIndex.from_documents(
    documents,
    transformations=[text_splitter],
    embed_model=embedding_model
)
print("Custom index built successfully!")

modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/10.7k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]



config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

#### (Optional) Save the Index to Disk

You can save your index to disk to avoid rebuilding it every time.

In [None]:
index.save_to_disk("custom_index.json")
print("Index saved to disk as 'custom_index.json'.")


#### Query the Index
Finally, test your index by running a query. This cell retrieves relevant information from your indexed documents.

In [None]:
query = "What are the latest trends in sports news?"
response = index.query(query)

print("Query Response:")
print(response)
