# Notebook 1: Setup & Basics

**Difficulty:** Beginner | **Estimated Time:** 45-60 minutes

## Learning Objectives

By the end of this notebook, you will be able to:

1. ‚úÖ Understand LlamaIndex architecture and modular ecosystem
2. ‚úÖ Install and configure LlamaIndex with the latest modular packages
3. ‚úÖ Configure the Settings object (LLM, embeddings, chunk size)
4. ‚úÖ Create your first VectorStoreIndex from documents
5. ‚úÖ Execute basic queries and analyze responses
6. ‚úÖ Understand the Document ‚Üí Node ‚Üí Index flow

## Prerequisites

- Python 3.12+ installed
- Understanding of embeddings and vector similarity
- OpenAI API key (get from https://platform.openai.com/)

## Curriculum Coverage

- **Section 1.1:** Introduction to LlamaIndex
- **Section 1.2:** Installation and Configuration
- **Section 1.3.1:** Index Types (overview)
- **Section 1.3.2:** Document and Node Structure (basics)
- **Section 1.4:** Global Settings and Configuration

---

## 1. Introduction to LlamaIndex

### What is LlamaIndex?

LlamaIndex is a **data framework** for LLM-based applications, specifically designed to:

- **Ingest** data from various sources (PDFs, APIs, databases)
- **Index** data into optimized structures for retrieval
- **Query** data with natural language
- **Integrate** with LLMs for context-aware responses

### Why LlamaIndex Matters

For ML engineers, LlamaIndex solves key challenges:

1. **Context Window Limitations**: LLMs have token limits (~8k-128k). LlamaIndex enables querying unlimited documents.
2. **Semantic Search**: Goes beyond keyword matching using embedding-based similarity.
3. **Source Attribution**: Tracks which documents contribute to responses.
4. **Production-Ready**: Modular architecture, extensive integrations, active development.

### Architecture Overview

```
Documents ‚Üí Nodes ‚Üí Index ‚Üí Query Engine ‚Üí LLM ‚Üí Response
     ‚Üì         ‚Üì       ‚Üì          ‚Üì
   Load     Chunk   Embed     Retrieve
```

**Key Components:**

- **Documents**: Raw data sources (PDFs, text, APIs)
- **Nodes**: Chunked text with metadata
- **Embeddings**: Vector representations of nodes
- **Index**: Optimized storage for retrieval (VectorStoreIndex, SummaryIndex, etc.)
- **Query Engine**: Orchestrates retrieval and synthesis
- **Response Synthesis**: Combines retrieved context with LLM generation

---

## 2. Installation & Modular Architecture

### New Modular Package Structure

LlamaIndex has shifted to a **three-tier architecture**:

1. **llama-index-core**: Base abstractions (no integrations)
2. **Integration packages**: Specific LLMs, embeddings, vector stores
   - `llama-index-llms-openai`
   - `llama-index-embeddings-huggingface`
   - `llama-index-vector-stores-qdrant`
3. **llama-index (meta)**: Bundles core + default integrations

### Why Modular?

- **Cherry-pick** only what you need
- **Independent versioning** for each integration
- **Smaller dependencies** = faster installs
- **Future-proof** with active development

### Installation

If you followed the README setup, dependencies are already installed from `requirements.txt`. If not:

```bash
pip install llama-index-core llama-index-llms-openai llama-index-embeddings-openai llama-index-readers-file
```

---

## 3. Imports & API Key Configuration

In [1]:
# Core LlamaIndex
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader, Settings, Document
from llama_index.core.node_parser import SentenceSplitter

# LLM Integration
from llama_index.llms.openai import OpenAI

# Embedding Integration
from llama_index.embeddings.openai import OpenAIEmbedding

# Utilities
from dotenv import load_dotenv
import os
import warnings
warnings.filterwarnings('ignore')

print("‚úÖ Imports successful!")

‚úÖ Imports successful!


### Load API Keys from .env

In [2]:
# Load environment variables from .env file
load_dotenv()

# Verify OpenAI API key is set
openai_api_key = os.getenv("OPENAI_API_KEY")

if not openai_api_key:
    raise ValueError(
        "‚ùå OPENAI_API_KEY not found!\n"
        "Please create a .env file in the project root with:\n"
        "OPENAI_API_KEY=your_key_here"
    )

print(f"‚úÖ OpenAI API key loaded (starts with: {openai_api_key[:8]}...)")

‚úÖ OpenAI API key loaded (starts with: sk-proj-...)


---

## 4. Global Settings Configuration

### Understanding the Settings Object

The `Settings` object is the **modern way** to configure LlamaIndex globally (replaces deprecated `ServiceContext`).

**Key Configuration Options:**

- `Settings.llm`: Default LLM for query engines
- `Settings.embed_model`: Default embedding model
- `Settings.chunk_size`: Default chunk size for text splitting
- `Settings.chunk_overlap`: Overlap between chunks
- `Settings.node_parser`: Default node parser

### Why Configure Settings?

- **Consistency**: All components use same LLM/embeddings
- **Convenience**: No need to pass parameters repeatedly
- **Best Practices**: Centralized configuration

In [3]:
# Configure LLM
Settings.llm = OpenAI(
    model="gpt-4o-mini",  # Fast, cost-effective for most use cases
    temperature=0.1,      # Low temperature for consistent responses
)

# Configure Embedding Model
Settings.embed_model = OpenAIEmbedding(
    model="text-embedding-3-small",  # 1536 dimensions, good balance
    dimensions=1536,                 # Can be reduced for speed (e.g., 512)
)

# Configure Text Chunking
Settings.chunk_size = 1024           # Tokens per chunk (typical: 512-1024)
Settings.chunk_overlap = 200         # 20% overlap helps preserve context

# Configure Node Parser
Settings.node_parser = SentenceSplitter(
    chunk_size=Settings.chunk_size,
    chunk_overlap=Settings.chunk_overlap,
)

print("‚úÖ Global Settings configured successfully!")
print(f"   LLM: {Settings.llm.model}")
print(f"   Embedding: {Settings.embed_model.model_name}")
print(f"   Chunk size: {Settings.chunk_size} tokens")
print(f"   Chunk overlap: {Settings.chunk_overlap} tokens")

‚úÖ Global Settings configured successfully!
   LLM: gpt-4o-mini
   Embedding: text-embedding-3-small
   Chunk size: 1024 tokens
   Chunk overlap: 200 tokens


### üéØ ML Engineering Note: Embedding Dimensions

**text-embedding-3-small** supports variable dimensions:

- **1536 (default)**: Best quality, slower, more storage
- **512**: 50% faster, 67% less storage, minimal quality loss
- **256**: 75% faster, 83% less storage, noticeable quality loss

**Trade-off**: For this tutorial, we use 1536 for best quality. In production, benchmark with your data!

---

## 5. Loading Your First Document

### Creating Sample Data

For this example, we'll create a simple text document about LlamaIndex. In practice, you'd load from PDFs, APIs, databases, etc.

In [4]:
# Create sample documents (in practice, load from files)
documents = [
    Document(
        text="""
        LlamaIndex is a data framework for large language models (LLMs). 
        It provides tools to ingest, structure, and access private or domain-specific data.
        LlamaIndex was created to solve the problem of connecting LLMs to external data sources.
        The framework supports various data sources including PDFs, databases, APIs, and web pages.
        """,
        metadata={"source": "intro", "category": "overview"}
    ),
    Document(
        text="""
        Vector embeddings are numerical representations of text that capture semantic meaning.
        In LlamaIndex, embeddings enable semantic search - finding relevant content based on meaning,
        not just keyword matching. The default embedding model is OpenAI's text-embedding-3-small,
        which produces 1536-dimensional vectors. Other models like all-MiniLM-L6-v2 produce 384 dimensions.
        """,
        metadata={"source": "embeddings", "category": "technical"}
    ),
    Document(
        text="""
        The VectorStoreIndex is the most common index type in LlamaIndex. It stores document embeddings
        in a vector database and performs similarity search during queries. When you query the index,
        it retrieves the most semantically similar chunks and passes them to the LLM as context.
        This is the foundation of Retrieval-Augmented Generation (RAG).
        """,
        metadata={"source": "vector_index", "category": "technical"}
    ),
]

print(f"‚úÖ Created {len(documents)} sample documents")
print(f"   Total characters: {sum(len(doc.text) for doc in documents)}")

‚úÖ Created 3 sample documents
   Total characters: 1169


### Understanding Document Objects

**Document** is the base container in LlamaIndex:

```python
Document(
    text="...",           # The actual content
    metadata={...},       # Custom metadata (source, date, author, etc.)
    doc_id="...",        # Optional: explicit ID
)
```

**Metadata** is crucial for:
- Filtering during retrieval
- Source attribution in responses
- Provenance tracking

---

## 6. Creating Your First VectorStoreIndex

### The Magic: from_documents()

This single method handles:
1. **Chunking**: Splits documents into nodes using `Settings.node_parser`
2. **Embedding**: Generates vectors using `Settings.embed_model`
3. **Indexing**: Stores in vector store (in-memory by default)

In [5]:
# Create index from documents
print("Creating VectorStoreIndex...")
print("This will:")
print("  1. Chunk documents into nodes")
print("  2. Generate embeddings for each node")
print("  3. Store in in-memory vector store\n")

index = VectorStoreIndex.from_documents(
    documents,
    show_progress=True,  # Display progress bar
)

print("\n‚úÖ Index created successfully!")

Creating VectorStoreIndex...
This will:
  1. Chunk documents into nodes
  2. Generate embeddings for each node
  3. Store in in-memory vector store



Parsing nodes:   0%|          | 0/3 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/3 [00:00<?, ?it/s]


‚úÖ Index created successfully!


### What Just Happened?

Behind the scenes:

1. **Document ‚Üí Nodes**: Each document was split into smaller chunks (nodes)
2. **Nodes ‚Üí Embeddings**: Each node was embedded using OpenAI's API
3. **Embeddings ‚Üí Index**: Vectors were stored in `SimpleVectorStore` (in-memory)

**Index Types** (we'll explore others in later notebooks):
- `VectorStoreIndex`: Semantic similarity search (most common)
- `SummaryIndex`: Sequential scanning (good for summaries)
- `TreeIndex`: Hierarchical structure
- `KeywordTableIndex`: Keyword extraction
- `KnowledgeGraphIndex`: Entity relationships

---

## 7. Basic Querying

### Creating a Query Engine

In [6]:
# Create query engine from index
query_engine = index.as_query_engine(
    similarity_top_k=2,  # Retrieve top 2 most similar chunks
    response_mode="compact",  # Compact response synthesis
)

print("‚úÖ Query engine created!")
print(f"   Top-K: {2}")
print(f"   Response mode: compact")

‚úÖ Query engine created!
   Top-K: 2
   Response mode: compact


### Execute Your First Query

In [7]:
# Query the index
query = "What is LlamaIndex used for?"
print(f"Query: {query}\n")

response = query_engine.query(query)

print("Response:")
print("-" * 80)
print(response)
print("-" * 80)

Query: What is LlamaIndex used for?

Response:
--------------------------------------------------------------------------------
LlamaIndex is used as a data framework for large language models (LLMs) to ingest, structure, and access private or domain-specific data. It connects LLMs to various external data sources such as PDFs, databases, APIs, and web pages.
--------------------------------------------------------------------------------


### Analyzing the Response

Let's examine what was retrieved and how it was used:

In [8]:
# Inspect source nodes
print(f"Number of source nodes: {len(response.source_nodes)}\n")

for i, node in enumerate(response.source_nodes, 1):
    print(f"Source Node {i}:")
    print(f"  Score: {node.score:.4f}")  # Similarity score (0-1)
    print(f"  Metadata: {node.metadata}")
    print(f"  Text (first 200 chars): {node.text[:200]}...")
    print()

Number of source nodes: 2

Source Node 1:
  Score: 0.7579
  Metadata: {'source': 'intro', 'category': 'overview'}
  Text (first 200 chars): LlamaIndex is a data framework for large language models (LLMs). 
        It provides tools to ingest, structure, and access private or domain-specific data.
        LlamaIndex was created to solve th...

Source Node 2:
  Score: 0.5726
  Metadata: {'source': 'vector_index', 'category': 'technical'}
  Text (first 200 chars): The VectorStoreIndex is the most common index type in LlamaIndex. It stores document embeddings
        in a vector database and performs similarity search during queries. When you query the index,
  ...



### üéØ ML Engineering Note: Similarity Scores

**Score Interpretation** (cosine similarity):
- **1.0**: Perfect match (identical vectors)
- **0.9-1.0**: Highly relevant
- **0.7-0.9**: Relevant
- **0.5-0.7**: Somewhat relevant
- **< 0.5**: Likely not relevant

**Why this matters**: You can set similarity thresholds to filter out low-quality retrievals.

---

## 8. Experimenting with Different Queries

### Query 1: Embedding-Specific Question

In [9]:
query1 = "How do embeddings work in LlamaIndex?"
response1 = query_engine.query(query1)

print(f"Query: {query1}\n")
print("Response:")
print(response1)
print("\nTop retrieved source:")
print(f"  Category: {response1.source_nodes[0].metadata.get('category')}")
print(f"  Score: {response1.source_nodes[0].score:.4f}")

Query: How do embeddings work in LlamaIndex?

Response:
Embeddings in LlamaIndex function as numerical representations of text that capture semantic meaning. They facilitate semantic search, allowing users to find relevant content based on meaning rather than relying solely on keyword matching. The framework utilizes a default embedding model, which generates 1536-dimensional vectors, while alternative models can produce vectors with fewer dimensions, such as 384. This capability enhances the connection between large language models and external data sources.

Top retrieved source:
  Category: technical
  Score: 0.7120


### Query 2: RAG-Specific Question

In [10]:
query2 = "What is Retrieval-Augmented Generation?"
response2 = query_engine.query(query2)

print(f"Query: {query2}\n")
print("Response:")
print(response2)

Query: What is Retrieval-Augmented Generation?

Response:
Retrieval-Augmented Generation (RAG) is a process that involves retrieving semantically similar chunks of information from a vector database and using them as context for generating responses. This approach enhances the generation of text by incorporating relevant information based on meaning rather than just keywords.


---

## 9. Understanding the Document ‚Üí Node ‚Üí Index Flow

### Inspecting Nodes Directly

In [11]:
# Parse documents into nodes manually to understand the flow
from llama_index.core.node_parser import SentenceSplitter

parser = SentenceSplitter(chunk_size=1024, chunk_overlap=200)
nodes = parser.get_nodes_from_documents(documents)

print(f"Number of nodes created: {len(nodes)}\n")

for i, node in enumerate(nodes, 1):
    print(f"Node {i}:")
    print(f"  ID: {node.node_id}")
    print(f"  Text length: {len(node.text)} characters")
    print(f"  Metadata: {node.metadata}")
    print(f"  Relationships: {node.relationships}")
    print()

Number of nodes created: 3

Node 1:
  ID: fc9acd1e-5da7-4216-9e0e-b48877c19812
  Text length: 354 characters
  Metadata: {'source': 'intro', 'category': 'overview'}
  Relationships: {<NodeRelationship.SOURCE: '1'>: RelatedNodeInfo(node_id='b565cf69-bc9f-49e5-a398-a8e3c1275d0a', node_type=<ObjectType.DOCUMENT: '4'>, metadata={'source': 'intro', 'category': 'overview'}, hash='6b67115e521a90d22245235f7a97b2808451a01b637bc146fc4c6b1b0126d392')}

Node 2:
  ID: 7d63512c-ce44-41da-b632-a3cd076a7e5f
  Text length: 395 characters
  Metadata: {'source': 'embeddings', 'category': 'technical'}
  Relationships: {<NodeRelationship.SOURCE: '1'>: RelatedNodeInfo(node_id='5b0c8e3e-de61-4805-a087-12679c0c674d', node_type=<ObjectType.DOCUMENT: '4'>, metadata={'source': 'embeddings', 'category': 'technical'}, hash='c74e00722500b0b510e568d1fd63750c229430501577b8d0a9295eaa48b99fb3')}

Node 3:
  ID: 5e1995cb-eac9-4076-a302-556d411caa18
  Text length: 366 characters
  Metadata: {'source': 'vector_index', 'cat

### Node Relationships

Nodes track relationships:
- **SOURCE**: Link to original document
- **PREVIOUS/NEXT**: Sequential order
- **PARENT/CHILD**: Hierarchical structure

This enables advanced retrieval (covered in Notebook 4).

---

## 10. Adjusting similarity_top_k

### Impact of Top-K on Responses

In [12]:
# Test with different top_k values
test_query = "Explain vector embeddings"

for k in [1, 2, 3]:
    engine = index.as_query_engine(similarity_top_k=k)
    response = engine.query(test_query)
    
    print(f"\nTop-K = {k}:")
    print(f"  Retrieved {len(response.source_nodes)} nodes")
    print(f"  Response length: {len(str(response))} characters")
    print(f"  First source score: {response.source_nodes[0].score:.4f}")


Top-K = 1:
  Retrieved 1 nodes
  Response length: 395 characters
  First source score: 0.5801

Top-K = 2:
  Retrieved 2 nodes
  Response length: 444 characters
  First source score: 0.5801

Top-K = 3:
  Retrieved 3 nodes
  Response length: 433 characters
  First source score: 0.5801


### üéØ ML Engineering Note: Choosing Top-K

**Trade-offs**:

- **Low K (1-2)**:
  - ‚úÖ Faster queries
  - ‚úÖ Lower LLM costs (fewer tokens)
  - ‚ùå May miss relevant context

- **Medium K (3-5)**:
  - ‚úÖ Balanced retrieval
  - ‚úÖ Good default for most use cases
  - ‚ö†Ô∏è Moderate cost/speed

- **High K (10+)**:
  - ‚úÖ Comprehensive context
  - ‚ùå Slower queries
  - ‚ùå Higher LLM costs
  - ‚ùå Risk of context dilution

**Best Practice**: Start with k=3, tune based on your data and query complexity.

---

## 11. Response Modes

LlamaIndex supports different **response synthesis** strategies:

In [13]:
# Test different response modes
modes = ["compact", "tree_summarize", "simple_summarize"]
test_query = "What are the key features of LlamaIndex?"

for mode in modes:
    engine = index.as_query_engine(
        similarity_top_k=2,
        response_mode=mode
    )
    response = engine.query(test_query)
    
    print(f"\nMode: {mode}")
    print(f"Response: {response}")
    print("-" * 80)


Mode: compact
Response: LlamaIndex is a data framework designed for large language models, offering tools to ingest, structure, and access private or domain-specific data. It connects LLMs to various external data sources, including PDFs, databases, APIs, and web pages. A key feature is the VectorStoreIndex, which stores document embeddings in a vector database and enables similarity searches during queries, facilitating the retrieval of semantically similar chunks to provide context for the LLM. This functionality underpins Retrieval-Augmented Generation (RAG).
--------------------------------------------------------------------------------

Mode: tree_summarize
Response: LlamaIndex is a data framework designed for large language models, offering tools to ingest, structure, and access private or domain-specific data. It connects LLMs to various external data sources, including PDFs, databases, APIs, and web pages. A key feature is the VectorStoreIndex, which stores document embedding

### Response Mode Comparison

| Mode | How It Works | Best For |
|------|-------------|----------|
| **compact** | Concatenates chunks, refines iteratively | Balanced quality/speed |
| **tree_summarize** | Builds summary tree hierarchically | Large context, comprehensive answers |
| **simple_summarize** | Concatenates all chunks, single LLM call | Simple queries, speed |
| **refine** | Iteratively refines answer with each chunk | High quality, slower |
| **accumulate** | Generates separate answer per chunk | Multiple perspectives |

**Default**: `compact` (good balance for most use cases)

---

## 12. Summary: What You Learned

### ‚úÖ Completed Learning Objectives

1. **LlamaIndex Architecture**: Understood the modular package structure and data flow
2. **Installation**: Set up latest modular packages (llama-index-core + integrations)
3. **Settings Configuration**: Configured global LLM, embeddings, and chunking parameters
4. **VectorStoreIndex**: Created your first index using `from_documents()`
5. **Query Execution**: Executed queries and analyzed responses with source attribution
6. **Document Flow**: Understood Document ‚Üí Node ‚Üí Embedding ‚Üí Index ‚Üí Query pipeline

### Key Concepts Mastered

- **Settings object** replaces deprecated ServiceContext
- **VectorStoreIndex** is the most common index type
- **similarity_top_k** controls retrieval breadth
- **Response modes** affect synthesis strategy
- **Source nodes** enable provenance tracking
- **Cosine similarity** scores indicate relevance (0-1)

### Next Steps

In **Notebook 2: Documents & Chunking**, you'll learn:
- Loading documents from multiple sources (PDFs, web, databases)
- Advanced chunking strategies (sentence, token, semantic)
- Metadata management and filtering
- Node relationships and hierarchies
- Optimizing chunk size for your use case

---

## üéØ Practice Exercises

1. **Experiment with chunk sizes**: Try `chunk_size=512` vs `chunk_size=2048`. How does it affect the number of nodes and query responses?

2. **Test with your own data**: Replace the sample documents with a PDF or text file from `data/sample_docs/`

3. **Tune top_k**: Query the same question with k=1, k=3, k=5. Compare response quality and source coverage.

4. **Embedding dimensions**: Change `OpenAIEmbedding(dimensions=512)` and observe speed differences.

5. **Response modes**: Test all response modes on a complex query. Which provides the best answer?

---

## Additional Resources

- **LlamaIndex Docs**: https://docs.llamaindex.ai/en/stable/
- **Settings API**: https://docs.llamaindex.ai/en/stable/module_guides/supporting_modules/settings/
- **VectorStoreIndex**: https://docs.llamaindex.ai/en/stable/module_guides/indexing/vector_store_index/
- **OpenAI Embeddings**: https://platform.openai.com/docs/guides/embeddings