![image](https://raw.githubusercontent.com/IBM/watson-machine-learning-samples/master/cloud/notebooks/headers/watsonx-Prompt_Lab-Notebook.png)

# Build a Question-Answering System using RAG with watsonx, Chroma, and LangChain

## What You'll Learn

In this tutorial, you'll learn how to build a **Retrieval-Augmented Generation (RAG)** system that can answer questions based on a document. By the end of this notebook, you will be able to:

- ‚úÖ Understand what RAG is and why it's useful
- ‚úÖ Load and prepare documents for question-answering
- ‚úÖ Create a vector database using Chroma
- ‚úÖ Use watsonx.ai's Granite models with LangChain
- ‚úÖ Build an end-to-end RAG pipeline
- ‚úÖ Ask questions and get accurate answers from your documents

## What is RAG (Retrieval-Augmented Generation)?

RAG is a powerful technique that combines:
- **Retrieval**: Finding relevant information from your documents
- **Generation**: Using an AI model to create natural language answers

**Why use RAG?**
- Answer questions about your specific documents
- Keep answers grounded in factual content
- Avoid AI "hallucinations" (making up information)
- Work with documents that weren't in the AI's training data

## How RAG Works (Simple 3-Step Process)

1. **üìö Index**: Break documents into chunks and store them in a searchable database (one-time setup)
2. **üîç Retrieve**: When asked a question, find the most relevant chunks from the database
3. **üí¨ Generate**: Feed the relevant chunks to an AI model to generate a natural answer

## Prerequisites

Before starting, you'll need:
- An IBM Cloud account (free tier available)
- A Watson Machine Learning service instance
- Basic familiarity with Python

## Tutorial Contents

- [Step 1: Install Dependencies](#install)
- [Step 2: Set Up watsonx.ai Connection](#setup)
- [Step 3: Load Your Document](#data)
- [Step 4: Build the Knowledge Base](#build_base)
- [Step 5: Configure the AI Model](#models)
- [Step 6: Ask Questions and Get Answers](#predict)
- [Summary and Next Steps](#summary)

Let's get started! üöÄ

<a id="install"></a>
## Step 1: Install Dependencies

First, we need to install the necessary Python packages. Here's what each package does:

- **wget**: Downloads files from the internet
- **langchain**: Framework for building LLM applications
- **ibm_watsonx_ai**: IBM's watsonx.ai SDK
- **langchain_ibm**: Integration between LangChain and watsonx.ai
- **langchain_chroma**: Vector database for storing document embeddings
- **langchain_community**: Community integrations for LangChain
- **langchain_text_splitters**: Tools to split documents into manageable chunks

**Note**: The installation may take 1-2 minutes. You might see some dependency warnings - these are usually safe to ignore.

In [None]:
# Install all required packages
# Using | tail -n 1 to show only the last line of output (keeps the notebook clean)

!pip install wget | tail -n 1
!pip install -U "langchain>=0.3,<0.4" | tail -n 1
!pip install -U "ibm_watsonx_ai>=1.1.22" | tail -n 1
!pip install -U "langchain_ibm>=0.3,<0.4" | tail -n 1
!pip install -U "langchain_chroma>=0.1,<0.2" | tail -n 1
!pip install -U "langchain_community>=0.3,<0.4" | tail -n 1
!pip install -U "langchain_text_splitters>=0.3,<0.4" | tail -n 1

print("\n‚úÖ All packages installed successfully!")

In [None]:
# Import necessary libraries
import os
import getpass

print("‚úÖ Libraries imported successfully!")

<a id="setup"></a>
## Step 2: Set Up watsonx.ai Connection

To use watsonx.ai, you need two things:
1. **API Key**: Your personal access key
2. **Project ID**: The ID of your watsonx.ai project

### How to Get Your API Key

1. Go to [IBM Cloud](https://cloud.ibm.com)
2. Click on **Manage** ‚Üí **Access (IAM)**
3. Select **API keys** from the left menu
4. Click **Create an IBM Cloud API key**
5. Give it a name (e.g., "watsonx-notebook")
6. Copy the API key (you won't be able to see it again!)

üìñ [Detailed Instructions](https://cloud.ibm.com/docs/account?topic=account-userapikey&interface=ui)

### How to Get Your Project ID

1. Open [watsonx.ai](https://dataplatform.cloud.ibm.com/wx/home)
2. Open your project
3. Click on the **Manage** tab
4. Find **Project ID** in the **General** section
5. Copy the Project ID

**Security Tip**: Never hardcode your API key in the notebook. We'll use `getpass` to enter it securely.

In [None]:
from ibm_watsonx_ai import Credentials

# Set up your credentials
# The API key will be hidden as you type it
credentials = Credentials(
    url="https://us-south.ml.cloud.ibm.com",  # watsonx.ai API endpoint
    api_key=getpass.getpass("Please enter your watsonx.ai API key: "),
)

print("\n‚úÖ Credentials configured successfully!")

In [None]:
# Get your project ID
# This can come from environment variable or manual input
try:
    project_id = os.environ["PROJECT_ID"]
    print(f"‚úÖ Found project ID from environment: {project_id[:8]}...")
except KeyError:
    project_id = input("Please enter your watsonx.ai project ID: ")
    print("‚úÖ Project ID configured!")

In [None]:
from ibm_watsonx_ai import APIClient

# Create API client to interact with watsonx.ai
api_client = APIClient(credentials=credentials, project_id=project_id)

print("‚úÖ API client created successfully!")
print("You're now connected to watsonx.ai! üéâ")

<a id="data"></a>
## Step 3: Load Your Document

For this tutorial, we'll use the **State of the Union** address as our example document. This is a speech given by the U.S. President.

### What's Happening Here?

We'll download a text file from the internet and save it locally. Then we'll be able to ask questions about its content!

**You can replace this with your own documents later** (PDFs, text files, etc.)

In [None]:
import wget

# Define the document to download
filename = 'state_of_the_union.txt'
url = 'https://raw.github.com/IBM/watson-machine-learning-samples/master/cloud/data/foundation_models/state_of_the_union.txt'

# Download the file if it doesn't exist
if not os.path.isfile(filename):
    print(f"üì• Downloading document: {filename}...")
    wget.download(url, out=filename)
    print(f"\n‚úÖ Document downloaded successfully!")
else:
    print(f"‚úÖ Document already exists: {filename}")

# Check the file size
file_size = os.path.getsize(filename) / 1024  # Convert to KB
print(f"üìÑ File size: {file_size:.1f} KB")

<a id="build_base"></a>
## Step 4: Build the Knowledge Base

Now we'll create a **vector database** - a searchable knowledge base from our document.

### The Process (Broken Down):

1. **Load the document**: Read the text file
2. **Split into chunks**: Break the document into smaller pieces (1000 characters each)
   - Why? Large documents need to be split for better search results
   - Each chunk should contain complete thoughts/sentences
3. **Create embeddings**: Convert text chunks into numbers (vectors) that represent their meaning
4. **Store in Chroma**: Save these vectors in a database that can quickly find similar content

### What are Embeddings?

Embeddings are numerical representations of text that capture semantic meaning. Similar text has similar embeddings, allowing us to find relevant content even if the words are different!

In [None]:
from langchain_community.document_loaders import TextLoader
from langchain_text_splitters import CharacterTextSplitter

# Step 1: Load the document
print("üìñ Loading document...")
loader = TextLoader(filename)
documents = loader.load()
print(f"‚úÖ Loaded {len(documents)} document(s)")

# Step 2: Split the document into chunks
print("\n‚úÇÔ∏è Splitting document into chunks...")
text_splitter = CharacterTextSplitter(
    chunk_size=1000,        # Each chunk will be ~1000 characters
    chunk_overlap=0         # No overlap between chunks (you can add overlap for better context)
)
texts = text_splitter.split_documents(documents)

print(f"‚úÖ Created {len(texts)} text chunks")
print(f"\nüìù Example chunk (first 200 characters):")
print(f"{texts[0].page_content[:200]}...")

### Choose an Embedding Model

watsonx.ai provides several embedding models. Let's see what's available:

In [None]:
# Display available embedding models
print("üîç Available embedding models in watsonx.ai:\n")
available_models = api_client.foundation_models.EmbeddingModels.show()

for name, model_id in available_models.items():
    print(f"  ‚Ä¢ {name}")
    print(f"    Model ID: {model_id}")
    print()

### Create Embeddings and Build Vector Database

We'll use the **all-minilm-l6-v2** model - it's fast, efficient, and works well for English text.

This step will:
1. Create embeddings for all text chunks
2. Store them in Chroma (a vector database)
3. Make the data searchable

**Note**: You might see some ChromaDB telemetry warnings - these are harmless and can be ignored.

In [None]:
from langchain_ibm import WatsonxEmbeddings
from langchain_chroma import Chroma

# Step 3: Create embeddings using watsonx.ai
print("üßÆ Creating embeddings using watsonx.ai...")
embeddings = WatsonxEmbeddings(
    model_id="sentence-transformers/all-minilm-l6-v2",  # Fast and efficient model
    url=credentials["url"],
    apikey=credentials["apikey"],
    project_id=project_id
)

# Step 4: Create the vector database
print("üíæ Building vector database with Chroma...")
docsearch = Chroma.from_documents(texts, embeddings)

print(f"\n‚úÖ Knowledge base created successfully!")
print(f"üìä Total chunks indexed: {len(texts)}")
print("\nYour document is now searchable! üéâ")

### Test the Search (Optional)

Let's verify that our vector database can find relevant content:

In [None]:
# Test the retrieval system
test_query = "What did the president say about the economy?"
print(f"üîç Test search: '{test_query}'\n")

# Search for relevant chunks
relevant_docs = docsearch.similarity_search(test_query, k=2)  # Get top 2 matches

print(f"Found {len(relevant_docs)} relevant chunks:\n")
for i, doc in enumerate(relevant_docs, 1):
    print(f"Chunk {i}:")
    print(doc.page_content[:300] + "...\n")

<a id="models"></a>
## Step 5: Configure the AI Model

Now we'll set up the **Granite** language model from watsonx.ai. This model will read the retrieved chunks and generate natural language answers.

### About Granite Models

Granite is IBM's family of enterprise-ready AI models:
- Designed for business use cases
- Trained on high-quality data
- Optimized for accuracy and reliability

### Model Parameters Explained

- **DECODING_METHOD**: How the model generates text
  - `GREEDY`: Always picks the most likely next word (more predictable)
  - `SAMPLING`: Introduces randomness (more creative)
- **MIN_NEW_TOKENS**: Minimum words to generate (at least 1)
- **MAX_NEW_TOKENS**: Maximum words to generate (up to 100 - keeps answers concise)
- **STOP_SEQUENCES**: When to stop generating (e.g., end of text marker)

In [None]:
from ibm_watsonx_ai.foundation_models.utils.enums import ModelTypes
from ibm_watsonx_ai.metanames import GenTextParamsMetaNames as GenParams
from ibm_watsonx_ai.foundation_models.utils.enums import DecodingMethods

# Select the model to use
# Using Granite 3.2 8B Instruct - a powerful and efficient model
model_id = "ibm/granite-3-2-8b-instruct"
print(f"ü§ñ Selected model: {model_id}")

# Configure model parameters
parameters = {
    GenParams.DECODING_METHOD: DecodingMethods.GREEDY,  # Deterministic output
    GenParams.MIN_NEW_TOKENS: 1,                         # Generate at least 1 word
    GenParams.MAX_NEW_TOKENS: 100,                       # Generate at most 100 words
    GenParams.STOP_SEQUENCES: ["<|endoftext|>"]          # Stop at end of text marker
}

print("‚úÖ Model parameters configured!")
print("\nParameters:")
for key, value in parameters.items():
    print(f"  ‚Ä¢ {key}: {value}")

In [None]:
from langchain_ibm import WatsonxLLM

# Create the LangChain wrapper for watsonx.ai
print("üîß Initializing Granite model with LangChain...")

watsonx_granite = WatsonxLLM(
    model_id=model_id,
    url=credentials.get("url"),
    apikey=credentials.get("apikey"),
    project_id=project_id,
    params=parameters
)

print("‚úÖ Granite model ready to use!")
print("\nYou can now ask questions about your document! üí¨")

<a id="predict"></a>
## Step 6: Ask Questions and Get Answers

Now comes the exciting part! We'll create a **question-answering chain** that:

1. Takes your question
2. Searches the vector database for relevant chunks
3. Sends those chunks to the Granite model
4. Returns a natural language answer

### About RetrievalQA

RetrievalQA is a LangChain component that automates the RAG process:
- **llm**: The language model to use (our Granite model)
- **chain_type**: "stuff" means all retrieved chunks are sent together to the model
- **retriever**: The search component (our Chroma database)

In [None]:
from langchain.chains import RetrievalQA

# Create the question-answering chain
print("üîó Creating the RAG question-answering chain...")

qa = RetrievalQA.from_chain_type(
    llm=watsonx_granite,                    # Our Granite model
    chain_type="stuff",                     # Send all retrieved chunks to the model
    retriever=docsearch.as_retriever()      # Our vector database retriever
)

print("‚úÖ RAG system is ready!")
print("\nüéâ You can now ask questions about the State of the Union address!")

### Example Questions

Let's try asking some questions about the document:

In [None]:
# Question 1: About Ketanji Brown Jackson
query = "What did the president say about Ketanji Brown Jackson?"

print(f"‚ùì Question: {query}")
print("\nü§î Thinking...\n")

result = qa.invoke(query)

print("="*80)
print("üí° Answer:")
print("="*80)
print(result['result'])
print("="*80)

In [None]:
# Question 2: About the economy
query = "What economic policies were mentioned?"

print(f"‚ùì Question: {query}")
print("\nü§î Thinking...\n")

result = qa.invoke(query)

print("="*80)
print("üí° Answer:")
print("="*80)
print(result['result'])
print("="*80)

In [None]:
# Question 3: About Ukraine
query = "What was said about Ukraine?"

print(f"‚ùì Question: {query}")
print("\nü§î Thinking...\n")

result = qa.invoke(query)

print("="*80)
print("üí° Answer:")
print("="*80)
print(result['result'])
print("="*80)

### Try Your Own Questions!

Now it's your turn! Run the cell below and ask your own questions:

In [None]:
# Interactive Q&A
print("üí¨ Ask your own question about the State of the Union!")
print("   (Press Enter without typing anything to skip)\n")

user_query = input("Your question: ")

if user_query.strip():
    print(f"\n‚ùì Question: {user_query}")
    print("\nü§î Thinking...\n")
    
    result = qa.invoke(user_query)
    
    print("="*80)
    print("üí° Answer:")
    print("="*80)
    print(result['result'])
    print("="*80)
else:
    print("No question entered. Try the next cell!")

## üí° Tips for Better Results

### Ask Better Questions
- ‚úÖ **Good**: "What did the president say about climate change?"
- ‚ùå **Vague**: "Tell me about stuff"

### Be Specific
- ‚úÖ **Good**: "What economic policies were mentioned for small businesses?"
- ‚ùå **Too broad**: "What about the economy?"

### Understand the Limitations
- The answer is based ONLY on the document you provided
- If the information isn't in the document, the model may say "I don't know" or give a generic answer
- The model can only see the most relevant chunks (not the entire document at once)

## üîß Troubleshooting

### "No answer found" or irrelevant answers?
- Try rephrasing your question
- Make sure the information exists in the document
- Try increasing the chunk size or retrieval count

### ChromaDB warnings?
- These telemetry warnings are harmless and can be ignored
- They don't affect functionality

### Dependency conflicts?
- These usually don't cause issues in Colab
- If you encounter errors, try restarting the runtime and running all cells again

## üöÄ Next Steps & Customization

Now that you understand the basics, try these enhancements:

### 1. Use Your Own Documents
```python
# Replace the document with your own
filename = 'your_document.txt'
# Then run the same pipeline!
```

### 2. Adjust Chunk Size
```python
text_splitter = CharacterTextSplitter(
    chunk_size=500,      # Smaller chunks
    chunk_overlap=50     # Add overlap for better context
)
```

### 3. Get More Context
```python
# Retrieve more chunks per question
retriever = docsearch.as_retriever(search_kwargs={"k": 4})  # Get 4 chunks instead of default
```

### 4. Try Different Models
```python
# Use a different Granite model
model_id = "ibm/granite-3-8b-instruct"
```

### 5. Adjust Generation Parameters
```python
parameters = {
    GenParams.DECODING_METHOD: DecodingMethods.SAMPLE,  # More creative
    GenParams.MAX_NEW_TOKENS: 200,                       # Longer answers
    GenParams.TEMPERATURE: 0.7,                          # Add randomness
}
```

<a id="summary"></a>
## üéì Summary and Next Steps

### What You Learned

Congratulations! You've successfully built a complete RAG system! üéâ

You now know how to:

1. ‚úÖ **Set up watsonx.ai** - Connect to IBM's AI platform
2. ‚úÖ **Prepare documents** - Load and split text into searchable chunks
3. ‚úÖ **Create embeddings** - Convert text to vector representations
4. ‚úÖ **Build a vector database** - Store and search document embeddings with Chroma
5. ‚úÖ **Configure LLMs** - Set up Granite models with custom parameters
6. ‚úÖ **Build RAG pipelines** - Combine retrieval and generation with LangChain
7. ‚úÖ **Ask questions** - Get accurate answers grounded in your documents

### Key Concepts Review

- **RAG (Retrieval-Augmented Generation)**: Combines document search with AI generation
- **Embeddings**: Numerical representations of text that capture meaning
- **Vector Database**: Searchable storage for embeddings (we used Chroma)
- **LangChain**: Framework for building LLM applications
- **Granite Models**: IBM's enterprise AI models on watsonx.ai

### Real-World Applications

You can use this RAG system for:

- üìö **Document Q&A**: Answer questions about company documents, manuals, reports
- üíº **Customer Support**: Build chatbots that answer based on your knowledge base
- üîç **Research**: Query large collections of papers or articles
- üìñ **Education**: Create study assistants for textbooks and course materials
- ‚öñÔ∏è **Legal/Compliance**: Search and query legal documents and regulations

### Learn More

- üìò [watsonx.ai Documentation](https://ibm.github.io/watsonx-ai-python-sdk/)
- üìó [LangChain Documentation](https://python.langchain.com/docs/get_started/introduction)
- üìô [Chroma Documentation](https://docs.trychroma.com/)
- üìï [More watsonx Samples](https://ibm.github.io/watsonx-ai-python-sdk/samples.html)

### What's Next?

Try these advanced topics:

1. **Multi-document RAG**: Query across multiple documents
2. **PDF Support**: Load and process PDF files
3. **Web Scraping**: Build RAG from website content
4. **Conversational RAG**: Add chat memory for multi-turn conversations
5. **Advanced Retrieval**: Use re-ranking and hybrid search

### Share Your Results!

Built something cool with this tutorial? We'd love to hear about it!

---

**Happy coding! üöÄ**

---

### üìÑ License

Copyright ¬© 2023, 2024 IBM. This notebook and its source code are released under the terms of the MIT License.

---