# Automatically Building a RAG Pattern Using Watsonx.ai AutoAI RAG Chromadb


In the following Python notebook, I have built a **Retrieval-Augmented Generation (RAG) pipeline** to answer questions based on a collection of articles I’ve written about **IBM Db2’s machine learning features**.

To construct this pipeline, I used **AutoAI RAG**, an automated tool available as a service on **Watsonx.ai Cloud**. AutoAI RAG simplifies the process of building an **end-to-end RAG pipeline** by running experiments with multiple configurations to identify the best-performing RAG pattern.

## How AutoAI RAG Works

1. **Generates candidate patterns**  
   - Explores different **LLMs, embedding models, and retrieval strategies**.  
2. **Evaluates candidate patterns**  
   - Uses **sample question-answer pairs** to rank the candidates based on a predefined evaluation metric.  
3. **Automates complexity**  
   - Removes the need for manual design and optimization.  
4. **Deploys the best pattern**  
   - Once the optimal pattern is found, it is automatically deployed on **Watsonx.ai**, enabling seamless **question-answering** over a **private knowledge base**.

This automation makes it significantly easier to build and deploy **accurate RAG pipelines** without dealing with the underlying complexities.

Check it out!


# **Prerequisites**  

Before running this notebook, ensure that you have:  

### **1. Set Up a Python Environment**  
- Create a Python environment with the required dependencies.  
- This notebook was developed using **Python 3.12.3** within a **virtual environment (venv)**.  
- The complete list of installed Python packages and their versions is available in the **[`requirements.txt`](requirements.txt
)** file located in the same directory of this notebook.  

### **2. Provision Watsonx.ai Runtime and Create a Watsonx.ai Project**  
- You need an active **Watsonx.ai runtime** and a **Watsonx.ai project**.  
- Follow the instructions in the official documentation:  
  👉 [Coding an AutoAI RAG experiment with a Chroma vector store](https://dataplatform.cloud.ibm.com/docs/content/wsj/analyze-data/autoai-rag-code-chroma.html?context=wx&audience=wdp)  

### **3. Configure API Credentials**  
- In the same directory as this notebook, create a file named **`.env`** (the filename must be exactly `.env`).  
- Add the following keys and replace the placeholders with your actual credentials:  

```ini
WATSONX_PROJECT=REPLACE_WITH_YOUR_WATSONX_AI_PROJECT_ID
WATSONX_APIKEY=REPLACE_WITH_YOUR_WATSONX_AI_RUNTIME_API_KEY

# Import Dependencies

In [30]:
import os
from dotenv import load_dotenv
from ibm_watsonx_ai import APIClient, Credentials
from ibm_watsonx_ai.experiment import AutoAI
from ibm_watsonx_ai.helpers import DataConnection
import os
import json
from langchain_community.document_loaders import WebBaseLoader
from ibm_watsonx_ai.experiment import AutoAI
import pandas as pd
import sqlite3
from IPython.display import Markdown

# Create `watsonx.ai` APIClient

Load `watsonx.ai` API key and project id from `.env` file

In [31]:
load_dotenv(os.getcwd()+"/.env", override=True)

True

Define `watsonx.ai` credentials and create an instance of `watsonx.ai` APIClient

In [32]:
credentials = Credentials(
                url = "https://us-south.ml.cloud.ibm.com",
                api_key = os.getenv("WATSONX_APIKEY", "")
                )

client = APIClient(credentials)

Set the watsonx.ai `project id`

In [33]:
project_id = os.getenv("WATSONX_PROJECT", "")
client.set.default_project(project_id)

'SUCCESS'

# Create and Upload Training Data to IBM Cloud COS bucket

View the list of files that I previously uploaded to COS

In [34]:
client.data_assets.list()

Unnamed: 0,NAME,ASSET_TYPE,SIZE,ASSET_ID
0,ModelInference.txt,data_asset,20198,56345f72-6f01-41b3-8e8c-bbd31011c164
1,benchmarking_data_ModelInference.json,data_asset,458,cfe2836e-cebf-4c59-8c99-d042c7418021


Download the training content from the above URL and save it locally as `ModelInference.txt`.

In [35]:
url = "https://ibm.github.io/watsonx-ai-python-sdk/fm_model_inference.html"

docs = WebBaseLoader(url).load()

# Access the content of the loaded document
train_doc_content = docs[0].page_content
train_filename = "ModelInference.txt"

with open(train_filename, "w") as file:
        # Write the content of the web page to the file
        file.write(train_doc_content)

If `ModelInference.txt` wasnt's previously uploaded to COS, Upload it now.

In [36]:
wx_assets = client.data_assets.list()

# If an asset with the name document_file doesn't exist already, upload it to wx.ai
if train_filename not in wx_assets['NAME'].values:
    # Upload the training file
    document_asset_details = client.data_assets.create(name=train_filename, file_path=train_filename)
    print(f'Uploaded training file: {train_filename}')
    
    # Get the ID of the uploaded training file
    document_asset_id = client.data_assets.get_id(document_asset_details)
    
    # Define a connection to the training data
    train_data_references = [DataConnection(data_asset_id=document_asset_id)]
else:
    # Get the asset_id for the matching row
    document_asset_id = wx_assets.loc[wx_assets['NAME'] == train_filename, 'ASSET_ID'].iloc[0]
    print(f"Training file: {train_filename} was previously uploaded with asset ID: {document_asset_id}")
    
    # Define a connection to the previously uploaded training data
    train_data_references = [DataConnection(data_asset_id=document_asset_id)]

Training file: ModelInference.txt was previously uploaded with asset ID: 56345f72-6f01-41b3-8e8c-bbd31011c164


# Create and Upload Evaluation data to COS 
`AutoAI RAG` experiment will use this evaluation data to compute the accuracy of candidate `RAG Pipelines` during the exeriment. 

In [37]:


benchmarking_data_IBM_page_content = [
    {
        "question": "What is path to ModelInference class?",
        "correct_answer": "ibm_watsonx_ai.foundation_models.inference.ModelInference",
        "correct_answer_document_ids": [
            "ModelInference.txt"
        ]
    },
    {
        "question": "What is method for get model inferance details?",
        "correct_answer": "get_details()",
        "correct_answer_document_ids": [
            "ModelInference.txt"
        ]
    }
]

test_filename = "benchmarking_data_ModelInference.json"

# Overwrite the file regardless of its existence
with open(test_filename, "w") as json_file:
    json.dump(benchmarking_data_IBM_page_content, json_file, indent=4)
    print(f"File '{test_filename}' has been overwritten successfully.")

File 'benchmarking_data_ModelInference.json' has been overwritten successfully.


If an asset with the name test_filename doesn't exist already, upload it to wx.ai

In [38]:
if test_filename not in wx_assets['NAME'].values:
    # Upload the test file
    document_asset_details = client.data_assets.create(name=test_filename, file_path=test_filename)
    print(f'Uploaded test file: {test_filename}')
    
    # Get the ID of the uploaded test file
    document_asset_id = client.data_assets.get_id(document_asset_details)
    
    # Define a connection to the test data
    test_data_references = [DataConnection(data_asset_id=document_asset_id)]
else:
    # Get the asset_id for the matching row
    document_asset_id = wx_assets.loc[wx_assets['NAME'] == test_filename, 'ASSET_ID'].iloc[0]
    print(f"Test file: {test_filename} was previously uploaded with asset ID: {document_asset_id}")
    
    # Define a connection to the previously uploaded test data
    test_data_references = [DataConnection(data_asset_id=document_asset_id)]

Test file: benchmarking_data_ModelInference.json was previously uploaded with asset ID: cfe2836e-cebf-4c59-8c99-d042c7418021


# Setup and Run `AutoAI RAG` Experiment

In [39]:
experiment = AutoAI(credentials, project_id=project_id)
rag_optimizer_name = 'DEMO - AutoAI RAG ibm-watsonx-ai SDK documentation'

Fetch the list of Past `AutoAI RAG` experiment runs

In [40]:
past_experiments = experiment.runs(filter='rag_optimizer').list()
past_experiments

Unnamed: 0,timestamp,run_id,state,auto_pipeline_optimizer name
0,2025-01-16T16:17:16.389Z,596ddb0f-1bbc-492b-bec5-0a0b4f7bc599,completed,DEMO - AutoAI RAG ibm-watsonx-ai SDK documenta...
1,2025-01-14T19:46:53.187Z,98804827-00b4-4a08-af0a-38912903bab1,completed,DEMO - AutoAI RAG ibm-watsonx-ai SDK documenta...
2,2025-01-14T17:27:17.666Z,a23c7ab2-903b-4cf8-92f6-a20986d74099,failed,DEMO - AutoAI RAG ibm-watsonx-ai SDK documenta...
3,2025-01-14T16:18:52.696Z,ab3277bb-fe33-417d-b8ea-3b70c0744f2c,failed,DEMO - AutoAI RAG ibm-watsonx-ai SDK documenta...
4,2025-01-14T15:59:55.175Z,4145dd5f-94da-425c-85b6-b00e24971cd5,failed,DEMO - AutoAI RAG ibm-watsonx-ai SDK documenta...
5,2025-01-14T15:50:56.139Z,85a2cab9-e863-412b-895b-daaebd2fd29d,failed,DEMO - AutoAI RAG ibm-watsonx-ai SDK documenta...
6,2025-01-14T14:45:58.774Z,2ec7c054-970d-48e8-bdce-d5ecb91dbbae,failed,DEMO - AutoAI RAG ibm-watsonx-ai SDK documenta...
7,2025-01-10T21:26:42.846Z,46f9ac00-3896-419f-97db-6c06b6fd1965,failed,DEMO - AutoAI RAG ibm-watsonx-ai SDK documenta...


Check if the list of past `AutoAI RAG` experiment runs includes a successful run of the `DEMO - AutoAI RAG ibm-watsonx-ai SDK documentation` rag optimizer. 
- If a successful run is found, load this run from history. 
- If no successful run of the given rag optimizer is found, then start a new run of this rag optimizer.

In [41]:

# Ensure the timestamp column is in datetime format
past_experiments['timestamp'] = pd.to_datetime(past_experiments['timestamp'])

# Filter for rows matching the given optimizer name and not in failed state
filtered_experiments = past_experiments[
    (past_experiments['auto_pipeline_optimizer name'] == rag_optimizer_name) &
    (past_experiments['state'] != 'failed')
]

if filtered_experiments.empty:
    print(f"No runs found for optimizer '{rag_optimizer_name}' in a non-failed state.")
    
    print(f'create and run a new RAG Optimizer: {rag_optimizer_name}')
    # create a new experiment
    rag_optimizer = experiment.rag_optimizer(
        name=rag_optimizer_name,
        description="AutoAI RAG Optimizer for Db2 AI Blogs",
        max_number_of_rag_patterns=4,
        optimization_metrics=[AutoAI.RAGMetrics.ANSWER_CORRECTNESS]
    )
    
    rag_optimizer.run(
        input_data_references=train_data_references,
        test_data_references=test_data_references,
        background_mode=False
    )
    
    print(f'status of RAG Optimizer: {rag_optimizer_name} is {rag_optimizer.get_run_status()}')
else:
    # Sort the filtered dataframe by timestamp in descending order
    sorted_experiments = filtered_experiments.sort_values(by='timestamp', ascending=False)

    # Get the run_id of the most recent run
    most_recent_run_id = sorted_experiments.iloc[0]['run_id']
        
     # get the previously completed experiment with the same name as experiment_name
    print(f'Retrieving previously created RAG Optimizer: {rag_optimizer_name}, runid: {most_recent_run_id}')
    
     # Get the historical rag_optimizer instance and training details
    rag_optimizer = experiment.runs.get_rag_optimizer(most_recent_run_id)

summary = rag_optimizer.summary(scoring="faithfulness")

Retrieving previously created RAG Optimizer: DEMO - AutoAI RAG ibm-watsonx-ai SDK documentation, runid: 596ddb0f-1bbc-492b-bec5-0a0b4f7bc599


Print the list of `RAG patterns` from the successful run of rag optimizer: `DEMO - AutoAI RAG ibm-watsonx-ai SDK documentation`

In [42]:
summary

Unnamed: 0_level_0,mean_faithfulness,mean_answer_correctness,mean_context_correctness,chunking.method,chunking.chunk_size,chunking.chunk_overlap,embeddings.model_id,vector_store.distance_metric,retrieval.method,retrieval.number_of_chunks,generation.model_id
Pattern_Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
Pattern2,0.8654,1.0,1.0,recursive,1024,256,intfloat/multilingual-e5-large,euclidean,window,5,meta-llama/llama-3-1-70b-instruct
Pattern5,0.8281,1.0,1.0,recursive,1024,512,intfloat/multilingual-e5-large,cosine,window,3,meta-llama/llama-3-1-8b-instruct
Pattern4,0.8182,1.0,1.0,recursive,1024,256,intfloat/multilingual-e5-large,cosine,window,5,meta-llama/llama-3-70b-instruct
Pattern1,0.5216,0.5,1.0,recursive,512,256,ibm/slate-125m-english-rtrvr,euclidean,window,5,meta-llama/llama-3-70b-instruct
Pattern3,0.1837,0.5,1.0,recursive,1024,256,intfloat/multilingual-e5-large,euclidean,simple,5,meta-llama/llama-3-1-70b-instruct


# Print the details of the best `RAG` pattern

In [43]:
best_pattern_name = summary.index.values[0]
print('Best pattern is:', best_pattern_name)

print(summary.loc[best_pattern_name])

Best pattern is: Pattern2
mean_faithfulness                                          0.8654
mean_answer_correctness                                       1.0
mean_context_correctness                                      1.0
chunking.method                                         recursive
chunking.chunk_size                                          1024
chunking.chunk_overlap                                        256
embeddings.model_id                intfloat/multilingual-e5-large
vector_store.distance_metric                            euclidean
retrieval.method                                           window
retrieval.number_of_chunks                                      5
generation.model_id             meta-llama/llama-3-1-70b-instruct
Name: Pattern2, dtype: object


get the best pattern from the optimizer

In [45]:
# these three lines swap the stdlib sqlite3 lib with the pysqlite3 package
__import__('pysqlite3')
import sys
sys.modules['sqlite3'] = sys.modules.pop('pysqlite3')

In [46]:
import sqlite3
print(sqlite3.sqlite_version)

3.46.1


In [47]:
best_pattern = rag_optimizer.get_pattern(pattern_name=best_pattern_name)



3.34.1


# Create a Vector Index Using the best pattern

Download My 3 articles, chunk and vectorize them using the best `RAG` pattern

In [None]:
urls = [
    "https://community.ibm.com/community/user/datamanagement/blogs/shaikh-quader/2024/05/07/building-an-in-db-linear-regression-model-with-ibm",
    "https://www.ibm.com/blog/how-to-build-a-decision-tree-model-in-ibm-db2/",
    "https://community.ibm.com/community/user/datamanagement/blogs/shaikh-quader/2024/05/27/db2ai-pyudf"
]
docs_list = WebBaseLoader(urls).load()
doc_splits = best_pattern.chunker.split_documents(docs_list)

Print the number of chunks created from the above 3 articles

In [None]:
len(doc_splits)

Create an in-memory vector index, using the above chunks, with `Chromadb` and the best rag pattern

In [None]:
best_pattern.indexing_function(doc_splits)

# Ask Questions from Indexed Articles Using the Best RAG Pattern

`First Question`: How to generate summary statistics for a Db2 table?

In [None]:
questions = ["How to generate summary statistics for a Db2 table?"]

payload = {
    client.deployments.ScoringMetaNames.INPUT_DATA: [
        {
            "values": questions,
            "access_token": client.service_instance._get_token()
        }
    ]
}

score_response = best_pattern.inference_function()(payload)
Markdown(score_response["predictions"][0]["values"][0][0])

`Second Question`: `How can one inference a Python model with Db2?`

In [None]:
questions = ["How can one inference a Python model with Db2?"]

payload = {
    client.deployments.ScoringMetaNames.INPUT_DATA: [
        {
            "values": questions,
            "access_token": client.service_instance._get_token()
        }
    ]
}

score_response = best_pattern.inference_function()(payload)
Markdown(score_response["predictions"][0]["values"][0][0])

`Third Question`: `How to integrate a Python model with Db2?`

In [None]:
questions = ["How to integrate a Python model with Db2?"]

payload = {
    client.deployments.ScoringMetaNames.INPUT_DATA: [
        {
            "values": questions,
            "access_token": client.service_instance._get_token()
        }
    ]
}

score_response = best_pattern.inference_function()(payload)
Markdown(score_response["predictions"][0]["values"][0][0])

`Fourth Question`: `What is Python UDF?`

In [None]:
questions = ["What is Python UDF?"]

payload = {
    client.deployments.ScoringMetaNames.INPUT_DATA: [
        {
            "values": questions,
            "access_token": client.service_instance._get_token()
        }
    ]
}

score_response = best_pattern.inference_function()(payload)
Markdown(score_response["predictions"][0]["values"][0][0])

# Learn More

1. [Automating a RAG pattern with AutoAI (watxonx doc)](https://dataplatform.cloud.ibm.com/docs/content/wsj/analyze-data/autoai-programming-rag.html?context=wx)
2. [AutoAI RAG Sample Notebooks (Github)](https://github.com/IBM/watson-machine-learning-samples/tree/master/cloud/notebooks/python_sdk/experiments/autoai_rag)
3. [AutoAI Python SDK](https://ibm.github.io/watsonx-ai-python-sdk/autoai.html)