# Quick RAG (Retrieval-Augmented Generation) Tutorial

This tutorial demonstrates how to set up and use a RAG system using Azure's OpenAI service. We'll walk through the process of configuring an OpenAI client, setting up a search index, and generating responses from a knowledge base.

Let's begin by loading the necessary libraries and configurations.

In [1]:
# Enable autoreload for automatically reloading modules before executing code
# This is helpful when we make changes to any modules and want those changes to reflect immediately.
%load_ext autoreload
%autoreload 2

# Import the required libraries
import os
from dotenv import load_dotenv

# Load environment variables from a .env file if it exists
load_dotenv()

# Append the path to our custom code directory so that we can import from it
import sys
sys.path.append('code')

# Import custom environment variables and utility functions from our project files
from env_vars import *
from openai import AzureOpenAI
from utils.cogsearch_rest import *
from utils.openai_helpers import *
from utils.general_helpers import *
from utils.llm_helpers import *


Success status: False. Reading file from full path: c:\Users\selhousseini\Documents\GitHub\code\prompts\generate_tags_prompt.txt


### Creating a Search Index for RAG

Now that we have our OpenAI clients set up, we need to create a search index. This index will store the vector embeddings and related metadata for the information we want to retrieve during the RAG process.

In [2]:
# Define the index name that will be used for storing the documents and vectors
index_name = KB_INDEX_NAME

# Set the vector dimensions according to the embedding model being used
vector_dimensions = 3072

# Define the schema for the search index, which includes fields for id, vector, tags, and text
fields = [
    {"name": "id", "type": "Edm.String", "key": True, "searchable": True, "filterable": True, "retrievable": True, "sortable": True},
    {"name": "vector", "type": "Collection(Edm.Single)", "searchable": True,"retrievable": True, "dimensions": vector_dimensions,"vectorSearchProfile": "my-vector-profile"},
    {"name": "tags", "type": "Edm.String","searchable": True, "filterable": False, "retrievable": True, "sortable": False, "facetable": False},
    {"name": "text", "type": "Edm.String","searchable": True, "filterable": False, "retrievable": True, "sortable": False, "facetable": False},
]

# Create an instance of the CogSearchRestAPI to interact with the search index
index = CogSearchRestAPI(index_name, fields=fields)


### Initializing and Populating the Search Index

Next, we check if the index already exists. If it doesn't, we will create the index and populate it with synthetic data (e.g., Tesla Model S facts).

In [3]:
# Check if the index exists. If it doesn't, create it and upload documents.
# This is a crucial step as we want to make sure our index is set up before attempting to upload data.

# Load synthetic data from the Tesla Model S text file if the index does not exist
if index.get_index() is None:
    print(f"No index {index_name} detected, creating one ... ")
    index.create_index()

# If the data file does not exist, we will process the text data to generate embeddings and metadata.
pkl_file = './data/tesla_facts.pkl'
data = './data/Tesla_Model_S.txt'

if not os.path.exists(pkl_file):
    text = read_file(data)
    facts = text.split('\n\n')

    metadatas = []

    for fact in facts: 
        metadata = {
            "text": fact, 
            "vector": get_embeddings(fact),
            "tags": generate_tag_list(fact),
            "id": generate_uuid_from_string(fact)
        }

        metadatas.append(metadata)
        
        # Print progress every 10 items processed
        if len(metadatas) % 10 == 0:
            print(f"Processed {len(metadatas)} items")

    # Save the processed metadata to a pickle file for future use
    save_to_pickle(metadatas, pkl_file)
else:
    # Load previously saved metadata if it exists
    metadatas = load_from_pickle(pkl_file)

# Upload the metadata and vectors to the Azure Cognitive Search index
upload_output = index.upload_documents(metadatas)    


Get Index Error:  RetryError[<Future at 0x21c38659a90 state=finished raised HTTPError>]
No index tesla_facts_index detected, creating one ... 
Success status: True. Reading file from full path: c:\Users\selhousseini\Documents\GitHub\quick_rag\data\Tesla_Model_S.txt
Processed 10 items
Processed 20 items
Processed 30 items
Processed 40 items
Processed 50 items
Processed 60 items
Processed 70 items


### Creating an Orchestrator and Generating Responses

Finally, we create an orchestrator that interacts with our search index and LLM to answer questions. The orchestrator will utilize the RAG approach to retrieve relevant information from the index and generate an appropriate response.

In [4]:
# Import the orchestrator module that ties together the RAG components
from orchestrator import Orchestrator

# Initialize the Orchestrator with the knowledge base index and topic of interest
o = Orchestrator(index_name, KB_TOPIC)

# Use the orchestrator to generate a response to a simple greeting
# The orchestrator uses the LLM to generate a response based on the system prompt and functions defined.
o.chat("hi how are you?")


Success status: True. Reading file from full path: c:\Users\selhousseini\Documents\GitHub\quick_rag\code\prompts\orchestrator_system_prompt.txt
Success status: True. Reading file from full path: c:\Users\selhousseini\Documents\GitHub\quick_rag\code\prompts\orchestrator_functions.json
Final Answer:
Hello! I'm just a virtual assistant, so I don't have feelings, but I'm here and ready to help you. How can I assist you today?


("Hello! I'm just a virtual assistant, so I don't have feelings, but I'm here and ready to help you. How can I assist you today?",
 [])

### Querying the Knowledge Base

Now let's see the RAG system in action by querying the knowledge base for information. We will ask the orchestrator a specific question about the Tesla Model S and observe the response generated.

In [5]:
# Ask the orchestrator about the safety features of the Tesla Model S
# The orchestrator will retrieve relevant information from the index and use it to generate a detailed response.
o.chat("what are the safety features of the Tesla Model S?")


Arguments:  {"search_phrase":"Tesla Model S Facts"}
Success status: True. Reading file from full path: c:\Users\selhousseini\Documents\GitHub\quick_rag\code\prompts\orchestrator_rag_prompt.txt
Final Answer:
Answer from the RAG Database:
The knowledge base does not include information about this query. However, based on my knowledge, the Tesla Model S is known for its advanced safety features. Here are some of the key safety features typically found in the Tesla Model S:

1. **Autopilot and Full Self-Driving Capability**: These systems provide advanced driver assistance features, including automatic lane-keeping, adaptive cruise control, and traffic-aware cruise control.


3. **Airbags**: Multiple airbags, including front, side, and curtain airbags, provide protection in the event of a collision.

4. **Electronic Stability Control**: This system helps maintain vehicle stability by detecting and reducing loss of traction.

5. **Anti-lock Braking System (ABS)**: Helps maintain steering co

 [])

### Conclusion

In this tutorial, we demonstrated how to set up a simple RAG system using Azure OpenAI. We created a search index, populated it with data, and then used an orchestrator to generate responses based on that data. This is a powerful approach for leveraging large language models to provide detailed and contextually relevant answers.