# Retrieval Augmented Generation (RAG) with Azure AI Search and OpenAI

This code demonstrates how to work with RAG to give more context to the LLM/SLM models to get a more accurate answer. The code uses Azure AI Search to index the documents and Azure OpenAI's embedding model to generate embeddings/vectors for the documents.

+ Create an index schema
+ Load the sample data from a local folder
+ Embed the documents in-memory using Azure OpenAI's text-embedding-ada-002 model
+ Index the vector and non-vector fields on Azure AI Search
+ Run a series of vector and hybrid queries, including metadata filtering and hybrid (text + vectors) search. 

The code uses Azure OpenAI to generate embeddings for title and content fields. You'll need access to Azure OpenAI to run this demo.

## Create the resources

What we just did :D 

## Install python packages

In [2]:
%pip install python-dotenv
%pip install tiktoken
%pip install azure-search-documents
%pip install azure-identity
%pip install openai

Collecting python-dotenv
  Downloading python_dotenv-1.0.1-py3-none-any.whl (19 kB)
Installing collected packages: python-dotenv
Successfully installed python-dotenv-1.0.1

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.0.1[0m[39;49m -> [0m[32;49m24.3.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.
Collecting tiktoken
  Downloading tiktoken-0.8.0-cp39-cp39-macosx_11_0_arm64.whl (983 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m983.8/983.8 kB[0m [31m2.8 MB/s[0m eta [36m0:00:00[0ma [36m0:00:01[0m
[?25hCollecting regex>=2022.1.18
  Downloading regex-2024.9.11-cp39-cp39-macosx_11_0_arm64.whl (284 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m284.6/284.6 kB[0m [31m6.8 MB/s[0m eta [36m0:00:00[0ma [36m0:00:01[0m
Installing collected packages: regex

## Connect to the Azure AI Search and OpenAI

Load environment variables from the `.env` file

In [4]:
import os
import re
from openai import AzureOpenAI
from dotenv import load_dotenv
from dotenv import dotenv_values

if os.path.exists(".env"):
    load_dotenv(override=True)
    config = dotenv_values(".env")

azure_openai_endpoint = os.getenv("AZURE_OPENAI_ENDPOINT")
azure_openai_api_key = os.getenv("AZURE_OPENAI_API_KEY")
azure_openai_chat_completions_deployment_name = os.getenv("AZURE_OPENAI_CHAT_COMPLETIONS_DEPLOYMENT_NAME")

azure_openai_embedding_model = os.getenv("AZURE_OPENAI_EMBEDDING_MODEL")
embedding_vector_dimensions = os.getenv("EMBEDDING_VECTOR_DIMENSIONS")

azure_search_service_endpoint = os.getenv("AZURE_SEARCH_SERVICE_ENDPOINT")
azure_search_service_admin_key = os.getenv("AZURE_SEARCH_SERVICE_ADMIN_KEY")
search_index_name = os.getenv("SEARCH_INDEX_NAME")

openai_client = AzureOpenAI(
    azure_endpoint=azure_openai_endpoint,
    api_key=azure_openai_api_key,
    api_version="2024-06-01"
)

# Test connection to OpenAI ChatGPT
completion = openai_client.chat.completions.create(
    model=azure_openai_chat_completions_deployment_name,
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Who are you ?"}
    ])
print(completion.to_json())

{
  "id": "chatcmpl-AOF2oi6tH4I5NlWDdQpbkCYSWLfzq",
  "choices": [
    {
      "finish_reason": "stop",
      "index": 0,
      "logprobs": null,
      "message": {
        "content": "I am an AI developed by OpenAI called Assistant. I'm here to help answer your questions and provide information on a wide range of topics. How can I assist you today?",
        "role": "assistant"
      },
      "content_filter_results": {
        "hate": {
          "filtered": false,
          "severity": "safe"
        },
        "self_harm": {
          "filtered": false,
          "severity": "safe"
        },
        "sexual": {
          "filtered": false,
          "severity": "safe"
        },
        "violence": {
          "filtered": false,
          "severity": "safe"
        }
      }
    }
  ],
  "created": 1730341066,
  "model": "gpt-4-turbo-2024-04-09",
  "object": "chat.completion",
  "system_fingerprint": "fp_5b26d85e12",
  "usage": {
    "completion_tokens": 36,
    "prompt_tokens": 2

## Count the number of tokens in a text

Like LLM models, Embedding models defines a `max input`. It is defined in number of `tokens`. The `max_input` for `text-embedding-3-large` is 8191 tokens. So we need to split the text into chunks of 8191 tokens or less. For that, you need to get the number of tokens in a text string.

In [None]:
import tiktoken

def num_tokens_from_string(string: str) -> int:
    encoding = tiktoken.get_encoding(encoding_name="cl100k_base")
    num_tokens = len(encoding.encode(string, disallowed_special=()))
    return num_tokens

# Test the function
num_tokens_from_string("MangoChango is great!")

7

The OpenAI embedding model `text-embedding-3-large` has a limit of `8191` tokens per request.
Before sending the files to the model, we need to split the text into chunks of less than `8191` tokens.
Count the number of tokens in the sample files and show the files with more than `8191` tokens.

In [8]:
input_directory = './data/azure-ai-docs/'
i=0

for filename in os.listdir(input_directory):
    if filename.endswith('.md'):
        with open(os.path.join(input_directory, filename), 'r', encoding='utf-8') as file:
            content = file.read()
            tokens = num_tokens_from_string(content)
            if tokens > 8191:
                print(f'File {filename} has {tokens} tokens which is more than 8191 (max) tokens')

File fine-tuning-python.md has 8869 tokens which is more than 8191 (max) tokens
File content-filter.md has 11481 tokens which is more than 8191 (max) tokens
File whats-new.md has 9884 tokens which is more than 8191 (max) tokens
File fine-tune copy.md has 10788 tokens which is more than 8191 (max) tokens
File use-your-data.md has 12120 tokens which is more than 8191 (max) tokens
File assistant.md has 8817 tokens which is more than 8191 (max) tokens


## Transforming/cleaning the documents

We will need to remove all special characters and markdown syntax from the files. The function `clean_markdown_content()` will help us with this.

In [9]:
def clean_markdown_content(content):
    # Remove links
    link_pattern = r'\[([^\[]+)\]\(([^\)]+)\)'
    content = re.sub(link_pattern, r'\1', content)

    # Remove images
    image_pattern = r'\!\[([^\[]*)\]\(([^\)]+)\)'
    content = re.sub(image_pattern, '', content)

    content = content.replace('**', '')
    content = content.replace('\n', '')

    return content

## Get the vector embedding for an input text

In [None]:
def get_embeddings_vector(text):

    response = openai_client.embeddings.create(
        input=text,
        model=azure_openai_embedding_model,
    )

    embedding = response.data[0].embedding

    return embedding

vector = get_embeddings_vector("MangoChango techThursday sample text")
print(vector)

[-0.02353655, 0.014757044, -0.019707192, 0.02278936, -0.0043220394, 0.00936792, -0.050360747, 0.04808181, -0.002552128, 0.03775188, 0.05555373, -0.023443151, 0.016410206, -0.018829241, -0.015831131, -0.006883507, -0.02917785, 0.00074427336, 0.006122305, -0.0102738915, -0.008947625, 0.00065379305, -0.05002451, 0.04897844, -0.014355428, -0.0035211428, -0.010675507, -0.0042846794, 0.016438225, 0.0020220885, 0.013505497, 0.024171663, 0.011665536, -0.023218993, 0.023723349, 0.008228453, 0.025385851, 0.05383519, 0.03877927, -0.027963664, -0.0055245515, 0.0028229852, -0.023648629, 0.002930394, -0.023312394, 0.010955704, 0.0015317438, -0.016438225, -0.01603661, -0.021407053, 0.034651034, 0.023872787, -0.002001074, -0.0155789545, -0.0036215466, 0.018063368, 0.00035141376, -0.0027716155, 0.021874048, -0.006976906, -0.009797556, -0.0066733593, 0.011114482, 0.016298126, -0.02361127, -0.0031662264, 0.003030798, 0.021164216, 0.0049734972, 0.044009615, -0.03678053, 0.03100847, -0.009335231, -0.005239

## Create file chunks

Split the markdown files in folder `./data/azure-ai-docs` into chunks.

In [12]:
import uuid
import re
import json
import os

input_directory = './data/azure-ai-docs/'
output_directory = './data/chunks/'

if not os.path.exists(output_directory):
    os.makedirs(output_directory)

chunk_index=0

for filename in os.listdir(input_directory):
    
    if filename.endswith('.md'):
        
        with open(os.path.join(input_directory, filename), 'r', encoding='utf-8') as file:
            print(filename)
            
            content = file.read()
            
            # break if content doesn't contain title, description, ms.date and '##'
            if 'title:' not in content or 'description:' not in content or 'ms.date:' not in content or '##' not in content:
                print(f'File {filename} does not contain title, description, ms.date or ##')
                continue

            # Extract the title, description, and date
            page_title = re.search(r'title: (.*)', content).group(1).replace('"', '')
            page_description = re.search(r'description: (.*)', content).group(1)
            page_date = re.search(r'ms.date: (.*)', content).group(1)
            
            # Split the content into chunks based on '##'
            chunks = content.split('\n## ')[1:]  # Skip the first chunk as it contains the title, description, and date
            
            # Add the chunks to the list along with the title, description, and date
            for chunk in chunks:
                chunk_index=chunk_index + 1
                chunk_content = clean_markdown_content(chunk.strip())
                
                if (num_tokens_from_string(chunk_content) > 8191):
                    print(f'Chunk {chunk_index} in file {filename} has more than 8191 tokens')
                    break

                vector = get_embeddings_vector(chunk_content)
                
                chunk = {
                    "id": str(uuid.uuid4()),
                    'page_title': page_title,
                    'page_description': page_description,
                    'page_date': page_date,
                    'chunk_title': chunk.split('\n')[0],  # The first line after '##' is the title of the chunk
                    'chunk_content': chunk_content, 
                    'vector': vector
                }
                
                chunk_file_name = f'chunk_{chunk_index}_{page_title}.json'.replace('?', '').replace(':', '').replace("'", '').replace('|', '').replace('/', '').replace('\\', '')

                # write chunk into JSON file into output directory
                with open(f'{output_directory}/{chunk_file_name}', 'w') as f:
                    json.dump(chunk, f)

fine-tuning-python.md
chatgpt.md
File chatgpt.md does not contain title, description, ms.date or ##
prompt-completion.md
chat-markup-language.md
assistants-studio.md
advanced-prompt-engineering.md
File advanced-prompt-engineering.md does not contain title, description, ms.date or ##
whisper-rest.md
File whisper-rest.md does not contain title, description, ms.date or ##
use-your-data-javascript.md
File use-your-data-javascript.md does not contain title, description, ms.date or ##
deploy-web-app.md
File deploy-web-app.md does not contain title, description, ms.date or ##
work-with-code.md
spring.md
File spring.md does not contain title, description, ms.date or ##
use-your-data-spring-common-variables.md
File use-your-data-spring-common-variables.md does not contain title, description, ms.date or ##
red-teaming.md
customizing-llms.md
use-your-data-dotnet.md
File use-your-data-dotnet.md does not contain title, description, ms.date or ##
code-interpreter.md
reference.md
assistants-reference

By default, the length of the embedding vector will be `1536` for `text-embedding-3-small` or `3072` for `text-embedding-3-large`. You can reduce the dimensions of the embedding by passing in the dimensions parameter without the embedding losing its concept-representing properties.

## Create Index in Azure AI Search.

In [13]:
from azure.core.credentials import AzureKeyCredential
from azure.search.documents.indexes import SearchIndexClient
from azure.search.documents.indexes.models import (
    ComplexField,
    CorsOptions,
    SearchIndex,
    SearchField,
    ScoringProfile,
    SearchFieldDataType,
    SimpleField,
    SearchableField,
    VectorSearch,
    HnswAlgorithmConfiguration,
    VectorSearchProfile,
    SemanticConfiguration,
    SemanticPrioritizedFields,
    SemanticSearch,
    SemanticField
)

credential = AzureKeyCredential(azure_search_service_admin_key)

search_index_client = SearchIndexClient(
    endpoint=azure_search_service_endpoint, 
    index_name=search_index_name, 
    credential=credential
)

# create search index
fields = [
    SimpleField(
        name="id",
        type=SearchFieldDataType.String,
        key=True,
        sortable=True,
        filterable=True,
        facetable=True,
    ),
    SearchableField(name="page_title", type=SearchFieldDataType.String),
    SearchableField(name="page_description", type=SearchFieldDataType.String),
    SearchableField(name="page_date", type=SearchFieldDataType.String),
    SearchableField(name="chunk_title", type=SearchFieldDataType.String),
    SearchableField(name="chunk_content", type=SearchFieldDataType.String),
    SearchField(name="vector", type=SearchFieldDataType.Collection(SearchFieldDataType.Single),
        searchable=True,
        vector_search_dimensions=3072, #1536,
        vector_search_profile_name="myHnswProfile",
    ),
]

# Configure the vector search configuration  
vector_search = VectorSearch(
    algorithms=[
        HnswAlgorithmConfiguration(
            name="myHnsw"
        )
    ],
    profiles=[
        VectorSearchProfile(
            name="myHnswProfile",
            algorithm_configuration_name="myHnsw",
        )
    ]
)

semantic_config = SemanticConfiguration(
    name="my-semantic-config",
    prioritized_fields=SemanticPrioritizedFields(
        title_field=SemanticField(field_name="page_title"),
        # keywords_fields=[SemanticField(field_name="category")],
        content_fields=[SemanticField(field_name="chunk_content")]
    )
)

# Create the semantic settings with the configuration
semantic_search = SemanticSearch(configurations=[semantic_config])
# Create the search index with the semantic settings
search_index = SearchIndex(name=search_index_name, fields=fields,
                    vector_search=vector_search, semantic_search=semantic_search)
result = search_index_client.create_or_update_index(search_index)
print(f' {result.name} created')

 index-doc created


In case you ned to delete an index, you can use the following code.

In [None]:
# delete index
search_index_client.delete_index(search_index_name)

## Upload chunks/documents to Azure AI Search

In [14]:
import uuid
from azure.search.documents import SearchClient

search_client = SearchClient(endpoint=azure_search_service_endpoint, index_name=search_index_name, credential=credential)

# for each json file in ./data/chunks/ folder, load the json document and upload it to the search index

for filename in os.listdir(output_directory):
    if filename.endswith('.json'):
        with open(os.path.join(output_directory, filename), 'r') as file:
            document = json.load(file)

            result = search_client.upload_documents(documents=document)
            print(f"Upload of {filename} succeeded: { result[0].succeeded }")

Upload of chunk_497_Quickstart Use Azure OpenAI Service with the Java SDK.json succeeded: True
Upload of chunk_718_Text to speech with Azure OpenAI Service.json succeeded: True
Upload of chunk_561_How to migrate to OpenAI JavaScript v4.x.json succeeded: True
Upload of chunk_638_How to work with prompt engineering and the Chat Completion API.json succeeded: True
Upload of chunk_573_Use the Azure OpenAI web app.json succeeded: True
Upload of chunk_192_Azure OpenAI prompt transformation concepts.json succeeded: True
Upload of chunk_254_Quickstart Use the OpenAI Service image generation Go SDK.json succeeded: True
Upload of chunk_166_Quickstart - getting started with Azure OpenAI assistants (preview) in AI Studio.json succeeded: True
Upload of chunk_373_Quickstart Use Azure OpenAI Service with the Java SDK.json succeeded: True
Upload of chunk_378_Quickstart Use Azure OpenAI Service with the JavaScript SDK and the completions API.json succeeded: True
Upload of chunk_773_How to use Azure Ope

## Perform a vector similarity search

This example shows a pure vector search using the vectorizable text query, all you need to do is pass in text and your vectorizer will handle the query vectorization.

In [15]:
from azure.search.documents.models import VectorizedQuery

# Pure Vector Search
query = "rag"  

embedding = get_embeddings_vector(query)

vector_query = VectorizedQuery(vector=embedding, k_nearest_neighbors=3, fields="vector")
  
results = search_client.search(  
    search_text=None,  
    vector_queries= [vector_query],
    select=["page_title", "page_date", "chunk_title", "chunk_content"],
)  
  
for result in results:
    print(f"Page Date: {result['page_date']}")  
    print(f"Page Title: {result['page_title']}")  
    print(f"Chunk Title: {result['chunk_title']}")  
    print(f"Chunk Content: {result['chunk_content']}")
    print(f"Score: {result['@search.score']}")  


Page Date: 03/26/2024
Page Title: Azure OpenAI Service getting started with customizing a large language model (LLM)
Chunk Title: RAG (Retrieval Augmented Generation)
Chunk Content: RAG (Retrieval Augmented Generation)### Definition RAG (Retrieval Augmented Generation) is a method that integrates external data into a Large Language Model prompt to generate relevant responses. This approach is particularly beneficial when using a large corpus of unstructured text based on different topics. It allows for answers to be grounded in the organization’s knowledge base (KB), providing a more tailored and accurate response.RAG is also advantageous when answering questions based on an organization’s private data or when the public data that the model was trained on might have become outdated. This helps ensure that the responses are always up-to-date and relevant, regardless of the changes in the data landscape.### Illustrative use caseA corporate HR department is looking to provide an intellige

## Simulate a user query

This is where we will use the Azure AI Search to search for documents similar to the user query.

In [16]:
response = openai_client.chat.completions.create(
    model=azure_openai_chat_completions_deployment_name,
    messages=[
        {"role": "system", "content": "You are a helpful assistant for an AI learner."},
        {"role": "user", "content": "What are the LLM models supported by Azure ?"}
    ],
    extra_body={
        "data_sources": [
            {
                "type": "azure_search",
                "parameters": {
                    "endpoint": azure_search_service_endpoint,
                    "index_name": search_index_name,
                    "authentication": {
                        "type": "api_key",
                        "key": azure_search_service_admin_key,
                    }
                }
            }
        ]
    }
)

print(response.to_json())

{
  "id": "1f7978ea-aa85-4a56-ba2f-08206b09ca81",
  "choices": [
    {
      "finish_reason": "stop",
      "index": 0,
      "message": {
        "content": "The Azure OpenAI Service supports a variety of large language models (LLMs) including OpenAI GPT-4, GPT-3, Codex, DALL-E, Whisper, and text to speech models. These models are part of the advanced language AI offerings provided by Azure OpenAI, ensuring compatibility and a smooth transition for users leveraging these technologies [doc5].",
        "role": "assistant",
        "end_turn": true,
        "context": {
          "citations": [
            {
              "content": "Why is RAI red teaming an important practice?Red teaming is a best practice in the responsible development of systems and features using LLMs. While not a replacement for systematic measurement and mitigation work, red teamers help to uncover and identify harms and, in turn, enable measurement strategies to validate the effectiveness of mitigations.While Mi

In [17]:
print(response.choices[0].message.content)

The Azure OpenAI Service supports a variety of large language models (LLMs) including OpenAI GPT-4, GPT-3, Codex, DALL-E, Whisper, and text to speech models. These models are part of the advanced language AI offerings provided by Azure OpenAI, ensuring compatibility and a smooth transition for users leveraging these technologies [doc5].
