## RAG Microhack Challenge 3: Retrieve Data from Azure AI Search

__In this challenge, attendees will perform following tasks:__
1. Connect with Azure AI Search service
2. Querying of Azure AI Search vector index
3. Understanding of vectors
4. Retrieval of  document relevant to the search query
5. Understanding various search options (basic, semantic, vector & hybrid search)

__Pre-requisites:__
1. Azure OpenAI Service is created with text-embedding-ada-002 embedding model.
2. Azure AI Search service is created
3. Sample data is indexed in the AI Search service. HR Benefits sample data is provided in this repo.

### Step 1: Update ai_search.env file with your AI Search & AOAI service details. 

``Example ai_search.env file below.``

<img src="attachment:510cc22e-73d5-46a1-a20a-5791b889f38b.png" width=60% style="display: block; margin: auto;">


It is a good practive to put service configuration in a dot env file in Python. We will specify AI Search service configuration in ai_search.env file and use that to connect to the AI Search service you have created in the previous challenges

In [None]:
from dotenv import dotenv_values

In [None]:
# specify the name of the .env file name (default is ai_search.env)
env_filename = "ai_search.env"

# Load AI Search configuration (endpoint, key etc) from the environment file
env_config = dotenv_values(env_filename)

In [None]:
ai_search_endpoint = env_config["AI_SEARCH_ENDPOINT"]
ai_search_key = env_config["AI_SEARCH_KEY"]
ai_search_index_name = env_config["AI_SEARCH_INDEX_NAME"]
ai_search_semantic_config_name = env_config["AI_SEARCH_SEMANTIC_CONFIG_NAME"]

In [None]:
print("AI Search service endpoint is: ", ai_search_endpoint, " and index name is: ", ai_search_index_name)

AI Search service endpoint is:  https://search0-7szd5hx5xc74c.search.windows.net  and index name is:  rag-index2


### Step 2: Connect to AI Search service

 First, make sure you have installed the necessary Azure libraries. You can do this using pip:

pip install azure-search-documents

In [None]:
# Uncomment the line below and install python library if you get modeule not found error when running the next code block
# !pip install azure-search-documents

In [None]:
# Import Required Python libraries for Azure AI Search and foundational packages
from azure.core.credentials import AzureKeyCredential  
from azure.search.documents import SearchClient 
#from azure.search.documents.models import Vector  
import numpy as np 
import os  

In [None]:
ai_search_credential = AzureKeyCredential(ai_search_key)  

# Instantiate AI Search service client
ai_search_client = SearchClient(endpoint=ai_search_endpoint,  
                      index_name=ai_search_index_name,  
                      credential=ai_search_credential)  

### Step 3: Perform Basic Search using Azure AI Search service
Azure AI Search provides several options to customize your search queries. Here, we will explore basic, semantic, vector and hybrid search options.h:

In [None]:
# Run basic search against your AI Search and print results
simple_query_results = ai_search_client.search(search_text="does my plan cover dental exam", top=3)  
  
for i, result in enumerate(simple_query_results):  
    print("****** SEARCH RESULT# :", i+1, " CONTENT ******", "Search score: ", result["@search.score"])
    ## Only printing 1000 characters from each matching document. Adjust as needed
    print(result["content"][0:1000]) 
    print("Source Document: ", result["metadata_storage_path"], "\n" ) 

****** SEARCH RESULT# : 1  CONTENT ****** Search score:  7.3076386
allows you to submit a claim for reimbursement for any eligible expenses. Make sure to save 

all of your receipts and submit your claim as soon as possible to ensure that your costs are 

covered. 

At Northwind Health, we are committed to providing our members with quality coverage 

for medical treatments such as chemotherapy and radiation therapy. With the Northwind 

Standard plan, you can rest assured that any eligible treatments you receive will be covered. 

Clinical Trials 

COVERED SERVICES: Clinical Trials 

At Northwind Health, we understand the importance of access to clinical trials for our 

members. Clinical trials are research studies that look at new ways to prevent, detect, or 

treat diseases and conditions, and can give members access to treatments and therapies not 

yet available. Northwind Standard offers coverage for certain clinical trial services, 

including those related to common diseases a

KeyError: 'page'

### Step 4: Perform Semantic Search using Azure AI Search service

Semantic search uses AI to understand the context of the search query and retrieve more relevant results.

In [None]:
print("Running Semantic search using config: ", ai_search_semantic_config_name)
semantic_query_results = list(
        ai_search_client.search(
            search_text="does my plan cover dental exam",
            query_type="semantic",
            query_answer="extractive",
            query_answer_count=3,
            top=3,
            semantic_configuration_name=ai_search_semantic_config_name,
        )
    )

Running Semantic search using config:  default


In [None]:
for i, result in enumerate(semantic_query_results):  
    print("****** SEARCH RESULT#:", i+1, " CONTENT ******", "Search score:", 
          result["@search.score"], " Semantic Reranker Score:", result["@search.reranker_score"])
    ## Only printing 1000 characters from each matching document. Adjust as needed
    print(result["content"][0:100]) 
    print("Source Document: ", result["metadata_storage_path"], "\n" ) 

****** SEARCH RESULT#: 1  CONTENT ****** Search score: 4.2739983  Semantic Reranker Score: 2.8656136989593506

 

 

 

Contoso Electronics  
Plan and Benefit Packages 

 
 

 

 

 

  



This document contai
****** SEARCH RESULT#: 2  CONTENT ****** Search score: 7.3076386  Semantic Reranker Score: 2.622802257537842
allows you to submit a claim for reimbursement for any eligible expenses. Make sure to save 

all of
****** SEARCH RESULT#: 3  CONTENT ****** Search score: 3.8690808  Semantic Reranker Score: 2.5361344814300537
• Ask your dentist about any additional costs that may not be covered by your plan  

• Have a list 


### Step 5: Perform Vector Search using Azure AI Search service

``What Are Vectors? `` 
Vectors are mathematical representations of data points in a multi-dimensional space. In the context of Azure AI Search and machine learning, vectors are used to represent textual data, images, or other data types in a way that captures the inherent relationships and similarities between different data points. Each vector is typically a list of numbers (floats) that define the position of a data point in the vector space.

In [None]:
from openai import AzureOpenAI
import requests
import json

In [None]:
aoai_api_key = env_config["AOAI_KEY"]
aoai_endpoint = env_config["AOAI_ENDPOINT"]
aoai_embedding_deployment_name= env_config["AOAI_EMBEDDING_DEPLOYMENT_NAME"]

In [None]:
# Initialize the Azure OpenAI client
aoai_client = AzureOpenAI(
  api_key = aoai_api_key,  
  api_version = "2024-02-01",
  azure_endpoint = aoai_endpoint
)

In [None]:
# 1. Create a vector embedding for a query using Azure OpenAI embedding API
search_text = "does my plan cover dental exam"
search_text_embedding = aoai_client.embeddings.create(input = [search_text], model=aoai_embedding_deployment_name).data[0].embedding

In [None]:
# Here is how the query "does my plan cover dental exam" looks when converted to a vector
print("Length of the vector embeddings: ", len(search_text_embedding))
print(search_text_embedding)

Length of the vector embeddings:  1536
[-0.00823073647916317, 0.008439992554485798, 0.02119065821170807, -0.02466430701315403, -0.04332994297146797, 0.035964131355285645, -0.022083483636379242, -0.04617582634091377, -0.01927945390343666, 0.022934457287192345, 0.009618801064789295, 0.03978653997182846, -0.024329498410224915, 0.012339129112660885, -0.013078500516712666, -0.006138176191598177, 0.017549604177474976, -0.006971712689846754, 0.013706267811357975, -0.01584765501320362, -0.0037700957618653774, -0.012143823318183422, 0.00036467218887992203, -0.00856554601341486, 0.01721479371190071, -0.0005148569471202791, 0.014522366225719452, -0.01131377462297678, -0.038810014724731445, 0.01922365091741085, 0.03211382031440735, -0.004983780439943075, -0.036103636026382446, -0.03024446777999401, 0.011802038177847862, 0.01978166773915291, -0.009779230691492558, 0.003182434942573309, 0.03007706254720688, -0.011216121725738049, -0.004907053429633379, -0.011090568266808987, 0.008453942835330963, -0

#### Note on embeddings size
The embedding vector size depends on the model used to create embedding. We are using OpenAI "text-embedding-ada-002" model which gives vectors of dimension 1536. You can see (below) Azure AI Search index has "contentVector" field of the same dimension.

<img src="attachment:65bc7ef2-6e39-41de-a869-52a8fa99fd41.png" width=70% style="display: block; margin: auto;">


##### Perform Vector search using the search query embeddings

In [None]:
endpoint = f"{ai_search_endpoint}/indexes/{ai_search_index_name}/docs/search?api-version=2023-11-01"

# Headers for the request
headers = {
    "Content-Type": "application/json",
    "api-key": ai_search_key,
}

# Payload for the vector search
payload = {
        "count": True,
        "vectorQueries": [
            {
                "vector": search_text_embedding,
                "k": 5,
                "fields": "contentVector",
                "kind": "vector",
                "exhaustive": True
            }
        ]
    }

In [None]:
# Perform vector search - Note that you are just sending vector embeddings as the search data
response = requests.post(endpoint, headers=headers, data=json.dumps(payload))

In [None]:
# Check for successful response
if response.status_code == 200:
    results = response.json()
    print("Search results:")
    for result in results.get("value", []):
        print(result)
else:
    print(f"Error: {response.status_code}, {response.text}")

Search results:
{'@search.score': 0.8367705, 'chunk_id': '151f626421fc_aHR0cHM6Ly9zdHJhZzA3c3pkNWh4NXhjNzRjLmJsb2IuY29yZS53aW5kb3dzLm5ldC9kb2N1bWVudHMvTm9ydGh3aW5kX0hlYWx0aF9QbHVzX0JlbmVmaXRzX0RldGFpbHMucGRm0_pages_14', 'parent_id': 'aHR0cHM6Ly9zdHJhZzA3c3pkNWh4NXhjNzRjLmJsb2IuY29yZS53aW5kb3dzLm5ldC9kb2N1bWVudHMvTm9ydGh3aW5kX0hlYWx0aF9QbHVzX0JlbmVmaXRzX0RldGFpbHMucGRm0', 'content': '• Ask your dentist about any additional costs that may not be covered by your plan  \n\n• Have a list of any medications or allergies that may affect the anesthesia  \n\n• Make sure that the dentist is aware of any medical conditions that might increase the risk \n\nof complications  \n\n• Ask your dentist about any special instructions that you may need to follow before or after \n\nthe procedure  \n\n• Make sure that you understand the risks and benefits of the procedure  \n\n• Ask your dentist about any follow-up care that may be needed after the procedure  \n\n• Make sure that you have a plan for transp

#### [OPTIONAL] Step 6: Semantic Search via REST API to get @semantic.answer

In [None]:
import requests
import json

url = f"{ai_search_endpoint}/indexes/{ai_search_index_name}/docs"
headers = {
    "Content-Type": "application/json",
    "api-key": ai_search_key
}
params = {
    "api-version": "2024-09-01-preview",
    "search": "does my plan cover eye exams",
    "queryType": "semantic",
    "answers": "extractive|count-3",
    "semanticConfiguration": ai_search_semantic_config_name,  
    "top": 3  
}

response = requests.get(url, headers=headers, params=params)
data = response.json()
print(json.dumps(data, indent=4))

{
    "@odata.context": "https://search0-7szd5hx5xc74c.search.windows.net/indexes('rag-index2')/$metadata#docs(*)",
    "@search.answers": [
        {
            "key": "151f626421fc_aHR0cHM6Ly9zdHJhZzA3c3pkNWh4NXhjNzRjLmJsb2IuY29yZS53aW5kb3dzLm5ldC9kb2N1bWVudHMvTm9ydGh3aW5kX0hlYWx0aF9QbHVzX0JlbmVmaXRzX0RldGFpbHMucGRm0_pages_12",
            "text": "Additionally, it i... Chemotherapy And Radiation Therapy   Chemotherapy and Radiation Therapy: Covered Services   At Contoso, we are proud to offer our employees the Northwind Health Plus plan, which   provides comprehensive coverage for medical, vision, and dental services. This plan also   includes coverage for chemotherapy and radiation therapy.",
            "highlights": "Additionally, it i... Chemotherapy And Radiation Therapy   Chemotherapy and Radiation Therapy: Covered Services   At Contoso, we are proud to offer our employees the Northwind Health Plus plan,<em> which   provides comprehensive coverage for medical, vision, and </e

In [None]:
# Print answers provided by the semantic search
data["@search.answers"]

[{'key': '151f626421fc_aHR0cHM6Ly9zdHJhZzA3c3pkNWh4NXhjNzRjLmJsb2IuY29yZS53aW5kb3dzLm5ldC9kb2N1bWVudHMvTm9ydGh3aW5kX0hlYWx0aF9QbHVzX0JlbmVmaXRzX0RldGFpbHMucGRm0_pages_12',
  'text': 'Additionally, it i... Chemotherapy And Radiation Therapy   Chemotherapy and Radiation Therapy: Covered Services   At Contoso, we are proud to offer our employees the Northwind Health Plus plan, which   provides comprehensive coverage for medical, vision, and dental services. This plan also   includes coverage for chemotherapy and radiation therapy.',
  'highlights': 'Additionally, it i... Chemotherapy And Radiation Therapy   Chemotherapy and Radiation Therapy: Covered Services   At Contoso, we are proud to offer our employees the Northwind Health Plus plan,<em> which   provides comprehensive coverage for medical, vision, and </em>dental<em> services.</em> This plan also   includes coverage for chemotherapy and radiation therapy.',
  'score': 0.9700000286102295},
 {'key': '151f626421fc_aHR0cHM6Ly9zdHJhZzA3c