### 1. Set Up Environment Variables
To store credentials securely, rename the `.env.sample` file folder to `.env` in the same directory as the notebook and update the variables with the required connection information.

### 2. Install Dependenices


In [None]:
import os
from azure.identity import DefaultAzureCredential, get_bearer_token_provider
from azure.core.credentials import AzureKeyCredential
from openai import AzureOpenAI
from azure.search.documents import SearchClient
import dotenv
from azure.search.documents.models import VectorizedQuery, VectorizableTextQuery

dotenv.load_dotenv()

### 3. Load environment variables and instantiate your OpenAI and AI Search clients

In [None]:
# Load Azure OpenAI environment variables
AZURE_OPENAI_ENDPOINT = os.getenv("AZURE_OPENAI_ENDPOINT")
AZURE_OPENAI_API_KEY = os.getenv("AZURE_OPENAI_API_KEY")
AZURE_OPENAI_CHAT_COMPLETION_DEPLOYED_MODEL_NAME = os.getenv("AZURE_OPENAI_CHAT_COMPLETION_DEPLOYED_MODEL_NAME")
AZURE_OPENAI_EMBEDDING_DEPLOYED_MODEL_NAME = os.getenv("AZURE_OPENAI_EMBEDDING_DEPLOYED_MODEL_NAME")

# Load Azure Search environment variables
AZURE_SEARCH_ENDPOINT = os.getenv("AZURE_SEARCH_ENDPOINT")
AZURE_SEARCH_INDEX_NAME = os.getenv("AZURE_SEARCH_INDEX_NAME")
AZURE_SEARCH_ADMIN_KEY = os.getenv("AZURE_SEARCH_ADMIN_KEY")

# 🔹 Initialize Azure OpenAI Client (API Key or Managed Identity)
if AZURE_OPENAI_API_KEY:
    openai_client = AzureOpenAI(
        api_key=AZURE_OPENAI_API_KEY,
        azure_endpoint=AZURE_OPENAI_ENDPOINT,
        api_version="2024-10-21"
    )
else:
    azure_credential = DefaultAzureCredential()
    token_provider = get_bearer_token_provider(azure_credential, "https://cognitiveservices.azure.com/.default")
    openai_client = AzureOpenAI(
        azure_ad_token_provider=token_provider,
        azure_endpoint=AZURE_OPENAI_ENDPOINT,
        api_version="2024-10-21"
    )

# 🔹 Initialize Azure AI Search Client (API Key or Managed Identity)
if AZURE_SEARCH_ADMIN_KEY:
    search_client = SearchClient(
        endpoint=AZURE_SEARCH_ENDPOINT,
        index_name=AZURE_SEARCH_INDEX_NAME,
        credential=AzureKeyCredential(AZURE_SEARCH_ADMIN_KEY)
    )
else:
    azure_credential = DefaultAzureCredential()
    search_client = SearchClient(
        endpoint=AZURE_SEARCH_ENDPOINT,
        index_name=AZURE_SEARCH_INDEX_NAME,
        credential=azure_credential
    )

def get_embedding(text):
    return openai_client.embeddings.create(
        model=os.getenv("AZURE_OPENAI_EMBEDDING_DEPLOYED_MODEL_NAME"),
        input=text
    ).data[0].embedding

In [None]:
#Verify that you use the right endpoints
print(f"Azure OpenAI Endpoint: {AZURE_OPENAI_ENDPOINT}")
print(f"AZURE_OPENAI_EMBEDDING_DEPLOYED_MODEL_NAME: {AZURE_OPENAI_EMBEDDING_DEPLOYED_MODEL_NAME}")

### 4. Prepare a question

Define a sample question and convert it into an embedding vector:

In [None]:
user_question = "What is included in my Northwind Health Plus plan that is not in standard?"
user_question_vector = get_embedding(user_question)
print(user_question_vector)

### 5. Retrieve matching documents

Perform a vector search in Azure AI Search to retrieve relevant document chunks:

In [None]:
search_results = search_client.search(
    None,
    top=3,
    vector_queries=[
        VectorizableTextQuery( 
            text=user_question, k_nearest_neighbors=3, fields="text_vector"
        )
    ],
)

# Print Results
for result in search_results:
    print("Chunk ID:", result["chunk_id"])
    print("Title:", result["title"])
    print("Text:", result["chunk"])
    print()

### 6. RAG TIME! Generate a Response

Using the retrieved documents, construct a **system prompt** and generate a response with Azure OpenAI:

In [None]:
# First, let's collect the context from search results
context = ""
for result in search_results:
    context += result["chunk"] + "\n\n"

SYSTEM_MESSAGE = f"""
You are an AI Assistant.
Be brief in your answers. Answer ONLY with the facts listed in the retrieved text.

Context:
{context}
"""

USER_MESSAGE = user_question

response = openai_client.chat.completions.create(
    model=os.getenv("AZURE_OPENAI_CHAT_COMPLETION_DEPLOYED_MODEL_NAME"),
    temperature=0.7,
    messages=[
        {"role": "system", "content": SYSTEM_MESSAGE},
        {"role": "user", "content": USER_MESSAGE},
    ],
)

answer = response.choices[0].message.content
print(answer)




### 7. Let's have a look at the metrics
Now that we used the embedding model and the chat completion model, let's have a look at the consumption.
First, let's explore how many tokens the previous answer generation (chat completion) using the RAG pattern required:

In [None]:
print("\nToken Usage of last call:")
print(f"Prompt Tokens: {response.usage.prompt_tokens}")
print(f"Completion Tokens: {response.usage.completion_tokens}")
print(f"Total Tokens: {response.usage.total_tokens}")

### 8. Challenge
Now let's experiment a little bit with our current RAG setup

#### Task 1
Find out what Healthplans Northwind Health offers in general. Use the same RAG setup as shown above.

In [17]:
user_question = "What Healthplans does Northwind Health offer?"
## TODO: Your code goes here

#### Bonus Task 2:
This is a lot of code to repeat everytime a new questions comes. Create a function that takes the user question as an input and prints out the answer on the screen as a result.

In [16]:
def get_answer_from_question(user_question):
    ## TODO: your code goes here
    return answer

#### Task 3:
Now find the answers to the following questions: Is there a limit on how much can be expensed with PerksPlus?

In [None]:
user_question = "Is there a limit on how much can be expensed with PerksPlus?"
## TODO: Your code goes here


#### Task 4:
Let's test the system whether it avoids hallucination or answers with irrelevant information when it shouldn't. Think of a question that surely has nothing to do with the content of the Search index and test your systems ability to handle this!

In [None]:
user_question = "your question here"
## TODO: Your code goes here



#### Task 5:
Find out how many tokens you used up overall for these few questions.
Hint: You can solve this via the Azure Portal

## Troubleshooting

1. **Environment Variables Not Loaded:** Ensure you have correctly set the `.env` file or manually export them in your terminal before running the notebook.
1. **Authentication Issues:** If using Managed Identity, make sure your Azure identity has proper role assignments.
1. **Search Results Are Empty:** Ensure your Azure AI Search index contains vectorized data.
1. **OpenAI API Errors:** Verify your deployment name and API key.

## Summary

This notebook demonstrates a **vector-based RAG pipeline** using Azure OpenAI and Azure AI Search. It retrieves relevant documents using vector search and generates responses using GPT-based chat completions. The approach improves the accuracy of AI responses by grounding them in real data.