# Azure AI Foundry

<center><img src="../../../images/Azure-AI-Foundry_1600x900.jpg" alt="Azure AI Foundry" width="600">

## Lab 5 - RAG (Retrieval-Augmented Generation)

In this lab, we will implement a RAG (Retrieval-Augmented Generation) system using Azure AI Search and Azure OpenAI. RAG is a technique that combines information retrieval with text generation, allowing language models to access specific external knowledge beyond their training.

We will explore:
1. **Query without RAG**: How the LLM responds without additional knowledge
2. **Azure AI Search Setup**: Preparing the vector search index
3. **RAG Implementation**: Integrating search with response generation
4. **Results Comparison**: Demonstrating the difference between responses with and without RAG

The first step is validating the environment variables configuration in the `.env` file present at the repository root.

Fill in the variable values as requested, including the Azure AI Search credentials.

### Exercise 1 - Configuration and Library Import

Let's import the necessary libraries for the RAG lab.

In [None]:
%pip install azure-search azure-search-documents

In [None]:
import json
import os
from openai import AzureOpenAI
from dotenv import load_dotenv
from azure.search.documents import SearchClient
from azure.core.credentials import AzureKeyCredential
from azure.search.documents.models import VectorizedQuery
import numpy as np

load_dotenv(dotenv_path="../../../.env")

Let's load the credentials into variables to facilitate use in the lab.

In [None]:
# Azure OpenAI Configuration
azure_endpoint = os.getenv("AZURE_OPENAI_ENDPOINT")
api_key = os.getenv("AZURE_OPENAI_API_KEY")
api_version = os.getenv("API_VERSION")
deployment_name = os.getenv("AZURE_OPENAI_DEPLOYMENT")
embedding_model = os.getenv("AZURE_OPENAI_EMBEDDING_MODEL")

# Azure AI Search Configuration
search_endpoint = os.getenv("AZURE_SEARCH_ENDPOINT")
search_key = os.getenv("AZURE_SEARCH_KEY")
search_index = os.getenv("AZURE_SEARCH_INDEX", "rag-index")

print("Configurations loaded:")
print(f"Azure OpenAI Endpoint: {azure_endpoint}")
print(f"Azure Search Endpoint: {search_endpoint}")
print(f"Search Index: {search_index}")

Now let's initialize the Azure OpenAI and Azure AI Search clients.

In [None]:
# Initialize Azure OpenAI client
openai_client = AzureOpenAI(
    azure_endpoint=azure_endpoint,
    api_key=api_key,
    api_version=api_version
)

# Initialize Azure AI Search client
search_client = SearchClient(
    endpoint=search_endpoint,
    index_name=search_index,
    credential=AzureKeyCredential(search_key)
)

print("Clients initialized successfully!")

### Exercise 2 - Query Without RAG (Baseline)

First, let's ask a specific question about a topic that probably isn't in the LLM's base knowledge or is outdated. This will allow us to compare the quality of responses before and after RAG implementation.

In [None]:
# Specific question about a topic that may not be in the LLM's knowledge
question = "What are the main features of Azure AI Foundry launched in 2024?"

def query_without_rag(question):
    """Makes a direct query to the model without using RAG"""
    response = openai_client.chat.completions.create(
        model=deployment_name,
        messages=[
            {"role": "system", "content": "You are a helpful assistant specialized in Azure technologies."},
            {"role": "user", "content": question}
        ],
        max_tokens=500,
        temperature=0.7
    )
    return response.choices[0].message.content

# Query without RAG
print("=== RESPONSE WITHOUT RAG ===")
response_without_rag = query_without_rag(question)
print(response_without_rag)
print("\n" + "="*50 + "\n")

### Exercise 3 - Data Preparation for RAG

Let's create some sample documents about Azure AI Foundry to simulate a knowledge base. In a real scenario, this data would come from official documentation, manuals, or other knowledge repositories.

In [None]:
# Sample data about Azure AI Foundry
sample_documents = [
    {
        "id": "1",
        "title": "Azure AI Foundry - Overview",
        "content": """Azure AI Foundry is a unified platform for developing AI applications. 
        Launched in 2024, it offers integrated tools to build, train and deploy AI models. 
        Includes native support for RAG, function calling, and integration with Azure AI Search. 
        The platform enables team collaboration and centralized model governance."""
    },
    {
        "id": "2", 
        "title": "Azure AI Foundry 2024 Features",
        "content": """The main features of Azure AI Foundry include: 
        1) Model Catalog with over 200 pre-trained models
        2) Prompt Flow for AI workflow orchestration
        3) Integrated Azure AI Search for RAG implementations
        4) Evaluation tools for quality metrics
        5) Responsible AI dashboard for ethical monitoring
        6) Multi-cloud deployment support
        7) Real-time inference endpoints with auto-scaling"""
    },
    {
        "id": "3",
        "title": "RAG in Azure AI Foundry", 
        "content": """Azure AI Foundry offers native support for Retrieval-Augmented Generation (RAG). 
        Allows easy connection with Azure AI Search, Cosmos DB, and other data sources. 
        Includes visual tools to configure RAG pipelines without code. 
        Supports custom embeddings and multiple retrieval types like hybrid and semantic."""
    },
    {
        "id": "4",
        "title": "Azure AI Search Integration",
        "content": """Integration with Azure AI Search enables advanced vector search with support for:
        - Hybrid search (keyword + semantic)
        - Metadata filters
        - Semantic re-ranking 
        - Multiple similarity algorithms (cosine, dot product, euclidean)
        - Automatic document indexing
        - Support for over 300 file formats"""
    }
]

print(f"Prepared {len(sample_documents)} sample documents")
for doc in sample_documents:
    print(f"- {doc['title']}")

### Exercise 4 - Embeddings Generation

Now let's generate embeddings for our documents using Azure OpenAI. Embeddings are vector representations of documents that enable semantic search.

In [None]:
def get_embedding(text):
    """Generates embedding for a text using Azure OpenAI"""
    response = openai_client.embeddings.create(
        input=text,
        model=embedding_model
    )
    return response.data[0].embedding

# Generate embeddings for all documents
print("Generating embeddings for documents...")
for doc in sample_documents:
    # Combine title and content for embedding
    full_text = f"{doc['title']} {doc['content']}"
    doc['embedding'] = get_embedding(full_text)
    print(f"✓ Embedding generated for: {doc['title']}")

print(f"\nEmbeddings generated! Dimension: {len(sample_documents[0]['embedding'])}")

### Exercise 5 - Semantic Search (Simulated)

Since we don't have a real Azure AI Search index configured, let's simulate semantic search by calculating similarity between embeddings. In a real environment, Azure AI Search would do this automatically.

In [None]:
def cosine_similarity(vec1, vec2):
    """Calculates cosine similarity between two vectors"""
    vec1 = np.array(vec1)
    vec2 = np.array(vec2)
    dot_product = np.dot(vec1, vec2)
    norm_vec1 = np.linalg.norm(vec1)
    norm_vec2 = np.linalg.norm(vec2)
    return dot_product / (norm_vec1 * norm_vec2)

def semantic_search(query, documents, top_k=2):
    """Performs semantic search on documents"""
    # Generate query embedding
    query_embedding = get_embedding(query)
    
    # Calculate similarities
    similarities = []
    for doc in documents:
        similarity = cosine_similarity(query_embedding, doc['embedding'])
        similarities.append({
            'document': doc,
            'similarity': similarity
        })
    
    # Sort by similarity (highest first)
    similarities.sort(key=lambda x: x['similarity'], reverse=True)
    
    return similarities[:top_k]

# Test semantic search
print("=== SEMANTIC SEARCH ===")
results = semantic_search(question, sample_documents)

for i, result in enumerate(results, 1):
    doc = result['document']
    similarity = result['similarity']
    print(f"{i}. {doc['title']} (Similarity: {similarity:.3f})")
    print(f"   Content: {doc['content'][:100]}...")
    print()

### Exercise 6 - RAG Implementation

Now let's implement the complete RAG system, combining semantic search with response generation by the language model.

In [None]:
def query_with_rag(question, documents):
    """Implements RAG: search relevant documents and generate response based on context"""
    
    # 1. Retrieval: Search relevant documents
    search_results = semantic_search(question, documents, top_k=2)
    
    # 2. Build context with retrieved documents
    context_parts = []
    for result in search_results:
        doc = result['document']
        context_parts.append(f"Document: {doc['title']}\nContent: {doc['content']}")
    
    context = "\n\n".join(context_parts)
    
    # 3. Build prompt with context
    system_message = """You are an assistant specialized in Azure technologies. 
    Use ONLY the information provided in the context to answer the question. 
    If the information is not in the context, say you don't have that specific information.
    Be precise and cite specific information from the context when possible."""
    
    user_message = f"""Context:
{context}

Question: {question}

Answer based on the provided context:"""
    
    # 4. Augmented Generation: Generate response with context
    response = openai_client.chat.completions.create(
        model=deployment_name,
        messages=[
            {"role": "system", "content": system_message},
            {"role": "user", "content": user_message}
        ],
        max_tokens=500,
        temperature=0.3  # Lower temperature for more precise responses
    )
    
    return {
        'answer': response.choices[0].message.content,
        'sources': [result['document']['title'] for result in search_results],
        'context': context
    }

# Execute RAG with the same question
print("=== RESPONSE WITH RAG ===")
rag_result = query_with_rag(question, sample_documents)

print("Answer:")
print(rag_result['answer'])
print(f"\nSources used: {', '.join(rag_result['sources'])}")
print("\n" + "="*50 + "\n")

### Exercise 7 - Results Comparison

Let's directly compare the responses obtained without RAG and with RAG to highlight the differences and benefits of using specific knowledge.

In [None]:
print("🔍 COMPARATIVE ANALYSIS: RAG vs Without RAG")
print("="*60)

print("\n📝 QUESTION:")
print(f'"{question}"')

print("\n❌ RESPONSE WITHOUT RAG (limited knowledge):")
print("-" * 50)
print(response_without_rag)

print("\n✅ RESPONSE WITH RAG (augmented knowledge):")
print("-" * 50)
print(rag_result['answer'])

print(f"\n📚 SOURCES USED IN RAG:")
for source in rag_result['sources']:
    print(f"• {source}")

print("\n💡 OBSERVED BENEFITS OF RAG:")
benefits = [
    "✓ More specific and up-to-date information",
    "✓ Responses based on reliable sources",
    "✓ Greater precision in technical details", 
    "✓ Information traceability (sources)",
    "✓ Reduction of model hallucinations"
]

for benefit in benefits:
    print(benefit)

### Exercise 8 - RAG with Azure AI Search (Real Example)

Although we simulated semantic search, let's show how the real implementation would be using Azure AI Search. This code demonstrates how to connect with a real index.

In [None]:
def query_with_azure_ai_search(question, search_client, openai_client):
    """
    Real RAG implementation using Azure AI Search
    Note: This code requires a configured index in Azure AI Search
    """
    try:
        # 1. Generate question embedding
        query_embedding = get_embedding(question)
        
        # 2. Create vectorized query for Azure AI Search
        vector_query = VectorizedQuery(
            vector=query_embedding,
            k_nearest_neighbors=3,  # Top 3 most similar results
            fields="content_vector"  # Field containing embeddings
        )
        
        # 3. Execute search in Azure AI Search
        search_results = search_client.search(
            search_text=question,  # Hybrid search: text + vector
            vector_queries=[vector_query],
            top=3,
            include_total_count=True
        )
        
        # 4. Extract context from results
        context_parts = []
        sources = []
        
        for result in search_results:
            context_parts.append(f"Title: {result['title']}\nContent: {result['content']}")
            sources.append(result['title'])
        
        context = "\n\n".join(context_parts)
        
        # 5. Generate response with context
        system_message = """You are an assistant specialized in Azure. 
        Use ONLY the information from the provided context to answer."""
        
        user_message = f"""Context:\n{context}\n\nQuestion: {question}\n\nAnswer:"""
        
        response = openai_client.chat.completions.create(
            model=deployment_name,
            messages=[
                {"role": "system", "content": system_message},
                {"role": "user", "content": user_message}
            ],
            max_tokens=500,
            temperature=0.3
        )
        
        return {
            'answer': response.choices[0].message.content,
            'sources': sources,
            'context': context
        }
        
    except Exception as e:
        return {
            'answer': f"Error connecting to Azure AI Search: {str(e)}",
            'sources': [],
            'context': ""
        }

# Usage example (commented as it requires configured index)
print("📋 CODE FOR REAL AZURE AI SEARCH:")
print("# To use this code, you need:")
print("# 1. Create an index in Azure AI Search")
print("# 2. Configure embedding fields") 
print("# 3. Index your documents")
print("# 4. Execute: query_with_azure_ai_search(question, search_client, openai_client)")

# result = query_with_azure_ai_search(question, search_client, openai_client)
# print(result['answer'])

### Exercise 9 - Interactive Testing

Now you can test the RAG system with your own questions! Try different types of queries to see how RAG behaves.

In [None]:
# Test your own questions here!
test_questions = [
    "How does Azure AI Foundry support RAG?",
    "How many models are available in the Model Catalog?",
    "What types of search are supported by Azure AI Search?",
    "What is Prompt Flow in Azure AI Foundry?"
]

print("🧪 INTERACTIVE TEST - Try different questions:")
print("="*60)

for test_q in test_questions:
    print(f"\n❓ Question: {test_q}")
    
    # Response without RAG
    no_rag = query_without_rag(test_q)
    print(f"❌ Without RAG: {no_rag[:150]}...")
    
    # Response with RAG
    with_rag = query_with_rag(test_q, sample_documents)
    print(f"✅ With RAG: {with_rag['answer'][:150]}...")
    print(f"📚 Sources: {', '.join(with_rag['sources'])}")
    print("-" * 40)

print("\n💡 Try creating your own questions by modifying the 'test_questions' list!")

## Conclusion and Next Steps

In this lab, we explored the concept and implementation of RAG (Retrieval-Augmented Generation), demonstrating how:

### ✅ What we learned:
1. **Fundamental difference**: How RAG significantly improves response quality
2. **Embedding Generation**: Converting text into vector representations
3. **Semantic Search**: Finding relevant documents using embedding similarity
4. **Augmented Generation**: Combining retrieved context with text generation
5. **Practical implementation**: Functional code for complete RAG system

### 🚀 Demonstrated RAG benefits:
- **Updated knowledge**: Access to specific and recent information
- **Reduced hallucinations**: Responses based on reliable sources  
- **Traceability**: Ability to identify information sources
- **Specialization**: More precise responses about specific topics
- **Flexibility**: Easy updating of knowledge base

### 🔧 For production implementation:
1. **Configure Azure AI Search** with optimized vector indexes
2. **Implement chunking** for large documents (512-1024 tokens)
3. **Use hybrid search** (keyword + semantic) for better recall
4. **Add re-ranking** to improve result relevance
5. **Monitor performance** and adjust parameters (top_k, temperature, etc.)
6. **Implement caching** for frequent queries
7. **Configure governance** for source validation

### 📚 Additional resources:
- [Azure AI Search Vector Search](https://learn.microsoft.com/en-us/azure/search/vector-search-overview)
- [Azure OpenAI RAG Patterns](https://learn.microsoft.com/en-us/azure/ai-services/openai/concepts/use-your-data)
- [Azure AI Foundry Documentation](https://learn.microsoft.com/en-us/azure/ai-foundry/)
- [RAG Best Practices](https://learn.microsoft.com/en-us/azure/ai-services/openai/concepts/advanced-usage)