# Challenge 04-A - Retrieval Augmented Generation (RAG) for Structured Data


## Introduction

In this notebook, we will explore the practical application of RAG with a more manageable type of data i.e structured data such as relational data or text data stored in csv files. The main objective is to introduce a specific use case that demonstrates the utilization of Azure Cognitive Search to extract relevant documents and the power of ChatGPT to address relevant portions of the document, providing concise summaries based on user prompts. It aims to showcase how Azure OpenAI's ChatGPT capabilities can be adapted to suit your summarization needs, while also guiding you through the setup and evaluation of summarization results. This method can be customized to suit various summarization use cases and applied to diverse datasets.

This notebook leverages **Semantic Kernel** as an orchestration framework to coordinate multiple AI services and manage the RAG workflow. Semantic Kernel provides:

- **Service Management**: Centralized registration and access to Azure OpenAI services (chat completion and embeddings)
- **Execution Orchestration**: Coordinated execution of embedding generation, similarity search, and response generation
- **Configuration Management**: Unified handling of model parameters and execution settings
- **Async Operations**: Efficient handling of concurrent AI service calls

The kernel acts as the central hub that orchestrates the interaction between Azure Cognitive Search for document retrieval and Azure OpenAI for embeddings and completions.

## Student Tasks
Your goals for this challenge are to read through this notebook and complete the code where there is a TODO comment. Use Github Copilot to write the code! Ensure you run each code block, observe the results, and then be able to answer the questions posed in the student guide.

## Use Case

This use case consists of three sections:
- Document Search - The process of extracting relevant documents based on the query from a corpus of documents.
- Document Zone Search - The process of finding the relevant part of the document extracted from document search.
- Downstream AI tasks such as Question Answering (aka Text summarization) - Text summarization is the process of creating summaries from large volumes of data while maintaining significant informational elements and content value.
This use case can be useful in helping subject matter experts in finding relevant information from large document corpus.

**Example:** In the drug discovery process, scientists in pharmaceutical industry read a corpus of documents to find specific information related to concepts, experiment results etc. This use case enables them to ask questions from the document corpus and the solution will come back with the succinct answer. Consequently, expediting the drug discovery process.
 
Benefits of the solution:
1. Shortens reading time
2. Improves the effectiveness of searching for information
3. Removes bias from human summarization techniques
4. Increases bandwidth for humans to focus on more in-depth analysis 


The need for document summarization be applied to any subject matter (legal, financial, journalist, medical, academic, etc) that requires long document summarization. The subject matter that this notebook is focusing on is journalistic - we will walk through news articles.   


### CNN & Daily Mail Dataset
For this walkthrough, we will be using the CNN/Daily Mail dataset. This is a common dataset used for text summarization and question answering tasks. Human generated abstractive summary bullets were generated from news stories on the CNN and Daily Mail websites.


### Data Description
The relevant schema for our work today consists of:

- `id`: a string containing the heximal formatted SHA1 hash of the URL where the story was retrieved from
- `article`: a string containing the body of the news article
- `highlights`: a string containing the highlight of the article as written by the article author


In [None]:
%pip install -r ../requirements.txt

In [None]:
# Import Azure Cognitive Search, Semantic Kernel, and other python modules

import os, json, requests, sys, re
import asyncio
from pprint import pprint
import pandas as pd
import numpy as np
from sklearn.metrics.pairwise import cosine_similarity

# Azure Cognitive Search imports
from azure.core.credentials import AzureKeyCredential
from azure.search.documents.indexes import SearchIndexClient 
from azure.search.documents import SearchClient
from azure.search.documents.indexes.models import (
    SearchIndex,
    SearchField,
    SearchFieldDataType,
    SimpleField,
    SearchableField,
    SemanticConfiguration,
    PrioritizedFields,
    SemanticField,
    SemanticSettings
)

# Semantic Kernel imports for core services
import semantic_kernel as sk
from semantic_kernel.connectors.ai.open_ai import AzureChatCompletion, AzureTextEmbedding
from semantic_kernel.contents import ChatHistory
from semantic_kernel.connectors.ai.open_ai import AzureChatPromptExecutionSettings

from dotenv import load_dotenv
load_dotenv()

In [None]:
# Load environment variables and initialize Semantic Kernel services
chat_model = os.environ['CHAT_MODEL_NAME']
embedding_model = os.environ['EMBEDDING_MODEL_NAME']

# Initialize Semantic Kernel
kernel = sk.Kernel()

# Add Azure OpenAI Chat Completion service
chat_service = AzureChatCompletion(
    deployment_name=chat_model,
    endpoint=os.environ['OPENAI_API_BASE'],
    api_key=os.environ['OPENAI_API_KEY']
)
kernel.add_service(chat_service)

# Add Azure OpenAI Text Embedding service  
embedding_service = AzureTextEmbedding(
    deployment_name=embedding_model,
    endpoint=os.environ['OPENAI_API_BASE'],
    api_key=os.environ['OPENAI_API_KEY']
)
kernel.add_service(embedding_service)

print("Semantic Kernel services initialized successfully!")

**NOTE:** The path in the code cell below is referring to the `cnn_dailymail.csv` file in the `/data/structured/` folder. You may need to update this path if you are running this notebook from a different location.

In [None]:
# read the CNN dailymail dataset in pandas dataframe
df = pd.read_csv('../data/structured/cnn_dailymail_data.csv') #path to CNN daily mail dataset
df.head()

In [None]:
# Create a Cognitive Search Index client
service_endpoint = os.getenv("AZURE_AI_SEARCH_ENDPOINT")   
key = os.getenv("AZURE_AI_SEARCH_KEY")
credential = AzureKeyCredential(key)

index_name = "news-index"

index_client = SearchIndexClient(
    endpoint=service_endpoint, credential=credential)
index_client

### Define Index Fields and Create a Semantic Configuration

A *semantic configuration* specifies how fields are used in semantic ranking. It gives the underlying models hints about which index fields are most important for semantic ranking, captions, highlights, and answers.

You can add or update a semantic configuration at any time without rebuilding your index. When you issue a query, you'll add the semantic configuration (one per query) that specifies which semantic configuration to use for the query.

Review the properties you'll need to specify. A semantic configuration has a name and at least one each of the following properties:

* Title field - A title field should be a concise description of the document, ideally a string that is under 25 words. This field could be the title of the document, name of the product, or item in your search index. If you don't have a title in your search index, leave this field blank.
* Content fields - Content fields should contain text in natural language form. Common examples of content are the body of a document, the description of a product, or other free-form text.
* Keyword fields - Keyword fields should be a list of keywords, such as the tags on a document, or a descriptive term, such as the category of an item.

You can only specify one title field but you can specify as many content and keyword fields as you like. For content and keyword fields, list the fields in priority order because lower priority fields may get truncated.

In [None]:
fields = [
    SimpleField(name="id", type=SearchFieldDataType.String, key=True),
    SearchableField(name="highlights", type=SearchFieldDataType.String,
                searchable=True, retrievable=True),
    SearchableField(name="article", type=SearchFieldDataType.String,
                filterable=True, searchable=True, retrievable=True),
]

semantic_config = SemanticConfiguration(
    name="my-semantic-config",
    prioritized_fields=PrioritizedFields(
        #title_field=SemanticField(field_name=""), # title field is not present in the dataset. We can use OpenAI to generate title
        #prioritized_keywords_fields=[SemanticField(field_name="")], # keywords are not present in the dataset. We can use OpenAI to generate keywords
        prioritized_content_fields=[SemanticField(field_name="article"), SemanticField(field_name="highlights")]
    )
)

# Create the semantic settings with the configuration
semantic_settings = SemanticSettings(configurations=[semantic_config])

# Create the search index with the semantic settings
index = SearchIndex(name=index_name, fields=fields, semantic_settings=semantic_settings)
result = index_client.create_or_update_index(index)
print(f' {result.name} created')

In [None]:
documents = df.to_dict('records')
documents[0]

In [None]:
len(documents)

In [None]:
search_client = SearchClient(endpoint=service_endpoint, index_name=index_name, credential=credential)
result = search_client.upload_documents(documents)  
print(f"Uploaded and Indexed {len(result)} documents") 

## Section 1: Leveraging Cognitive Search to extract relevant article based on the query 

In [None]:
# Semantic Kernel helper functions
# These functions demonstrate the orchestration pattern where the kernel
# manages service discovery and execution, providing a clean abstraction
# over the underlying Azure OpenAI services

import numpy as np
from sklearn.metrics.pairwise import cosine_similarity
from semantic_kernel.connectors.ai.open_ai import AzureTextEmbedding, AzureChatCompletion

#Student Task: Complete the get_embedding function
async def get_embedding(kernel, text):
    # Get the embedding service by type
    #TODO: use semantic kernel and the AzureTextEmbedding function to generate embeddings and return embeddings. Sepcifically the generate_embeddings function
    return embeddings

async def get_completion(kernel, prompt, temperature):
    # Get the chat service by type
    chat_service = kernel.get_service(type=AzureChatCompletion)
    
    # Create execution settings
    settings = AzureChatPromptExecutionSettings(
        temperature=temperature,
        max_tokens=500
    )
    
    # Create a proper ChatHistory object with the user prompt
    chat_history = ChatHistory()
    chat_history.add_user_message(prompt)
    
    # Generate completion
    response = await chat_service.get_chat_message_content(
        chat_history=chat_history,
        settings=settings
    )
    
    return response.content

def search_similar_chunks(query_embedding, chunks_df, top_k=3):
    similarities = []
    query_embedding = np.array(query_embedding).reshape(1, -1)
    
    for idx, embedding in enumerate(chunks_df['embedding']):
        embedding_array = np.array(embedding).reshape(1, -1)
        similarity = cosine_similarity(query_embedding, embedding_array)[0][0]
        similarities.append((similarity, idx))
    
    similarities.sort(reverse=True)
    results = []
    for similarity, idx in similarities[:top_k]:
        results.append({
            'text': chunks_df.iloc[idx]['text'],
            'score': similarity
        })
    return results

In [None]:
# Search for document about Laurene Jobs and Hillary Clinton
search_query = "Laurene Jobs Hillary Clinton"
results = search_client.search(search_text=search_query, top=1)

document = list(results)[0]['article']
print(f"Retrieved document for query: '{search_query}'")
document

In [None]:
#length of article extracted from Azure Cognitive search
len(document) 

## Section 2: Document Zone Search
### Document Zone: Semantic Kernel Orchestrated Embeddings
Now that we narrowed on a single document from our knowledge base using Azure Cognitive Search, we can dive deeper into the single document to refine our initial query to a more specific section or "zone" of the article.

To do this, we will utilize Semantic Kernel's orchestrated Azure OpenAI Embeddings service.

### **Embeddings Overview**
An embedding is a special format of data representation that can be easily utilized by machine learning models and algorithms. The embedding is an information dense representation of the semantic meaning of a piece of text. Each embedding is a vector of floating-point numbers, such that the distance between two embeddings in the vector space is correlated with semantic similarity between two inputs in the original format. For example, if two texts are similar, then their vector representations should also be similar.

Different Azure OpenAI embedding models are specifically created to be good at a particular task. Similarity embeddings are good at capturing semantic similarity between two or more pieces of text. Text search embeddings help measure long documents are relevant to a short query. Code search embeddings are useful for embedding code snippets and embedding nature language search queries.

Embeddings make it easier to do machine learning on large inputs representing words by capturing the semantic similarities in a vector space. Therefore, we can use embeddings to if two text chunks are semantically related or similar, and inherently provide a score to assess similarity.

### **Cosine Similarity**
A previously used approach to match similar documents was based on counting maximum number of common words between documents. This is flawed since as the document size increases, the overlap of common words increases even if the topics differ. Therefore cosine similarity is a better approach.

Mathematically, cosine similarity measures the cosine of the angle between two vectors projected in a multi-dimensional space. This is beneficial because if two documents are far apart by Euclidean distance because of size, they could still have a smaller angle between them and therefore higher cosine similarity.

The Azure OpenAI embeddings rely on cosine similarity to compute similarity between documents and a query.

### **Chunking**

Let's start with chunking. Why is chunking important when working with LLMs?

Chunking helps overcome the challenges associated with processing long sequences and ensures optimal performance when working with LLMs.

**Mitigating Token Limitations:** LLMs have a maximum token limit for each input sequence. If a document or input exceeds this limit, it needs to be divided into chunks that fit within the token constraints. Chunking allows the LLM to handle long documents or inputs by splitting them into multiple chunks that fall within the token limit. This ensures that the model can effectively process the entire content while adhering to the token constraints.

**Memory and Computational Efficiency:** LLMs are computationally expensive and require substantial memory resources to process long sequences of text. Chunking involves breaking down long documents or input into smaller, manageable chunks, allowing the LLM to process them efficiently within its memory limitations. By dividing the input into smaller parts, chunking helps avoid memory errors or performance degradation that may occur when processing lengthy sequences.

**Contextual Coherence:** Chunking helps maintain contextual coherence in the generated outputs. Instead of treating the entire input as a single sequence, breaking it into smaller chunks allows the model to capture local context more effectively. This improves the model's understanding of the relationships and dependencies within the text, leading to more coherent and meaningful generated responses.

**Improved Parallelism:** Chunking enables parallel processing, which is essential for optimizing the performance of LLMs. By dividing the input into chunks, multiple chunks can be processed simultaneously, taking advantage of parallel computing capabilities. This leads to faster inference times and enhances overall efficiency when working with LLMs.

We will be leveraging a basic splitter for this notebook. However, it's important to note that there are more advanced splitters available, which may better suit your specific use case. 

### Orchestrated Multi-Step Processing

The following section demonstrates how Semantic Kernel orchestrates a multi-step RAG process:
1. **Document Chunking**: Breaking down retrieved documents into manageable pieces
2. **Embedding Generation**: Using the orchestrated embedding service to create vector representations
3. **Similarity Search**: Coordinating the search across embedded chunks
4. **Response Generation**: Using the orchestrated chat service to generate final answers

### Orchestration Benefits
The Semantic Kernel orchestration pattern demonstrated in the helper functions above provides:
- **Service Discovery**: Automatic lookup of registered services by type
- **Lifecycle Management**: Proper initialization and cleanup of AI services  
- **Configuration Consistency**: Centralized management of model parameters
- **Error Handling**: Unified exception handling across all AI operations
- **Async Coordination**: Efficient orchestration of concurrent AI service calls

In [None]:
# Text processing functions
import re
import asyncio
import nest_asyncio
nest_asyncio.apply()

def normalize_text(text):
    text = re.sub(r'\s+', ' ', text).strip()
    text = text.replace("..", ".").replace(". .", ".").replace("\n", "")
    return text

def split_into_chunks(text, sentences_per_chunk=5):
    sentences = text.split(". ")
    chunks = []
    for i in range(0, len(sentences), sentences_per_chunk):
        chunk = ". ".join(sentences[i:i+sentences_per_chunk])
        chunks.append(chunk)
    return chunks

# Create document chunks
if document:
    #TODO: add code to normalize text in document and split normalized document to chunks and
    # store it in variable called document_chunks

    print(f"Created {len(document_chunks)} chunks from document")
else:
    document_chunks = ["Sample text chunk for testing"]
    print("Using sample chunks - no document found")

document_chunks

In [None]:
# Create embeddings for document chunks
embeddings = []
for chunk in document_chunks:
    #TODO: Generate embedding using get_embedding function and append to embeddings
    

# Create DataFrame with text and embeddings
import pandas as pd
chunks_df = pd.DataFrame({
    'text': document_chunks,
    'embedding': embeddings
})

print(f"Generated embeddings for {len(chunks_df)} chunks")

# Demo search
user_query = "What did Laurene Jobs say about Hillary Clinton?"
query_embedding = await get_embedding(kernel, user_query)
search_results = search_similar_chunks(query_embedding, chunks_df, top_k=3)

print(f"\nQuery: {user_query}")
for i, result in enumerate(search_results, 1):
    print(f"\nResult {i} (Score: {result['score']:.3f}):")
    print(result['text'][:200] + "...")

In [None]:
# Generate RAG response
context = "\n\n".join([result['text'] for result in search_results])

prompt = f"""Based on the following context, answer the question: {user_query}

Context:
{context}

Answer:"""

response = await get_completion(kernel, prompt, temperature=0)
print("RAG Response:")
print(response)

In [None]:
# Alternative embedding DataFrame for compatibility
embed_df = pd.DataFrame({
    'chunks': document_chunks,
    'embeddings': embeddings
})

embed_df.head()

In [None]:
# Document search function
def search_docs(df, user_query, top_n=3):
    query_embedding = asyncio.get_event_loop().run_until_complete(get_embedding(kernel, user_query))
    query_embedding = np.array(query_embedding).reshape(1, -1)
    
    similarities = []
    for idx, embedding in enumerate(df['embeddings']):
        embedding_array = np.array(embedding).reshape(1, -1)
        similarity = cosine_similarity(query_embedding, embedding_array)[0][0]
        similarities.append((similarity, idx))
    
    similarities.sort(reverse=True)
    
    results = []
    for similarity, idx in similarities[:top_n]:
        results.append({
            'chunks': df.iloc[idx]['chunks'],
            'similarities': similarity
        })
    
    return pd.DataFrame(results)

# Search for specific content
query = "trouble so far in clinton campaign"
results = search_docs(embed_df, query, top_n=2)
results

## Section 3: Text Summarization

This section will cover the end-to-end flow of using the GPT-3 and ChatGPT models for summarization tasks. 
The model used by the Azure OpenAI service is a generative completion call which uses natural language instructions to identify the task being asked and skill required – aka Prompt Engineering. Using this approach, the first part of the prompt includes natural language instructions and/or examples of the specific task desired. The model then completes the task by predicting the most probable next text. This technique is known as "in-context" learning. 

There are three main approaches for in-context learning: Zero-shot, Few-shot and Fine tuning. These approaches vary based on the amount of task-specific data that is given to the model: 

**Zero-shot**: In this case, no examples are provided to the model and only the task request is provided. 

**Few-shot**: In this case, a user includes several examples in the call prompt that demonstrate the expected answer format and content. 

**Fine-Tuning**: Fine Tuning lets you tailor models to your personal datasets. This customization step will let you get more out of the service by providing: 
-	With lots of data (at least 500 and above) traditional optimization techniques are used with Back Propagation to re-adjust the weights of the model – this enables higher quality results than mere zero-shot or few-shot. 
-	A customized model improves the few-shot learning approach by training the model weights on your specific prompts and structure. This lets you achieve better results on a wider number of tasks without needing to provide examples in the prompt. The result is less text sent and fewer tokens 


In [None]:
# Create summarization prompt
result_1 = results.iloc[0]['chunks']
result_2 = results.iloc[1]['chunks']
print(f"Selected chunks for summarization:\n1. {result_1}\n2. {result_2}")

prompt = f"""Summarize the content about the Clinton campaign from the following text:

Text 1: {normalize_text(result_1)}

Text 2: {normalize_text(result_2)}

Summary:"""

# Generate summary
summary = await get_completion(kernel, prompt, temperature=0.5)
print(f"Summary (Temperature=0.5):")
print(summary)

In [None]:
# Complete RAG demonstration: Search + Summarize

#TODO: Change this test query and see how it affects the results
test_query = "What are the key points about Clinton's campaign events?"

# Search for relevant documents
search_results_df = search_docs(embed_df, test_query, top_n=3)

# Create context from search results
context = "\n\n".join(search_results_df['chunks'].tolist())

# Generate RAG response using the context
rag_prompt = f"""Based on the following context, answer the question: {test_query}

Context:
{context}

Answer:"""

final_response = await get_completion(kernel, rag_prompt, temperature=0.2)

print(f"Query: {test_query}")
print(f"\nRAG Answer: {final_response}")