## 🚀 Motivation

Understand how to evaluate relevance and optimize your search using azure searc python api SDK !

## ❗ Problem Statement


##### Understand Quantitve measure of relevenace 


+ **NDCG@10**: This metric, standing for Normalized Discounted Cumulative Gain at 10, rates a retrieval system's effectiveness in finding and correctly ordering the top 10 documents. The score ranges from 0 to 100, reflecting how closely the system's ordered list matches the ideal order of documents. NDCG@10 is widely used for its balance in evaluating both the precision of results and their proper sequencing.

- **NDCG@3**: Similar to NDCG@10, NDCG@3 focuses on the top 3 documents. It's particularly relevant in contexts where the highest accuracy in the topmost results is crucial, like in generative AI applications. This metric scores the system's ability to identify and correctly rank the three most relevant documents.

+ **Recall@50**: This measures the proportion of high-quality documents identified within the top 50 results. It's calculated by counting the number of documents rated as high quality by a scoring prompt and dividing this by the total number of known good documents for a given query. It's a useful metric for assessing the system's ability to retrieve a broad set of relevant documents from a large pool.

##### The limitation of semantic search and embeddings 

+ Limitations of Embedding-Based Search

    - Weakness in Keyword Precision: Embedding-based search excels in understanding the overall context and semantic meaning but may falter in accurately identifying specific keywords or phrases.It can miss documents containing exact terms if those terms are not semantically aligned with the rest of the content or query.
    + Contextual Misinterpretation: Embeddings can sometimes overgeneralize or misinterpret the context, leading to the retrieval of documents that are broadly relevant but miss specific nuances or details. They might struggle with distinguishing subtle differences in meanings, especially in specialized or technical domains.
    - Dependency on Training Data:The effectiveness of embeddings is highly dependent on the data they were trained on. If the training data lacks diversity or depth in certain topics, the embeddings may not capture those areas well.

+ Limitations of Semantic Search

    - Struggles with Synonyms and Paraphrasing:Traditional semantic search methods are often rigid in matching terms. They might not recognize synonyms or different ways of expressing the same idea, limiting their ability to retrieve all relevant documents.
    + Limited Understanding of Context:
    Semantic search can be effective in finding documents with specific terms but might not fully grasp the broader context or the intent behind a query. This limitation becomes pronounced in complex queries where understanding the context or the relationship between terms is crucial.



## 💡 Solution


Hybrid Search as a Winner: Hybrid search combines keyword and vector search methods, capitalizing on the strengths of both. Keyword search excels in identifying specific terms, while vector search excels in understanding semantic similarities. This combination ensures a more comprehensive and accurate retrieval of documents, making it especially effective for diverse and complex search queries.

Re-Ranking and L2 in Cognitive Search: The L2 layer in cognitive search improves upon the initial retrieval (L1) results by applying advanced ranking algorithms. It reorders the top documents, focusing on enhancing relevance and contextual accuracy. This is particularly important in scenarios where the initial retrieval might miss subtle nuances. L2 uses more sophisticated techniques, often leveraging deep learning models, to ensure the most relevant results are prioritized. 

In more detail: The semantic ranker runs the query and documents text simultaneously though transformer models that utilize the cross-attention mechanism to produce a ranker score. The query and document chunk score is calibrated to a range that is consistent across all indexes and queries. A score of 0 represents a very irrelevant chunk, and a score of 4 represents an excellent one. In the chart below, Hybrid + Semantic ranking finds the best content for the LLM at each result set size. 

## 📝 How-to

### 🌐 Azure Hybrid Search with Semantic Reranker

This section covers the implementation of a hybrid search system that combines traditional Azure search with a semantic reranker for improved results.

### 📊 Implementation of Evaluation Metrics using Scikit-learn

We will use the `ndcg_score` function from the `sklearn.metrics` module to evaluate our search system. This function calculates the Normalized Discounted Cumulative Gain (NDCG), a commonly used metric for evaluating the quality of a ranked list of items.

```python
from sklearn.metrics import ndcg_score
import numpy as np

# Sample data: Predicted scores and true relevance scores for a set of documents
predicted_scores = np.array([[0.9, 0.8, 0.7, 0.6, 0.5, 0.4, 0.3, 0.2, 0.1, 0.05]])
true_relevance = np.array([[1, 1, 0, 0, 1, 0, 0, 0, 0, 0]])  # Assuming binary relevance (1 for relevant, 0 for not relevant)

# NDCG@10
ndcg_at_10 = ndcg_score(true_relevance, predicted_scores, k=10)
print(f"NDCG@10: {ndcg_at_10}")

# NDCG@3
ndcg_at_3 = ndcg_score(true_relevance, predicted_scores, k=3)
print(f"NDCG@3: {ndcg_at_3}")

# Recall@50 - Normally, we'd have 50 documents, but for this example, we'll use the 10 we have
relevant_documents_count = np.sum(true_relevance)
recall_at_50 = relevant_documents_count / len(true_relevance[0])  # Dividing by total documents (50 ideally)
print(f"Recall@50: {recall_at_50}")
```

### 🎯 Evaluation Process

1. **🔍 Gather Azure Cognitive Search Results**: Retrieve the search results from Azure Cognitive Search.

2. **🎯 Define Ground Truth Relevance Scores**: Establish a ground truth set of relevance scores for the search results.

3. **📈 Calculate NDCG@3**: Use the `ndcg_score` function to calculate the NDCG at the 3rd position. This gives us a measure of the quality of the top 3 results.
```

### Prerequisites

Before running the script, you need to install some Python packages. These packages provide the functionality needed by the script.

Here's the command to install these packages:

```python
%pip install azure-search-documents==11.4.0b10
```

This script requires several environment variables to be set. Here's a list of these variables and what they're used for:

- `AZURE_SEARCH_SERVICE_ENDPOINT`: This is the endpoint for your Azure Cognitive Search service. It should be in the form `https://<your-search-service-name>.search.windows.net`.

- `AZURE_SEARCH_ADMIN_KEY`: This is the admin key for your Azure Cognitive Search service. You can find this in the "Keys" section of your search service in the Azure portal.

- `OPENAI_API_KEY`: This is your OpenAI API key. You can find this on the OpenAI website, under your account settings.

- `OPENAI_ENDPOINT`: This is the endpoint for the OpenAI API. It should be in the form `https://<your unique endpoint identifier>.openai.azure.com`.

- `AZURE_OPENAI_API_VERSION`: This is the version of the OpenAI API that you're using. For example, `v1`.

- `AZURE_OPENAI_EMBEDDINGS_DEPLOYMENT_ID`: This is the ID of your OpenAI embeddings deployment. You can find this on the OpenAI website, under your deployments.

Remember to store these variables in a `.env` file and never commit them to version control to keep your credentials secure.


In [3]:
%pip install azure-search-documents==11.4.0b10

Collecting azure-search-documents==11.4.0b10
  Using cached azure_search_documents-11.4.0b10-py3-none-any.whl.metadata (22 kB)
Using cached azure_search_documents-11.4.0b10-py3-none-any.whl (312 kB)
Installing collected packages: azure-search-documents
  Attempting uninstall: azure-search-documents
    Found existing installation: azure-search-documents 11.4.0b8
    Uninstalling azure-search-documents-11.4.0b8:
      Successfully uninstalled azure-search-documents-11.4.0b8
Successfully installed azure-search-documents-11.4.0b10
Note: you may need to restart the kernel to use updated packages.


In [1]:
import os
from dotenv import load_dotenv
from azure.core.credentials import AzureKeyCredential
import openai
from azure.search.documents import SearchClient
from openai.embeddings_utils import cosine_similarity, get_embedding
from azure.search.documents.models import RawVectorQuery

In [2]:
# Load environment variables from .env file
load_dotenv()

# Set up Azure Cognitive Search credentials
service_endpoint = os.getenv("AZURE_SEARCH_SERVICE_ENDPOINT")
key = os.getenv("AZURE_SEARCH_ADMIN_KEY")
credential = AzureKeyCredential(key)

# Set up OpenAI credentials and settings
openai.api_type = "azure"
openai.api_key = os.getenv("OPENAI_API_KEY")
openai.api_base = os.getenv('OPENAI_ENDPOINT')
openai.api_version = os.getenv("AZURE_OPENAI_API_VERSION")

# Set up model details
model_name = "text-embedding-ada-002"
ADA_DEPLOYMENT = os.getenv('AZURE_OPENAI_EMBEDDINGS_DEPLOYMENT_ID')

# Validate OpenAI API key and endpoint
assert openai.api_key, "ERROR: Azure OpenAI Key is missing"
assert openai.api_base, "ERROR: Azure OpenAI Endpoint is missing"
assert "openai.azure.com" in openai.api_base.lower(), "ERROR: Azure OpenAI Endpoint should be in the form: \n\n\t<your unique endpoint identifier>.openai.azure.com"

In [3]:
# Define the name of the Azure Search index
# This is the index where your data is stored in Azure Search
index_name = 'index-teradyne-web'

# Set up the Azure Search client with the specified index
# This prepares the client to interact with the Azure Search service
search_client = SearchClient(service_endpoint, index_name, credential=credential)

In [4]:
search_query = "On UltraFLEX platform, design an external instrument that can fast charge a smart phone."
search_vector = get_embedding(search_query, engine="foundational-ada")

In [None]:
# Pure vector Search

r = search_client.search(None, top=5, vector_queries=[RawVectorQuery(vector=search_vector, k=50, fields="content_vector")])
for doc in r:
    content = doc["content"].replace("\n", " ")[:1000]
    print(f"score: {doc['@search.score']}. {content}")

In [12]:
#keyword search
r = search_client.search(search_query, top=5)
for doc in r:
    if "Jesus" in doc["content"]:
        content = doc["content"].replace("\n", " ")[:1000]
        print(f"score: {doc['@search.score']}. {content}")

score: 8.379741. 18.12.1. Who Performs the OrdinanceOrdinances and blessings are sacred acts performed by the authority of the priesthood and in the name of Jesus Christ. As priesthood holders perform ordinances and blessings, they follow the Savior’s example of blessing others.
score: 8.292201. 18.10.4. Who Performs the OrdinanceOrdinances and blessings are sacred acts performed by the authority of the priesthood and in the name of Jesus Christ. As priesthood holders perform ordinances and blessings, they follow the Savior’s example of blessing others.
score: 8.288818. 18.6.1. Who Gives the BlessingOrdinances and blessings are sacred acts performed by the authority of the priesthood and in the name of Jesus Christ. As priesthood holders perform ordinances and blessings, they follow the Savior’s example of blessing others.   18.6.2. InstructionsOrdinances and blessings are sacred acts performed by the authority of the priesthood and in the name of Jesus Christ. As priesthood holders pe

In [13]:
# hybrid retrieval
search_query = "Who is Jesus Christ?"
search_vector = get_embedding(search_query, engine="foundational-ada")
r = search_client.search(search_query, top=5, vector_queries=[RawVectorQuery(vector=search_vector, k=50, fields="content_vector")])
for doc in r:
    content = doc["content"].replace("\n", " ")[:1000]
    print(f"score: {doc['@search.score']}. {content}")

score: 0.027973394840955734. 17.1. Principles of Christlike Teaching   17.1. Principles of Christlike TeachingEffective gospel teaching helps people grow in their testimonies and their faith in Heavenly Father and Jesus Christ.   17.1.1. Love Those You TeachEffective gospel teaching helps people grow in their testimonies and their faith in Heavenly Father and Jesus Christ.   17.1.2. Teach by the SpiritEffective gospel teaching helps people grow in their testimonies and their faith in Heavenly Father and Jesus Christ.
score: 0.026012461632490158. God’s Work of Salvation and Exaltation   Living the Gospel of Jesus Christ   16. Living the Gospel of Jesus ChristWe live the gospel as we exercise faith in Jesus Christ, repent daily, make covenants with God as we receive the ordinances of salvation and exaltation, and endure to the end by keeping those covenants.  17. Teaching the Gospel   17. Teaching the GospelEffective gospel teaching helps people grow in their testimonies and their faith 

In [5]:
# hybrid retrieval + rerank 
r = search_client.search(
        search_query,
        top=5, 
        vector_queries=[RawVectorQuery(vector=search_vector, k=50, fields="content_vector")],
        query_type="semantic",
        semantic_configuration_name="config",
        query_language="en-us")

for doc in r:
    content = doc["content"].replace("\n", " ")[:1000]
    print(f"score: {doc['@search.score']}, reranker: {doc['@search.reranker_score']}. {content}")

score: 0.026916220784187317, reranker: 2.1510417461395264. The UltraWaveMX20-D16 is Teradyne’s lower frequency band instrument. It is also an extension to the UltraWave24 to provide additional test coverage up to 20 GHz. This frequency band targets devices that have WiFi-6, Cellular 5G, Ultra-wideband, and 5G-NR IF interfaces. Each UltraWaveMX20-D16 provides 16 mmWave ports in the test system with the ability to scale to 64 mmWave ports for high site count testing. The UltraWaveMX8 adds mid-band RF test capabilities to support WiFi-6E & 5G test needs. Adding the MX8 as a 1-slot upgrade to any UltraFLEX system with UltraWave24 extends the frequency range to 7.5GHz. Up to 160 MHz of bandwidth is available for supporting 802.11 modulated tests, and 16 RF ports with 8 channels are provided for high parallelism and efficiency. The MX8 provides industry-leading WiFi-6E test capabilities with faster time to market, higher yields, and lower costs.
score: 0.026511315256357193, reranker: 2.09096

In [19]:
import logging
import json
from typing import List, Dict, Union

from azure.search.documents._paging import SearchItemPaged

# Set up logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)


In [20]:
# hybrid retrieval + rerank 
r = search_client.search(
        search_query,
        top=5, 
        vector_queries=[RawVectorQuery(vector=search_vector, k=50, fields="content_vector")],
        query_type="semantic",
        semantic_configuration_name="config",
        query_language="en-us")


In [21]:
def process_search_results(result: SearchItemPaged) -> List[Dict[str, str]]:
    """
    Process the results from a search query in an Azure Cognitive Search index.

    Args:
        result (SearchItemPaged): The results from a search query.

    Returns:
        List[Dict[str, str]]: A list of dictionaries, where each dictionary contains the content and source of a document.
    """
    results = []
    for doc in result:
        content = doc["content"].replace("\n", " ")[:1000]
        metadata = json.loads(doc["metadata"])
        source = metadata.get("source", "")

        logger.info(
            f"score: {doc['@search.score']}, reranker: {doc['@search.reranker_score']}. {content}"
        )
        logger.info(f"source: {source}")

        results.append({
            "content": content,
            "source": source,
        })

    return results

In [22]:
test= process_search_results(r)

INFO:azure.core.pipeline.policies.http_logging_policy:Request URL: 'https://azure-ai-search-dev-eastus-001.search.windows.net/indexes('index-teradyne-web')/docs/search.post.search?api-version=REDACTED'
Request method: 'POST'
Request headers:
    'Content-Type': 'application/json'
    'Content-Length': '34698'
    'api-key': 'REDACTED'
    'Accept': 'application/json;odata.metadata=none'
    'x-ms-client-request-id': 'f9d5c540-8a24-11ee-9f6c-f43bd8cfe846'
    'User-Agent': 'azsdk-python-search-documents/11.4.0b10 Python/3.9.18 (Windows-10-10.0.22631-SP0)'
A body is sent with the request
INFO:azure.core.pipeline.policies.http_logging_policy:Response status: 200
Response headers:
    'Transfer-Encoding': 'chunked'
    'Content-Type': 'application/json; odata.metadata=none; odata.streaming=true; charset=utf-8'
    'Content-Encoding': 'REDACTED'
    'Vary': 'REDACTED'
    'Server': 'Microsoft-IIS/10.0'
    'Strict-Transport-Security': 'REDACTED'
    'Preference-Applied': 'REDACTED'
    'ODa

In [25]:
test

[{'content': 'The UltraWaveMX20-D16 is Teradyne’s lower frequency band instrument. It is also an extension to the UltraWave24 to provide additional test coverage up to 20 GHz. This frequency band targets devices that have WiFi-6, Cellular 5G, Ultra-wideband, and 5G-NR IF interfaces. Each UltraWaveMX20-D16 provides 16 mmWave ports in the test system with the ability to scale to 64 mmWave ports for high site count testing. The UltraWaveMX8 adds mid-band RF test capabilities to support WiFi-6E & 5G test needs. Adding the MX8 as a 1-slot upgrade to any UltraFLEX system with UltraWave24 extends the frequency range to 7.5GHz. Up to 160 MHz of bandwidth is available for supporting 802.11 modulated tests, and 16 RF ports with 8 channels are provided for high parallelism and efficiency. The MX8 provides industry-leading WiFi-6E test capabilities with faster time to market, higher yields, and lower costs.',
  'source': 'https://www.teradyne.com/products/ultraflex/'},
 {'content': 'UVI264 – High 