# Azure AI Search: Relevance

## Setup API clients

In [9]:
import os

import azure.identity
import dotenv
import openai
from azure.search.documents import SearchClient
from azure.search.documents.models import VectorizedQuery

dotenv.load_dotenv()

azure_credential = azure.identity.AzureDeveloperCliCredential(tenant_id=os.getenv("AZURE_TENANT_ID"))

# Initialize Azure OpenAI client
AZURE_OPENAI_SERVICE = os.getenv("AZURE_OPENAI_SERVICE")
AZURE_OPENAI_ADA_DEPLOYMENT = os.getenv("AZURE_OPENAI_ADA_DEPLOYMENT")

token_provider = azure.identity.get_bearer_token_provider(azure_credential, "https://cognitiveservices.azure.com/.default")
openai_client = openai.AzureOpenAI(
    api_version="2023-07-01-preview",
    azure_endpoint=f"https://{AZURE_OPENAI_SERVICE}.openai.azure.com",
    azure_ad_token_provider=token_provider)

def get_embedding(text):
    get_embeddings_response = openai_client.embeddings.create(model=AZURE_OPENAI_ADA_DEPLOYMENT, input=text)
    return get_embeddings_response.data[0].embedding

# Initialize Azure search client
AZURE_SEARCH_SERVICE = os.getenv("AZURE_SEARCH_SERVICE")
AZURE_SEARCH_ENDPOINT = f"https://{AZURE_SEARCH_SERVICE}.search.windows.net"

AZURE_SEARCH_FULL_INDEX = "gptkbindex"
search_client = SearchClient(AZURE_SEARCH_ENDPOINT, AZURE_SEARCH_FULL_INDEX, credential=azure_credential)


#### Vector search is not always optimal

For example, consider searches for exact strings.

In [10]:
search_query = "$45.00"
search_vector = get_embedding(search_query)
r = search_client.search(None, top=3, vector_queries=[
    VectorizedQuery(vector=search_vector, k_nearest_neighbors=50, fields="embedding")])
for doc in r:
    found_content = "Not found"
    if search_query.lower() in doc['content'].lower():
        found_content = doc['content'][doc['content'].lower().find(search_query.lower()):].replace("\n", " ")
    print(f"Score: {doc['@search.score']:.5f}\tMatching text: {found_content}")

Score: 0.52195	Matching text: Not found
Score: 0.52063	Matching text: Not found
Score: 0.52031	Matching text: Not found


Compare to a text search for same query:

In [11]:
r = search_client.search(search_query, top=3)
for doc in r:
    found_content = "Not found"
    if search_query.lower() in doc['content'].lower():
        found_content = doc['content'][doc['content'].lower().find(search_query.lower()):].replace("\n", " ")
    print(f"Score: {doc['@search.score']:.5f}\tMatching text: {found_content}")

Score: 4.82338	Matching text: $45.00</td><td>$55.00</td></tr><tr><td>Employee +1</td><td>$65.00</td><td>$71.00</td></tr><tr><td>Employee +2 or more</td><td>$78.00</td><td>$89.00</td></tr></table> Next Steps We hope that this information has been helpful in understanding the differences 
Score: 3.98890	Matching text: Not found


#### Hybrid retrieval

Uses RRF to combine vector and text results.

In [12]:
search_vector = get_embedding(search_query)
r = search_client.search(search_query, top=15, vector_queries=[
    VectorizedQuery(vector=search_vector, k_nearest_neighbors=10, fields="embedding")])
for doc in r:
    found_content = "Not found"
    if search_query in doc['content']:
        found_content = doc['content'][doc['content'].find(search_query):].replace("\n", " ")
    print(f"Score: {doc['@search.score']:.5f}\tMatching text: {found_content}")

Score: 0.01667	Matching text: $45.00</td><td>$55.00</td></tr><tr><td>Employee +1</td><td>$65.00</td><td>$71.00</td></tr><tr><td>Employee +2 or more</td><td>$78.00</td><td>$89.00</td></tr></table> Next Steps We hope that this information has been helpful in understanding the differences 
Score: 0.01667	Matching text: Not found
Score: 0.01639	Matching text: Not found
Score: 0.01639	Matching text: Not found
Score: 0.01613	Matching text: Not found
Score: 0.01587	Matching text: Not found
Score: 0.01562	Matching text: Not found
Score: 0.01538	Matching text: Not found
Score: 0.01515	Matching text: Not found
Score: 0.01493	Matching text: Not found
Score: 0.01471	Matching text: Not found
Score: 0.01449	Matching text: Not found


#### Hybrid ranking is not always optimal

In [13]:
search_query = "learning about underwater activities"
search_vector = get_embedding(search_query)
r = search_client.search(search_query, top=5, vector_queries=[
    VectorizedQuery(vector=search_vector, k_nearest_neighbors=10, fields="embedding")])
for doc in r:
    content = doc["content"].replace("\n", " ")[:150]
    print(f"Score: {doc['@search.score']:.5f}\t\tContent: {content}")

Score: 0.02361		Content:  Ask Northwind Health about any limitations or conditions that may be placed on the prior authorization. 5. Understand that certain services may not r
Score: 0.02205		Content: · Inpatient and respite care · Grief counseling and bereavement services In addition, Northwind Health Plus covers the costs of services that are rela
Score: 0.02020		Content: services and supplies, such as home dialysis machines, dialyzers, and other necessary supplies. In-network dialysis services are covered at 80% of the
Score: 0.01889		Content:  Members should be aware that some medical services may require pre- authorization before they can be approved, and that the clinical review process m
Score: 0.01667		Content: plans · Oversee all aspects of marketing, including digital marketing, public relations, advertising, and events · Monitor and analyze market trends, 


#### Hybrid + semantic reranking 🎉

In [14]:
search_query = "learning about underwater activities"
search_vector = get_embedding(search_query)
r = search_client.search(
        search_query,
        top=5, 
        vector_queries=[
                VectorizedQuery(vector=search_vector, k_nearest_neighbors=50, fields="embedding")],
        query_type="semantic",
        semantic_configuration_name="default")

for doc in r:
    content = doc["content"].replace("\n", " ")[:150]
    print(f"Score: {doc['@search.score']:.5f}\tReranker: {doc['@search.reranker_score']:.5f}\tContent: {content}")

Score: 0.01266	Reranker: 1.04293	Content:  Exceptions: In certain circumstances, Northwind Health Plus may use or disclose your PHI without your authorization. These exceptions include: • Disc
Score: 0.01471	Reranker: 0.92170	Content:  This includes any use or disclosure that is not for treatment, payment, or health care operations. Examples of non- health care uses and disclosures 
Score: 0.01389	Reranker: 0.82174	Content:  It is designed to help individuals with disabilities learn the skills they need to live as independently as possible, including communication, proble
Score: 0.01667	Reranker: 0.80048	Content: plans · Oversee all aspects of marketing, including digital marketing, public relations, advertising, and events · Monitor and analyze market trends, 
Score: 0.01515	Reranker: 0.64876	Content:  Some of the goals of habilitation services may include improving the individual's ability to communicate, learning how to use adaptive equipment, imp
