### Setup

In [1]:
import os

import azure.identity
import dotenv
import openai
from azure.core.credentials import AzureKeyCredential
from azure.search.documents import SearchClient
from azure.search.documents.models import VectorizedQuery

dotenv.load_dotenv()



# Initialize Azure OpenAI client
AZURE_OPENAI_SERVICE = os.getenv("AZURE_OPENAI_SERVICE")
AZURE_OPENAI_ADA_DEPLOYMENT = os.getenv("AZURE_OPENAI_ADA_DEPLOYMENT")
AZURE_OPENAI_SERVICE_KEY = os.getenv("AZURE_OPENAI_SERVICE_KEY")
AZURE_SEARCH_SERVICE_KEY = os.getenv("AZURE_SEARCH_SERVICE_KEY")

azure_search_credential = AzureKeyCredential(AZURE_SEARCH_SERVICE_KEY)

openai_client = openai.AzureOpenAI(
    api_version="2023-07-01-preview",
    api_key=AZURE_OPENAI_SERVICE_KEY,
    azure_endpoint=f"https://{AZURE_OPENAI_SERVICE}.openai.azure.com")

def get_embedding(text):
    get_embeddings_response = openai_client.embeddings.create(model=AZURE_OPENAI_ADA_DEPLOYMENT, input=text)
    return get_embeddings_response.data[0].embedding

# Initialize Azure search client
AZURE_SEARCH_SERVICE = os.getenv("AZURE_SEARCH_SERVICE")
AZURE_SEARCH_ENDPOINT = f"https://{AZURE_SEARCH_SERVICE}.search.windows.net"

AZURE_SEARCH_FULL_INDEX = "vector-1715911082937"
search_client = SearchClient(AZURE_SEARCH_ENDPOINT, AZURE_SEARCH_FULL_INDEX, credential=azure_search_credential)


#### Vector search is not always optimal

For example, consider searches for exact strings.

In [2]:
search_query = "$45.00"
search_vector = get_embedding(search_query)
r = search_client.search(None, top=3, vector_queries=[
    VectorizedQuery(vector=search_vector,k_nearest_neighbors=50, fields="vector")])

for doc in r:
    found_content = "Not found"
    if search_query in doc['chunk']:
        found_content = doc['chunk'][doc['chunk'].find(search_query):].replace("\n", " ")
    print(f"Score: {doc['@search.score']:.5f}\tMatching text: {found_content}")

Score: 0.81094	Matching text: Not found
Score: 0.80699	Matching text: Not found
Score: 0.80669	Matching text: Not found


Compare to a text search for same query:

In [3]:
r = search_client.search(search_query, top=3)
for doc in r:
    found_content = "Not found"
    if search_query in doc['chunk']:
        found_content = doc['chunk'][doc['chunk'].find(search_query):].replace("\n", " ")
    print(f"Score: {doc['@search.score']:.5f}\tMatching text: {found_content}")

Score: 2.31820	Matching text: $45.00 $55.00 Employee +1 $65.00 $71.00 Employee +2 or $78.00 $89.00 more


#### Hybrid retrieval

Uses RRF to combine vector and text results.

In [4]:
search_vector = get_embedding(search_query)
r = search_client.search(search_query, top=15, vector_queries=[
    VectorizedQuery(vector=search_vector, k_nearest_neighbors=10, fields="vector")])

for doc in r:
    found_content = "Not found"
    if search_query in doc['chunk']:
        found_content = doc['chunk'][doc['chunk'].find(search_query):].replace("\n", " ")
    print(f"Score: {doc['@search.score']:.5f}\tMatching text: {found_content}")

Score: 0.03116	Matching text: $45.00 $55.00 Employee +1 $65.00 $71.00 Employee +2 or $78.00 $89.00 more
Score: 0.01667	Matching text: Not found
Score: 0.01639	Matching text: Not found
Score: 0.01613	Matching text: Not found
Score: 0.01587	Matching text: Not found
Score: 0.01562	Matching text: Not found
Score: 0.01538	Matching text: Not found
Score: 0.01515	Matching text: Not found
Score: 0.01493	Matching text: Not found
Score: 0.01471	Matching text: Not found


#### Hybrid ranking is not always optimal

In [5]:
search_query = "learning about underwater activities"
search_vector = get_embedding(search_query)
r = search_client.search(search_query, top=5, vector_queries=[
    VectorizedQuery(vector=search_vector, k_nearest_neighbors=10, fields="vector")])
for doc in r:
    content = doc["chunk"].replace("\n", " ")[:250]
    print(f"Score: {doc['@search.score']:.5f}\t\tContent: {content}")

Score: 0.03089		Content: in a Human Resources role.   • Excellent written and verbal communication skills.   • Demonstrated ability to lead and manage change.   • Proficiency with Microsoft Office Suite.   • Knowledge of applicable labor laws and regulations.   • Ability to 
Score: 0.03080		Content: treatment of dental injuries,   such as root canals, crowns, fillings, extractions, and periodontal services. This coverage   also includes dental anesthesia used during a dental procedure, such as local anesthesia,   sedation, and general anesthesia
Score: 0.02871		Content: Yoga and Pilates classes   • Fitness equipment purchases   • Sports team fees   • Health retreats and spas   • Outdoor adventure activities (such as rock climbing, hiking, and kayaking)   • Group fitness classes (such as dance, martial arts, and cycl
Score: 0.02814		Content: product life-cycle management processes.   • Monitor and analyze industry trends to identify opportunities for new products.   • Develop product ma

#### Hybrid + semantic reranking 🎉

In [6]:
search_query = "learning about underwater activities"
search_vector = get_embedding(search_query)
r = search_client.search(
        search_query,
        top=5, 
        vector_queries=[
                VectorizedQuery(vector=search_vector, k_nearest_neighbors=50, fields="vector")],
        query_type="semantic",
        semantic_configuration_name="vector-1715911082937-semantic-configuration")

for doc in r:
    content = doc["chunk"].replace("\n", " ")[:550]
    print(f"Score: {doc['@search.score']:.5f}\tReranker: {doc['@search.reranker_score']:.5f}\tContent: {content}")

Score: 0.02871	Reranker: 2.05587	Content: Yoga and Pilates classes   • Fitness equipment purchases   • Sports team fees   • Health retreats and spas   • Outdoor adventure activities (such as rock climbing, hiking, and kayaking)   • Group fitness classes (such as dance, martial arts, and cycling)   • Virtual fitness programs (such as online yoga and workout classes)   In addition to the wide range of fitness activities covered by PerksPlus, the program also covers a variety   of lessons and experiences that promote health and wellness. Some of the lessons covered under   PerksPlus inclu
Score: 0.02546	Reranker: 0.92402	Content: - Lead the design of Contoso’s overall marketing efforts   - Monitor competitive landscape, consumer trends, and industry best practices   - Manage and develop marketing team   - Ensure all marketing activities are in line with budget      Qualifications:   - Bachelor’s degree in marketing, communications, or related field   - 5+ years of experience in marketing,

#### Hybrid + semantic reranking + Integrated 🎉

In [7]:
from azure.search.documents.models import VectorizableTextQuery

search_query = "learning about underwater activities"

r = search_client.search(
        search_query,
        top=5, 
        vector_queries=[
                VectorizableTextQuery(text=search_query, k_nearest_neighbors=3, fields="vector", exhaustive=True)],
        query_type="semantic",
        semantic_configuration_name="vector-1715911082937-semantic-configuration")

for doc in r:
    content = doc["chunk"].replace("\n", " ")[:550]
    print(f"Score: {doc['@search.score']:.5f}\tReranker: {doc['@search.reranker_score']:.5f}\tContent: {content}")

Score: 0.02871	Reranker: 2.05587	Content: Yoga and Pilates classes   • Fitness equipment purchases   • Sports team fees   • Health retreats and spas   • Outdoor adventure activities (such as rock climbing, hiking, and kayaking)   • Group fitness classes (such as dance, martial arts, and cycling)   • Virtual fitness programs (such as online yoga and workout classes)   In addition to the wide range of fitness activities covered by PerksPlus, the program also covers a variety   of lessons and experiences that promote health and wellness. Some of the lessons covered under   PerksPlus inclu
Score: 0.01075	Reranker: 0.92402	Content: - Lead the design of Contoso’s overall marketing efforts   - Monitor competitive landscape, consumer trends, and industry best practices   - Manage and develop marketing team   - Ensure all marketing activities are in line with budget      Qualifications:   - Bachelor’s degree in marketing, communications, or related field   - 5+ years of experience in marketing,