### 1. Overview
Advanced Search Techniques with Azure AI Search: Keyword, Vector, and Hybrid Methods

This notebook demonstrates how to perform different types of searches using Azure AI Search, including keyword search, vector search, hybrid search, semantic ranking, and query rewriting.

### 2. Set Up Environment Variables
Just like for Journey 1, create the `.env` file in the same directory as this notebook and update the variables.
You can use the `.env.sample` file to see which variables are needed.

After setting up, the notebook will automatically load these values using dotenv.

### 3. Load Environment Variables

Run the following command to load environment variables from the .env file:

In [None]:
import os
from azure.core.credentials import AzureKeyCredential
from azure.identity import DefaultAzureCredential
from dotenv import load_dotenv

load_dotenv(override=True) # take environment variables from .env.

endpoint = os.environ["AZURE_SEARCH_ENDPOINT"]
index_name = os.environ["AZURE_SEARCH_INDEX_NAME"]
credential = AzureKeyCredential(os.getenv("AZURE_SEARCH_ADMIN_KEY")) if os.getenv("AZURE_SEARCH_ADMIN_KEY") else DefaultAzureCredential()

This will ensure all necessary credentials are available before setting up the API client.

### 4. Set Up API Client and Define the Display Function

Initialize the Azure AI Search Client for interacting with the Azure Search service and make the search results easier to read by defining a function that formats and displays results:

In [None]:
from azure.search.documents import SearchClient
import pandas as pd

search_client = SearchClient(endpoint, index_name, credential)

def display_results(results):
    df = pd.json_normalize(list(results)).dropna(axis=1, how='all')
    df["chunk"] = df["chunk"].apply(lambda c: c[:300] + '...' if len(c) > 300 else c)
    first_cols = ['title', 'chunk', '@search.score']
    df = df[first_cols + [col for col in df.columns if col not in first_cols]]

    df = df.style.set_properties(**{
        'max-width': '500px',
        'text-align': 'left',
        'white-space': 'normal',
        'word-wrap': 'break-word'
    }).hide(axis="index")


    return df


### 5. Perform Different Search Methods

#### Keyword Search

Execute a traditional keyword-based search:

In [None]:
results = search_client.search(search_text="What is Contoso", top=5, select=["title", "chunk"])

display_results(results)


#### Vector Search

Retrieve documents using vector similarity search:

In [None]:
from azure.search.documents.models import VectorizableTextQuery

results = search_client.search(vector_queries=[VectorizableTextQuery(text="What is Contoso", k_nearest_neighbors=50, fields="text_vector")], top=5, select=["title", "chunk"])

display_results(results)

#### Hybrid Search (Keyword + Vector Search)

Combine keyword and vector searches for better accuracy:

In [None]:
results = search_client.search(
    search_text="What is Contoso",
    vector_queries=[VectorizableTextQuery(text="What is Contoso", k_nearest_neighbors=50, fields="text_vector")],
    top=5,
    select=["title", "chunk"]
)

display_results(results)

#### Hybrid Search + Semantic Ranker

Enhance search results using a semantic ranker:

In [None]:
#Semantic configuration name should be the name of your index + "-semantic-configuration" --> if you run into an error, verify the name of your semantic configuration
semantic_configuration_name=index_name + "-semantic-configuration"

results = search_client.search(
    search_text="What is Contoso",
    vector_queries=[VectorizableTextQuery(text="What is Contoso", k_nearest_neighbors=50, fields="text_vector")],
    top=5,
    select=["title", "chunk"],
    query_type="semantic",
    semantic_configuration_name=semantic_configuration_name
)

display_results(results)

#### Hybrid Search + Semantic Ranker + Query Rewriting

Use semantic ranking and query rewriting for improved relevance.

**Note**: Currently, query rewriting is in public preview stage and only available in a search service, Basic tier or higher, in **North Europe** or **Southeast Asia**.
More Info [here](https://learn.microsoft.com/en-us/azure/search/semantic-how-to-query-rewrite)!

In [None]:
# results = search_client.search(
#     search_text="What is Contoso",
#     vector_queries=[VectorizableTextQuery(text="What is Contoso", k_nearest_neighbors=50, fields="text_vector")],
#     top=5,
#     select=["title", "chunk"],
#     query_type="semantic",
#     semantic_configuration_name="ragtime2-semantic-configuration",
#     query_rewrites="generative",
#     query_language="en"
# )

# display_results(results)

### 6. Challenge
Let's have a look at the data of our search index and try to think how users might ask questions - and with which search query type the relevant chunks would be retrieved best!

1. Review content of the PerksPlus.pdf
2. Formulate two questions that users might ask about this content
3. Make assumptions about which search method will perform better (focus on keyword search vs. vector search)
4. Test the assumption by executing both searches and comparing the retrieved results.



In [None]:
#First question
question = "..."
#TO DO: your code goes here

print("Key word search results")
display_results(results_keyword)

print("Vector search results")
display_results(results_vector)


In [None]:
#Second question
question = "..."
#TO DO: your code goes here

print("Key word search results")
display_results(results_keyword)

print("Vector search results")
display_results(results_vector)

## Troubleshooting

- **Environment Variables Not Loaded:** Ensure you have correctly set the .env file or manually export them in your terminal before running the notebook.
- **Authentication Issues:** If using Managed Identity, make sure your Azure identity has proper role assignments.
- **Search Results Are Empty:** Ensure your Azure AI Search index contains vectorized data.
- **Query Rewriting Issues:** Ensure your search service supports semantic configurations and generative query rewrites.

## Summary

This notebook demonstrates different search techniques using Azure AI Search, including keyword search, vector search, hybrid search, semantic ranking, and query rewriting. The approach enhances search accuracy by leveraging vector embeddings and semantic understanding to retrieve the most relevant documents.

