# Advanced Search Techniques with Azure AI Search: Keyword, Vector, and Hybrid Methods

This notebook demonstrates how to perform different types of searches using Azure AI Search, including keyword search, vector search, hybrid search, semantic ranking, and query rewriting.

## Prerequisites

Before running the notebook, ensure you have the following: 

- [Fork](https://github.com/microsoft/rag-time/fork) the repository and clone it to your local machine by following the script below:

    ```bash
    git clone https://github.com/your-org/rag-time.git
    cd rag-time
    ```

- An [Azure account](https://portal.azure.com) with proper permissions to access the following services:
    - An **Azure AI Search** service with an index that contains vectorized text data. Follow the instructions in the [Quickstart](https://learn.microsoft.com/en-us/azure/search/search-get-started-portal-import-vectors?tabs=sample-data-storage%2Cmodel-aoai%2Cconnect-data-storage) to index the [MSFT_cloud_architecture_contoso.pdf](MSFT_cloud_architecture_contoso.pdf) file. 
- Install Python 3.8 or later from [python.org](https://python.org).

## Steps to Use the Notebook

### 1. Install Required Libraries

Run the first code cell to install the required Python libraries:

In [None]:
%pip install azure-search-documents==11.6.0b9 azure-identity python-dotenv pandas jinja2 --quiet

### 2. Set Up Environment Variables

To store credentials securely, rename `.env.sample` file to `.env` in the same directory as the notebook and update the following variables:

In [None]:
AZURE_SEARCH_SERVICE_ENDPOINT="<your_search_service_endpoint>"
AZURE_SEARCH_INDEX="<your_search_index_name>"
AZURE_SEARCH_ADMIN_KEY="<your_search_admin_key>"  # Leave blank if using Managed Identity

After setting up, the notebook will automatically load these values using dotenv.

### 3. Load Environment Variables

Run the following command to load environment variables from the .env file:

In [None]:
import os
from azure.core.credentials import AzureKeyCredential
from azure.identity import DefaultAzureCredential
from dotenv import load_dotenv

load_dotenv(override=True) # take environment variables from .env.

endpoint = os.environ["AZURE_SEARCH_SERVICE_ENDPOINT"]
index_name = os.environ["AZURE_SEARCH_INDEX"]
credential = AzureKeyCredential(os.getenv("AZURE_SEARCH_ADMIN_KEY")) if os.getenv("AZURE_SEARCH_ADMIN_KEY") else DefaultAzureCredential()

This will ensure all necessary credentials are available before setting up the API client.

### 4. Set Up API Client and Define the Display Function

Initialize the Azure AI Search Client for interacting with the Azure Search service and make the search results easier to read by defining a function that formats and displays results:

In [16]:
from azure.search.documents import SearchClient
import pandas as pd

search_client = SearchClient(endpoint, index_name, credential)

def display_results(results):
    df = pd.json_normalize(list(results)).dropna(axis=1, how='all')
    df["chunk"] = df["chunk"].apply(lambda c: c[:300] + '...' if len(c) > 300 else c)
    first_cols = ['title', 'chunk', '@search.score']
    df = df[first_cols + [col for col in df.columns if col not in first_cols]]

    df = df.style.set_properties(**{
        'max-width': '500px',
        'text-align': 'left',
        'white-space': 'normal',
        'word-wrap': 'break-word'
    }).hide(axis="index")


    return df


### 5. Perform Different Search Methods

#### Keyword Search

Execute a traditional keyword-based search:

In [None]:
results = search_client.search(search_text="What is Contoso", top=5, select=["title", "chunk"])

display_results(results)


#### Vector Search

Retrieve documents using vector similarity search:

In [None]:
from azure.search.documents.models import VectorizableTextQuery

results = search_client.search(vector_queries=[VectorizableTextQuery(text="What is Contoso", k_nearest_neighbors=50, fields="text_vector")], top=5, select=["title", "chunk"])

display_results(results)

#### Hybrid Search (Keyword + Vector Search)

Combine keyword and vector searches for better accuracy:

In [None]:
results = search_client.search(
    search_text="What is Contoso",
    vector_queries=[VectorizableTextQuery(text="What is Contoso", k_nearest_neighbors=50, fields="text_vector")],
    top=5,
    select=["title", "chunk"]
)

display_results(results)

#### Hybrid Search + Semantic Ranker

Enhance search results using a semantic ranker:

In [None]:
results = search_client.search(
    search_text="What is Contoso",
    vector_queries=[VectorizableTextQuery(text="What is Contoso", k_nearest_neighbors=50, fields="text_vector")],
    top=5,
    select=["title", "chunk"],
    query_type="semantic",
    semantic_configuration_name="ragtime2-semantic-configuration"
)

display_results(results)

#### Hybrid Search + Semantic Ranker + Query Rewriting

Use semantic ranking and query rewriting for improved relevance:

In [None]:
results = search_client.search(
    search_text="What is Contoso",
    vector_queries=[VectorizableTextQuery(text="What is Contoso", k_nearest_neighbors=50, fields="text_vector")],
    top=5,
    select=["title", "chunk"],
    query_type="semantic",
    semantic_configuration_name="ragtime2-semantic-configuration",
    query_rewrites="generative",
    query_language="en"
)

display_results(results)

## Troubleshooting

- **Environment Variables Not Loaded:** Ensure you have correctly set the .env file or manually export them in your terminal before running the notebook.
- **Authentication Issues:** If using Managed Identity, make sure your Azure identity has proper role assignments.
- **Search Results Are Empty:** Ensure your Azure AI Search index contains vectorized data.
- **Query Rewriting Issues:** Ensure your search service supports semantic configurations and generative query rewrites.

## Summary

This notebook demonstrates different search techniques using Azure AI Search, including keyword search, vector search, hybrid search, semantic ranking, and query rewriting. The approach enhances search accuracy by leveraging vector embeddings and semantic understanding to retrieve the most relevant documents.

