# Azure AI Search Index Creation

This notebook creates an Azure AI Search index for a multilingual car troubleshooting system. The index supports both traditional keyword search and vector-based semantic search for better query understanding across languages.

## Overview
- **Purpose**: Create a search index to store and query car troubleshooting information
- **Features**: Multilingual support, vector search, faceted navigation
- **Use Case**: Enable users to search for car problems and solutions in their native language

## Step 1: Import Required Libraries

Import the necessary Python libraries for:
- Environment variable management (dotenv)
- Azure AI Search client and authentication
- Index schema definitions and vector search configuration

In [1]:
from dotenv import load_dotenv
from azure.search.documents.indexes.aio import SearchIndexClient
from azure.core.credentials import AzureKeyCredential
from azure.search.documents.indexes.models import (
    SearchField,
    VectorSearch,
    HnswAlgorithmConfiguration,
    VectorSearchProfile,
    SearchIndex,    
    SearchFieldDataType
)
import os

## Step 2: Load Azure AI Search Configuration

Load the Azure AI Search endpoint and API key from environment variables. These credentials are required to authenticate and connect to your Azure AI Search service.

**Required Environment Variables:**
- `SEARCH_ENDPOINT`: Your Azure AI Search service URL (e.g., https://your-service.search.windows.net)
- `SEARCH_API_KEY`: Admin API key for full access to create/modify indexes

In [2]:
load_dotenv(override=True)

search_endpoint = os.getenv('SEARCH_ENDPOINT')
search_api_key = os.getenv('SEARCH_API_KEY')

## Step 3: Create the Search Index

This cell performs the following operations:

### 3.1 Delete Existing Index (if present)
Ensures a clean slate by removing any previously created index with the same name.

### 3.2 Define Index Schema
Creates fields for:
- **id**: Unique document identifier (key field)
- **original_language**: Language code of the source content
- **brand / brand_en**: Car manufacturer name (original and English)
- **model / model_en**: Car model (original and English) - filterable and facetable
- **fault / fault_en**: Problem description (original and English)
- **fix**: Solution description (searchable)
- **vector_fix**: 1536-dimensional embedding vector for semantic search

### 3.3 Configure Vector Search
Uses HNSW (Hierarchical Navigable Small World) algorithm for:
- Fast approximate nearest neighbor search
- Efficient semantic similarity matching
- Scalable vector search across large datasets

### 3.4 Create Index
Deploys the index configuration to Azure AI Search service.

**Note**: The 1536 dimensions are compatible with OpenAI's text-embedding-ada-002 model.

In [5]:
index_name = "cartroubleshooting"

# Initialize the search index client
index_client = SearchIndexClient(endpoint=search_endpoint,credential=AzureKeyCredential(search_api_key))

# Delete existing index if it exists to start fresh
try:
    index = await index_client.get_index(index_name)
    if index:
        await index_client.delete_index(index_name)
except Exception:
    print("No Index found")

# Define the index schema with fields for multilingual content
# Fields ending in "_en" contain English translations
fields = [
    SearchField(name="id", type=SearchFieldDataType.String,key=True),   
    #SearchField(name="original_language", type=SearchFieldDataType.String, searchable=False,sortable=False, facetable=False, filterable=False),
    #SearchField(name="brand_en", type=SearchFieldDataType.String, searchable=True,sortable=False, facetable=False, filterable=False),      
    SearchField(name="brand", type=SearchFieldDataType.String, searchable=True,sortable=False, facetable=False, filterable=False),      
    #SearchField(name="model_en", type=SearchFieldDataType.String, searchable=True,sortable=False, facetable=True, filterable=True),  
    SearchField(name="model", type=SearchFieldDataType.String, searchable=True,sortable=False, facetable=True, filterable=True),  
    #SearchField(name="fault_en", type=SearchFieldDataType.String, searchable=True,sortable=False, facetable=False, filterable=False),
    SearchField(name="fault", type=SearchFieldDataType.String, searchable=True,sortable=False, facetable=False, filterable=False),
    # Vector field for semantic search using embeddings (1536 dimensions for OpenAI text-embedding-ada-002)
    SearchField(name="vector", type=SearchFieldDataType.Collection(SearchFieldDataType.Single), vector_search_dimensions=1024, vector_search_profile_name="vector-profile-1",searchable=True,sortable=False, facetable=False, filterable=False),
    SearchField(name="fix", type=SearchFieldDataType.String, searchable=True,sortable=False, facetable=False, filterable=False)    
]    

# Configure vector search using HNSW (Hierarchical Navigable Small World) algorithm
# This enables efficient approximate nearest neighbor search for semantic similarity
vector_search = VectorSearch(  
    algorithms=[  
        HnswAlgorithmConfiguration(name="myHnsw"),
    ],  
    profiles=[  
        VectorSearchProfile(  
            name="vector-profile-1",  
            algorithm_configuration_name="myHnsw"
        )
    ]
)

# Create the search index with the defined schema and vector search configuration
index = SearchIndex(name=index_name, fields=fields, vector_search=vector_search)
result = await index_client.create_or_update_index(index)
print(f"{result.name} created")

# Clean up: close the index client connection
await index_client.close()

cartroubleshooting created


## Next Steps

After running this notebook, you can:
1. **Populate the index** with car troubleshooting documents
2. **Perform searches** using keywords or semantic queries
3. **Filter results** by car model or other facetable fields
4. **Test multilingual search** capabilities

The index is now ready to receive data and handle search queries!