# 🚗 Azure AI Search Index Creation for Multilingual Car Troubleshooting

Welcome! This notebook will guide you through creating a powerful search index that helps users find car troubleshooting solutions in multiple languages. 

## 🎯 What We'll Build
This notebook creates an Azure AI Search index designed specifically for a multilingual car troubleshooting system. The index combines traditional keyword search with cutting-edge vector-based semantic search for better understanding of queries across different languages.

## ✨ Key Features
- 🌍 **Multilingual Support**: Search in your native language
- 🔍 **Vector Search**: Semantic understanding using AI embeddings
- 🏷️ **Faceted Navigation**: Filter by car models and brands
- ⚡ **Fast Performance**: HNSW algorithm for efficient vector search

## 📋 What You'll Learn
By the end of this notebook, you'll have created three different index configurations to test various multilingual search strategies!

---

## 📚 Prerequisites

Before running this notebook, make sure you have:
- ✅ An Azure AI Search service created
- ✅ Environment variables configured (`.env` file)
- ✅ Python packages installed: `azure-search-documents`, `python-dotenv`

Let's get started! 🚀

## Step 1: Import Required Libraries 📦

First, we'll import all the necessary Python libraries. These tools will help us:
- 🔐 Manage environment variables securely (dotenv)
- 🔗 Connect to Azure AI Search (authentication)
- 📊 Define the index structure (schema definitions)
- 🎯 Configure vector search capabilities

In [2]:
from dotenv import load_dotenv
from azure.search.documents.indexes.aio import SearchIndexClient
from azure.core.credentials import AzureKeyCredential
from azure.search.documents.indexes.models import (
    SearchField,
    VectorSearch,
    HnswAlgorithmConfiguration,
    VectorSearchProfile,
    SearchIndex,    
    SearchFieldDataType
)
import os

✅ **Libraries loaded!** We're ready to connect to Azure.

## Step 2: Load Azure AI Search Configuration 🔑

Now we'll load your Azure AI Search credentials from environment variables. This keeps your sensitive information secure and out of the code.

**Required Environment Variables:**
- `SEARCH_ENDPOINT`: Your Azure AI Search service URL 
  - Example: `https://your-service.search.windows.net`
- `SEARCH_API_KEY`: Admin API key for full access to create/modify indexes

💡 **Tip**: Make sure your `.env` file is in the project root directory!

In [3]:
load_dotenv(override=True)

search_endpoint = os.getenv('SEARCH_ENDPOINT')
search_api_key = os.getenv('SEARCH_API_KEY')

🔌 **Configuration loaded!** We're connected to your Azure AI Search service.

## Step 3: Define Index Schemas 🏗️

This is where the magic happens! We'll define **three different index configurations** to test various multilingual search approaches:

### 🌐 Index 1: "multilanguage"
- **Strategy**: Vectorize content in its original language
- **Vector Size**: 1024 dimensions (Cohere embeddings)
- **Use Case**: Test native language embedding performance

### 🔄 Index 2: "translated" 
- **Strategy**: Translate everything to English before vectorizing
- **Vector Size**: 1536 dimensions (OpenAI embeddings)
- **Use Case**: Leverage English-trained models for all languages

### 🎭 Index 3: "translated_dual"
- **Strategy**: Store both original language AND English vectors
- **Vector Sizes**: 1024 (original) + 1536 (English) dimensions
- **Use Case**: Best of both worlds - hybrid search capability

### 📋 Common Fields
All indexes include:
- **id** 🔑: Unique identifier
- **original_language** 🌍: Source language code
- **brand** 🚗: Car manufacturer
- **model** 🏷️: Car model (filterable)
- **fault** ⚠️: Problem description
- **fix** 🔧: Solution instructions

In [12]:
indexes = [
    {
        # This index will vectorize in the original language using cohere
        # This will affect the research since the embedding of the prompt
        # will be in the current language of the user
        'name': 'multilanguage',
        'fields': [
                SearchField(name="id", type=SearchFieldDataType.String,key=True),   
                SearchField(name="original_language", type=SearchFieldDataType.String, searchable=False,sortable=False, facetable=False, filterable=False),                
                SearchField(name="brand", type=SearchFieldDataType.String, searchable=True,sortable=False, facetable=False, filterable=False),                      
                SearchField(name="model", type=SearchFieldDataType.String, searchable=True,sortable=False, facetable=True, filterable=True),                  
                SearchField(name="fault", type=SearchFieldDataType.String, searchable=True,sortable=False, facetable=False, filterable=False),                
                SearchField(name="vector", type=SearchFieldDataType.Collection(SearchFieldDataType.Single), vector_search_dimensions=1024, vector_search_profile_name="vector-profile-1",searchable=True,sortable=False, facetable=False, filterable=False),
                SearchField(name="fix", type=SearchFieldDataType.String, searchable=True,sortable=False, facetable=False, filterable=False)    
        ]    
    },
    {
        # This index will vectorize in english when the languague is not in english
        'name': 'translated',
        'fields': [
                SearchField(name="id", type=SearchFieldDataType.String,key=True),   
                SearchField(name="original_language", type=SearchFieldDataType.String, searchable=False,sortable=False, facetable=False, filterable=False),                
                SearchField(name="brand", type=SearchFieldDataType.String, searchable=True,sortable=False, facetable=False, filterable=False),                      
                SearchField(name="model", type=SearchFieldDataType.String, searchable=True,sortable=False, facetable=True, filterable=True),                  
                SearchField(name="fault", type=SearchFieldDataType.String, searchable=True,sortable=False, facetable=False, filterable=False),                
                SearchField(name="vector", type=SearchFieldDataType.Collection(SearchFieldDataType.Single), vector_search_dimensions=1536, vector_search_profile_name="vector-profile-1",searchable=True,sortable=False, facetable=False, filterable=False),
                SearchField(name="fix", type=SearchFieldDataType.String, searchable=True,sortable=False, facetable=False, filterable=False)    
        ]            
    },
    {
        # This index will vectorize in english when the languague is not in english and another vector in the actual language
        'name': 'translated_dual',
        'fields': [
                SearchField(name="id", type=SearchFieldDataType.String,key=True),   
                SearchField(name="original_language", type=SearchFieldDataType.String, searchable=False,sortable=False, facetable=False, filterable=False),                
                SearchField(name="brand", type=SearchFieldDataType.String, searchable=True,sortable=False, facetable=False, filterable=False),                      
                SearchField(name="model", type=SearchFieldDataType.String, searchable=True,sortable=False, facetable=True, filterable=True),                  
                SearchField(name="fault", type=SearchFieldDataType.String, searchable=True,sortable=False, facetable=False, filterable=False),                
                SearchField(name="vector", type=SearchFieldDataType.Collection(SearchFieldDataType.Single), vector_search_dimensions=1024, vector_search_profile_name="vector-profile-1",searchable=True,sortable=False, facetable=False, filterable=False),
                SearchField(name="vector_english", type=SearchFieldDataType.Collection(SearchFieldDataType.Single), vector_search_dimensions=1536, vector_search_profile_name="vector-profile-1",searchable=True,sortable=False, facetable=False, filterable=False),
                SearchField(name="fix", type=SearchFieldDataType.String, searchable=True,sortable=False, facetable=False, filterable=False)    
        ]            
    }    
]

📐 **Index schemas defined!** Three different strategies ready to deploy.

---

## 🚀 Creating the Indexes

Now we'll create all three indexes in your Azure AI Search service. This process will:

1. 🗑️ **Clean up**: Delete any existing indexes with the same names
2. ⚙️ **Configure**: Set up HNSW vector search algorithm
3. 📤 **Deploy**: Push the index definitions to Azure
4. ✅ **Verify**: Confirm successful creation

The HNSW (Hierarchical Navigable Small World) algorithm provides:
- ⚡ Fast approximate nearest neighbor search
- 🎯 Efficient semantic similarity matching
- 📈 Scalability for large datasets

Let's create them! 🎬

In [13]:
# Initialize the search index client
index_client = SearchIndexClient(endpoint=search_endpoint,credential=AzureKeyCredential(search_api_key))

# Configure vector search using HNSW (Hierarchical Navigable Small World) algorithm
# This enables efficient approximate nearest neighbor search for semantic similarity
vector_search = VectorSearch(  
    algorithms=[  
        HnswAlgorithmConfiguration(name="myHnsw"),
    ],  
    profiles=[  
        VectorSearchProfile(  
            name="vector-profile-1",  
            algorithm_configuration_name="myHnsw"
        )
    ]
)

for index in indexes:
    # Delete existing index if it exists to start fresh
    try:
        index_found = await index_client.get_index(index['name'])
        if index_found:
            await index_client.delete_index(index['name'])
    except Exception:
        print("No Index found")

    # Create the search index with the defined schema and vector search configuration
    index_definition = SearchIndex(name=index['name'], fields=index['fields'], vector_search=vector_search)
    result = await index_client.create_or_update_index(index_definition)
    print(f"{result.name} created")

# Clean up: close the index client connection
await index_client.close()

multilanguage created
translated created
No Index found
translated created
No Index found
translated_dual created
translated_dual created


🎉 **Success!** All three indexes have been created and are ready to use!

---