# Getting Started: Existing Azure AI Search Index Knowledge Source

This notebook demonstrates how to create a knowledge source from an existing Azure AI Search index. This is ideal when you already have indexed data and want to leverage it for intelligent retrieval.

## What You'll Learn

- Create an Azure AI Search index with semantic configuration
- Upload sample documents to the index
- Create a knowledge source from an existing search index
- Configure search fields and source data fields
- Query the existing index through the knowledge base
- Apply filters for targeted retrieval

## Prerequisites

- Azure subscription
- Existing Azure AI Foundry project (see notebook 01)
- Existing Azure AI Search service (see notebook 01)
- Python environment with requests library

## Architecture Overview

```
Existing Azure AI Search Index → Knowledge Source → Knowledge Base → Retrieval API
                    ↓
          Semantic Search + Reranking
```

**Note:** This approach does NOT ingest or re-index documents. It queries your existing index directly.

## Step 1: Configure Resources

In [None]:
import requests
import json

# Existing resources (from notebook 01 or your own)
EXISTING_SEARCH_ENDPOINT = "https://<your-search-service>.search.windows.net"
EXISTING_SEARCH_API_KEY = "<your-search-api-key>"
EXISTING_FOUNDRY_ENDPOINT = "https://<your-foundry-project>.services.ai.azure.com/api/projects/<project-name>"
EXISTING_AZURE_OPENAI_KEY = "<your-api-key>"
EXISTING_CHAT_DEPLOYMENT = "gpt-4o-mini"

# API version
API_VERSION = "2025-11-01-preview"

# Index configuration
SEARCH_INDEX_NAME = "products-index"

## Step 2: Create Azure AI Search Index

Create a search index with semantic configuration for better search results.

In [None]:
# Create search index with semantic configuration
url = f"{EXISTING_SEARCH_ENDPOINT}/indexes/{SEARCH_INDEX_NAME}?api-version={API_VERSION}"

headers = {
    "api-key": EXISTING_SEARCH_API_KEY,
    "Content-Type": "application/json"
}

index_schema = {
    "name": SEARCH_INDEX_NAME,
    "fields": [
        {
            "name": "id",
            "type": "Edm.String",
            "key": True,
            "filterable": True
        },
        {
            "name": "title",
            "type": "Edm.String",
            "searchable": True,
            "filterable": False,
            "sortable": False
        },
        {
            "name": "content",
            "type": "Edm.String",
            "searchable": True,
            "filterable": False,
            "sortable": False
        },
        {
            "name": "category",
            "type": "Edm.String",
            "searchable": False,
            "filterable": True,
            "sortable": True,
            "facetable": True
        },
        {
            "name": "price",
            "type": "Edm.Double",
            "searchable": False,
            "filterable": True,
            "sortable": True,
            "facetable": True
        },
        {
            "name": "inStock",
            "type": "Edm.Boolean",
            "searchable": False,
            "filterable": True,
            "sortable": False
        },
        {
            "name": "tags",
            "type": "Collection(Edm.String)",
            "searchable": True,
            "filterable": True,
            "facetable": True
        }
    ],
    "semantic": {
        "defaultConfiguration": "default",
        "configurations": [
            {
                "name": "default",
                "prioritizedFields": {
                    "titleField": {
                        "fieldName": "title"
                    },
                    "prioritizedContentFields": [
                        {
                            "fieldName": "content"
                        }
                    ],
                    "prioritizedKeywordsFields": [
                        {
                            "fieldName": "tags"
                        }
                    ]
                }
            }
        ]
    }
}

response = requests.put(url, headers=headers, json=index_schema)
print(f"Status: {response.status_code}")
if response.status_code in [200, 201]:
    print("✅ Search index created successfully!")
else:
    print(response.json())

## Step 3: Upload Sample Documents to Index

Populate the index with sample product data.

In [None]:
# Upload documents to the search index
url = f"{EXISTING_SEARCH_ENDPOINT}/indexes/{SEARCH_INDEX_NAME}/docs/index?api-version={API_VERSION}"

documents = {
    "value": [
        {
            "@search.action": "upload",
            "id": "1",
            "title": "Premium Laptop Pro 15",
            "content": "High-performance laptop featuring 15-inch 4K display, Intel Core i9 processor, 32GB DDR5 RAM, 1TB NVMe SSD, NVIDIA RTX 4060 graphics. Perfect for content creators, developers, and power users. Includes Thunderbolt 4 ports and Wi-Fi 6E.",
            "category": "electronics",
            "price": 2499.99,
            "inStock": True,
            "tags": ["laptop", "high-performance", "gaming", "professional"]
        },
        {
            "@search.action": "upload",
            "id": "2",
            "title": "Wireless Ergonomic Mouse",
            "content": "Ergonomic wireless mouse designed for all-day comfort. Features precision optical sensor with adjustable DPI (800-3200), 6 programmable buttons, and rechargeable battery with 60-day battery life. Compatible with Windows, Mac, and Linux.",
            "category": "accessories",
            "price": 79.99,
            "inStock": True,
            "tags": ["mouse", "wireless", "ergonomic", "productivity"]
        },
        {
            "@search.action": "upload",
            "id": "3",
            "title": "USB-C Docking Station Pro",
            "content": "Professional USB-C docking station with dual 4K monitor support, 100W power delivery, Gigabit Ethernet, SD card reader, and multiple USB 3.2 ports. Ideal for home office and hybrid work setups. Supports Windows, Mac, and Chrome OS.",
            "category": "accessories",
            "price": 249.99,
            "inStock": False,
            "tags": ["dock", "usb-c", "dual-monitor", "workstation"]
        },
        {
            "@search.action": "upload",
            "id": "4",
            "title": "4K Webcam with AI Features",
            "content": "Professional 4K webcam with AI-powered auto-framing, background blur, and low-light correction. Built-in dual microphones with noise cancellation. Perfect for video conferencing, streaming, and content creation. Plug-and-play USB connection.",
            "category": "electronics",
            "price": 199.99,
            "inStock": True,
            "tags": ["webcam", "4k", "video-conferencing", "streaming"]
        },
        {
            "@search.action": "upload",
            "id": "5",
            "title": "Mechanical Keyboard RGB",
            "content": "Premium mechanical keyboard with Cherry MX switches, customizable RGB backlighting, programmable macro keys, and detachable USB-C cable. Features aluminum frame, dedicated media controls, and included wrist rest. Available in multiple switch types.",
            "category": "accessories",
            "price": 159.99,
            "inStock": True,
            "tags": ["keyboard", "mechanical", "rgb", "gaming", "typing"]
        },
        {
            "@search.action": "upload",
            "id": "6",
            "title": "27-inch 4K Monitor",
            "content": "Professional 27-inch 4K IPS monitor with 99% sRGB color accuracy, HDR400, 60Hz refresh rate, and USB-C connectivity with 65W power delivery. Includes height-adjustable stand and VESA mount compatibility. Ideal for creative professionals.",
            "category": "electronics",
            "price": 449.99,
            "inStock": True,
            "tags": ["monitor", "4k", "display", "professional"]
        }
    ]
}

response = requests.post(url, headers=headers, json=documents)
print(f"Status: {response.status_code}")
if response.status_code == 200:
    result = response.json()
    print(f"✅ Uploaded {len(result['value'])} documents successfully!")
else:
    print(response.json())

## Step 4: Create Knowledge Source from Existing Index

Create a knowledge source that references the existing search index.

In [None]:
KNOWLEDGE_SOURCE_NAME = "products-index-source"

url = f"{EXISTING_SEARCH_ENDPOINT}/knowledgeSources/{KNOWLEDGE_SOURCE_NAME}?api-version={API_VERSION}"

body = {
    "name": KNOWLEDGE_SOURCE_NAME,
    "kind": "searchIndex",
    "description": "Knowledge source from existing products search index",
    "searchIndexParameters": {
        "searchIndexName": SEARCH_INDEX_NAME,
        "searchFields": [  # Fields to search for relevant content
            {
                "name": "content"
            },
            {
                "name": "title"
            },
            {
                "name": "tags"
            }
        ],
        "sourceDataFields": [  # Additional fields to return in results
            {
                "name": "id"
            },
            {
                "name": "category"
            },
            {
                "name": "price"
            },
            {
                "name": "inStock"
            }
        ],
        "semanticConfigurationName": "default"  # Use the semantic config we created
    }
}

response = requests.put(url, headers=headers, json=body)
print(f"Status: {response.status_code}")
print(json.dumps(response.json(), indent=2))

## Step 5: Create Knowledge Base

Create a knowledge base using the search index source.

In [None]:
KNOWLEDGE_BASE_NAME = "products-kb"

url = f"{EXISTING_SEARCH_ENDPOINT}/knowledgeBases/{KNOWLEDGE_BASE_NAME}?api-version={API_VERSION}"

body = {
    "name": KNOWLEDGE_BASE_NAME,
    "description": "Product catalog knowledge base",
    "knowledgeSources": [
        {
            "name": KNOWLEDGE_SOURCE_NAME
        }
    ],
    "models": [
        {
            "kind": "azureOpenAI",
            "azureOpenAIParameters": {
                "resourceUri": EXISTING_FOUNDRY_ENDPOINT,
                "deploymentId": EXISTING_CHAT_DEPLOYMENT,
                "modelName": EXISTING_CHAT_DEPLOYMENT,
                "apiKey": EXISTING_AZURE_OPENAI_KEY
            }
        }
    ],
    "outputMode": "answerSynthesis",
    "retrievalInstructions": "Provide accurate product information including specifications and pricing.",
    "answerInstructions": "Give helpful product recommendations with specific details and prices."
}

response = requests.put(url, headers=headers, json=body)
print(f"Status: {response.status_code}")
print(json.dumps(response.json(), indent=2))

## Step 6: Query the Knowledge Base

Test the knowledge base with various product queries.

In [None]:
# Simple product query
url = f"{EXISTING_SEARCH_ENDPOINT}/knowledgeBases/{KNOWLEDGE_BASE_NAME}/retrieve?api-version={API_VERSION}"

query_body = {
    "messages": [
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": "What laptops do you have available?"
                }
            ]
        }
    ],
    "includeActivity": True
}

response = requests.post(url, headers=headers, json=query_body)
result = response.json()

print("Answer:")
print(result["choices"][0]["message"]["content"])
print("\nReferences:")
for ref in result.get("activity", {}).get("references", []):
    print(f"- {ref.get('title', 'Unknown')}")

In [None]:
# Query for accessories
query_body = {
    "messages": [
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": "I need ergonomic accessories for my home office. What do you recommend?"
                }
            ]
        }
    ],
    "includeActivity": True
}

response = requests.post(url, headers=headers, json=query_body)
result = response.json()

print("Answer:")
print(result["choices"][0]["message"]["content"])

## Step 7: Query with Filters

Apply OData filters to narrow down results.

In [None]:
# Query with category filter (only electronics)
query_body = {
    "messages": [
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": "Show me high-end electronics for content creation"
                }
            ]
        }
    ],
    "includeActivity": True,
    "knowledgeSourceParams": [
        {
            "knowledgeSourceName": KNOWLEDGE_SOURCE_NAME,
            "kind": "searchIndex",
            "includeReferences": True,
            "includeReferenceSourceData": True,
            "alwaysQuerySource": True,
            "rerankerThreshold": 0.5,
            "filterAddOn": "category eq 'electronics'"  # OData filter
        }
    ]
}

response = requests.post(url, headers=headers, json=query_body)
result = response.json()

print("Answer:")
print(result["choices"][0]["message"]["content"])
print("\nFiltered to electronics category only")

In [None]:
# Query with price range filter
query_body = {
    "messages": [
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": "What products are available under $200?"
                }
            ]
        }
    ],
    "includeActivity": True,
    "knowledgeSourceParams": [
        {
            "knowledgeSourceName": KNOWLEDGE_SOURCE_NAME,
            "kind": "searchIndex",
            "includeReferences": True,
            "includeReferenceSourceData": True,
            "alwaysQuerySource": True,
            "rerankerThreshold": 0.4,
            "filterAddOn": "price lt 200 and inStock eq true"  # Price and availability filter
        }
    ]
}

response = requests.post(url, headers=headers, json=query_body)
result = response.json()

print("Answer:")
print(result["choices"][0]["message"]["content"])
print("\nFiltered to: price < $200 AND in stock")

## OData Filter Examples

Common OData filter patterns for search index knowledge sources:

```python
# Equality
"category eq 'electronics'"

# Comparison
"price lt 200"  # less than
"price le 200"  # less than or equal
"price gt 100"  # greater than
"price ge 100"  # greater than or equal

# Logical operators
"category eq 'electronics' and inStock eq true"
"price lt 200 or category eq 'accessories'"
"not (category eq 'discontinued')"

# Range queries
"price ge 100 and price le 500"

# Collection contains
"tags/any(t: t eq 'gaming')"
"tags/any(t: t in ('wireless', 'bluetooth'))"

# Complex filters
"(category eq 'electronics' and price lt 1000) or (category eq 'accessories' and inStock eq true)"
```

## Step 8: Understanding Search vs. Source Data Fields

- **searchFields**: Fields used for finding relevant documents (title, content, tags)
- **sourceDataFields**: Additional metadata to include in results (id, category, price)

The knowledge base searches `searchFields` but can return data from both.

In [None]:
# Query that demonstrates source data field usage
query_body = {
    "messages": [
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": "Compare the prices of gaming accessories"
                }
            ]
        }
    ],
    "includeActivity": True,
    "knowledgeSourceParams": [
        {
            "knowledgeSourceName": KNOWLEDGE_SOURCE_NAME,
            "kind": "searchIndex",
            "includeReferences": True,
            "includeReferenceSourceData": True,  # Include price, category, etc.
            "alwaysQuerySource": True,
            "filterAddOn": "tags/any(t: t eq 'gaming')"
        }
    ]
}

response = requests.post(url, headers=headers, json=query_body)
result = response.json()

print("Answer:")
print(result["choices"][0]["message"]["content"])
print("\nNote: The response includes pricing from sourceDataFields")

## Cleanup

Clean up resources when done.

In [None]:
# Delete knowledge base
url = f"{EXISTING_SEARCH_ENDPOINT}/knowledgeBases/{KNOWLEDGE_BASE_NAME}?api-version={API_VERSION}"
response = requests.delete(url, headers=headers)
print(f"Delete knowledge base: {response.status_code}")

In [None]:
# Delete knowledge source
url = f"{EXISTING_SEARCH_ENDPOINT}/knowledgeSources/{KNOWLEDGE_SOURCE_NAME}?api-version={API_VERSION}"
response = requests.delete(url, headers=headers)
print(f"Delete knowledge source: {response.status_code}")

In [None]:
# Optional: Delete the search index
url = f"{EXISTING_SEARCH_ENDPOINT}/indexes/{SEARCH_INDEX_NAME}?api-version={API_VERSION}"
response = requests.delete(url, headers=headers)
print(f"Delete search index: {response.status_code}")

## Summary

In this notebook, you learned how to:

1. Create an Azure AI Search index with semantic configuration
2. Upload documents to the search index
3. Create a knowledge source from an existing search index
4. Configure search fields and source data fields
5. Query the existing index through a knowledge base
6. Apply OData filters for targeted retrieval
7. Use source data fields to enrich responses

## When to Use Search Index Knowledge Sources

Search Index knowledge sources are ideal when:

- You already have indexed data in Azure AI Search
- You need complex filtering and faceting capabilities
- You want to combine structured metadata with text search
- You need real-time updates (no ingestion delay)
- You're building product catalogs, inventory systems, or directories

## Comparison: Search Index vs. Ingested Sources

| Feature | Search Index | Blob/SharePoint/OneLake |
|---------|-------------|-------------------------|
| **Setup** | Create index + upload docs | Configure ingestion |
| **Latency** | Real-time | Ingestion delay |
| **Control** | Full schema control | Auto-generated schema |
| **Filtering** | Rich OData filters | Limited |
| **Updates** | Immediate | Scheduled/on-demand |
| **Best For** | Structured data | Document libraries |

## Next Steps

- Combine multiple knowledge sources (blob + search index)
- Implement hybrid search with vectors and keywords
- Add custom analyzers for domain-specific terminology
- Set up synonyms for better recall
- Explore vector search with embeddings