# Azure AI Search - Async Search Operations

This notebook demonstrates asynchronous search operations in Azure AI Search for high-performance applications.

## ⚠️ Status: Under Development

This notebook is currently being tested and refined. Please use the main `basic_search.ipynb` for stable functionality.

## Learning Objectives
- Understand when to use async operations
- Perform concurrent search operations
- Handle async errors and SSL issues
- Implement proper async client lifecycle management
- Optimize performance with async patterns

## Prerequisites
- Completed the main `basic_search.ipynb` notebook
- Azure AI Search service configured
- Environment variables set (AZURE_SEARCH_SERVICE_ENDPOINT, AZURE_SEARCH_API_KEY, AZURE_SEARCH_INDEX_NAME)
- Sample data indexed in your search service
- Additional requirement: `aiohttp` package installed

## Setup and Imports

First, let's import the necessary libraries for async operations.

In [None]:
import os
import asyncio
import time
import ssl
import aiohttp
import logging
from typing import List, Dict, Any
from azure.search.documents.aio import SearchClient as AsyncSearchClient
from azure.core.credentials import AzureKeyCredential
from azure.core.exceptions import HttpResponseError
from azure.core.pipeline.transport import AioHttpTransport

# Setup logging for better debugging
logging.basicConfig(level=logging.INFO, format='%(levelname)s: %(message)s')

print("✅ Async imports completed successfully!")
print("📋 Note: Make sure you have 'aiohttp' installed: pip install aiohttp")

## Configuration

Load your Azure AI Search configuration:

In [None]:
# Configuration - Update these with your actual values or set environment variables
SEARCH_ENDPOINT = os.getenv("AZURE_SEARCH_SERVICE_ENDPOINT", "https://your-service.search.windows.net")
SEARCH_API_KEY = os.getenv("AZURE_SEARCH_API_KEY", "your-api-key")
SEARCH_INDEX_NAME = os.getenv("AZURE_SEARCH_INDEX_NAME", "your-index-name")

print(f"📋 Configuration loaded:")
print(f"   Endpoint: {SEARCH_ENDPOINT}")
print(f"   Index: {SEARCH_INDEX_NAME}")
print(f"   API Key: {'*' * (len(SEARCH_API_KEY) - 4) + SEARCH_API_KEY[-4:] if len(SEARCH_API_KEY) > 4 else '****'}")

## When to Use Async Operations

Async operations are beneficial for:

### ✅ **Use Async For:**
- **Web Applications**: Handle multiple user search requests concurrently
- **API Services**: Improve throughput in REST API endpoints
- **Batch Processing**: Process multiple search operations simultaneously
- **High-Performance Apps**: Maximize resource utilization
- **I/O Bound Operations**: When you're waiting for network responses

### ❌ **Use Sync For:**
- **Simple Scripts**: Single-user, sequential operations
- **Data Analysis**: Jupyter notebooks with step-by-step analysis
- **Learning/Prototyping**: When simplicity is more important than performance
- **Single Operations**: When you only need to do one search at a time

## Async Client Management

Proper client lifecycle management is crucial for async operations:

In [None]:
# Function to create fresh async search client
def create_async_search_client():
    """Create a fresh async search client for each operation"""
    return AsyncSearchClient(
        endpoint=SEARCH_ENDPOINT,
        index_name=SEARCH_INDEX_NAME,
        credential=AzureKeyCredential(SEARCH_API_KEY)
    )

print("✅ Async search client factory created successfully!")
print("📋 Note: Each operation will create a fresh client to avoid transport issues")

## Basic Async Search Example

Here's the recommended pattern for async searches:

In [None]:
async def simple_async_search(query: str, top: int = 5):
    """Simple, reliable async search pattern"""
    print(f"🔍 Async searching for: '{query}'")
    print("-" * 50)
    
    try:
        # Always create a fresh client for each operation
        async with create_async_search_client() as client:
            # Perform async search
            results = await client.search(
                search_text=query,
                top=top,
                include_total_count=True
            )
            
            # Process results (note: async iteration)
            result_count = 0
            async for result in results:
                result_count += 1
                title = result.get('title', 'No title')
                score = result['@search.score']
                author = result.get('author', 'Unknown')
                
                print(f"{result_count}. {title}")
                print(f"   Score: {score:.3f}")
                print(f"   Author: {author}")
                
                # Show content preview
                content = result.get('content', '')
                if content:
                    preview = content[:150] + '...' if len(content) > 150 else content
                    print(f"   Preview: {preview}")
                print()
            
            if result_count == 0:
                print("No results found.")
            else:
                print(f"✅ Successfully displayed {result_count} results")
                
    except Exception as e:
        print(f"❌ Async search error: {str(e)}")
        print("💡 If you see transport or SSL errors, check the troubleshooting sections below")

# Test the basic async search
await simple_async_search("python programming", top=3)

## 🚨 Troubleshooting Section

This section contains solutions for common async issues. Only run these if you encounter problems.

### SSL Certificate Issues

If you encounter SSL certificate verification errors:

In [None]:
# SSL Diagnostics - run this if you have SSL issues
def diagnose_ssl_issue():
    """Diagnose SSL certificate issues"""
    print("🔍 SSL Diagnostics:")
    print("-" * 40)
    
    # Check Python version
    import sys
    print(f"Python version: {sys.version}")
    
    # Check SSL module
    import ssl
    print(f"SSL version: {ssl.OPENSSL_VERSION}")
    
    # Check certificates location
    try:
        import certifi
        print(f"Certificates location: {certifi.where()}")
    except ImportError:
        print("Certifi not installed - run: pip install certifi")
    
    # Check if we can resolve the hostname
    import socket
    try:
        hostname = SEARCH_ENDPOINT.replace('https://', '').replace('http://', '')
        ip = socket.gethostbyname(hostname)
        print(f"✅ DNS resolution: {hostname} -> {ip}")
    except Exception as e:
        print(f"❌ DNS resolution failed: {e}")

# Uncomment the line below if you have SSL issues
# diagnose_ssl_issue()

print("🔧 SSL Troubleshooting Steps:")
print("1. Update certificates: pip install --upgrade certifi")
print("2. Update aiohttp: pip install --upgrade aiohttp")
print("3. Update Azure SDK: pip install --upgrade azure-search-documents")
print("4. Restart your Jupyter kernel after updates")
print("5. For corporate networks: Check proxy/firewall settings")

### Transport Closure Issues

If you see "HTTP transport has already been closed" errors:

In [None]:
print("🔧 Transport Closure Error Solutions:")
print("")
print("✅ DO THIS:")
print("# Create fresh client for each operation")
print("async with create_async_search_client() as client:")
print("    results = await client.search('query')")
print("    async for result in results:")
print("        print(result['title'])")
print("")
print("❌ AVOID THIS:")
print("# Don't reuse clients across operations")
print("client = AsyncSearchClient(...)  # Global client")
print("# ... later in different cell ...")
print("async with client:  # May cause transport closure error")
print("    results = await client.search('query')")
print("")
print("🎯 Why This Works:")
print("- Fresh Transport: Each client gets a new HTTP transport")
print("- Clean Lifecycle: Proper setup and teardown")
print("- No State Issues: Avoids shared state problems")
print("- Reliable: Works consistently across multiple operations")

## Concurrent Search Operations

The real power of async comes from running multiple operations concurrently:

In [None]:
async def concurrent_search_demo():
    """Demonstrate concurrent async search operations"""
    print("🚀 Demonstrating Concurrent Async Searches")
    print("=" * 60)
    
    # Define multiple search queries
    search_queries = [
        "python programming",
        "machine learning",
        "web development",
        "data science"
    ]
    
    async def single_search(query: str, search_id: int):
        """Single async search operation with timing"""
        start_time = time.time()
        
        try:
            async with create_async_search_client() as client:
                results = await client.search(search_text=query, top=2)
                
                result_list = []
                async for result in results:
                    result_list.append({
                        'title': result.get('title', 'No title'),
                        'score': result['@search.score']
                    })
                
                execution_time = time.time() - start_time
                
                return {
                    'search_id': search_id,
                    'query': query,
                    'results_count': len(result_list),
                    'execution_time': execution_time,
                    'top_result': result_list[0] if result_list else None,
                    'success': True
                }
                
        except Exception as e:
            return {
                'search_id': search_id,
                'query': query,
                'error': str(e),
                'execution_time': time.time() - start_time,
                'success': False
            }
    
    # Record start time for total operation
    total_start_time = time.time()
    
    # Create concurrent tasks
    tasks = [
        single_search(query, i+1)
        for i, query in enumerate(search_queries)
    ]
    
    # Execute all searches concurrently
    results = await asyncio.gather(*tasks, return_exceptions=True)
    
    total_execution_time = time.time() - total_start_time
    
    # Display results
    print(f"\n📊 Concurrent Search Results:")
    print(f"Total execution time: {total_execution_time:.3f} seconds")
    print(f"Number of concurrent searches: {len(search_queries)}")
    print("-" * 60)
    
    successful_results = []
    for result in results:
        if isinstance(result, Exception):
            print(f"❌ Exception occurred: {result}")
            continue
            
        if result['success']:
            successful_results.append(result)
            print(f"✅ Search {result['search_id']}: '{result['query']}'")
            print(f"   Results: {result['results_count']}, Time: {result['execution_time']:.3f}s")
            
            if result['top_result']:
                print(f"   Top: {result['top_result']['title']} (Score: {result['top_result']['score']:.3f})")
            print()
        else:
            print(f"❌ Search {result['search_id']}: {result.get('error', 'Unknown error')}")
    
    # Calculate performance benefits
    if successful_results:
        avg_individual_time = sum(r['execution_time'] for r in successful_results) / len(successful_results)
        sequential_time = sum(r['execution_time'] for r in successful_results)
        
        print(f"📈 Performance Summary:")
        print(f"   Average individual search time: {avg_individual_time:.3f}s")
        print(f"   Sequential execution would take: {sequential_time:.3f}s")
        print(f"   Concurrent execution took: {total_execution_time:.3f}s")
        print(f"   Time savings: {(sequential_time - total_execution_time):.3f}s")
        print(f"   Efficiency gain: {(sequential_time / total_execution_time):.1f}x faster")

# Run concurrent searches demonstration
await concurrent_search_demo()

## Async Best Practices Summary

Key takeaways for using async Azure AI Search operations:

### ✅ **Do's:**
- **Always use fresh clients** for each operation
- **Use `async with`** context manager for proper resource cleanup
- **Use `await`** for all async operations
- **Use `async for`** when iterating over search results
- **Implement proper error handling** with specific exception types
- **Use `asyncio.gather()`** for concurrent operations
- **Install `aiohttp`** for async transport support

### ❌ **Don'ts:**
- **Don't reuse clients** across multiple operations
- **Don't forget to await** async operations
- **Don't use regular `for` loops** with async iterators
- **Don't ignore proper resource cleanup**
- **Don't create too many concurrent connections** (respect service limits)

### 🚀 **Performance Benefits:**
- **Concurrent Operations**: Handle multiple searches simultaneously
- **Better Resource Utilization**: Don't block on I/O operations
- **Improved Throughput**: Especially beneficial for web applications
- **Scalability**: Handle more users with the same resources

## Next Steps

After mastering async operations:

1. **🔙 Return to Main Notebook**: Complete the sync operations in `basic_search.ipynb`
2. **🏗️ Build Applications**: Integrate async search into web applications or APIs
3. **📈 Monitor Performance**: Measure the benefits in your specific use case
4. **🔧 Optimize Further**: Explore advanced async patterns and optimizations

### Useful Resources:
- [Azure AI Search Documentation](https://docs.microsoft.com/en-us/azure/search/)
- [Python Async Programming Guide](https://docs.python.org/3/library/asyncio.html)
- [Azure SDK for Python Async Patterns](https://docs.microsoft.com/en-us/python/api/overview/azure/)

---

**⚠️ Remember**: This notebook is under development. Use the main `basic_search.ipynb` for stable functionality while we refine the async examples.