# Managing Analyzers in Your Resource

This notebook demonstrates how to create a simple analyzer and manage its lifecycle.

## Prerequisites
1. Ensure your Azure AI service is configured following the [configuration steps](../README.md#configure-azure-ai-service-resource).
2. Install the required packages to run this sample.

In [None]:
%pip install -r ../requirements.txt

## Create the Azure AI Content Understanding Client

> The [AzureContentUnderstandingClient](../python/content_understanding_client.py) is a utility class that provides functions to interact with the Content Understanding API. Before the official release of the Content Understanding SDK, this client serves as a lightweight SDK.

> Fill the constants **AZURE_AI_ENDPOINT**, **AZURE_AI_API_VERSION**, and **AZURE_AI_API_KEY** with your Azure AI Service details.

> ‚ö†Ô∏è Important:
Update the code below to match your Azure authentication method.
Look for the `# IMPORTANT` comments and modify those sections accordingly.
If you skip this step, the sample might not run correctly.

> ‚ö†Ô∏è Note: Using a subscription key works, but using Azure Active Directory (AAD) token-based authentication is more secure and highly recommended for production environments.

In [None]:
from datetime import datetime
import logging
import os
import sys
from typing import Any, Optional
from dotenv import find_dotenv, load_dotenv

# Add the parent directory to the Python path to import the sample_helper module
sys.path.append(os.path.join(os.path.dirname(os.getcwd()), 'python'))
from content_understanding_client import AzureContentUnderstandingClient
from azure.identity import DefaultAzureCredential

load_dotenv(find_dotenv())
logging.basicConfig(level=logging.INFO)

# For authentication, you can use either token-based auth or subscription key; only one is required
AZURE_AI_ENDPOINT = os.getenv("AZURE_AI_ENDPOINT")
# IMPORTANT: Replace with your actual subscription key or set it in your ".env" file if not using token authentication
AZURE_AI_API_KEY = os.getenv("AZURE_AI_API_KEY")
API_VERSION = "2025-11-01"

# Create token provider for Azure AD authentication
def token_provider():
    credential = DefaultAzureCredential()
    token = credential.get_token("https://cognitiveservices.azure.com/.default")
    return token.token

# Create the Content Understanding client
try:
    client = AzureContentUnderstandingClient(
        endpoint=AZURE_AI_ENDPOINT,
        api_version=API_VERSION,
        subscription_key=AZURE_AI_API_KEY,
        token_provider=token_provider if not AZURE_AI_API_KEY else None,
        x_ms_useragent="azure-ai-content-understanding-python-sample-ga"    # The user agent is used for tracking sample usage and does not provide identity information. You can change this if you want to opt out of tracking.
    )
    credential_type = "Subscription Key" if AZURE_AI_API_KEY else "Azure AD Token"
    print(f"‚úÖ Client created successfully")
    print(f"   Endpoint: {AZURE_AI_ENDPOINT}")
    print(f"   Credential: {credential_type}")
    print(f"   API Version: {API_VERSION}")
except Exception as e:
    credential_type = "Subscription Key" if AZURE_AI_API_KEY else "Azure AD Token"
    print(f"‚ùå Failed to create client")
    print(f"   Endpoint: {AZURE_AI_ENDPOINT}")
    print(f"   Credential: {credential_type}")
    print(f"   Error: {e}")
    raise

## Configure Model Deployments for Prebuilt Analyzers

> **üí° Note:** This step is only required **once per Azure Content Understanding resource**, unless the GPT deployment has been changed. You can skip this section if:
> - This configuration has already been run once for your resource, or
> - Your administrator has already configured the model deployments for you

Before using prebuilt analyzers, you need to configure the default model deployment mappings. This tells Content Understanding which model deployments to use.

**Model Requirements:**
- **GPT-4.1** - Required for most prebuilt analyzers (e.g., `prebuilt-invoice`, `prebuilt-receipt`, `prebuilt-idDocument`)
- **GPT-4.1-mini** - Required for RAG analyzers (e.g., `prebuilt-documentSearch`, `prebuilt-audioSearch`, `prebuilt-videoSearch`)
- **text-embedding-3-large** - Required for all prebuilt analyzers that use embeddings

**Prerequisites:**
1. Deploy **GPT-4.1**, **GPT-4.1-mini**, and **text-embedding-3-large** models in Azure AI Foundry
2. Set `GPT_4_1_DEPLOYMENT`, `GPT_4_1_MINI_DEPLOYMENT`, and `TEXT_EMBEDDING_3_LARGE_DEPLOYMENT` in your `.env` file with the deployment names

In [None]:
# Get model deployment names from environment variables
GPT_4_1_DEPLOYMENT = os.getenv("GPT_4_1_DEPLOYMENT")
GPT_4_1_MINI_DEPLOYMENT = os.getenv("GPT_4_1_MINI_DEPLOYMENT")
TEXT_EMBEDDING_3_LARGE_DEPLOYMENT = os.getenv("TEXT_EMBEDDING_3_LARGE_DEPLOYMENT")

# Check if required deployments are configured
missing_deployments = []
if not GPT_4_1_DEPLOYMENT:
    missing_deployments.append("GPT_4_1_DEPLOYMENT")
if not GPT_4_1_MINI_DEPLOYMENT:
    missing_deployments.append("GPT_4_1_MINI_DEPLOYMENT")
if not TEXT_EMBEDDING_3_LARGE_DEPLOYMENT:
    missing_deployments.append("TEXT_EMBEDDING_3_LARGE_DEPLOYMENT")

if missing_deployments:
    print(f"‚ö†Ô∏è  Warning: Missing required model deployment configuration(s):")
    for deployment in missing_deployments:
        print(f"   - {deployment}")
    print("\n   Prebuilt analyzers require GPT-4.1, GPT-4.1-mini, and text-embedding-3-large deployments.")
    print("   Please:")
    print("   1. Deploy all three models in Azure AI Foundry")
    print("   2. Add the following to notebooks/.env:")
    print("      GPT_4_1_DEPLOYMENT=<your-gpt-4.1-deployment-name>")
    print("      GPT_4_1_MINI_DEPLOYMENT=<your-gpt-4.1-mini-deployment-name>")
    print("      TEXT_EMBEDDING_3_LARGE_DEPLOYMENT=<your-text-embedding-3-large-deployment-name>")
    print("   3. Restart the kernel and run this cell again")
else:
    print(f"üìã Configuring default model deployments...")
    print(f"   GPT-4.1 deployment: {GPT_4_1_DEPLOYMENT}")
    print(f"   GPT-4.1-mini deployment: {GPT_4_1_MINI_DEPLOYMENT}")
    print(f"   text-embedding-3-large deployment: {TEXT_EMBEDDING_3_LARGE_DEPLOYMENT}")
    
    try:
        # Update defaults to map model names to your deployments
        result = client.update_defaults({
            "gpt-4.1": GPT_4_1_DEPLOYMENT,
            "gpt-4.1-mini": GPT_4_1_MINI_DEPLOYMENT,
            "text-embedding-3-large": TEXT_EMBEDDING_3_LARGE_DEPLOYMENT
        })
        
        print(f"‚úÖ Default model deployments configured successfully")
        print(f"   Model mappings:")
        for model, deployment in result.get("modelDeployments", {}).items():
            print(f"     {model} ‚Üí {deployment}")
    except Exception as e:
        print(f"‚ùå Failed to configure defaults: {e}")
        print(f"   This may happen if:")
        print(f"   - One or more deployment names don't exist in your Azure AI Foundry project")
        print(f"   - You don't have permission to update defaults")
        raise


## Get Default Settings

You can retrieve the default model deployment mappings configured for your Content Understanding resource.

In [None]:
try:
    defaults = client.get_defaults()
    print(f"‚úÖ Retrieved default settings")
    
    model_deployments = defaults.get("modelDeployments", {})
    if model_deployments:
        print(f"\nüìã Model Deployments:")
        for model_name, deployment_name in model_deployments.items():
            print(f"   {model_name}: {deployment_name}")
    else:
        print("\n   No model deployments configured")
        
except Exception as e:
    print(f"‚ö†Ô∏è  Error retrieving defaults: {e}")
    print("   This is expected if no defaults have been configured yet.")

## Create a Simple Analyzer

> **üí° Note:** This section demonstrates analyzer creation for learning purposes only. For actual invoice field extraction, we recommend using the **`prebuilt-invoice`** analyzer, which is optimized for invoice processing. See the `field_extraction.ipynb` notebook for examples of using prebuilt analyzers.

First, we create an analyzer from a template to extract invoice fields.

In [None]:
import time
analyzer_id = f"notebooks_sample_management_{int(time.time())}"

# Create a custom analyzer using dictionary format
print(f"üîß Creating custom analyzer '{analyzer_id}'...")

call_analyzer = {
    "baseAnalyzerId": "prebuilt-callCenter",
    "description": "Sample call recording analytics",
    "config": {
        "returnDetails": True,
        "locales": ["en-US"]
    },
    "fieldSchema": {
        "fields": {
            "Summary": {
                "type": "string",
                "method": "generate",
                "description": "A one-paragraph summary"
            },
            "Topics": {
                "type": "array",
                "method": "generate",
                "description": "Top 5 topics mentioned",
                "items": {
                    "type": "string"
                }
            },
            "Companies": {
                "type": "array",
                "method": "generate",
                "description": "List of companies mentioned",
                "items": {
                    "type": "string"
                }
            },
            "People": {
                "type": "array",
                "method": "generate",
                "description": "List of people mentioned",
                "items": {
                    "type": "object",
                    "properties": {
                        "Name": {
                            "type": "string",
                            "description": "Person's name"
                        },
                        "Role": {
                            "type": "string",
                            "description": "Person's title/role"
                        }
                    }
                }
            },
            "Sentiment": {
                "type": "string",
                "method": "classify",
                "description": "Overall sentiment",
                "enum": [
                    "Positive",
                    "Neutral",
                    "Negative"
                ]
            },
            "Categories": {
                "type": "array",
                "method": "classify",
                "description": "List of relevant categories",
                "items": {
                    "type": "string",
                    "enum": [
                        "Agriculture",
                        "Business",
                        "Finance",
                        "Health",
                        "Insurance",
                        "Mining",
                        "Pharmaceutical",
                        "Retail",
                        "Technology",
                        "Transportation"
                    ]
                }
            }
        }
    },
    "models": {"completion": "gpt-4.1"}
}

# Start the analyzer creation operation
response = client.begin_create_analyzer(
    analyzer_id=analyzer_id,
    analyzer_template=call_analyzer,
)

# Wait for the analyzer to be created
print(f"‚è≥ Waiting for analyzer creation to complete...")
client.poll_result(response)
print(f"‚úÖ Analyzer '{analyzer_id}' created successfully!")

## List All Analyzers in Your Resource

After successfully creating an analyzer, you can use it to analyze our input files. You can also list all analyzers available in your resource.

In [None]:
response = client.get_all_analyzers()
analyzers = response.get("value", [])

print(f"‚úÖ Found {len(analyzers)} analyzers")

# Display detailed information about each analyzer
for i, analyzer in enumerate(analyzers, 1):
    print(f"üîç Analyzer {i}:")
    print(f"   ID: {analyzer.get('analyzerId')}")
    print(f"   Description: {analyzer.get('description')}")
    print(f"   Status: {analyzer.get('status')}")
    print(f"   Created at: {analyzer.get('createdAt')}")

    # Check if it's a prebuilt analyzer
    if analyzer.get('analyzerId', '').startswith("prebuilt-"):
        print(f"   Type: Prebuilt analyzer")
    else:
        print(f"   Type: Custom analyzer")

    # Show tags if available
    tags = analyzer.get("tags")
    if tags:
        print(f"   Tags: {tags}")

## Get Analyzer Details by ID

Keep track of the analyzer ID when you create it. Use the ID to retrieve detailed analyzer definitions later.

In [None]:
import json

retrieved_analyzer = client.get_analyzer_detail_by_id(analyzer_id=analyzer_id)
print(f"‚úÖ Analyzer '{analyzer_id}' retrieved successfully!")
print(f"   Description: {retrieved_analyzer.get('description')}")
print(f"   Status: {retrieved_analyzer.get('status')}")
print(f"   Created at: {retrieved_analyzer.get('createdAt')}")

# Print the full analyzer response
print("\nüìÑ Full Analyzer Details:")
print(json.dumps(retrieved_analyzer, indent=2))

## Delete an Analyzer
If you no longer need an analyzer, delete it using its ID.

In [None]:
# Clean up: delete the analyzer
# Note: You can leave the analyzer for later use if desired
print(f"üóëÔ∏è  Deleting analyzer '{analyzer_id}'...")
client.delete_analyzer(analyzer_id=analyzer_id)
print(f"‚úÖ Analyzer '{analyzer_id}' deleted successfully!")