# Local Development Setup

This notebook guides you through setting up your local development environment for the Azure Document Intelligence pipeline.

## Prerequisites

- Azure Functions Core Tools v4
- Python 3.10+
- UV package manager
- Deployed Azure resources (see deployment notebooks)

## Step 1: Install Dependencies

In [None]:
# Navigate to project root
Set-Location -Path "$PSScriptRoot/.."

Write-Host "Installing Python dependencies with UV..." -ForegroundColor Cyan
uv sync

Write-Host "`nDependencies installed!" -ForegroundColor Green

## Step 2: Configure Environment Variables

You have two options for configuration:

1. **`.env` file** - Recommended for development/testing scripts
2. **`local.settings.json`** - Required for Azure Functions runtime

In [None]:
# ============================================
# CONFIGURATION - UPDATE THESE VALUES
# ============================================

$SUBSCRIPTION_ID = "363ef5d1-0e77-4594-a530-f51af23dbf8c"

# Get these from Azure Portal or deployment outputs
$DOC_INTEL_ENDPOINT = "https://docservendpointdev.cognitiveservices.azure.com/"
$DOC_INTEL_API_KEY = "<YOUR_DOC_INTEL_KEY>"

$COSMOS_ENDPOINT = "https://cosmosdb-dlz-east2-sandbox.documents.azure.com:443/"
$COSMOS_DATABASE = "DocumentsDB"
$COSMOS_CONTAINER = "ExtractedDocuments"

$STORAGE_CONNECTION_STRING = "<YOUR_STORAGE_CONNECTION_STRING>"

$KEY_VAULT_NAME = "aiml-stack-keyvault-dev"

# ============================================

Write-Host "Configuration values set. Run next cell to create config files." -ForegroundColor Green

In [None]:
# Create .env file
Set-Location -Path "$PSScriptRoot/.."

$envContent = @"
# Azure Subscription
AZURE_SUBSCRIPTION_ID=$SUBSCRIPTION_ID

# Document Intelligence
DOC_INTEL_ENDPOINT=$DOC_INTEL_ENDPOINT
DOC_INTEL_API_KEY=$DOC_INTEL_API_KEY

# Cosmos DB
COSMOS_ENDPOINT=$COSMOS_ENDPOINT
COSMOS_DATABASE=$COSMOS_DATABASE
COSMOS_CONTAINER=$COSMOS_CONTAINER

# Storage
AZURE_STORAGE_CONNECTION_STRING=$STORAGE_CONNECTION_STRING

# Key Vault
KEY_VAULT_NAME=$KEY_VAULT_NAME

# Optional settings
DEFAULT_MODEL_ID=prebuilt-layout
MAX_CONCURRENT_REQUESTS=10
LOG_LEVEL=INFO
"@

$envContent | Out-File -FilePath ".env" -Encoding utf8
Write-Host "Created .env file" -ForegroundColor Green

In [None]:
# Create local.settings.json for Azure Functions
$localSettings = @{
    IsEncrypted = $false
    Values = @{
        AzureWebJobsStorage = $STORAGE_CONNECTION_STRING
        FUNCTIONS_WORKER_RUNTIME = "python"
        DOC_INTEL_ENDPOINT = $DOC_INTEL_ENDPOINT
        DOC_INTEL_API_KEY = $DOC_INTEL_API_KEY
        COSMOS_ENDPOINT = $COSMOS_ENDPOINT
        COSMOS_DATABASE = $COSMOS_DATABASE
        COSMOS_CONTAINER = $COSMOS_CONTAINER
        KEY_VAULT_NAME = $KEY_VAULT_NAME
        DEFAULT_MODEL_ID = "prebuilt-layout"
        MAX_CONCURRENT_REQUESTS = "10"
        LOG_LEVEL = "INFO"
    }
}

$localSettingsPath = "src/functions/local.settings.json"
$localSettings | ConvertTo-Json -Depth 3 | Out-File -FilePath $localSettingsPath -Encoding utf8

Write-Host "Created $localSettingsPath" -ForegroundColor Green
Write-Host "`nIMPORTANT: These files are gitignored and should never be committed!" -ForegroundColor Yellow

## Alternative: Get Values from Deployed Resources

If you've already deployed resources, you can retrieve the values automatically.

In [None]:
# Retrieve values from deployed resources
$SUBSCRIPTION_ID = "363ef5d1-0e77-4594-a530-f51af23dbf8c"
az account set --subscription $SUBSCRIPTION_ID

# Resource groups for each resource type
$DOC_INTEL_NAME = "docservendpointdev"
$DOC_INTEL_RG = "rg-dlz-aiml-stack-dev"
$COSMOS_NAME = "cosmosdb-dlz-east2-sandbox"
$COSMOS_RG = "rg-dlz-cosmosdb-east2-sandbox"
$STORAGE_NAME = "aimldatastore"
$STORAGE_RG = "rg-dlz-aiml-stack-dev"

Write-Host "Retrieving values from deployed resources..." -ForegroundColor Cyan

# Get Document Intelligence values
$DOC_INTEL_ENDPOINT = az cognitiveservices account show `
    --name $DOC_INTEL_NAME `
    --resource-group $DOC_INTEL_RG `
    --query properties.endpoint -o tsv

$DOC_INTEL_API_KEY = az cognitiveservices account keys list `
    --name $DOC_INTEL_NAME `
    --resource-group $DOC_INTEL_RG `
    --query key1 -o tsv

# Get Cosmos DB endpoint
$COSMOS_ENDPOINT = az cosmosdb show `
    --name $COSMOS_NAME `
    --resource-group $COSMOS_RG `
    --query documentEndpoint -o tsv

# Get Storage connection string
$STORAGE_CONNECTION_STRING = az storage account show-connection-string `
    --name $STORAGE_NAME `
    --resource-group $STORAGE_RG `
    --query connectionString -o tsv

Write-Host "`nRetrieved values:" -ForegroundColor Green
Write-Host "  Doc Intel Endpoint: $DOC_INTEL_ENDPOINT"
Write-Host "  Cosmos Endpoint: $COSMOS_ENDPOINT"
Write-Host "  Storage Account: $STORAGE_NAME"

Write-Host "`nRun the previous cells to create config files with these values" -ForegroundColor Yellow

## Step 3: Run Functions Locally

In [None]:
# Start Azure Functions locally
Write-Host "Starting Azure Functions..." -ForegroundColor Cyan
Write-Host "Press Ctrl+C to stop" -ForegroundColor Yellow
Write-Host "`nEndpoints:" -ForegroundColor Green
Write-Host "  Health: http://localhost:7071/api/health"
Write-Host "  Process: http://localhost:7071/api/process"
Write-Host "  Status: http://localhost:7071/api/status/{blob_name}"

Set-Location -Path "$PSScriptRoot/../src/functions"
func start

## Step 4: Test Local Endpoints

Open a new terminal or notebook to test while functions are running.

In [None]:
# Test health endpoint
Write-Host "Testing health endpoint..." -ForegroundColor Cyan

try {
    $response = Invoke-RestMethod -Uri "http://localhost:7071/api/health" -Method Get
    Write-Host "Health check passed!" -ForegroundColor Green
    $response | ConvertTo-Json
} catch {
    Write-Host "Health check failed. Is the function running?" -ForegroundColor Red
    Write-Host "Error: $_" -ForegroundColor Red
}

In [None]:
# Test process endpoint with a document
# NOTE: You need a blob URL with SAS token to a PDF in your storage account

$testRequest = @{
    blobUrl = "https://yourstorage.blob.core.windows.net/pdfs/test.pdf?sv=..."
    blobName = "incoming/test.pdf"
    modelId = "prebuilt-layout"
}

Write-Host "Testing process endpoint..." -ForegroundColor Cyan
Write-Host "Request:" -ForegroundColor Yellow
$testRequest | ConvertTo-Json

try {
    $response = Invoke-RestMethod `
        -Uri "http://localhost:7071/api/process" `
        -Method Post `
        -ContentType "application/json" `
        -Body ($testRequest | ConvertTo-Json)
    
    Write-Host "`nResponse:" -ForegroundColor Green
    $response | ConvertTo-Json
} catch {
    Write-Host "Process failed: $_" -ForegroundColor Red
}

## Configuration Reference

| Variable | Description | Required | Default |
|----------|-------------|----------|---------|  
| `DOC_INTEL_ENDPOINT` | Document Intelligence endpoint URL | Yes | - |
| `DOC_INTEL_API_KEY` | Document Intelligence API key | Yes | - |
| `COSMOS_ENDPOINT` | Cosmos DB endpoint URL | Yes | - |
| `COSMOS_DATABASE` | Cosmos DB database name | Yes | - |
| `COSMOS_CONTAINER` | Cosmos DB container name | Yes | - |
| `AzureWebJobsStorage` | Storage connection string | Yes | - |
| `KEY_VAULT_NAME` | Key Vault name for secrets | No | - |
| `MAX_CONCURRENT_REQUESTS` | Max concurrent DI requests | No | `10` |
| `DEFAULT_MODEL_ID` | Default Document Intelligence model | No | `prebuilt-layout` |
| `LOG_LEVEL` | Logging level | No | `INFO` |

## Next Steps

- **Run tests** - See `04-Testing-Linting.ipynb`
- **Deploy to Azure** - See deployment notebooks
- **Configure Synapse pipeline** - See `05-Synapse-Pipeline.ipynb`