# Token Rate Limiting Lab

This lab demonstrates implementing **token-based rate limiting** using Azure API Management's `llm-token-limit` policy with Azure OpenAI.

## What You'll Learn
- Deploy Azure APIM (BasicV2 SKU) as an AI Gateway
- Deploy Azure OpenAI with GPT-4o-mini model
- Configure token-based rate limiting (500 TPM)
- Monitor and test rate limiting behavior

## Prerequisites

- Azure subscription with Contributor access
- Azure CLI installed and configured
- Python 3.8+ with pip

## Step 1: Login to Azure

In [None]:
!az login

## Step 2: Set Configuration Variables

In [None]:
import os
import random
import string

# Generate unique suffix
suffix = ''.join(random.choices(string.ascii_lowercase + string.digits, k=6))

# Configuration
RESOURCE_GROUP = "lab-token-rate-limiting"
LOCATION = "swedencentral"
APIM_NAME = f"apim-tokenratelimit-{suffix}"
OPENAI_NAME = f"openai-tokenratelimit-{suffix}"
MODEL_NAME = "gpt-4o-mini"
TPM_LIMIT = 500

print(f"APIM Name: {APIM_NAME}")
print(f"OpenAI Name: {OPENAI_NAME}")

## Step 3: Create Resource Group

In [None]:
!az group create --name {RESOURCE_GROUP} --location {LOCATION}

## Step 4: Deploy Infrastructure

Deploys APIM (BasicV2) + Azure OpenAI with 500 TPM rate limit.

In [None]:
!az deployment group create --resource-group {RESOURCE_GROUP} --template-file main.bicep --parameters apimName={APIM_NAME} openAiName={OPENAI_NAME} location={LOCATION}

## Step 5: Get APIM Subscription Key

In [None]:
import subprocess, json

result = subprocess.run(["az", "apim", "show", "--name", APIM_NAME, "--resource-group", RESOURCE_GROUP, "--query", "gatewayUrl", "-o", "tsv"], capture_output=True, text=True)
GATEWAY_URL = result.stdout.strip()

result = subprocess.run(["az", "apim", "subscription", "keys", "list", "--resource-group", RESOURCE_GROUP, "--service-name", APIM_NAME, "--subscription-id", "aoai-subscription", "-o", "json"], capture_output=True, text=True)
API_KEY = json.loads(result.stdout).get("primaryKey", "")

print(f"Gateway URL: {GATEWAY_URL}")
print(f"API Key: {API_KEY}")

## Step 6: Test Rate Limiting

In [None]:
import requests, time

endpoint = f"{GATEWAY_URL}/openai/deployments/{MODEL_NAME}/chat/completions?api-version=2024-02-01"
headers = {"api-key": API_KEY, "Content-Type": "application/json"}
payload = {"messages": [{"role": "user", "content": "Say hello in 50 words"}], "max_tokens": 100}

for i in range(10):
    r = requests.post(endpoint, headers=headers, json=payload, timeout=30)
    if r.status_code == 200:
        print(f"Request {i+1}: ✅ Success")
    elif r.status_code == 429:
        print(f"Request {i+1}: ⚠️ Rate limited")
    else:
        print(f"Request {i+1}: ❌ Error {r.status_code}")
    time.sleep(0.5)

## Step 7: Run Dashboard

```bash
pip install -r dashboard/requirements.txt
cd dashboard && streamlit run app.py
```

Dashboard opens at http://localhost:8501

## Cleanup

In [None]:
# Uncomment to delete resources
# !az group delete --name {RESOURCE_GROUP} --yes --no-wait