# APIM ❤️ OpenAI

## Model routing lab
![flow](../../images/model-routing.gif)

Playground to try routing to a backend based on Azure OpenAI model and version.

### TOC
- [0️⃣ Initialize notebook variables](#0)
- [1️⃣ Create the Azure Resource Group](#1)
- [2️⃣ Create deployment using 🦾 Bicep](#2)
- [3️⃣ Get the deployment outputs](#3)
- [🧪 Test the API using a direct HTTP call](#requests)
- [🧪 Test the API using the Azure OpenAI Python SDK](#sdk)
- [🗑️ Clean up resources](#clean)

### Backlog
- Improve the notebook

### Prerequisites
- [Python 3.8 or later version](https://www.python.org/) installed
- [Pandas Library](https://pandas.pydata.org/) installed
- [VS Code](https://code.visualstudio.com/) installed with the [Jupyter notebook extension](https://marketplace.visualstudio.com/items?itemName=ms-toolsai.jupyter) enabled
- [Azure CLI](https://learn.microsoft.com/en-us/cli/azure/install-azure-cli) installed
- [An Azure Subscription](https://azure.microsoft.com/en-us/free/) with Contributor permissions
- [Access granted to Azure OpenAI](https://aka.ms/oai/access)
- [Sign in to Azure with Azure CLI](https://learn.microsoft.com/en-us/cli/azure/authenticate-azure-cli-interactively)

<a id='0'></a>
### 0️⃣ Initialize notebook variables

- Resources will be suffixed by a unique string based on your subscription id
- Adjust the location parameters according your preferences and on the [product availability by Azure region.](https://azure.microsoft.com/en-us/explore/global-infrastructure/products-by-region/?cdn=disable&products=cognitive-services,api-management) 
- Adjust the OpenAI model and version according the [availability by region.](https://learn.microsoft.com/en-us/azure/ai-services/openai/concepts/models) 

In [2]:
import os
import json
import datetime
import requests

deployment_name = os.path.basename(os.path.dirname(globals()['__vsc_ipynb_file__']))
index = '1'  # Set a matching value here and in clean-up-resources.ipynb if you want separate instances of the lab. This is helpful when tearing down resources and mitigating API Management's soft delete.
resource_group_name = f"lab-{deployment_name}{index}" # change the name to match your naming style
resource_group_location = "westeurope"

# Define three OpenAI model and version combinations
# https://learn.microsoft.com/en-us/azure/ai-services/openai/concepts/models#gpt-35
# Please note that availability of models and versions is variable and that you may need to adjust the model and version names to match the available models and versions in your Azure subscription.
# For this lab, we are using the following combinations based on PayGo availability on June 21, 2024:
#
#   1) GPT-3.5 Turbo 1106: France Central, Sweden Central
#   2) GPT-3.5 Turbo 0125: North Central US, South Central US
#   3) GPT-4o 2024-05-13: East US, West US

openai_model_gpt_35_turbo_0125 = { "name": "gpt-35-turbo", "version": "0125" }
openai_model_gpt_35_turbo_1106 = { "name": "gpt-35-turbo", "version": "1106" }
openai_model_gpt_4o_20240513 = { "name": "gpt-4o", "version": "2024-05-13" }

openai_model_1_name = "gpt-35-turbo"
openai_model_1_version = "1106"
openai_deployment_1_name = f"{openai_model_1_name}-{openai_model_1_version}"
openai_resources_1 = [ {"name": "oai-francecentral", "location": "francecentral"}, {"name": "oai-swedencentral", "location": "swedencentral"} ]

openai_model_2_name = "gpt-35-turbo"
openai_model_2_version = "0125"
openai_deployment_2_name = f"{openai_model_2_name}-{openai_model_2_version}"
openai_resources_2 = [ {"name": "oai-northcentralus", "location": "northcentralus"}, {"name": "oai-southcentralus", "location": "southcentralus"} ]

openai_model_3_name = "gpt-4o"
openai_model_3_version = "2024-05-13"
openai_deployment_3_name = f"{openai_model_3_name}-{openai_model_3_version}"
openai_resources_3 = [ {"name": "oai-eastus", "location": "eastus"}, {"name": "oai-westus", "location": "westus"} ]

# Define Azure OpenAI resources
openai_resources_sku = "S0"
openai_api_version = "2024-02-01"
openai_specification_url='https://raw.githubusercontent.com/Azure/azure-rest-api-specs/main/specification/cognitiveservices/data-plane/AzureOpenAI/inference/stable/' + openai_api_version + '/inference.json'

# Define Azure API Management
apim_resource_name = "apim"
apim_resource_location = "westeurope"
apim_resource_sku = "Basicv2"

# Define the Azure OpenAI backends and backend pools per Azure OpenAI model and version
openai_backend_pool_1 = f"oai-backend-pool-{openai_deployment_1_name}"
openai_backend_pool_2 = f"oai-backend-pool-{openai_deployment_2_name}"
openai_backend_pool_3 = f"oai-backend-pool-{openai_deployment_3_name}"

log_analytics_name = "workspace"
app_insights_name = 'insights'


<a id='1'></a>
### 1️⃣ Create the Azure Resource Group
All resources deployed in this lab will be created in the specified resource group. Skip this step if you want to use an existing resource group.

In [None]:
# type: ignore
resource_group_stdout = ! az group create --name {resource_group_name} --location {resource_group_location} 

if resource_group_stdout.n.startswith("ERROR"):
    print(resource_group_stdout)
else:
    print("✅ Azure Resource Group ", resource_group_name, " created ⌚ ", datetime.datetime.now().time())

<a id='2'></a>
### 2️⃣ Create deployment using 🦾 Bicep

This lab uses [Bicep](https://learn.microsoft.com/en-us/azure/azure-resource-manager/bicep/overview?tabs=bicep) to declarative define all the resources that will be deployed. Change the parameters or the [main.bicep](main.bicep) directly to try different configurations. 

In [None]:
if len(openai_resources_1) > 0:
    backend_id_1 = openai_backend_pool_1 if len(openai_resources_1) > 1 else openai_resources_1[0].get("name")
if len(openai_resources_2) > 0:
    backend_id_2 = openai_backend_pool_2 if len(openai_resources_2) > 1 else openai_resources_2[0].get("name")
if len(openai_resources_3) > 0:
    backend_id_3 = openai_backend_pool_3 if len(openai_resources_3) > 1 else openai_resources_3[0].get("name")

with open("policy.xml", 'r') as policy_xml_file:
    policy_template_xml = policy_xml_file.read()
    policy_xml = policy_template_xml.replace("{backend-id-1}", backend_id_1).replace("{backend-id-2}", backend_id_2).replace("{backend-id-3}", backend_id_3)
    policy_xml_file.close()
open("policy.xml", 'w').write(policy_xml)

bicep_parameters = {
  "$schema": "https://schema.management.azure.com/schemas/2019-04-01/deploymentParameters.json#",
  "contentVersion": "1.0.0.0",
  "parameters": {
    "openAIBackendPoolName_1": { "value": openai_backend_pool_1 },
    "openAIBackendPoolName_2": { "value": openai_backend_pool_2 },
    "openAIBackendPoolName_3": { "value": openai_backend_pool_3 },
    "openAIConfig_1": { "value": openai_resources_1 },
    "openAIConfig_2": { "value": openai_resources_2 },
    "openAIConfig_3": { "value": openai_resources_3 },
    "openAIDeploymentName_1": { "value": openai_deployment_1_name },
    "openAIDeploymentName_2": { "value": openai_deployment_2_name },
    "openAIDeploymentName_3": { "value": openai_deployment_3_name },
    "openAISku": { "value": openai_resources_sku },
    "openAIModelName_1": { "value": openai_model_1_name },
    "openAIModelName_2": { "value": openai_model_2_name },
    "openAIModelName_3": { "value": openai_model_3_name },
    "openAIModelVersion_1": { "value": openai_model_1_version },
    "openAIModelVersion_2": { "value": openai_model_2_version },
    "openAIModelVersion_3": { "value": openai_model_3_version },
    "openAIModelCapacity": { "value": 2 },
    "openAIAPISpecURL": { "value": openai_specification_url },
    "apimResourceName": { "value": apim_resource_name},
    "apimResourceLocation": { "value": apim_resource_location},
    "apimSku": { "value": apim_resource_sku},
    "logAnalyticsName": { "value": log_analytics_name },
    "applicationInsightsName": { "value": app_insights_name },
    "index": { "value": index}
  }
}
with open('params.json', 'w') as bicep_parameters_file:
    bicep_parameters_file.write(json.dumps(bicep_parameters))

! az deployment group create --name {deployment_name} --resource-group {resource_group_name} --template-file "main.bicep" --parameters "params.json"

open("policy.xml", 'w').write(policy_template_xml)


<a id='3'></a>
### 3️⃣ Get the deployment outputs

We are now at the stage where we only need to retrieve the gateway URL and the subscription before we are ready for testing.

In [4]:
# type: ignore
deployment_stdout = ! az deployment group show --name {deployment_name} -g {resource_group_name} --query properties.outputs.apimSubscriptionKey.value -o tsv
apim_subscription_key = deployment_stdout.n

# type: ignore
deployment_stdout = ! az deployment group show --name {deployment_name} -g {resource_group_name} --query properties.outputs.apimResourceGatewayURL.value -o tsv
apim_resource_gateway_url = deployment_stdout.n
print("👉🏻 API Gateway URL: ", apim_resource_gateway_url)

# type: ignore
deployment_stdout = ! az deployment group show --name {deployment_name} -g {resource_group_name} --query properties.outputs.logAnalyticsWorkspaceId.value -o tsv
workspace_id = deployment_stdout.n
print("👉🏻 Workspace ID: ", workspace_id)

# type: ignore
deployment_stdout = ! az deployment group show --name {deployment_name} -g {resource_group_name} --query properties.outputs.applicationInsightsAppId.value -o tsv
app_id = deployment_stdout.n
print("👉🏻 App ID: ", app_id)



👉🏻 API Gateway URL:  https://apim-qdp4k6fhzarv4.azure-api.net
👉🏻 Workspace ID:  f71c1365-1228-4263-8118-40404d5f1ac3
👉🏻 App ID:  46a7cd8e-e5a2-4585-94bb-65510d239a34


<a id='requests'></a>
### 🧪 Test the API using a direct HTTP call
Requests is an elegant and simple HTTP library for Python that will be used here to make raw API requests and inspect the responses.

In [15]:
import time
runs = 10
sleep_time_ms = 1000

# Initialize a session for connection pooling
session = requests.Session()

def make_api_request(deployment_name, openai_resources, apim_subscription_key, apim_resource_gateway_url, openai_api_version, sleep_time_ms):
    if len(openai_resources) > 0:
        print("APIM Subscription Key:", apim_subscription_key)
        
        messages = {
            "messages": [
                {"role": "system", "content": "You are a sarcastic unhelpful assistant."},
                {"role": "user", "content": "Can you tell me the time, please?"}
            ]
        }
        url = f"{apim_resource_gateway_url}/openai/deployments/{deployment_name}/chat/completions?api-version={openai_api_version}"
        print("url: ", url)

        start_time = time.time()
        response = session.post(url, headers={'api-key': apim_subscription_key}, json=messages)
        response_time = time.time() - start_time
        
        print(f"⌚ {response_time:.2f} seconds")
        # Check the response status code and apply formatting
        if 200 <= response.status_code < 300:
            status_code_str = '\x1b[1;32m' + str(response.status_code) + " - " + response.reason + '\x1b[0m'  # Bold and green
        elif response.status_code >= 400:
            status_code_str = '\x1b[1;31m' + str(response.status_code) + " - " + response.reason + '\x1b[0m'  # Bold and red
        else:
            status_code_str = str(response.status_code)  # No formatting

        # Print the response status with the appropriate formatting
        print("Response status:", status_code_str)
    
        print("Response headers:", response.headers)
        print("x-ms-region:", response.headers.get("x-ms-region"))  # Useful to determine the region of the backend
        if response.status_code == 200:
            data = response.json()
            print("\nresponse: ", data.get("choices")[0].get("message").get("content"))
            time.sleep(sleep_time_ms / 1000)
        elif response.status_code == 503:
            time.sleep(sleep_time_ms * 5 / 1000)
        else:
            if response.text:
                print(response.text)
        

# Define your deployments and resources
deployments = [
    {"name": openai_deployment_1_name, "resources": openai_resources_1},    # GPT-3.5 Turbo 1106
    {"name": openai_deployment_2_name, "resources": openai_resources_2},    # GPT-3.5 Turbo 0125
    {"name": openai_deployment_3_name, "resources": openai_resources_3},    # GPT-4o 2024-05-13
]

apim_subscription_keys = [
    'e29c474acb7e403792d027cf9a08e4ea',
    #'e5765d85094341e5aae9865c0c7794ab',
    #'00eec5eb3e67490f83c2cd9f8a548959'
]

# Loop through each deployment and make the API request
for deployment in deployments:
    for i in range(runs):
        print(f"\n▶️ Run: {i+1}/{runs} for deployment {deployment['name']}")
        make_api_request(deployment['name'], deployment['resources'], apim_subscription_keys[i % len(apim_subscription_keys)], apim_resource_gateway_url, openai_api_version, sleep_time_ms)
        


▶️ Run: 1/10 for deployment gpt-35-turbo-1106
APIM Subscription Key: e29c474acb7e403792d027cf9a08e4ea
url:  https://apim-qdp4k6fhzarv4.azure-api.net/openai/deployments/gpt-35-turbo-1106/chat/completions?api-version=2024-02-01
⌚ 0.83 seconds
Response status: [1;32m200 - OK[0m
Response headers: {'Content-Length': '918', 'Content-Type': 'application/json', 'Date': 'Fri, 26 Jul 2024 17:49:43 GMT', 'Access-Control-Allow-Origin': '*', 'Cache-Control': 'no-cache, must-revalidate', 'Strict-Transport-Security': 'max-age=31536000; includeSubDomains; preload', 'apim-request-id': '8d6ff6f5-3c0b-49ab-a9ff-d1f04fcbf671', 'X-Content-Type-Options': 'nosniff', 'x-ms-region': 'Sweden Central', 'x-ratelimit-remaining-requests': '1', 'x-ratelimit-remaining-tokens': '1340', 'x-accel-buffering': 'no', 'x-ms-rai-invoked': 'true', 'X-Request-ID': 'ab7e32a3-a186-45a9-a4ff-ddf24d936c59', 'x-ms-client-request-id': 'Not-Set', 'azureml-model-session': 'd111-20240710164001', 'remaining-tokens': '1839', 'consumed

<a id='sdk'></a>
### 🧪 Test the API using the Azure OpenAI Python SDK
OpenAPI provides a widely used [Python library](https://github.com/openai/openai-python). The library includes type definitions for all request params and response fields. The goal of this test is to assert that APIM can seamlessly proxy requests to OpenAI without disrupting its functionality.
- Note: run ```pip install openai``` in a terminal before executing this step.

In [None]:
import time
runs = 2
sleep_time_ms = 1000

def make_openaisdk_request(deployment_name, openai_resources, apim_subscription_key, apim_resource_gateway_url, openai_api_version, sleep_time_ms):
    from openai import AzureOpenAI
    if len(openai_resources) > 0:
        messages = {
            "messages": [
                {"role": "system", "content": "You are a sarcastic unhelpful assistant."},
                {"role": "user", "content": "Can you tell me the time, please?"}
            ]
        }
        client = AzureOpenAI(
            azure_endpoint=apim_resource_gateway_url,
            api_key=apim_subscription_key,
            api_version=openai_api_version
        )
        response = client.chat.completions.create(model=openai_deployment_1_name, messages=messages)
        print(response.choices[0].message.content)
        time.sleep(sleep_time_ms/1000)

# Define your deployments and resources
deployments = [
    {"name": openai_deployment_1_name, "resources": openai_resources_1},    # GPT-3.5 Turbo 1106
    {"name": openai_deployment_2_name, "resources": openai_resources_2},    # GPT-3.5 Turbo 0125
    {"name": openai_deployment_3_name, "resources": openai_resources_3},    # GPT-4o 2024-05-13
]

# Loop through each deployment and make the API request
for deployment in deployments:
    for i in range(runs):
        print(f"\n▶️ Run: {i+1} for deployment {deployment['name']}")
        make_api_request(deployment['name'], deployment['resources'], apim_subscription_key, apim_resource_gateway_url, openai_api_version, sleep_time_ms)     


<a id='portal'></a>
### 🔍 Open the workbook in the Azure Portal

Open the workbook resource and review the usage analysis to confirm the model routing and other metrics.

<a id='clean'></a>
### 🗑️ Clean up resources

When you're finished with the lab, you should remove all your deployed resources from Azure to avoid extra charges and keep your Azure subscription uncluttered.
Use the [clean-up-resources notebook](clean-up-resources.ipynb) for that.