# APIM ❤️ FinOps

## FinOps Framework lab
![flow](../../images/finops-framework.gif)

This playground leverages the [FinOps Framework](https://www.finops.org/framework/) and Azure API Management to control AI costs. It uses the [token limit](https://learn.microsoft.com/en-us/azure/api-management/azure-openai-token-limit-policy) policy for each [product](https://learn.microsoft.com/en-us/azure/api-management/api-management-howto-add-products?tabs=azure-portal&pivots=interactive) and integrates [Azure Monitor alerts](https://learn.microsoft.com/en-us/azure/azure-monitor/alerts/alerts-overview) with [Logic Apps](https://learn.microsoft.com/en-us/azure/azure-monitor/alerts/alerts-logic-apps?tabs=send-email) to automatically disable APIM [subscriptions](https://learn.microsoft.com/en-us/azure/api-management/api-management-subscriptions) that exceed cost quotas.

### Result
![result](result.png)

### Prerequisites

- [Python 3.12 or later version](https://www.python.org/) installed
- [VS Code](https://code.visualstudio.com/) installed with the [Jupyter notebook extension](https://marketplace.visualstudio.com/items?itemName=ms-toolsai.jupyter) enabled
- [Python environment](https://code.visualstudio.com/docs/python/environments#_creating-environments) with the [requirements.txt](../../requirements.txt) or run `pip install -r requirements.txt` in your terminal
- [An Azure Subscription](https://azure.microsoft.com/free/) with [Contributor](https://learn.microsoft.com/en-us/azure/role-based-access-control/built-in-roles/privileged#contributor) + [RBAC Administrator](https://learn.microsoft.com/en-us/azure/role-based-access-control/built-in-roles/privileged#role-based-access-control-administrator) or [Owner](https://learn.microsoft.com/en-us/azure/role-based-access-control/built-in-roles/privileged#owner) roles
- [Azure CLI](https://learn.microsoft.com/cli/azure/install-azure-cli) installed and [Signed into your Azure subscription](https://learn.microsoft.com/cli/azure/authenticate-azure-cli-interactively)

▶️ Click `Run All` to execute all steps sequentially, or execute them `Step by Step`...


<a id='0'></a>
### 0️⃣ Initialize notebook variables

- Resources will be suffixed by a unique string based on your subscription id.
- Adjust the location parameters according your preferences and on the [product availability by Azure region.](https://azure.microsoft.com/explore/global-infrastructure/products-by-region/?cdn=disable&products=cognitive-services,api-management) 
- Adjust the OpenAI model and version according the [availability by region.](https://learn.microsoft.com/azure/ai-services/openai/concepts/models) 

In [1]:
import os, sys, json
sys.path.insert(1, '../../shared')  # add the shared directory to the Python path
import utils

deployment_name = os.path.basename(os.path.dirname(globals()['__vsc_ipynb_file__']))
resource_group_name = f"ENG-GWU-APIM-{deployment_name}" # change the name to match your naming style
resource_group_location = "eastus2" 

apim_sku = 'Basicv2'
apim_products_config = [{"name": "platinum", "displayName": "Platinum Product", "tpm": 2000, "tokenQuota": 1000000, "tokenQuotaPeriod": "Monthly", "costQuota": 5 },
                    {"name": "gold", "displayName": "Gold Product", "tpm": 1000, "tokenQuota": 1000000, "tokenQuotaPeriod": "Monthly", "costQuota": 5}, 
                    {"name": "silver", "displayName": "Silver Product", "tpm": 500, "tokenQuota": 1000000, "tokenQuotaPeriod": "Monthly", "costQuota": 5}]
apim_users_config = [ ]
apim_subscriptions_config = [{"name": "subscription1", "displayName": "Subscription 1", "product": "platinum" },
                    {"name": "subscription2", "displayName": "Subscription 2", "product": "gold" },
                    {"name": "subscription3", "displayName": "Subscription 3", "product": "silver" },
                     {"name": "subscription4", "displayName": "Subscription 4", "product": "silver" } ]

openai_resource_location = "eastus2"

openai_deployments = [ { "name": "gpt-4o-mini", "model": "gpt-4o-mini", "capacity": 200, "version": "2024-07-18", "sku": "GlobalStandard", "inputTokensMeterSku": "gpt-4o-mini-0718-Inp-glbl", "outputTokensMeterSku": "gpt-4o-mini-0718-Outp-glbl" }, 
            { "name": "gpt-4o", "model": "gpt-4o", "capacity": 200, "version": "2024-11-20", "sku": "GlobalStandard", "inputTokensMeterSku": "gpt-4o-0806-Inp-glbl", "outputTokensMeterSku": "gpt-4o-0806-Outp-glbl" },
            { "name": "o1-mini", "model": "o1-mini", "capacity": 200, "version": "2024-09-12", "sku": "GlobalStandard", "inputTokensMeterSku": "o1 mini input glbl", "outputTokensMeterSku": "o1 mini output glbl"} ]

openai_api_version = "2024-10-21"
currency_code = 'USD'

utils.print_ok('Notebook initialized')

✅ [1;32mNotebook initialized[0m ⌚ 11:58:57.579055 


<a id='1'></a>
### 1️⃣ Verify the Azure CLI and the connected Azure subscription

The following commands ensure that you have the latest version of the Azure CLI and that the Azure CLI is connected to your Azure subscription.

In [2]:
output = utils.run("az account show", "Retrieved az account", "Failed to get the current az account")

if output.success and output.json_data:
    current_user = output.json_data['user']['name']
    tenant_id = output.json_data['tenantId']
    subscription_id = output.json_data['id']

    utils.print_info(f"Current user: {current_user}")
    utils.print_info(f"Tenant ID: {tenant_id}")
    utils.print_info(f"Subscription ID: {subscription_id}")

output = utils.run("az ad signed-in-user show", "Retrieved az ad signed-in-user", "Failed to get az ad signed-in-user")
if output.success and output.json_data:
    current_user_object_id = output.json_data['id']

    

⚙️ [1;34mRunning: az account show [0m
✅ [1;32mRetrieved az account[0m ⌚ 11:59:08.191265 [0m:0s]
👉🏽 [1;34mCurrent user: admin@MngEnvMCAP986157.onmicrosoft.com[0m
👉🏽 [1;34mTenant ID: 06e268d3-585c-4ab5-9219-8b48220b8fd6[0m
👉🏽 [1;34mSubscription ID: e1f7b502-6ec4-4a45-be74-59f2587c55ec[0m
⚙️ [1;34mRunning: az ad signed-in-user show [0m
✅ [1;32mRetrieved az ad signed-in-user[0m ⌚ 11:59:08.838378 [0m:0s]


<a id='2'></a>
### 2️⃣ Create deployment using 🦾 Bicep

This lab uses [Bicep](https://learn.microsoft.com/azure/azure-resource-manager/bicep/overview?tabs=bicep) to declarative define all the resources that will be deployed in the specified resource group. Change the parameters or the [main.bicep](main.bicep) directly to try different configurations. 

⚠️ Retry this step if you get deployment error: `workspace not active` 

In [3]:
# Create the resource group if doesn't exist
utils.create_resource_group(resource_group_name, resource_group_location)

# Define the Bicep parameters
bicep_parameters = {
    "$schema": "https://schema.management.azure.com/schemas/2019-04-01/deploymentParameters.json#",
    "contentVersion": "1.0.0.0",
    "parameters": {
        "currentUserObjectId": { "value": current_user_object_id },
        "apimSku": { "value": apim_sku },
        "apimUsersConfig": { "value": apim_users_config },
        "apimSubscriptionsConfig": { "value": apim_subscriptions_config },
        "apimProductsConfig": { "value": apim_products_config },
        "openAIResourceLocation": { "value": openai_resource_location },
        "openAIDeployments": { "value": openai_deployments },
        "openAIAPIVersion": { "value": openai_api_version }
    }
}

# Write the parameters to the params.json file
with open('params.json', 'w') as bicep_parameters_file:
    bicep_parameters_file.write(json.dumps(bicep_parameters))

# Run the deployment
output = utils.run(f"az deployment group create --name {deployment_name} --resource-group {resource_group_name} --template-file main.bicep --parameters params.json",
    f"Deployment '{deployment_name}' succeeded", f"Deployment '{deployment_name}' failed")

⚙️ [1;34mRunning: az group show --name ENG-GWU-APIM-finops-framework [0m
👉🏽 [1;34mResource group ENG-GWU-APIM-finops-framework does not yet exist. Creating the resource group now...[0m
⚙️ [1;34mRunning: az group create --name ENG-GWU-APIM-finops-framework --location eastus2 --tags source=ai-gateway [0m
✅ [1;32mResource group 'ENG-GWU-APIM-finops-framework' created[0m ⌚ 12:08:36.300713 [0m:1s]
⚙️ [1;34mRunning: az deployment group create --name finops-framework --resource-group ENG-GWU-APIM-finops-framework --template-file main.bicep --parameters params.json [0m
✅ [1;32mDeployment 'finops-framework' succeeded[0m ⌚ 12:10:25.282614 [1m:48s]


<a id='3'></a>
### 3️⃣ Get the deployment outputs

Retrieve the required outputs from the Bicep deployment.

In [4]:
# Obtain all of the outputs from the deployment
output = utils.run(f"az deployment group show --name {deployment_name} -g {resource_group_name}", f"Retrieved deployment: {deployment_name}", f"Failed to retrieve deployment: {deployment_name}")

if output.success and output.json_data:
    apim_resource_gateway_url = utils.get_deployment_output(output, 'apimResourceGatewayURL', 'APIM API Gateway URL')
    pricing_dcr_endpoint = utils.get_deployment_output(output, 'pricingDCREndpoint', 'Pricing DCR Endpoint')
    pricing_dcr_immutable_id = utils.get_deployment_output(output, 'pricingDCRImmutableId', 'Pricing DCR ImmutableId')
    pricing_dcr_stream = utils.get_deployment_output(output, 'pricingDCRStream', 'Pricing DCR Stream')
    subscription_quota_dcr_endpoint = utils.get_deployment_output(output, 'subscriptionQuotaDCREndpoint', 'Subscription Quota DCR Endpoint')
    subscription_quota_dcr_immutable_id = utils.get_deployment_output(output, 'subscriptionQuotaDCRImmutableId', 'Subscription Quota DCR ImmutableId')
    subscription_quota_dcr_stream = utils.get_deployment_output(output, 'subscriptionQuotaDCRStream', 'Subscription Quota DCR Stream')
    
    apim_subscriptions = json.loads(utils.get_deployment_output(output, 'apimSubscriptions').replace("\'", "\""))
    for subscription in apim_subscriptions:
        subscription_name = subscription['name']
        subscription_key = subscription['key']
        utils.print_info(f"Subscription Name: {subscription_name}")
        utils.print_info(f"Subscription Key: ****{subscription_key[-4:]}")


⚙️ [1;34mRunning: az deployment group show --name finops-framework -g ENG-GWU-APIM-finops-framework [0m
✅ [1;32mRetrieved deployment: finops-framework[0m ⌚ 12:25:27.796140 [0m:0s]
👉🏽 [1;34mAPIM API Gateway URL: https://apim-zkkb2uxidyxii.azure-api.net[0m
👉🏽 [1;34mPricing DCR Endpoint: https://dcr-pricing-zkkb2uxidyxii-4h0s-eastus2.logs.z1.ingest.monitor.azure.com[0m
👉🏽 [1;34mPricing DCR ImmutableId: dcr-45fcb987667c456f98428cb81367fff0[0m
👉🏽 [1;34mPricing DCR Stream: Custom-Json-PRICING_CL[0m
👉🏽 [1;34mSubscription Quota DCR Endpoint: https://dcr-quota-zkkb2uxidyxii-cwvx-eastus2.logs.z1.ingest.monitor.azure.com[0m
👉🏽 [1;34mSubscription Quota DCR ImmutableId: dcr-c86d4719732d45dca51c980db65ed698[0m
👉🏽 [1;34mSubscription Quota DCR Stream: Custom-Json-SUBSCRIPTION_QUOTA_CL[0m
👉🏽 [1;34mSubscription Name: subscription1[0m
👉🏽 [1;34mSubscription Key: ****96d8[0m
👉🏽 [1;34mSubscription Name: subscription2[0m
👉🏽 [1;34mSubscription Key: ****5d11[0m
👉🏽 [1;34mSubscription 

<a id='pricing'></a>
### 🔍 Display retail pricing info based on the [pricing API](https://learn.microsoft.com/en-us/rest/api/cost-management/retail-prices/azure-retail-prices)



In [5]:
%pip install tabulate

import requests
from tabulate import tabulate

def build_pricing_table(json_data, table_data):
    for item in json_data['Items']:
        meter = item['meterName']
        table_data.append([item['armRegionName'], item['armSkuName'], item['retailPrice']*1000])

table_data = []
table_data.append(['Region', 'SKU', 'Retail Price'])
prices = requests.get(f"https://prices.azure.com/api/retail/prices?currencyCode='{currency_code}'&$filter=productName eq 'Azure OpenAI' and unitOfMeasure eq '1K' and armRegionName eq '{openai_resource_location}'")
if prices.status_code == 200:
    prices_json = prices.json()
    build_pricing_table(prices_json, table_data)
print(tabulate(table_data, headers='firstrow', tablefmt='psql'))


Collecting tabulate
  Downloading tabulate-0.9.0-py3-none-any.whl (35 kB)
Installing collected packages: tabulate
Successfully installed tabulate-0.9.0
Note: you may need to restart the kernel to use updated packages.
+----------+-----------------------------------------+----------------+
| Region   | SKU                                     |   Retail Price |
|----------+-----------------------------------------+----------------|
| eastus2  | gpt4omini-rt-aud1217 Inp glbl           |         10     |
| eastus2  | gpt-35-turbo4K-Outp-glbl                |          2     |
| eastus2  | o3 mini 0131 cached input regnl         |          0.605 |
| eastus2  | gpt-4-turbo-Vision-128K Input-regional  |         10     |
| eastus2  | gpt4omini-rt-aud1217 Outp regnl         |         22     |
| eastus2  | gpt4o realtime cached audio inp regn    |         22     |
| eastus2  | gpt-4o-rt-txt-1217 Inp DZone            |          5.5   |
| eastus2  | gpt-4o-rt-txt-1217 cchd Inp rgnl        |        

<a id='4'></a>
### 4️⃣ Load the pricing data into Azure Monitor custom table

👉 This script uses retail price information. Please adjust it to apply a discount or to use a flat rate with PTUs.   
👉 We are multiplying by 1000 to get the retail price per 1K tokens.   
👉 Deploy this script as a [job](https://learn.microsoft.com/en-us/azure/container-apps/jobs?tabs=azure-cli) to run automatically on a predefined schedule.

In [6]:
%pip install azure-identity azure-monitor-ingestion azure-core

import requests
from azure.identity import DefaultAzureCredential
from azure.monitor.ingestion import LogsIngestionClient
from azure.core.exceptions import HttpResponseError
from datetime import datetime, timezone

credential = DefaultAzureCredential()
client = LogsIngestionClient(endpoint=pricing_dcr_endpoint, credential=credential, logging_enable=False)

prices = requests.get(f"https://prices.azure.com/api/retail/prices?currencyCode='{currency_code}'&$filter=productName eq 'Azure OpenAI' and unitOfMeasure eq '1K' and armRegionName eq '{openai_resource_location}'")
if prices.status_code == 200:
    prices_json = prices.json()
    if prices_json and 'Items' in prices_json:
        for deployment in openai_deployments:
            input_tokens_price = next((item['retailPrice'] * 1000 for item in prices_json['Items'] if item.get('skuName') == deployment.get("inputTokensMeterSku")), None)
            output_tokens_price = next((item['retailPrice'] * 1000 for item in prices_json['Items'] if item.get('skuName') == deployment.get("outputTokensMeterSku")), None)
            utils.print_info(f"Adding model {deployment.get('name')} with input / output tokens price {input_tokens_price} / {output_tokens_price}")
            body = [{ "TimeGenerated": str(datetime.now(timezone.utc)),
                    "Model": deployment.get("name"),
                    "InputTokensPrice": input_tokens_price,
                    "OutputTokensPrice": output_tokens_price }]
            try:
                client.upload(rule_id=pricing_dcr_immutable_id, stream_name=pricing_dcr_stream, logs=body)
                utils.print_ok(f"Upload succeeded for model {deployment.get('name')}")
            except HttpResponseError as e:
                utils.print_error(f"Upload failed: {e}")            


Collecting azure-identity
  Using cached azure_identity-1.21.0-py3-none-any.whl (189 kB)
Collecting azure-monitor-ingestion
  Using cached azure_monitor_ingestion-1.0.4-py3-none-any.whl (46 kB)
Collecting azure-core
  Downloading azure_core-1.33.0-py3-none-any.whl (207 kB)
[K     |████████████████████████████████| 207 kB 6.4 MB/s eta 0:00:01
[?25hCollecting msal>=1.30.0
  Using cached msal-1.32.3-py3-none-any.whl (115 kB)
Collecting msal-extensions>=1.2.0
  Downloading msal_extensions-1.3.0-py3-none-any.whl (28 kB)
Collecting cryptography>=2.5
  Downloading cryptography-44.0.3-cp37-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.2 MB)
[K     |████████████████████████████████| 4.2 MB 96.9 MB/s eta 0:00:01
Collecting isodate>=0.6.0
  Downloading isodate-0.7.2-py3-none-any.whl (22 kB)
Collecting PyJWT[crypto]<3,>=1.0.0
  Downloading PyJWT-2.9.0-py3-none-any.whl (22 kB)
Collecting cffi>=1.12; platform_python_implementation != "PyPy"
  Downloading cffi-1.17.1-cp38-cp38-manylinux_2

<a id='5'></a>
### 5️⃣ Load the Subscription Quota into Azure Monitor custom table


In [7]:
import requests
from azure.identity import DefaultAzureCredential
from azure.monitor.ingestion import LogsIngestionClient
from azure.core.exceptions import HttpResponseError
from datetime import datetime, timezone

credential = DefaultAzureCredential()
client = LogsIngestionClient(endpoint=subscription_quota_dcr_endpoint, credential=credential, logging_enable=False)

for subscription in apim_subscriptions_config:
    for product in apim_products_config:
        if product.get("name") == subscription.get("product"):
            cost_quota = product.get("costQuota")
            utils.print_info(f"Adding {subscription.get('name')} with cost quota {cost_quota}")
            body = [{ 
                "TimeGenerated": str(datetime.now(timezone.utc)),
                "Subscription": subscription.get('name'),
                "Email": subscription.get("email"),
                "CostQuota": cost_quota
            }]
            try:
                client.upload(rule_id=subscription_quota_dcr_immutable_id, stream_name=subscription_quota_dcr_stream, logs=body)
                utils.print_ok(f"Upload succeeded for {subscription.get('name')}")
            except HttpResponseError as e:
                utils.print_error(f"Upload failed: {e}")            


👉🏽 [1;34mAdding subscription1 with cost quota 5[0m
✅ [1;32mUpload succeeded for subscription1[0m ⌚ 12:29:54.835964 
👉🏽 [1;34mAdding subscription2 with cost quota 5[0m
✅ [1;32mUpload succeeded for subscription2[0m ⌚ 12:29:55.044660 
👉🏽 [1;34mAdding subscription3 with cost quota 5[0m
✅ [1;32mUpload succeeded for subscription3[0m ⌚ 12:29:55.284005 
👉🏽 [1;34mAdding subscription4 with cost quota 5[0m
✅ [1;32mUpload succeeded for subscription4[0m ⌚ 12:29:55.392847 


<a id='sdk'></a>
### 🧪 Execute multiple runs using the Azure OpenAI Python SDK

👉 We will send requests with random subscription and models. Adjust the `sleep_time_ms` and the number of `runs` to your test scenario.


In [13]:
import time, random
from openai import AzureOpenAI

runs = 12
sleep_time_ms = 150

for i in range(runs):
    apim_subscription = random.choice(apim_subscriptions)
    openai_model = random.choice(openai_deployments)
    client = AzureOpenAI(
        azure_endpoint = apim_resource_gateway_url,
        api_key = apim_subscription.get("key"),
        api_version = openai_api_version
    )
    try:
        response = client.chat.completions.create(
            model = str(openai_model.get('name')),
            messages = [
                {"role": "user", "content": "Can you tell me the time, please?"}
            ],
            extra_headers = {"x-user-id": "alex"}
        )
        print(f"▶️ Run {i+1}/{runs}: [{apim_subscription.get('name')} w/ {openai_model.get('name')}] 💬 {response.choices[0].message.content}")
    except Exception as e:
        print(f"❌ Run {i+1}/{runs}: [{apim_subscription.get('name')} w/ {openai_model.get('name')}] Error: {e}")
    time.sleep(sleep_time_ms/1000)


▶️ Run 1/12: [subscription1 w/ o1-mini] 💬 I'm sorry, but I can't access real-time information. Please check the current time on your device or a reliable time source.
▶️ Run 2/12: [subscription3 w/ gpt-4o] 💬 I don't have access to real-time data or a clock, but you can check the time on your device or nearby clock! 😊 Let me know if there's anything else I can help with.
▶️ Run 3/12: [subscription4 w/ o1-mini] 💬 I'm sorry, but I can't provide the current time. Please check the time on your device or another reliable source.
▶️ Run 4/12: [subscription2 w/ o1-mini] 💬 I'm sorry, but I can't provide real-time information. Please check your device's clock or another reliable source for the current time.
▶️ Run 5/12: [subscription3 w/ o1-mini] 💬 I'm sorry, but I can't provide the current time. Please check the time on your device or another reliable source.
▶️ Run 6/12: [subscription1 w/ gpt-4o] 💬 I can't provide real-time information, like the current time, as I don’t have access to live dat

<a id='workbooks'></a>
### 🔍 Open the dashboard and workbooks in the Azure Portal

👉 The Cost Analysis workbook contains information on the total costs and quotas for each subscription.  
👉 The [Azure OpenAI Insights workbook](https://github.com/dolevshor/Azure-OpenAI-Insights) provides comprehensive details about service and model usage. Credits to [Dolev Shor](https://github.com/dolevshor/Azure-OpenAI-Insights).  
👉 The [Alerts workbook](https://github.com/microsoft/AzureMonitorCommunity/tree/master/Azure%20Services) provides information about the alerts triggered by Azure Monitor.  

<a id='clean'></a>
### 🗑️ Clean up resources

When you're finished with the lab, you should remove all your deployed resources from Azure to avoid extra charges and keep your Azure subscription uncluttered.
Use the [clean-up-resources notebook](clean-up-resources.ipynb) for that.