## Azure OpenAI GPT Inferencing with APIM

![Description of GIF](./Assets/GPT-4o-inferencing.gif)


In [None]:
%pip install openai

In [None]:
%pip install python-dotenv

### 🧪 Test the API using a direct HTTP call

Use the code below to send a test request to your Azure OpenAI endpoint and inspect the response.  
This helps verify connectivity, authentication, and basic functionality before integrating into larger workflows.

- **API Key:** 🔑 `apim_subscription_key`
- **Endpoint:** 🌐 `url`
- **Sample Request:**  
    - Role-based messages for the GPT model

> 📝 **Tip:** Check the response headers and status code for troubleshooting and rate limit information!

In [None]:
import os
from dotenv import load_dotenv
import time 
import requests
import json
from tabulate import tabulate
runs = 1
sleep_time_ms = 1000

load_dotenv()

url = os.getenv("APIM_ENDPOINT")
apim_subscription_key = os.getenv("APIM_SUBSCRIPTION_KEY")


for i in range(runs):
    print("▶️ Run: ", i+1)
    messages={"messages":[
        {"role": "system", "content": "You are a sarcastic, unhelpful assistant."},
        {"role": "user", "content": "Can you tell me the time, please?"}
    ]}
    response = requests.post(url, headers = {'api-key':apim_subscription_key}, json = messages)
    # Print all response headers in a pretty table
    headers_list = [(key, value) for key, value in response.headers.items()]
    print("Response Headers:")
    print(tabulate(headers_list, headers=["Header", "Value"], tablefmt="fancy_grid"))
    print("x-ms-region: ", response.headers.get("x-ms-region")) # this header is useful to determine the region of the backend that served the request
    if (response.status_code == 200):
        data = json.loads(response.text)
        print("💬 ", data.get("choices")[0].get("message").get("content"))
    else:
        print(response.text)
    time.sleep(sleep_time_ms/1000)

### ⚡️ API Rate Limiting Demo

In the next section, we intentionally exceed the API rate limit to observe how the Azure OpenAI endpoint responds.  
You will see a `429` status code and a message indicating that the rate limit has been reached.  
This is useful for understanding throttling behavior and planning for robust error handling in production scenarios. 🚦

- **API Key Used:** 🔑 `apim_subscription_key`
- **Endpoint:** 🌐 `url`
- **Sample Response Headers:** 📨  
    - `Content-Length`, `Content-Type`, `Retry-After`, etc.

> 💡 Tip: Always implement retry logic and respect the `Retry-After` header to avoid service disruptions!


In [None]:
# Exhaust the rate limit
import time
for i in range(200):  # Adjust as needed to exceed your rate limit
    messages = {
        "messages": [
            {"role": "system", "content": "You are a sarcastic, unhelpful assistant."},
            {"role": "user", "content": "Can you tell me the time, please?"}
        ]
    }
    response = requests.post(url, headers={'api-key': apim_subscription_key}, json=messages)
    print(f"Request {i+1}: Status {response.status_code}")
    if response.status_code == 429:
        print("Rate limit exceeded!")
        print(response.json())
        break
    # Optional: sleep to avoid flooding too quickly
    time.sleep(0.5)

### 🧪 Test the API using the Azure OpenAI Python SDK

Use the code below to send a test request to your Azure OpenAI endpoint using the official Python SDK.  
This approach provides a more streamlined and Pythonic way to interact with the API compared to raw HTTP requests.

- **API Key:** 🔑 `apim_subscription_key`
- **Endpoint:** 🌐 `apim_resource_gateway_url`
- **Model Deployment:** 🤖 `openai_model_name`
- **API Version:** 📅 `openai_api_version`
- **Sample Request:**  
    - Role-based messages for the GPT model

> 📝 **Tip:** The SDK handles authentication, serialization, and error handling for you, making integration easier and more robust!

In [None]:
import time
from openai import AzureOpenAI
runs = 1
sleep_time_ms = 1000

apim_resource_gateway_url = os.getenv("APIM_GATEWAY_URL")
openai_model_name = os.getenv("MODEL_DEPLOYMENT_NAME")
openai_api_version = os.getenv("OPENAI_API_VERSION")
print("APIM Gateway URL: ", apim_resource_gateway_url)
print("OpenAI Model Name: ", openai_model_name)
print("OpenAI API Version: ", openai_api_version)

for i in range(runs):
    print("▶️ Run: ", i+1)
    messages=[
        {"role": "system", "content": "You are a sarcastic, unhelpful assistant."},
        {"role": "user", "content": "Can you tell me the time, please?"}
    ]
    client = AzureOpenAI(
        azure_endpoint=apim_resource_gateway_url,
        api_key=apim_subscription_key,
        api_version=openai_api_version
    )
    response = client.chat.completions.create(model=openai_model_name, messages=messages)
    print("💬 ", response.choices[0].message.content)
    time.sleep(sleep_time_ms/1000)

### 📝 Image Processing

In this section, we use the Azure OpenAI Python SDK to generate a response from the GPT model and render it directly as Markdown in the notebook.

- **API Endpoint:** 🌐 `apim_resource_gateway_url`
- **Model:** 🤖 `openai_model_name`
- **API Version:** 📅 `openai_api_version`
- **Client Instance:** 🔑 `client`

The code below demonstrates how to send a prompt and display the assistant's answer with rich formatting using `IPython.display.Markdown`. This is especially useful for math, code, and structured content!

Below, the referenced image is also displayed for context:

![Triangle Diagram](https://upload.wikimedia.org/wikipedia/commons/e/e2/The_Algebra_of_Mohammed_Ben_Musa_-_page_82b.png)

> 💡 **Tip:** Rendering responses as Markdown makes explanations, formulas, and code snippets easier to read and visually appealing.

In [None]:
from IPython.display import display, Markdown
response = client.chat.completions.create(
    model=openai_model_name,
    messages=[
        {"role": "system", "content": "You are a helpful assistant that responds in Markdown. Help me with my math homework!"},
        {"role": "user", "content": [
            {"type": "text", "text": "What's the area of the triangle?"},
            {"type": "image_url", "image_url": {
                "url": "https://upload.wikimedia.org/wikipedia/commons/e/e2/The_Algebra_of_Mohammed_Ben_Musa_-_page_82b.png"}
            }
        ]}
    ],
    temperature=0.0,
)

display(Markdown(response.choices[0].message.content))

### 🖼️ Base64 Image Processing Demo

In this section, we'll demonstrate how to send a local image (`meme.png`) as a base64-encoded string to the Azure OpenAI API for analysis. This is useful for scenarios where you want to process images that aren't hosted online.

- **Image Used:** `meme.png`
- **Encoding:** Base64
- **API:** Azure OpenAI GPT-4o
- **Response:** The assistant interprets the meme and explains the humor!

> 💡 **Tip:** Base64 encoding allows you to transmit image data directly in your API requests, making it easy to work with local files in notebooks.

![Meme Example](meme.png)

In [None]:
from IPython.display import display, Markdown, HTML
import base64
def encode_image(image_path):
    with open(image_path, "rb") as image_file:
        return base64.b64encode(image_file.read()).decode("utf-8")
base64_image = encode_image("meme.png")
response = client.chat.completions.create(
    model=openai_model_name,
    messages=[
        {"role": "system", "content": "You are a helpful assistant that helps to interprete memes!"},
        {"role": "user", "content": [
            {"type": "text", "text": "What's funny about this picture?"},
            {"type": "image_url", "image_url": {
                "url": f"data:image/png;base64,{base64_image}"}
            }
        ]}
    ],
    temperature=0.0,
)
HTML(f'<font size="5">{response.choices[0].message.content}</font>')