## 📚 Prerequisites

Ensure that your Azure Services are properly set up, your Conda environment is created, and your environment variables are configured as per the instructions in the [SETTINGS.md](SETTINGS.md) file.

## 📋 Table of Contents

This notebook assists in testing and retrieving the headers of Azure OpenAI, covering the following sections:

1. [**Setting Up Azure OpenAI Client**](#setting-up-azure-openai-client): Outlines the process of initializing the Azure OpenAI client.

2. [**Calling Azure OpenAI API**](#calling-azure-openai-api): Discusses how to make API calls to Azure OpenAI.

3. [**Extracting Headers and Payload Metadata**](#extracting-headers-and-payload-metadata): Explores how to extract headers and payload metadata from the API response.

4. [**Analyzing Rate Limit Info**](#analyzing-rate-limit-info): Details the steps to analyze the rate limit information from the API response.

For additional information, refer to the following resources:
- [AOAI API Documentation](https://learn.microsoft.com/en-us/azure/ai-services/openai/reference)

In [27]:
import os

# Define the target directory
target_directory = (
    r"C:\Users\pablosal\Desktop\gbbai-azure-aoai-faq"  # change your directory here
)

# Check if the directory exists
if os.path.exists(target_directory):
    # Change the current working directory
    os.chdir(target_directory)
    print(f"Directory changed to {os.getcwd()}")
else:
    print(f"Directory {target_directory} does not exist.")

Directory changed to C:\Users\pablosal\Desktop\gbbai-azure-aoai-faq


In [28]:
# Import the libraries
from dotenv import load_dotenv
from src.aoai.azure_openai import AzureOpenAIManager
from typing import Dict, Optional
from requests import Response
import requests

from utils.ml_logging import get_logger

# Load environment variables from .env file
load_dotenv()
# Set up logger
logger = get_logger()

## Setting Up Azure OpenAI Client

In [29]:
# Create an instance of the client. You can find it in src/aoai/azure_openai.py.
# It is essentially a wrapper using dependency injection to automate the initialization
# and most used API calls.
azure_openai_client = AzureOpenAIManager()

## Calling Azure OpenAI API

### Creating Helper Function
This function is designed to extract and build the metadata in JSON format. It's particularly useful for extracting rate limit and usage information.

In [30]:
def extract_rate_limit_and_usage_info(response: Response) -> Dict[str, Optional[int]]:
    """
    Extracts rate limiting information from the Azure Open API response headers and usage information from the payload.

    :param response: The response object returned by a requests call.
    :return: A dictionary containing the remaining requests, remaining tokens, and usage information
             including prompt tokens, completion tokens, and total tokens.
    """
    headers = response.headers
    usage = response.json().get("usage", {})
    return {
        "remaining-requests": headers.get("x-ratelimit-remaining-requests"),
        "remaining-tokens": headers.get("x-ratelimit-remaining-tokens"),
        "prompt-tokens": usage.get("prompt_tokens"),
        "completion_tokens": usage.get("completion_tokens"),
        "total_tokens": usage.get("total_tokens"),
    }

### Encapsulating Completions API Calls into a Function

In [31]:
def call_azure_openai_chat_completions_api(
    deployment_id: str, method: str, body: dict = None, api_version: str = "2023-11-01"
):
    """
    Calls the Azure OpenAI API with the given parameters.

    :param deployment_id: The ID of the deployment to access.
    :param method: The HTTP method to use ("get" or "post").
    :param body: The body of the request for "post" method. Defaults to None.
    :param api_version: The API version to use. Defaults to "2023-11-01".

    :return: The status code and response from the API call, along with rate limit headers.
    """
    if method.lower() not in ["get", "post"]:
        logger.error("Invalid HTTP method. Expected 'get' or 'post'.")
        return None, None, {}

    url = f"{azure_openai_client.azure_endpoint}/openai/deployments/{deployment_id}/chat/completions?api-version={api_version}"
    headers = {
        "Content-Type": "application/json",
        "api-key": azure_openai_client.api_key,
        "x-ms-useragent": "aoai-benchmark"
    }

    with requests.Session() as session:
        session.headers.update(headers)

        try:
            if method.lower() == "get":
                response = session.get(url)
            else:  # method.lower() == "post"
                response = session.post(url, json=body)
            response.raise_for_status()  # Raises HTTPError for bad responses
        except requests.HTTPError as http_err:
            logger.error(f"HTTP error occurred: {http_err}")
            return response.status_code, http_err.response.json(), {}
        except Exception as err:
            logger.error(f"An error occurred: {err}")
            return None, None, {}

    # Extract rate limit headers and usage details
    rate_limit_headers = extract_rate_limit_and_usage_info(response)
    print(response.headers)
    return response.status_code, response.json(), rate_limit_headers

## Extracting Headers and Payload Metadata

#### Constructing the Request

- **max_tokens**: Optional. Integer specifying the maximum number of tokens to generate. Default is 24.
- **temperature**: Optional. Number between 0 and 2 indicating the sampling temperature. Default is 1.
- **top_p**: Optional. Nucleus sampling parameter as a number between 0 and 1. Default is 1.
- **user**: Optional. A unique identifier for the end-user to help monitor and detect abuse.
- **n**: Optional. Integer for the number of completions to generate for each prompt. Default is 1.
- **presence_penalty**: Optional. Number between -2.0 and 2.0 to penalize new tokens based on presence in the text so far. Default is 0.
- **frequency_penalty**: Optional. Number between -2.0 and 2.0 to penalize new tokens based on frequency in the text so far. Default is 0.
- **messages**: Optional. An array of message objects.

You can learn more about the aoai API  [official documentation](https://learn.microsoft.com/en-us/azure/ai-services/openai/reference).

In [32]:
body = {
    "max_tokens": 24,
    "temperature": 1,
    "top_p": 1,
    "user": "",
    "n": 1,
    "presence_penalty": 0,
    "frequency_penalty": 0,
    "messages": [
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Does Azure OpenAI support customer managed keys?"},
        {
            "role": "assistant",
            "content": "Yes, customer managed keys are supported by Azure OpenAI.",
        },
        {"role": "user", "content": "Do other Azure AI services support this too?"},
        {
            "role": "assistant",
            "content": "Yes, other Azure AI services also support customer managed keys.",
        },
        {"role": "user", "content": "Can you tell me more about these services?"},
        {
            "role": "assistant",
            "content": "Sure, Azure AI services include Azure Cognitive Services, Azure Machine Learning, and more.",
        },
        {"role": "user", "content": "What is Azure Cognitive Services?"},
        {
            "role": "assistant",
            "content": "Azure Cognitive Services is a collection of APIs and services for building intelligent applications.",
        },
        {"role": "user", "content": "What is Azure Machine Learning?"},
        {
            "role": "assistant",
            "content": "Azure Machine Learning is a cloud-based service for building, training, and deploying machine learning models.",
        },
        {"role": "user", "content": "Thank you for the information."},
        {
            "role": "assistant",
            "content": "You're welcome! If you have any other questions, feel free to ask.",
        },
        {"role": "user", "content": "What other services does Azure offer?"},
        {
            "role": "assistant",
            "content": "Azure offers a wide range of services including computing, analytics, storage, and networking.",
        },
        {
            "role": "user",
            "content": "Can you tell me more about Azure's computing services?",
        },
        {
            "role": "assistant",
            "content": "Azure's computing services include virtual machines, container services, and serverless computing.",
        },
        {"role": "user", "content": "What is serverless computing?"},
        {
            "role": "assistant",
            "content": "Serverless computing is a cloud computing model where the cloud provider automatically manages the provisioning and scaling of servers.",
        },
        {
            "role": "user",
            "content": "That's interesting. Thank you for the information.",
        },
        {
            "role": "assistant",
            "content": "You're welcome! If you have any other questions, feel free to ask.",
        },
    ],
}

In [33]:
status_code, response, rate_limit_info = call_azure_openai_chat_completions_api(
    deployment_id=azure_openai_client.chat_model_name,
    method="post",
    body=body,
    api_version="2023-05-15",
)

# Print the status code, response, and rate limit info
print("Status Code:", status_code)
print("Response:", response)
print("Rate Limit Info:", rate_limit_info)

{'Cache-Control': 'no-cache, must-revalidate', 'Content-Length': '428', 'Content-Type': 'application/json', 'access-control-allow-origin': '*', 'apim-request-id': 'a4c6189e-8b0f-4a49-bf03-343bfe7c87f8', 'Strict-Transport-Security': 'max-age=31536000; includeSubDomains; preload', 'x-content-type-options': 'nosniff', 'x-ms-region': 'Canada East', 'x-ratelimit-remaining-requests': '9', 'x-ratelimit-remaining-tokens': '9629', 'x-accel-buffering': 'no', 'x-request-id': '6e7347f6-69db-480e-bf5b-934cf5130d7b', 'x-ms-client-request-id': 'a4c6189e-8b0f-4a49-bf03-343bfe7c87f8', 'azureml-model-session': 'd018-20240201032837', 'Date': 'Fri, 02 Feb 2024 05:13:43 GMT'}
Status Code: 200
Response: {'id': 'chatcmpl-8ng9sFt76UntMD7xLTfd29Ql4dbJA', 'object': 'chat.completion', 'created': 1706850820, 'model': 'gpt-4', 'choices': [{'finish_reason': 'length', 'index': 0, 'message': {'role': 'assistant', 'content': "You're welcome! If you have any more questions or need further clarification on any topic, fe

## Analyzing Rate Limit Info

The rate limit information provides details about the usage of the API:

- **Remaining Requests**: The number of API calls that can still be made. In this case, there are 9 requests left.
- **Remaining Tokens**: The number of tokens that can still be generated. In this case, there are 9258 tokens left.
- **Prompt Tokens**: The number of tokens used in the prompt for this API call. In this case, 333 tokens were used.
- **Completion Tokens**: The number of tokens generated in the completion for this API call. In this case, 24 tokens were generated.
- **Total Tokens**: The total number of tokens used in this API call. This is the sum of the prompt tokens and the completion tokens. In this case, 357 tokens were used.

### Bonus: Programmatically Stop Call if Input Tokens Exceed Limit

In [50]:
MAX_RETRY_SECONDS = 5.0

In [51]:
from typing import Tuple, Dict, Any, Optional, Union
import backoff
import time

In [53]:
def _terminal_http_code(e) -> bool:
    """
    Determine whether to give up retrying based on the HTTP status code.
    """
    # Assuming e is an exception instance from the requests library.
    status_code = e.response.status_code
    if status_code == 429:
        logger.info(f"429 status code received, will retry. Error: {e}")
    else:
        logger.info(f"Non-retryable status code {status_code} received. Error: {e}")
    return status_code != 429

def backoff_hdlr(details):
    """
    Log details when a backoff occurs.
    """
    logger.warning(f"Backing off {details['wait']:0.1f} seconds after {details['tries']} tries calling function {details['target']} with args {details['args']} and kwargs {details['kwargs']}")

def giveup_hdlr(details):
    """
    Log details when giving up on retries.
    """
    logger.warning(f"Giving up after {details['tries']} tries calling function {details['target']} with args {details['args']} and kwargs {details['kwargs']}")

@backoff.on_exception(backoff.expo,
                   (requests.exceptions.RequestException),
                    jitter=backoff.full_jitter,
                    max_time=MAX_RETRY_SECONDS,
                    giveup=_terminal_http_code,
                    on_backoff=backoff_hdlr,
                    on_giveup=giveup_hdlr)
def call_azure_openai_chat_completions_api_with_pre_check(
    deployment_id: str,
    method: str,
    body: Optional[Dict[str, Any]] = None,
    api_version: str = "2023-11-01",
) -> Tuple[Optional[int], Optional[Union[Dict[str, Any], str]], Dict[str, str]]:
    if method.lower() not in ["get", "post"]:
        logger.error("Invalid HTTP method. Expected 'get' or 'post'.")
        return None, "Invalid HTTP method. Expected 'get' or 'post'.", {}

    start_time = time.time()
    while time.time() - start_time < MAX_RETRY_SECONDS:
        (
            status_code,
            response,
            rate_limit_headers,
        ) = call_azure_openai_chat_completions_api(
            deployment_id, method=method, body=body, api_version=api_version)
        if status_code != 429:
            break
        time.sleep(1)  # sleep for a while before retrying

    return status_code, response, rate_limit_headers

In [54]:
import asyncio
from concurrent.futures import ThreadPoolExecutor

def call_api():
    status_code, response, rate_limit_info = call_azure_openai_chat_completions_api_with_pre_check(
        deployment_id=azure_openai_client.chat_model_name,
        method="post",
        body=body,
        api_version="2023-05-15",
    )
    return status_code, response, rate_limit_info

async def main():
    with ThreadPoolExecutor(max_workers=10) as executor:
        loop = asyncio.get_event_loop()
        tasks = []
        for _ in range(10):
            tasks.append(loop.run_in_executor(executor, call_api))

        responses = await asyncio.gather(*tasks)

        for response in responses:
            print(response)

# Run the main function
for _ in range(10):
    await main()
    time.sleep(1)

{'Cache-Control': 'no-cache, must-revalidate', 'Content-Length': '440', 'Content-Type': 'application/json', 'access-control-allow-origin': '*', 'apim-request-id': '31f1547c-6acb-4166-bb2a-bf9a97b997bf', 'Strict-Transport-Security': 'max-age=31536000; includeSubDomains; preload', 'x-content-type-options': 'nosniff', 'x-ms-region': 'Canada East', 'x-ratelimit-remaining-requests': '6', 'x-ratelimit-remaining-tokens': '8516', 'x-accel-buffering': 'no', 'x-request-id': 'f1274dd1-f361-442b-b488-1725f9b63cda', 'x-ms-client-request-id': '31f1547c-6acb-4166-bb2a-bf9a97b997bf', 'azureml-model-session': 'd017-20240201023444', 'Date': 'Fri, 02 Feb 2024 05:35:26 GMT'}
{'Cache-Control': 'no-cache, must-revalidate', 'Content-Length': '428', 'Content-Type': 'application/json', 'access-control-allow-origin': '*', 'apim-request-id': '2ea9f880-60b3-4899-8120-f6455f64f71e', 'Strict-Transport-Security': 'max-age=31536000; includeSubDomains; preload', 'x-content-type-options': 'nosniff', 'x-ms-region': 'Can

2024-02-01 23:35:31,644 - micro - MainProcess - ERROR    HTTP error occurred: 429 Client Error: Too Many Requests for url: https://ml-workspace-dev-canadaeast-001-aoai.openai.azure.com/openai/deployments/foundational-canadaeast-gpt4/chat/completions?api-version=2023-05-15 (2920182161.py:call_azure_openai_chat_completions_api:35)
2024-02-01 23:35:31,646 - micro - MainProcess - ERROR    HTTP error occurred: 429 Client Error: Too Many Requests for url: https://ml-workspace-dev-canadaeast-001-aoai.openai.azure.com/openai/deployments/foundational-canadaeast-gpt4/chat/completions?api-version=2023-05-15 (2920182161.py:call_azure_openai_chat_completions_api:35)
2024-02-01 23:35:31,653 - micro - MainProcess - ERROR    HTTP error occurred: 429 Client Error: Too Many Requests for url: https://ml-workspace-dev-canadaeast-001-aoai.openai.azure.com/openai/deployments/foundational-canadaeast-gpt4/chat/completions?api-version=2023-05-15 (2920182161.py:call_azure_openai_chat_completions_api:35)
2024-02

(429, {'error': {'code': '429', 'message': 'Requests to the ChatCompletions_Create Operation under Azure OpenAI API version 2023-05-15 have exceeded call rate limit of your current OpenAI S0 pricing tier. Please retry after 2 seconds. Please go here: https://aka.ms/oai/quotaincrease if you would like to further increase the default rate limit.'}}, {})
(429, {'error': {'code': '429', 'message': 'Requests to the ChatCompletions_Create Operation under Azure OpenAI API version 2023-05-15 have exceeded call rate limit of your current OpenAI S0 pricing tier. Please retry after 2 seconds. Please go here: https://aka.ms/oai/quotaincrease if you would like to further increase the default rate limit.'}}, {})
(429, {'error': {'code': '429', 'message': 'Requests to the ChatCompletions_Create Operation under Azure OpenAI API version 2023-05-15 have exceeded call rate limit of your current OpenAI S0 pricing tier. Please retry after 2 seconds. Please go here: https://aka.ms/oai/quotaincrease if you w

2024-02-01 23:35:43,553 - micro - MainProcess - ERROR    HTTP error occurred: 429 Client Error: Too Many Requests for url: https://ml-workspace-dev-canadaeast-001-aoai.openai.azure.com/openai/deployments/foundational-canadaeast-gpt4/chat/completions?api-version=2023-05-15 (2920182161.py:call_azure_openai_chat_completions_api:35)
2024-02-01 23:35:43,562 - micro - MainProcess - ERROR    HTTP error occurred: 429 Client Error: Too Many Requests for url: https://ml-workspace-dev-canadaeast-001-aoai.openai.azure.com/openai/deployments/foundational-canadaeast-gpt4/chat/completions?api-version=2023-05-15 (2920182161.py:call_azure_openai_chat_completions_api:35)
2024-02-01 23:35:43,565 - micro - MainProcess - ERROR    HTTP error occurred: 429 Client Error: Too Many Requests for url: https://ml-workspace-dev-canadaeast-001-aoai.openai.azure.com/openai/deployments/foundational-canadaeast-gpt4/chat/completions?api-version=2023-05-15 (2920182161.py:call_azure_openai_chat_completions_api:35)
2024-02

(429, {'error': {'code': '429', 'message': 'Requests to the ChatCompletions_Create Operation under Azure OpenAI API version 2023-05-15 have exceeded call rate limit of your current OpenAI S0 pricing tier. Please retry after 2 seconds. Please go here: https://aka.ms/oai/quotaincrease if you would like to further increase the default rate limit.'}}, {})
(429, {'error': {'code': '429', 'message': 'Requests to the ChatCompletions_Create Operation under Azure OpenAI API version 2023-05-15 have exceeded call rate limit of your current OpenAI S0 pricing tier. Please retry after 2 seconds. Please go here: https://aka.ms/oai/quotaincrease if you would like to further increase the default rate limit.'}}, {})
(429, {'error': {'code': '429', 'message': 'Requests to the ChatCompletions_Create Operation under Azure OpenAI API version 2023-05-15 have exceeded call rate limit of your current OpenAI S0 pricing tier. Please retry after 2 seconds. Please go here: https://aka.ms/oai/quotaincrease if you w

2024-02-01 23:35:54,752 - micro - MainProcess - ERROR    HTTP error occurred: 429 Client Error: Too Many Requests for url: https://ml-workspace-dev-canadaeast-001-aoai.openai.azure.com/openai/deployments/foundational-canadaeast-gpt4/chat/completions?api-version=2023-05-15 (2920182161.py:call_azure_openai_chat_completions_api:35)
2024-02-01 23:35:54,769 - micro - MainProcess - ERROR    HTTP error occurred: 429 Client Error: Too Many Requests for url: https://ml-workspace-dev-canadaeast-001-aoai.openai.azure.com/openai/deployments/foundational-canadaeast-gpt4/chat/completions?api-version=2023-05-15 (2920182161.py:call_azure_openai_chat_completions_api:35)
2024-02-01 23:35:54,778 - micro - MainProcess - ERROR    HTTP error occurred: 429 Client Error: Too Many Requests for url: https://ml-workspace-dev-canadaeast-001-aoai.openai.azure.com/openai/deployments/foundational-canadaeast-gpt4/chat/completions?api-version=2023-05-15 (2920182161.py:call_azure_openai_chat_completions_api:35)
2024-02

(429, {'error': {'code': '429', 'message': 'Requests to the ChatCompletions_Create Operation under Azure OpenAI API version 2023-05-15 have exceeded call rate limit of your current OpenAI S0 pricing tier. Please retry after 3 seconds. Please go here: https://aka.ms/oai/quotaincrease if you would like to further increase the default rate limit.'}}, {})
(429, {'error': {'code': '429', 'message': 'Requests to the ChatCompletions_Create Operation under Azure OpenAI API version 2023-05-15 have exceeded call rate limit of your current OpenAI S0 pricing tier. Please retry after 3 seconds. Please go here: https://aka.ms/oai/quotaincrease if you would like to further increase the default rate limit.'}}, {})
(429, {'error': {'code': '429', 'message': 'Requests to the ChatCompletions_Create Operation under Azure OpenAI API version 2023-05-15 have exceeded call rate limit of your current OpenAI S0 pricing tier. Please retry after 3 seconds. Please go here: https://aka.ms/oai/quotaincrease if you w

2024-02-01 23:36:01,176 - micro - MainProcess - ERROR    HTTP error occurred: 429 Client Error: Too Many Requests for url: https://ml-workspace-dev-canadaeast-001-aoai.openai.azure.com/openai/deployments/foundational-canadaeast-gpt4/chat/completions?api-version=2023-05-15 (2920182161.py:call_azure_openai_chat_completions_api:35)
2024-02-01 23:36:01,202 - micro - MainProcess - ERROR    HTTP error occurred: 429 Client Error: Too Many Requests for url: https://ml-workspace-dev-canadaeast-001-aoai.openai.azure.com/openai/deployments/foundational-canadaeast-gpt4/chat/completions?api-version=2023-05-15 (2920182161.py:call_azure_openai_chat_completions_api:35)
2024-02-01 23:36:01,353 - micro - MainProcess - ERROR    HTTP error occurred: 429 Client Error: Too Many Requests for url: https://ml-workspace-dev-canadaeast-001-aoai.openai.azure.com/openai/deployments/foundational-canadaeast-gpt4/chat/completions?api-version=2023-05-15 (2920182161.py:call_azure_openai_chat_completions_api:35)
2024-02

CancelledError: 