## 📚 Prerequisites

Ensure that your Azure Services are properly set up, your Conda environment is created, and your environment variables are configured as per the instructions in the [SETTINGS.md](SETTINGS.md) file.

## 📋 Table of Contents

This notebook assists in testing and retrieving the headers of Azure OpenAI, covering the following sections:

1. [**Setting Up Azure OpenAI Client**](#setting-up-azure-openai-client): Outlines the process of initializing the Azure OpenAI client.

2. [**Calling Azure OpenAI API**](#calling-azure-openai-api): Discusses how to make API calls to Azure OpenAI.

3. [**Extracting Headers and Payload Metadata**](#extracting-headers-and-payload-metadata): Explores how to extract headers and payload metadata from the API response.

4. [**Analyzing Rate Limit Info**](#analyzing-rate-limit-info): Details the steps to analyze the rate limit information from the API response.

For additional information, refer to the following resources:
- [AOAI API Documentation](https://learn.microsoft.com/en-us/azure/ai-services/openai/reference)

And visit the blog here 

In [1]:
import os

# Define the target directory
target_directory = (
    r"C:\Users\pablosal\Desktop\gbbai-azure-aoai-faq"  # change your directory here
)

# Check if the directory exists
if os.path.exists(target_directory):
    # Change the current working directory
    os.chdir(target_directory)
    print(f"Directory changed to {os.getcwd()}")
else:
    print(f"Directory {target_directory} does not exist.")

Directory changed to C:\Users\pablosal\Desktop\gbbai-azure-aoai-faq


In [2]:
# Import the libraries
from dotenv import load_dotenv
from src.aoai.azure_openai import AzureOpenAIManager
from typing import Dict, Optional
from requests import Response
import requests

from utils.ml_logging import get_logger

# Load environment variables from .env file
load_dotenv()
# Set up logger
logger = get_logger()

## Setting Up Azure OpenAI Client

In [3]:
# Create an instance of the client. You can find it in src/aoai/azure_openai.py.
# It is essentially a wrapper using dependency injection to automate the initialization
# and most used API calls.
azure_openai_client = AzureOpenAIManager()

## Calling Azure OpenAI API

### Creating Helper Function
This function is designed to extract and build the metadata in JSON format. It's particularly useful for extracting rate limit and usage information.

In [4]:
def extract_rate_limit_and_usage_info(response: Response) -> Dict[str, Optional[int]]:
    """
    Extracts rate limiting information from the Azure Open API response headers and usage information from the payload.

    :param response: The response object returned by a requests call.
    :return: A dictionary containing the remaining requests, remaining tokens, and usage information
             including prompt tokens, completion tokens, and total tokens.
    """
    headers = response.headers
    usage = response.json().get("usage", {})
    return {
        "remaining-requests": headers.get("x-ratelimit-remaining-requests"),
        "remaining-tokens": headers.get("x-ratelimit-remaining-tokens"),
        "retry-after": headers.get("retry-after", None), 
        "prompt-tokens": usage.get("prompt_tokens", None),
        "completion_tokens": usage.get("completion_tokens", None),
        "total_tokens": usage.get("total_tokens", None),
     
    }

### Encapsulating Completions API Calls into a Function

In [5]:
def call_azure_openai_chat_completions_api(
    deployment_id: str, body: dict = None, api_version: str = "2023-11-01"
):
    """
    Calls the Azure OpenAI API with the given parameters.

    :param deployment_id: The ID of the deployment to access.
    :param body: The body of the request for "post" method. Defaults to None.
    :param api_version: The API version to use. Defaults to "2023-11-01".

    :return: The status code and response from the API call, along with rate limit headers.
    """

    url = f"{azure_openai_client.azure_endpoint}/openai/deployments/{deployment_id}/chat/completions?api-version={api_version}"
    headers = {
        "Content-Type": "application/json",
        "api-key": azure_openai_client.api_key,
        "x-ms-useragent": "aoai-faq"
    }

    with requests.Session() as session:
        session.headers.update(headers)

    try:
        response = session.post(url, json=body)
        response.raise_for_status()  # Raises HTTPError for bad responses
    except requests.HTTPError as http_err:
        logger.error(f"HTTP error occurred: {http_err}")
        print(f"HEADERS -> {response.headers}")
        return response.status_code, http_err.response.json(), {}
    except Exception as err:
        logger.error(f"An error occurred: {err}")
        print(f"HEADERS -> {response.headers}")
        return None, None, {}

    # Extract rate limit headers and usage details
    rate_limit_headers = extract_rate_limit_and_usage_info(response)
    print(f"HEADERS -> {response.headers}")
    print(f"STATUS CODE -> {response.status_code}")
    return response.status_code, response.json(), rate_limit_headers

## Extracting Headers and Payload Metadata

#### Constructing the Request

- **max_tokens**: Optional. Integer specifying the maximum number of tokens to generate. Default is 24.
- **temperature**: Optional. Number between 0 and 2 indicating the sampling temperature. Default is 1.
- **top_p**: Optional. Nucleus sampling parameter as a number between 0 and 1. Default is 1.
- **user**: Optional. A unique identifier for the end-user to help monitor and detect abuse.
- **n**: Optional. Integer for the number of completions to generate for each prompt. Default is 1.
- **presence_penalty**: Optional. Number between -2.0 and 2.0 to penalize new tokens based on presence in the text so far. Default is 0.
- **frequency_penalty**: Optional. Number between -2.0 and 2.0 to penalize new tokens based on frequency in the text so far. Default is 0.
- **messages**: Optional. An array of message objects.

You can learn more about the aoai API  [official documentation](https://learn.microsoft.com/en-us/azure/ai-services/openai/reference).

In [6]:
body = {
    "max_tokens": 3000,
    "temperature": 1,
    "top_p": 1,
    "user": "",
    "n": 1,
    "presence_penalty": 0,
    "frequency_penalty": 0,
    "messages": [
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Does Azure OpenAI support customer-managed keys?"},
        {
            "role": "assistant",
            "content": '''Yes, Azure OpenAI supports customer-managed keys, allowing customers 
            to control encryption keys and ensuring that data remains secure.'''
        },
        {"role": "user", "content": "Do other Azure AI services support this as well?"},
    ],
}


In [7]:
status_code, response, rate_limit_info = call_azure_openai_chat_completions_api(
    deployment_id=azure_openai_client.chat_model_name,
    body=body,
    api_version="2024-02-15-preview",
)

# Print the status code, response, and rate limit info
print("Status Code:", status_code)
print("Response:", response)
print("Rate Limit Info:", rate_limit_info)

HEADERS -> {'Content-Length': '1658', 'Content-Type': 'application/json', 'x-ms-region': 'East US 2', 'apim-request-id': '23258c34-4768-4207-875e-4f03c5b33e7b', 'x-ratelimit-remaining-requests': '449', 'x-accel-buffering': 'no', 'x-ms-rai-invoked': 'true', 'x-request-id': 'd044fe06-d994-4b72-9041-37a8711e8599', 'Strict-Transport-Security': 'max-age=31536000; includeSubDomains; preload', 'azureml-model-session': 'd065-20240607042906', 'x-content-type-options': 'nosniff', 'x-envoy-upstream-service-time': '4734', 'x-ms-client-request-id': '23258c34-4768-4207-875e-4f03c5b33e7b', 'x-ratelimit-remaining-tokens': '446929', 'Date': 'Fri, 02 Aug 2024 23:12:41 GMT'}
STATUS CODE -> 200
Status Code: 200
Response: {'choices': [{'content_filter_results': {'hate': {'filtered': False, 'severity': 'safe'}, 'self_harm': {'filtered': False, 'severity': 'safe'}, 'sexual': {'filtered': False, 'severity': 'safe'}, 'violence': {'filtered': False, 'severity': 'safe'}}, 'finish_reason': 'stop', 'index': 0, 'lo

## Analyzing Response

In [8]:
print("Rate Limit Info:", rate_limit_info)

Rate Limit Info: {'remaining-requests': '449', 'remaining-tokens': '446929', 'retry-after': None, 'prompt-tokens': 69, 'completion_tokens': 147, 'total_tokens': 216}


- **Remaining Requests**: The number of API calls that can still be made. In this case, there are 449 requests left.
- **Remaining Tokens**: The number of tokens that can still be generated. In this case, there are 443,538 tokens left.
- **Prompt Tokens**: The number of tokens used in the prompt for this API call. In this case, 192 tokens were used.
- **Completion Tokens**: The number of tokens generated in the completion for this API call. In this case, 938 tokens were generated.
- **Total Tokens**: The total number of tokens used in this API call. This is the sum of the prompt tokens and the completion tokens. In this case, 1,130 tokens were used.
- **Retry-After**: The time to wait before making another request. In this case, it is not specified (None).

## Checking Current Rate Limitations Before Execution

In [31]:
from utils.ml_logging import get_logger
from src.aoai.azure_openai import AzureOpenAIManager
from src.aoai.tokenizer import AzureOpenAITokenizer
from dotenv import load_dotenv

# Load environment variables from .env file
load_dotenv()
# Set up logger
logger = get_logger()

azure_openai_client = AzureOpenAIManager(
    api_key=os.getenv('AZURE_OPENAI_KEY_TEST'), 
    azure_endpoint=os.getenv('AZURE_OPENAI_API_ENDPOINT_TEST'), 
    chat_model_name=os.getenv('AZURE_AOAI_CHAT_MODEL_NAME_DEPLOYMENT_ID_TEST'), 
    api_version="2024-05-13"
)
tokenizer = AzureOpenAITokenizer()

In [39]:
import requests
import time
from typing import Dict, Optional, Tuple, List
from requests import Response
from threading import Lock

# Global variables to store current rate limits
# Fake global values obtained from previous calls
current_rate_limits = {
    "remaining_requests": 5,  
    "remaining_tokens": 300
}

rate_limit_lock = Lock()  # To safely update rate limits across threads

def update_rate_limits(headers: Dict[str, str]) -> None:
    """
    Update the global rate limit tracker based on the headers from the response.
    """
    global current_rate_limits
    with rate_limit_lock:
        current_rate_limits["remaining_requests"] = int(headers.get("x-ratelimit-remaining-requests", 0))
        current_rate_limits["remaining_tokens"] = int(headers.get("x-ratelimit-remaining-tokens", 0))
        logger.info(f"Updated rate limits: {current_rate_limits}")

def extract_rate_limit_and_usage_info(response: Response) -> Dict[str, Optional[int]]:
    """
    Extracts rate limiting information from the Azure Open AI response headers and usage information from the payload.

    :param response: The response object returned by a requests call.
    :return: A dictionary containing the remaining requests, remaining tokens, and usage information.
    """
    headers = response.headers
    usage = response.json().get("usage", {})
    update_rate_limits(headers)
    return {
        "remaining-requests": int(headers.get("x-ratelimit-remaining-requests", 0)),
        "remaining-tokens": int(headers.get("x-ratelimit-remaining-tokens", 0)),
        "retry-after": headers.get("retry-after"),
        "prompt-tokens": usage.get("prompt_tokens"),
        "completion-tokens": usage.get("completion_tokens"),
        "total-tokens": usage.get("total_tokens"),
    }

def call_azure_openai_chat_completions_api(
    deployment_id: str, body: Dict, api_version: str = "2023-11-01"
) -> Tuple[Optional[int], Optional[Dict], Optional[Dict[str, Optional[int]]]]:
    """
    Calls the Azure OpenAI API with the given parameters.

    :param deployment_id: The ID of the deployment to access.
    :param body: The body of the request for "post" method. Defaults to None.
    :param api_version: The API version to use. Defaults to "2023-11-01".

    :return: The status code and response from the API call, along with rate limit headers.
    """
    url = f"{azure_openai_client.azure_endpoint}/openai/deployments/{deployment_id}/chat/completions?api-version={api_version}"
    headers = {
        "Content-Type": "application/json",
        "api-key": azure_openai_client.api_key,
        "x-ms-useragent": "aoai-faq"
    }

    with requests.Session() as session:
        session.headers.update(headers)
        response = session.post(url, json=body)
        response.raise_for_status()

    rate_limit_headers = extract_rate_limit_and_usage_info(response)
    logger.debug(f"Rate limit headers: {rate_limit_headers}")
    return response.status_code, response.json(), rate_limit_headers

def can_make_request(body: Dict, model: str) -> bool:
    """
    Checks if the current request can be made based on the remaining rate limits.

    :param body: The body of the request to be sent.
    :param model: The model name used for estimating tokens.
    :return: True if the request can be made, False otherwise.
    """
    global current_rate_limits
    required_tokens = tokenizer.estimate_tokens_azure_openai(messages=body["messages"], model=model)

    with rate_limit_lock:
        logger.info(f"Current rate limits before check: {current_rate_limits}")
        logger.info(f"Required tokens for the request: {required_tokens}")

        can_make_request = True

        if current_rate_limits["remaining_requests"] is not None:
            if current_rate_limits["remaining_requests"] <= 0:
                logger.info("Cannot make request: No remaining requests.")
                can_make_request = False
            else:
                logger.info(f"Remaining requests: {current_rate_limits['remaining_requests']}")

        if current_rate_limits["remaining_tokens"] is not None:
            if current_rate_limits["remaining_tokens"] < required_tokens:
                logger.info("Cannot make request: Not enough remaining tokens.")
                can_make_request = False
            else:
                logger.debug(f"Remaining tokens: {current_rate_limits['remaining_tokens']}")

        if can_make_request:
            if current_rate_limits["remaining_requests"] is not None:
                current_rate_limits["remaining_requests"] -= 1
                logger.info(f"Decremented remaining requests: {current_rate_limits['remaining_requests']}")
            if current_rate_limits["remaining_tokens"] is not None:
                current_rate_limits["remaining_tokens"] -= required_tokens
                logger.info(f"Decremented remaining tokens: {current_rate_limits['remaining_tokens']}")
            logger.info(f"After decrement, rate limits: {current_rate_limits}")
            return True

    logger.info("Request cannot be made due to rate limits.")
    return False

In [40]:
if can_make_request(body, "gpt-4-0613"):
    status_code, response, rate_limit_info = call_azure_openai_chat_completions_api(
        deployment_id=azure_openai_client.chat_model_name,
        body=body,
        api_version="2024-02-15-preview",
    )
    print("Status Code:", status_code)
    print("Response:", response)
    print("Rate Limit Info:", rate_limit_info)
else:
    print("Rate limit exceeded, cannot make the request.")

2024-08-02 19:15:13,011 - micro - MainProcess - INFO     Current rate limits before check: {'remaining_requests': 5, 'remaining_tokens': 300} (2257159956.py:can_make_request:85)
2024-08-02 19:15:13,014 - micro - MainProcess - INFO     Required tokens for the request: 69 (2257159956.py:can_make_request:86)
2024-08-02 19:15:13,015 - micro - MainProcess - INFO     Remaining requests: 5 (2257159956.py:can_make_request:95)
2024-08-02 19:15:13,016 - micro - MainProcess - INFO     Decremented remaining requests: 4 (2257159956.py:can_make_request:107)
2024-08-02 19:15:13,017 - micro - MainProcess - INFO     Decremented remaining tokens: 231 (2257159956.py:can_make_request:110)
2024-08-02 19:15:13,017 - micro - MainProcess - INFO     After decrement, rate limits: {'remaining_requests': 4, 'remaining_tokens': 231} (2257159956.py:can_make_request:111)
2024-08-02 19:15:15,390 - micro - MainProcess - INFO     Updated rate limits: {'remaining_requests': 9, 'remaining_tokens': 6932} (2257159956.py:up

Status Code: 200
Response: {'choices': [{'content_filter_results': {'hate': {'filtered': False, 'severity': 'safe'}, 'protected_material_code': {'filtered': False, 'detected': False}, 'protected_material_text': {'filtered': False, 'detected': False}, 'self_harm': {'filtered': False, 'severity': 'safe'}, 'sexual': {'filtered': False, 'severity': 'safe'}, 'violence': {'filtered': False, 'severity': 'safe'}}, 'finish_reason': 'stop', 'index': 0, 'logprobs': None, 'message': {'content': 'Yes, several other Azure AI services also support customer-managed keys (CMKs) to enhance security and give customers more control over their data. These services include but are not limited to:\n\n1. **Azure Cognitive Services:** Many of the individual services within Azure Cognitive Services support customer-managed keys. This allows you to control the encryption of your data processed by these services.\n   \n2. **Azure Machine Learning:** Azure Machine Learning supports customer-managed keys for the en