## 📚 Prerequisites

Ensure that your Azure Services are properly set up, your Conda environment is created, and your environment variables are configured as per the instructions in the [SETTINGS.md](SETTINGS.md) file.

## 📋 Table of Contents

This notebook assists in testing and retrieving the headers of Azure OpenAI, covering the following sections:

1. [**Setting Up Azure OpenAI Client**](#setting-up-azure-openai-client): Outlines the process of initializing the Azure OpenAI client.

2. [**Calling Azure OpenAI API**](#calling-azure-openai-api): Discusses how to make API calls to Azure OpenAI.

3. [**Extracting Headers and Payload Metadata**](#extracting-headers-and-payload-metadata): Explores how to extract headers and payload metadata from the API response.

4. [**Analyzing Rate Limit Info**](#analyzing-rate-limit-info): Details the steps to analyze the rate limit information from the API response.

For additional information, refer to the following resources:
- [AOAI API Documentation](https://learn.microsoft.com/en-us/azure/ai-services/openai/reference)

In [1]:
import os

# Define the target directory
target_directory = (
    r"C:\Users\pablosal\Desktop\gbbai-azure-aoai-faq"  # change your directory here
)

# Check if the directory exists
if os.path.exists(target_directory):
    # Change the current working directory
    os.chdir(target_directory)
    print(f"Directory changed to {os.getcwd()}")
else:
    print(f"Directory {target_directory} does not exist.")

Directory changed to C:\Users\pablosal\Desktop\gbbai-azure-aoai-faq


In [2]:
# Import the libraries
from dotenv import load_dotenv
from src.aoai.azure_openai import AzureOpenAIManager
from typing import Dict, Optional
from requests import Response
import requests

from utils.ml_logging import get_logger

# Load environment variables from .env file
load_dotenv()
# Set up logger
logger = get_logger()

## Setting Up Azure OpenAI Client

In [3]:
# Create an instance of the client. You can find it in src/aoai/azure_openai.py.
# It is essentially a wrapper using dependency injection to automate the initialization
# and most used API calls.
azure_openai_client = AzureOpenAIManager()

## Calling Azure OpenAI API

### Creating Helper Function
This function is designed to extract and build the metadata in JSON format. It's particularly useful for extracting rate limit and usage information.

In [4]:
def extract_rate_limit_and_usage_info(response: Response) -> Dict[str, Optional[int]]:
    """
    Extracts rate limiting information from the Azure Open API response headers and usage information from the payload.

    :param response: The response object returned by a requests call.
    :return: A dictionary containing the remaining requests, remaining tokens, and usage information
             including prompt tokens, completion tokens, and total tokens.
    """
    headers = response.headers
    usage = response.json().get("usage", {})
    return {
        "remaining-requests": headers.get("x-ratelimit-remaining-requests"),
        "remaining-tokens": headers.get("x-ratelimit-remaining-tokens"),
        "prompt-tokens": usage.get("prompt_tokens"),
        "completion_tokens": usage.get("completion_tokens"),
        "total_tokens": usage.get("total_tokens"),
    }

### Encapsulating Completions API Calls into a Function

In [5]:
azure_openai_client.api_key

'fa4384cb235b4b4781ee2af4c997f37b'

In [6]:
def call_azure_openai_chat_completions_api(
    deployment_id: str, method: str, body: dict = None, api_version: str = "2023-11-01"
):
    """
    Calls the Azure OpenAI API with the given parameters.

    :param deployment_id: The ID of the deployment to access.
    :param method: The HTTP method to use ("get" or "post").
    :param body: The body of the request for "post" method. Defaults to None.
    :param api_version: The API version to use. Defaults to "2023-11-01".

    :return: The status code and response from the API call, along with rate limit headers.
    """
    if method.lower() not in ["get", "post"]:
        logger.error("Invalid HTTP method. Expected 'get' or 'post'.")
        return None, None, {}

    url = f"{azure_openai_client.azure_endpoint}/openai/deployments/{deployment_id}/chat/completions?api-version={api_version}"
    headers = {
        "Content-Type": "application/json",
        "api-key": azure_openai_client.api_key,
    }

    with requests.Session() as session:
        session.headers.update(headers)

        try:
            if method.lower() == "get":
                response = session.get(url)
            else:  # method.lower() == "post"
                response = session.post(url, json=body)
            response.raise_for_status()  # Raises HTTPError for bad responses
        except requests.HTTPError as http_err:
            logger.error(f"HTTP error occurred: {http_err}")
            return response.status_code, http_err.response.json(), {}
        except Exception as err:
            logger.error(f"An error occurred: {err}")
            return None, None, {}

    # Extract rate limit headers and usage details
    rate_limit_headers = extract_rate_limit_and_usage_info(response)
    return response.status_code, response.json(), rate_limit_headers

## Extracting Headers and Payload Metadata

#### Constructing the Request

- **max_tokens**: Optional. Integer specifying the maximum number of tokens to generate. Default is 24.
- **temperature**: Optional. Number between 0 and 2 indicating the sampling temperature. Default is 1.
- **top_p**: Optional. Nucleus sampling parameter as a number between 0 and 1. Default is 1.
- **user**: Optional. A unique identifier for the end-user to help monitor and detect abuse.
- **n**: Optional. Integer for the number of completions to generate for each prompt. Default is 1.
- **presence_penalty**: Optional. Number between -2.0 and 2.0 to penalize new tokens based on presence in the text so far. Default is 0.
- **frequency_penalty**: Optional. Number between -2.0 and 2.0 to penalize new tokens based on frequency in the text so far. Default is 0.
- **messages**: Optional. An array of message objects.

You can learn more about the aoai API  [official documentation](https://learn.microsoft.com/en-us/azure/ai-services/openai/reference).

In [7]:
body = {
    "max_tokens": 24,
    "temperature": 1,
    "top_p": 1,
    "user": "",
    "n": 1,
    "presence_penalty": 0,
    "frequency_penalty": 0,
    "messages": [
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Does Azure OpenAI support customer managed keys?"},
        {
            "role": "assistant",
            "content": "Yes, customer managed keys are supported by Azure OpenAI.",
        },
        {"role": "user", "content": "Do other Azure AI services support this too?"},
        {
            "role": "assistant",
            "content": "Yes, other Azure AI services also support customer managed keys.",
        },
        {"role": "user", "content": "Can you tell me more about these services?"},
        {
            "role": "assistant",
            "content": "Sure, Azure AI services include Azure Cognitive Services, Azure Machine Learning, and more.",
        },
        {"role": "user", "content": "What is Azure Cognitive Services?"},
        {
            "role": "assistant",
            "content": "Azure Cognitive Services is a collection of APIs and services for building intelligent applications.",
        },
        {"role": "user", "content": "What is Azure Machine Learning?"},
        {
            "role": "assistant",
            "content": "Azure Machine Learning is a cloud-based service for building, training, and deploying machine learning models.",
        },
        {"role": "user", "content": "Thank you for the information."},
        {
            "role": "assistant",
            "content": "You're welcome! If you have any other questions, feel free to ask.",
        },
        {"role": "user", "content": "What other services does Azure offer?"},
        {
            "role": "assistant",
            "content": "Azure offers a wide range of services including computing, analytics, storage, and networking.",
        },
        {
            "role": "user",
            "content": "Can you tell me more about Azure's computing services?",
        },
        {
            "role": "assistant",
            "content": "Azure's computing services include virtual machines, container services, and serverless computing.",
        },
        {"role": "user", "content": "What is serverless computing?"},
        {
            "role": "assistant",
            "content": "Serverless computing is a cloud computing model where the cloud provider automatically manages the provisioning and scaling of servers.",
        },
        {
            "role": "user",
            "content": "That's interesting. Thank you for the information.",
        },
        {
            "role": "assistant",
            "content": "You're welcome! If you have any other questions, feel free to ask.",
        },
    ],
}

In [8]:
status_code, response, rate_limit_info = call_azure_openai_chat_completions_api(
    deployment_id=azure_openai_client.chat_model_name,
    method="post",
    body=body,
    api_version="2023-05-15",
)

# Print the status code, response, and rate limit info
print("Status Code:", status_code)
print("Response:", response)
print("Rate Limit Info:", rate_limit_info)

Status Code: 200
Response: {'id': 'chatcmpl-8m1ZcQYr6DQai7vYeqaG6vNFJurmm', 'object': 'chat.completion', 'created': 1706456484, 'model': 'gpt-4', 'choices': [{'finish_reason': 'stop', 'index': 0, 'message': {'role': 'assistant', 'content': "If you have any more questions or need further assistance, just let me know! I'm here to help."}}], 'usage': {'prompt_tokens': 333, 'completion_tokens': 22, 'total_tokens': 355}, 'system_fingerprint': 'fp_6d044fb900'}
Rate Limit Info: {'remaining-requests': '9', 'remaining-tokens': '9629', 'prompt-tokens': 333, 'completion_tokens': 22, 'total_tokens': 355}


## Analyzing Rate Limit Info

The rate limit information provides details about the usage of the API:

- **Remaining Requests**: The number of API calls that can still be made. In this case, there are 9 requests left.
- **Remaining Tokens**: The number of tokens that can still be generated. In this case, there are 9258 tokens left.
- **Prompt Tokens**: The number of tokens used in the prompt for this API call. In this case, 333 tokens were used.
- **Completion Tokens**: The number of tokens generated in the completion for this API call. In this case, 24 tokens were generated.
- **Total Tokens**: The total number of tokens used in this API call. This is the sum of the prompt tokens and the completion tokens. In this case, 357 tokens were used.

### Bonus: Programmatically Stop Call if Input Tokens Exceed Limit

In [9]:
from typing import Union, Optional, List, Dict
import tiktoken


def num_tokens_from_input(
    input: Union[str, List[Dict[str, str]]],
    encoding_name: Optional[str] = "cl100k_base",
) -> int:
    """
    Returns the total number of tokens in the input using the specified encoding.

    This function uses the Tiktoken library to count the number of tokens in the input.
    The input can be either a string or a list of dictionaries representing a conversation.
    Each message in the conversation should be a dictionary with a "content" key containing the message text.
    The encoding used for tokenization can be specified. If no encoding is specified,
    "cl100k_base" is used by default.

    Parameters:
    input (Union[str, List[Dict[str, str]]]): The input to count tokens in.
    encoding_name (Optional[str]): The name of the encoding to use for tokenization.
                                   Defaults to "cl100k_base".

    Returns:
    int: The total number of tokens in the input.
    """

    encoding = tiktoken.get_encoding(encoding_name)

    if isinstance(input, str):
        total_tokens = len(encoding.encode(input))
    elif isinstance(input, list):
        total_tokens = sum(
            len(encoding.encode(message["content"])) for message in input
        )
    else:
        raise ValueError(
            "Input must be either a string or a list of dictionaries representing a conversation."
        )

    return total_tokens

In [10]:
from typing import Tuple, Dict, Any, Optional, Union


def call_azure_openai_chat_completions_api_with_pre_check(
    deployment_id: str,
    method: str,
    body: Optional[Dict[str, Any]] = None,
    api_version: str = "2023-11-01",
) -> Tuple[Optional[int], Optional[Union[Dict[str, Any], str]], Dict[str, str]]:
    """
    Calls the Azure OpenAI API with the given parameters.

    This function performs a pre-insertion check to ensure that there are enough tokens remaining for the request.
    It uses the Tiktoken library to count the number of tokens in the request body and compares this with the remaining tokens from the rate limit info.
    If there are not enough tokens remaining, the function aborts the request and returns an error message.

    :param deployment_id: The ID of the deployment to access.
    :param method: The HTTP method to use ("get" or "post").
    :param body: The body of the request for "post" method. Defaults to None.
    :param api_version: The API version to use. Defaults to "2023-11-01".

    :return: The status code and response from the API call, along with rate limit headers.
    """
    if method.lower() not in ["get", "post"]:
        logger.error("Invalid HTTP method. Expected 'get' or 'post'.")
        return None, "Invalid HTTP method. Expected 'get' or 'post'.", {}
    # Pre-Check
    body_pre = {
        "max_tokens": 1,
        "messages": [{"role": "assistant", "content": ""}],
    }
    _, _, rate_limit_info = call_azure_openai_chat_completions_api(
        deployment_id, method=method, body=body_pre, api_version=api_version
    )
    logger.info(f"Rate Limit Info: {rate_limit_info}")

    tokens_needed_for_request = (
        num_tokens_from_input(body["messages"], "cl100k_base") if body else 0
    )
    logger.info(f"Tokens Needed for Request: {tokens_needed_for_request}")

    if int(rate_limit_info["remaining-tokens"]) >= tokens_needed_for_request:
        logger.info(
            f"Enough tokens remaining, proceed with the request. Remaining tokens: {rate_limit_info['remaining-tokens']}, Tokens needed for request: {tokens_needed_for_request}"
        )
        (
            status_code,
            response,
            rate_limit_headers,
        ) = call_azure_openai_chat_completions_api(
            deployment_id, method=method, body=body, api_version=api_version
        )
        return status_code, response, rate_limit_headers
    else:
        logger.error(
            f"Not enough tokens remaining for the request. Remaining tokens: {rate_limit_info['remaining-tokens']}, Tokens needed for request: {tokens_needed_for_request}"
        )
        return (
            None,
            "Not enough tokens remaining, aborting or delaying the request.",
            {},
        )

In [11]:
(
    status_code,
    response,
    rate_limit_info,
) = call_azure_openai_chat_completions_api_with_pre_check(
    deployment_id=azure_openai_client.completion_model_name,
    method="post",
    body=body,
    api_version="2023-05-15",
)

# Print the status code, response, and rate limit info
print("Status Code:", status_code)
print("Response:", response)
print("Rate Limit Info:", rate_limit_info)

2024-01-28 09:41:27,433 - micro - MainProcess - INFO     Rate Limit Info: {'remaining-requests': '119', 'remaining-tokens': '119999', 'prompt-tokens': 7, 'completion_tokens': 1, 'total_tokens': 8} (1966967700.py:call_azure_openai_chat_completions_api_with_pre_check:35)
2024-01-28 09:41:27,690 - micro - MainProcess - INFO     Tokens Needed for Request: 246 (1966967700.py:call_azure_openai_chat_completions_api_with_pre_check:40)
2024-01-28 09:41:27,690 - micro - MainProcess - INFO     Enough tokens remaining, proceed with the request. Remaining tokens: 119999, Tokens needed for request: 246 (1966967700.py:call_azure_openai_chat_completions_api_with_pre_check:43)


Status Code: 200
Response: {'id': 'chatcmpl-8m1ZgxWEW7Wp1FPUJqSsktID80hfc', 'object': 'chat.completion', 'created': 1706456488, 'model': 'gpt-35-turbo-16k', 'choices': [{'finish_reason': 'length', 'index': 0, 'message': {'role': 'assistant', 'content': 'Sure! Azure OpenAI supports customer managed keys. With customer managed keys, you have control over the encryption keys used to'}}], 'usage': {'prompt_tokens': 333, 'completion_tokens': 24, 'total_tokens': 357}}
Rate Limit Info: {'remaining-requests': '118', 'remaining-tokens': '119975', 'prompt-tokens': 333, 'completion_tokens': 24, 'total_tokens': 357}
