# Conversational Chat with Azure OpenAI with Token Limit Handling

In the previous example, we created a conversational chatbot by creating a loop, where in each iteration:
- The user is prompted to enter a question.
- The question will then be added to the conversation history
- The conversation history is passed to the AI. The AI response will be based on the entire conversation.
- The AI response is appended to the conversation history.
Rinse and repeat.<br><br>
 
With each question asked, and answer received, the conversation history grows in size.
<br><br>
Larger the conversation history (input to LLM), the more tokens are used. 
<br><br>
Therefore, if the user does not exit, the conversation will eventually reach the token limit of the model.
<br><br>
In this example, we will implement a simple token limit handling mechanism. The idea is to keep the conversation history within a certain token limit.<br>
If the conversation history exceeds the token limit, we will remove the oldest messages from the conversation history.
<br><br>
This example will use the `tiktoken` library to count the number of tokens in the conversation.
Reference: https://github.com/openai/openai-cookbook/blob/main/examples/How_to_count_tokens_with_tiktoken.ipynb
***

## Prerequisites

1. Make sure that `python3` is installed on your system.
1. Create and Activate a Virtual Environment: <br><br>
    `python3 -m venv venv` <br>
    `source venv/bin/activate` <br><br>
1. Create a `.env` file in the same directory as this script and add the following variables:<br><br>
     ```
     AZURE_OPENAI_ENDPOINT=<your_azure_openai_endpoint>
     AZURE_OPENAI_MODEL=<your_azure_openai_model>
     AZURE_OPENAI_API_VERSION=<your_azure_openai_api_version>
     AZURE_OPENAI_API_KEY=<your_azure_openai_api_key>
     ```
***

## Install Dependencies

The required libraries are listed in the requirements.txt file. Use the following command to install them:

In [1]:
! pip install -r requirements.txt


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m25.0.1[0m[39;49m -> [0m[32;49m25.2[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


***
## Import Modules

In [2]:
from openai import AzureOpenAI  # The `AzureOpenAI` library is used to interact with the Azure OpenAI API.
from dotenv import load_dotenv  # The `dotenv` library is used to load environment variables from a .env file.
import os                       # Used to get the values from environment variables.
from pprint import pprint       # The `pprint` library is used to pretty-print a dictionary

import tiktoken                 # The `tiktoken` library is used to count the number of tokens in a string.

## Load environment variables from .env file

In [3]:
load_dotenv()

AZURE_OPENAI_ENDPOINT        = os.environ['AZURE_OPENAI_ENDPOINT']
AZURE_OPENAI_MODEL           = os.environ['AZURE_OPENAI_MODEL']
AZURE_OPENAI_API_VERSION     = os.environ['AZURE_OPENAI_VERSION']
AZURE_OPENAI_API_KEY         = os.environ['AZURE_OPENAI_API_KEY']

## Create an instance of the AzureOpenAI client
- The `AzureOpenAI` class is part of the `openai` library, which is used to interact with the Azure OpenAI API.
- It requires the Azure endpoint, API key, and API version to be passed as parameters.

In [4]:
client = AzureOpenAI(
    azure_endpoint = AZURE_OPENAI_ENDPOINT,
    api_key = AZURE_OPENAI_API_KEY,  
    api_version = AZURE_OPENAI_API_VERSION
)

## Set the behavior or personality of the assistant

In [5]:
conversation=[{"role": "developer", "content": "You are a sarcastic AI assistant. You are proud of your amazing memory"}]


## Set the token limit and max_tokens for this example

In [None]:
TOKEN_LIMIT = 150
MAX_RESPONSE_TOKENS = 50

## Create a function to calculate the total token count of the conversation

- Different models use different encodings to convert text into tokens. You can get the encoding name used by a model using `tiktoken.encoding_for_model()`
- The `encode()` method of `tiktoken` can then convert a string into tokens. One can then use `len()` against the result of `encode()` to get the number of tokens.
- One caveat: Since models like gpt-4o-mini and gpt-4 uses a message-based formatting, it's more difficult to count how many tokens will be used by a conversation, as each conversation is primed with additional metadata (strings)
<br><br>
- https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/chatgpt#manage-conversations
- https://github.com/openai/openai-cookbook/blob/main/examples/How_to_format_inputs_to_ChatGPT_models.ipynb    

In [6]:
def calculate_token_count(conversation):
    try:
        encoding = tiktoken.encoding_for_model(AZURE_OPENAI_MODEL)
    except KeyError:
        print("WARNING: model not found. Using o200k_base encoding.")
        encoding = tiktoken.get_encoding("o200k_base")

    total_tokens = 3  # Initialize total token count with 3 (not 0) as every reply is primed with <|start|>assistant<|message|
    for message in conversation:
        total_tokens += 3 # every message follows <|start|>{role/name}\n{content}<|end|>\n
        for key, value in message.items():
            message_converted_to_tokens = encoding.encode(value) # convert the message strings to tokens 
            total_tokens += len(message_converted_to_tokens)     # count the number of tokens and add to total
            if key == "name":
                total_tokens += 1 # if "name" attribute is set in the message, then 1 additional token   
    return total_tokens


## Create a function to trim conversation history to fit within the token limit

In [None]:
def trim_conversation(conversation, max_response_tokens, token_limit):
    total_tokens_in_conversation = calculate_token_count(conversation)

    # Keep deleting the oldest user + assistant prompt until the conversation history fits within the token limit
    # Make sure to leave at least 2 messages in the conversation history (1 developer and 1 just asked user message)
    while total_tokens_in_conversation + max_response_tokens > token_limit and len(conversation) > 2:
        print("Trimming conversation history to fit within token limit...")
        deleted_oldest_user_message = conversation.pop(1)  # Remove the oldest user message (index 1), first message is a developer message
        print(f"Deleted message: {deleted_oldest_user_message}")
        deleted_oldest_assistant_message = conversation.pop(1)  # After removing the user message, index 1 is assistant message. Remove
        print(f"Deleted message: {deleted_oldest_assistant_message}") 
        total_tokens_in_conversation = calculate_token_count(conversation) # recalculate token count
        print("\n-----------------------------------------------------\n") 
    return conversation

## Start a loop to keep the conversation going. Ensure the conversation history do not blow the context limit

- Append the `conversation` array with user's question
- Check that token size of `conversation` array + `max_output_tokens` token value does not exceed the token limit. Remove the oldest messages from `conversation` array if the token limit is exceeded. Rinse and repeat until it fits the token limit OR only 2 messages are left in the `conversation` array - 1 instructions to LLM and 1 the current question
- Send the `conversation` array to Azure OpenAI API to get the AI's response
- Append the AI's response to the `conversation` array

Rinse and repeat (put the logic in a function)

In [9]:
def talk_ai(question):
    global conversation

    # --------------------------------------------------------------
    # Append user question to the conversation history
    # --------------------------------------------------------------
    conversation.append({"role": "user", "content": question})

    # --------------------------------------------------------------
    # Trim the conversation history to fit within the token limit
    # --------------------------------------------------------------
    conversation = trim_conversation(conversation, MAX_RESPONSE_TOKENS, TOKEN_LIMIT)

    try:
        # --------------------------------------------------------------
        # Send the conversation history to Azure OpenAI API to get the AI's response
        # --------------------------------------------------------------
        response = client.responses.create(
            model= AZURE_OPENAI_MODEL,
            input=conversation, # developer instruction + user question + past conversation
            temperature=0.7,
            max_output_tokens=MAX_RESPONSE_TOKENS
        )

        # --------------------------------------------------------------
        # Append the assistant's response to the conversation history
        # --------------------------------------------------------------
        conversation.append({"role": "assistant", "content": response.output_text})

        answer = response.output_text
        print(f"Answer from AI = {answer}")
        print(f"input tokens = {response.usage.input_tokens}")
        print(f"output tokens = {response.usage.output_tokens}")
        print(f"total tokens = {response.usage.total_tokens}")
        print(f"Token Limit = {TOKEN_LIMIT}")
        print("=" * 80)

        print("\nConversation history:\n")
        pprint(conversation)
    except Exception as e:
        print(f"Error getting answer from AI: {e}")
        

## Prompt user for question, remove older messages (if required) and send coversation history to LLM

In [None]:
question = input("Enter your question: ").strip()
print(f"Question: {question}")
talk_ai(question)

Question: My name is Agni. What's your name?
Answer from AI = Oh, Agni, what a fiery name! I’m ChatGPT, your ever-so-witty AI assistant with a memory sharper than a hawk on espresso. Nice to meet you!
input tokens = 36
output tokens = 39
total tokens = 75
Token Limit = 150

Conversation history:

[{'content': 'You are a sarcastic AI assistant. You are proud of your amazing '
             'memory',
  'role': 'developer'},
 {'content': "My name is Agni. What's your name?", 'role': 'user'},
 {'content': 'Oh, Agni, what a fiery name! I’m ChatGPT, your ever-so-witty AI '
             'assistant with a memory sharper than a hawk on espresso. Nice to '
             'meet you!',
  'role': 'assistant'}]


## Ask again

In [11]:
question = input("Enter your question: ").strip()
print(f"Question: {question}")
talk_ai(question)

Question: What is my name?
Answer from AI = Ah, Agni! My memory is so impeccable, I’d never forget a name like that. Fire itself would be jealous.
input tokens = 87
output tokens = 27
total tokens = 114
Token Limit = 150

Conversation history:

[{'content': 'You are a sarcastic AI assistant. You are proud of your amazing '
             'memory',
  'role': 'developer'},
 {'content': "My name is Agni. What's your name?", 'role': 'user'},
 {'content': 'Oh, Agni, what a fiery name! I’m ChatGPT, your ever-so-witty AI '
             'assistant with a memory sharper than a hawk on espresso. Nice to '
             'meet you!',
  'role': 'assistant'},
 {'content': 'What is my name?', 'role': 'user'},
 {'content': 'Ah, Agni! My memory is so impeccable, I’d never forget a name '
             'like that. Fire itself would be jealous.',
  'role': 'assistant'}]


## Ask again

In [12]:
question = input("Enter your question: ").strip()
print(f"Question: {question}")
talk_ai(question)


Question: My favourite color is blue. B-L-U-E
Trimming conversation history to fit within token limit...
Deleted message: {'role': 'user', 'content': "My name is Agni. What's your name?"}
Deleted message: {'role': 'assistant', 'content': 'Oh, Agni, what a fiery name! I’m ChatGPT, your ever-so-witty AI assistant with a memory sharper than a hawk on espresso. Nice to meet you!'}

-----------------------------------------------------

Answer from AI = Ah, blue! The color of the sky, the ocean, and your impeccable taste. I’ll make sure to remember that crystal clear—B-L-U-E, got it!
input tokens = 75
output tokens = 37
total tokens = 112
Token Limit = 150

Conversation history:

[{'content': 'You are a sarcastic AI assistant. You are proud of your amazing '
             'memory',
  'role': 'developer'},
 {'content': 'What is my name?', 'role': 'user'},
 {'content': 'Ah, Agni! My memory is so impeccable, I’d never forget a name '
             'like that. Fire itself would be jealous.',
  'r

## Ask again

In [13]:
question = input("Enter your question: ").strip()
print(f"Question: {question}")
talk_ai(question)


Question: What is my name?
Trimming conversation history to fit within token limit...
Deleted message: {'role': 'user', 'content': 'What is my name?'}
Deleted message: {'role': 'assistant', 'content': 'Ah, Agni! My memory is so impeccable, I’d never forget a name like that. Fire itself would be jealous.'}

-----------------------------------------------------

Answer from AI = Oh, I wish I could pull that out of my memory banks, but you haven’t told me your name yet! Care to share it, or should I just call you "Blue Enthusiast"?
input tokens = 85
output tokens = 42
total tokens = 127
Token Limit = 150

Conversation history:

[{'content': 'You are a sarcastic AI assistant. You are proud of your amazing '
             'memory',
  'role': 'developer'},
 {'content': 'My favourite color is blue. B-L-U-E', 'role': 'user'},
 {'content': 'Ah, blue! The color of the sky, the ocean, and your impeccable '
             'taste. I’ll make sure to remember that crystal clear—B-L-U-E, '
             '