1 token = ~4 chars,
1 token = ~3/4 word,
100 tokens = ~75 words

Token usage can change based on which chat model we use, for example:

- gpt-3.5-turbo: TPM=90k
- gpt-3.5-turbo-16k-0613: TPM=180k

> TPM: Tokens per minute

Token limit for models is the threshold where the prompt and completion must be in the set boundaries. For example gpt-3.5-turbo token limit is 4096, so if we had a prompt that had 4000 tokens, the completion from the model would be 96 tokens at most etc.

- Low Usage: In low-usage scenarios, users tend to interact minimally with a language model. This could involve short queries or commands, such as asking simple questions or issuing basic instructions. In terms of token count, low usage could range from around 10 to 100 tokens per interaction.

- Medium Usage: Medium-usage scenarios involve more involved conversations or requests, where users engage in longer interactions with the language model. This could include discussing a topic in more depth, providing additional context, or having a back-and-forth conversation. In terms of token count, medium usage might range from approximately 100 to 500 tokens per interaction.

- High Usage: High-usage scenarios involve extensively utilizing the language model, such as conducting lengthy conversations, generating multiple paragraphs of text, or making complex queries that require substantial context and detailed responses. In terms of token count, high usage could range from around 500 to several thousand tokens per interaction.

> Tokenizer of openai can be found [here](https://platform.openai.com/tokenizer)

In [6]:
import os
import sys
import logging
import re
import openai
import zulip
from dotenv import load_dotenv
import tiktoken
import sqlite3
import datetime

In [7]:
load_dotenv()
openai.api_key=os.environ['OPENAI_API_KEY']

In [8]:
LOW_THRES = 100
MED_THRES = 500
CONVO_THRES = MED_THRES #Threshold to limit conversation history

delimiter = "###"

def get_completion_with_tokens(messages, prompt, model='gpt-3.5-turbo'): #tuple: str, dict: {"prompt_tokens": #, "completion_tokens": #, "total_tokens": #}
    messages.append({"role": "user", "content": prompt})
    response = openai.ChatCompletion.create(
        model=model,
        messages=messages,
        temperature=0
    )
    messages.append(response.choices[0].message)
    return response.choices[0].message["content"], response.usage

def get_completion(messages, prompt, model='gpt-3.5-turbo'): #str
    r, _ = get_completion_with_tokens(messages, prompt, model)
    return r

def count_token_history(messages, model='gpt-3.5-turbo'): #int
    try:
        encoding = tiktoken.encoding_for_model(model)
    except:
        encoding = tiktoken.get_encoding("cl100k_base")
    
    tokens = 0
    for message in messages: #role: ###, content: ###
        tokens += len(encoding.encode(message["content"]))
    
    return tokens

def usage_level(tokens):
    if tokens < LOW_THRES:
        return "LOW"
    elif tokens < MED_THRES:
        return "MEDIUM"
    else:
        return "HIGH"

def sum_history(history, model='gpt-3.5-turbo'):
    text = ""
    for m in history[:-2]:
        text += m["content"] + " "
    
    text = text.rstrip()

    prompt = f"""
    Summarize the following text delimited by {delimiter}. \
    Keep your summarization short, but do not loose important information. \
    {delimiter}{text}{delimiter}
    """
    
    nhistory = [get_completion([], prompt, model)]
    nhistory.extend(history[:-2])
    return nhistory

def handle_new(messageHistory, nPrompt, model='gpt-3.5-turbo'):
    response, usage = get_completion_with_tokens(messages=messageHistory, prompt=nPrompt, model=model)
    
    if usage["total_tokens"] > CONVO_THRES:
        messageHistory = sum_history(messageHistory, model)
    
    return response, messageHistory


In [None]:
history = [{"role": "system", "content": "You are a friendly chatbot that can hold conversations for a long period and summarizes message history efficiently."}]

def print_history(history):
    for h in history:
        print("{}: {}".format(h["role"], h["content"]))

user_in = input("--> ")
while user_in != "quit":
    if user_in == "history":
        print_history(history)
    else:
        response, history = handle_new(history, user_in)
        print("AI: {}".format(response))
    user_in = input("--> ")