## How Cost is Calculated

Welcome! This notebook provides the practical demonstration for our lecture on cost calculation. We've discussed the theory: API usage is priced on a **pay-per-token** basis, and the cost is a function of both the **input tokens** you send and the **output tokens** you receive.

The formula is:
`Total Cost = (Input Tokens × Price per Input Token) + (Output Tokens × Price per Output Token)`

Now, let's see how to apply this in practice. The most accurate way to calculate the cost of a specific API call is to use the `usage` object that is returned in the API's response. This object gives you the precise token counts, taking all guesswork out of the equation.

## Defining the Pricing Structure

First, let's define the pricing for the model we're using, `gpt-4o-mini`. It's a best practice to define these values as constants in your application so you can easily update them if prices change.

> Note: The following prices are for gpt-4o-mini as of August 2025. Always check the official OpenAI pricing page for the most current rates. Prices are typically listed per 1 million tokens.
>

In [None]:
import litellm
from textwrap import dedent
from dotenv import load_dotenv

load_dotenv()

print("Libraries and environment variables loaded successfully.")

In [None]:
MODEL_PRICING = {
    "gpt-4o-mini": {
        "input_per_1m": 0.15,
        "output_per_1m": 0.60
    }
}

MODEL_NAME = "openai/gpt-4o-mini"

price_info = MODEL_PRICING[MODEL_NAME.split("/")[-1]]
PRICE_PER_INPUT_TOKEN = price_info["input_per_1m"] / 1_000_000
PRICE_PER_OUTPUT_TOKEN = price_info["output_per_1m"] / 1_000_000

print(f"Pricing for model: {MODEL_NAME}")
print(f"Price per input token:\t${PRICE_PER_INPUT_TOKEN:.10f}")
print(f"Price per output token:\t${PRICE_PER_OUTPUT_TOKEN:.10f}")

## Calculating Cost from API Response

Now, we'll make a simple, low-cost API call. Our calculation logic will pull the correct prices from the dictionary we just defined. We'll then inspect the `usage` attribute from the API response to get the exact token counts for our transaction.

This is the ideal method for logging, auditing, and managing costs in a production application, as it's both accurate and scalable.

In [None]:
def get_completion(prompt, model, max_tokens=20):
    print("--- Getting completion from LiteLLM ---")

    response = litellm.completion(
        model=model,
        messages=[
            {
                "role": "system",
                "content": "You are a helpful travel assistant."
            },
            {
                "role": "user",
                "content": prompt
            }
        ],
        max_tokens=max_tokens
    )

    return response

def analyze_cost(response):
    if not response:
        print("Cannot analyze cost of a failed API call!")
        return

    raw_model_name = response.model

    if "/" in raw_model_name:
        raw_model_name = raw_model_name.split("/")[-1]

    model_used = None
    
    for model in MODEL_PRICING.keys():
        if raw_model_name.startswith(model):
            model_used = model
            break

    price_info = MODEL_PRICING[model_used]

    if not price_info:
        print(f"WARNING: Pricing for model '{model_used}' not found!")
        return

    usage_data = response.usage
    input_tokens = usage_data.prompt_tokens
    output_tokens = usage_data.completion_tokens

    price_per_input = price_info["input_per_1m"] / 1_000_000
    price_per_output = price_info["output_per_1m"] / 1_000_000

    input_cost = input_tokens * price_per_input
    output_cost = output_tokens * price_per_output
    total_cost = input_cost + output_cost

    print(f"\n--- Cost breakdown for model: {model_used} ---")
    print(f"Model response: {response.choices[0].message.content}")
    print(f"Input tokens:\t{input_tokens}\t| Cost: ${input_cost:.8f}")
    print(f"Output tokens:\t{output_tokens}\t| Cost: ${output_cost:.8f}")
    print(f"Total cost of API call: ${total_cost:.8f}")

    return total_cost, input_cost, output_cost

user_prompt = "What is the capital of France, and what is it famous for?"

response = get_completion(
    user_prompt,
    model=MODEL_NAME
)

In [None]:
analyze_cost(response)

## Conversation Cost Analysis

Now let's explore how costs accumulate during a multi-turn conversation. Each API call includes the entire conversation history, which means the input token count (and therefore cost) increases with each exchange.

We'll simulate a 4-round conversation to demonstrate this cost escalation pattern.

In [None]:
def simulate_conversation():
    conversation_history = [
        {
            "role": "system",
            "content": dedent("""
            You are a senior software engineer well versed in Python.
            Your answers are comprehensive and intuitive to follow.
            """)
        }
    ]
    
    conversation_rounds = [
        "What is Python and what makes it popular for development?",
        "How do I create a virtual environment in that language?", 
        "What are the top 3 most important libraries for beginners?",
        "Can you show me a simple example of using one of those libraries?"
    ]

    total_cumulative_cost = 0

    for round_num, user_message in enumerate(conversation_rounds, start=1):
        print(f"--- Round {round_num} ---")
        print(f"User message: {user_message}")

        conversation_history.append({
            "role": "user",
            "content": user_message
        })

        response = litellm.completion(
            model=MODEL_NAME,
            messages=conversation_history
        )

        assistant_message = response.choices[0].message.content

        conversation_history.append({
            "role": "assistant",
            "content": assistant_message
        })

        print()
        print("+" * 50)
        total_cost = analyze_cost(response)[0]
        total_cumulative_cost += total_cost
        print("+" * 50)
        print()
        
    print("\n=== SUMMARY ===")
    print(f"Total conversation cost: ${total_cumulative_cost:.8f}")
    print(f"Average cost per round: ${total_cumulative_cost/len(conversation_rounds):.8f}")
    print(f"Final conversation length: {len(conversation_history)} messages")

simulate_conversation()