# Token Usage Cost and Sensitivity Analysis for GPT-4
In this notebook, we'll calculate the total token cost of using a GPT-4 model for a given AI application and perform a **sensitivity analysis** to see how changes in different variables (e.g., daily requests, token usage) affect the total cost. Additionally, we will demonstrate how **prompt engineering** can reduce costs by optimizing token usage.

In [1]:
# Import necessary libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns

# GPT pricing parameters
prompt_cost_per_1k_tokens = 0.03  # Cost per 1,000 tokens for input (prompt)
completion_cost_per_1k_tokens = 0.06  # Cost per 1,000 tokens for output (completion)

# Base parameters for token usage
base_daily_requests = 1000  # Number of API requests per day
base_prompt_tokens = 300  # Average tokens per prompt
base_completion_tokens = 500  # Average tokens in GPT response
days = 30  # Duration of usage in days

# Function to calculate costs
def calculate_token_cost(daily_requests, prompt_tokens, completion_tokens):
    total_prompt_tokens = prompt_tokens * daily_requests * days
    total_completion_tokens = completion_tokens * daily_requests * days
    prompt_cost = (total_prompt_tokens / 1000) * prompt_cost_per_1k_tokens
    completion_cost = (total_completion_tokens / 1000) * completion_cost_per_1k_tokens
    total_cost = prompt_cost + completion_cost
    return total_cost, prompt_cost, completion_cost

# Step 1: Baseline cost calculation
baseline_cost, baseline_prompt_cost, baseline_completion_cost = calculate_token_cost(
    base_daily_requests, base_prompt_tokens, base_completion_tokens
)

print("Baseline Cost Analysis:")
print(f"Total Token Cost: ${baseline_cost:,.2f}")
print(f"Prompt Cost: ${baseline_prompt_cost:,.2f}")
print(f"Completion Cost: ${baseline_completion_cost:,.2f}\n")

# Step 2: Sensitivity analysis
# Analyze the impact of daily requests and token usage
daily_requests_range = np.arange(500, 2001, 500)  # Vary daily requests from 500 to 2000
prompt_tokens_range = np.arange(100, 501, 100)  # Vary prompt token size from 100 to 500
completion_tokens_range = np.arange(200, 801, 100)  # Vary completion token size from 200 to 800

# Store results
cost_matrix = []

print("Sensitivity Analysis Results:")
print("Daily Requests | Prompt Tokens | Completion Tokens | Total Cost")
for daily_requests in daily_requests_range:
    for prompt_tokens in prompt_tokens_range:
        for completion_tokens in completion_tokens_range:
            total_cost, _, _ = calculate_token_cost(daily_requests, prompt_tokens, completion_tokens)
            cost_matrix.append((daily_requests, prompt_tokens, completion_tokens, total_cost))
            print(f"{daily_requests:<15}{prompt_tokens:<15}{completion_tokens:<19}${total_cost:,.2f}")

# Step 3: Impact of Prompt Engineering
# Reducing prompt and completion tokens by 30%
optimized_prompt_tokens = base_prompt_tokens * 0.7
optimized_completion_tokens = base_completion_tokens * 0.7

optimized_cost, optimized_prompt_cost, optimized_completion_cost = calculate_token_cost(
    base_daily_requests, optimized_prompt_tokens, optimized_completion_tokens
)

print("\nPrompt Engineering Impact:")
print(f"Original Total Cost: ${baseline_cost:,.2f}")
print(f"Optimized Total Cost (30% reduction in tokens): ${optimized_cost:,.2f}")
print(f"Cost Savings: ${baseline_cost - optimized_cost:,.2f} ({((baseline_cost - optimized_cost) / baseline_cost) * 100:.2f}%)")

# Step 4: Visualization of Sensitivity Analysis
# Convert sensitivity results into a heatmap
# Create a DataFrame for heatmap visualization
sensitivity_df = pd.DataFrame(cost_matrix, columns=["Daily Requests", "Prompt Tokens", "Completion Tokens", "Total Cost"])

# Pivot data for heatmap (e.g., vary prompt tokens and completion tokens for a fixed daily request)
pivot_data = sensitivity_df[sensitivity_df["Daily Requests"] == base_daily_requests].pivot(
    "Completion Tokens", "Prompt Tokens", "Total Cost"
)

# Plot heatmap
plt.figure(figsize=(8, 6))
sns.heatmap(pivot_data, annot=True, fmt=".0f", cmap="coolwarm", cbar_kws={"label": "Total Cost ($)"})
plt.title(f"Sensitivity Analysis: Total Cost (Daily Requests = {base_daily_requests})")
plt.xlabel("Prompt Tokens")
plt.ylabel("Completion Tokens")
plt.show()


### Instructions for Running the Notebook:
1. Copy the above code into a new Google Colab notebook or any Jupyter notebook environment.
2. Run each cell sequentially to see the baseline cost analysis, sensitivity analysis, and the impact of prompt engineering.
3. The sensitivity analysis will print the cost breakdown for various combinations of daily requests, prompt tokens, and completion tokens.
4. A heatmap will visualize how the total cost changes with different configurations.
5. You'll also see how reducing token usage (via prompt engineering) can save a significant portion of the cost.