##### Copyright 2025 Google LLC.

# @title Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.



# Cost Estimation and Health Monitoring with Gemini

<a target="_blank" href="https://colab.research.google.com/github/google-gemini/cookbook/blob/main/examples/Cost_Estimation_And_Health_Monitoring.ipynb"><img src="https://colab.research.google.com/assets/colab-badge.svg" height=30/></a>

## Overview

Cost observability is key for scaling Gemini API applications. Without tracking token usage and costs, you can't optimize your spending or identify expensive operations.

This notebook demonstrates how to build an observability layer for Gemini API applications. You will learn how to:

1. Extract token usage metadata from API responses
2. Calculate real-time USD costs based on model pricing
3. Perform health checks to verify API availability and quota

By the end of this notebook, you'll have a reusable class that you can integrate into any Gemini API application to monitor costs and health.

In [None]:
## Setup

First, install the Gemini API Python library.

%pip install -U -q "google-genai"

In [None]:
### Grab an API Key

Before you can use the Gemini API, you must first obtain an API key. If you don't already have one, create a key with one click in Google AI Studio.

<a class="button button-primary" href="https://aistudio.google.com/app/apikey" target="_blank" rel="noopener noreferrer">Get an API key</a>

In Colab, add the key to the secrets manager under the "üîë" in the left panel. Give it the name `GOOGLE_API_KEY`.

In [None]:
from google import genai
from google.genai import types
from typing import Dict
from google.colab import userdata

# Get API key from Colab secrets
api_key = userdata.get('GOOGLE_API_KEY')
client = genai.Client(api_key=api_key)
print("‚úÖ API key configured successfully!")

In [None]:
## Cost Estimator Class

The `GeminiObservability` class provides a simple way to track costs and monitor the health of your Gemini API usage. It handles different pricing for different models automatically.

In [None]:
# @title
class GeminiObservability:
    """A class to monitor costs and health of Gemini API usage."""

    def __init__(self, client):
        """
        Initialize the observability class.

        Args:
            client: The genai.Client instance
        """
        self.client = client
        # Prices per 1 million tokens (Example 2025 rates)
        # Update these prices based on current pricing at https://ai.google.dev/pricing
        self.prices = {
            "gemini-2.5-flash": {"input": 0.075, "output": 0.30},
            "gemini-2.5-pro": {"input": 3.50, "output": 10.50},
            "gemini-2.5-flash-lite": {"input": 0.0375, "output": 0.15},
            "gemini-3-flash-preview": {"input": 0.10, "output": 0.40},
            "gemini-3-pro-preview": {"input": 3.50, "output": 10.50},
        }

    def estimate_cost(self, usage_metadata, model_name: str) -> float:
        """
        Calculate the cost in USD based on token usage.

        Args:
            usage_metadata: The usage_metadata object from a Gemini API response
            model_name: The name of the model used (e.g., "gemini-2.5-flash")

        Returns:
            The estimated cost in USD
        """
        if model_name not in self.prices:
            print(f"Warning: No pricing data for model {model_name}. Returning 0.0")
            return 0.0

        # Calculate input cost (prompt tokens)
        input_cost = (usage_metadata.input_token_count / 1_000_000) * self.prices[model_name]["input"]

        # Calculate output cost (response tokens)
        output_cost = (usage_metadata.output_token_count / 1_000_000) * self.prices[model_name]["output"]

        return input_cost + output_cost

    def check_health(self, model_name: str = "gemini-2.5-flash") -> bool:
        """
        Perform a simple health check by making a minimal API call.

        Args:
            model_name: The model to test (default: "gemini-2.5-flash")

        Returns:
            True if the API is healthy, False otherwise
        """
        try:
            response = self.client.models.generate_content(
                model=model_name,
                contents="ping",
                config=types.GenerateContentConfig(max_output_tokens=1)
            )
            return True
        except exceptions.ResourceExhausted as e:
            print(f"Health Check Failed: Quota limit reached. {e}")
            return False
        except Exception as e:
            print(f"Health Check Failed: {e}")
            return False

    def get_usage_summary(self, usage_metadata, model_name: str) -> Dict:
        """
        Get a summary of token usage and cost.

        Args:
            usage_metadata: The usage_metadata object from a Gemini API response
            model_name: The name of the model used

        Returns:
            A dictionary with usage details and cost
        """
        cost = self.estimate_cost(usage_metadata, model_name)
        return {
            "model": model_name,
            "input_tokens": usage_metadata.input_token_count,
            "output_tokens": usage_metadata.output_token_count,
            "total_tokens": usage_metadata.input_token_count + usage_metadata.output_token_count,
            "estimated_cost_usd": round(cost, 6)
        }

## Example Usage

Let's see the observability class in action. First, create an instance of the class.

In [None]:
# Create an instance of the observability class
observability = GeminiObservability(client)


In [None]:
MODEL_ID = "gemini-2.5-flash"  # @param ["gemini-2.5-flash-lite", "gemini-2.5-flash", "gemini-2.5-pro", "gemini-3-flash-preview", "gemini-3-pro-preview"] {"allow-input": true, "isTemplate": true}

# Check if the API is healthy
is_healthy = observability.check_health(MODEL_ID)
print(f"API Health Status: {'‚úÖ Healthy' if is_healthy else '‚ùå Unhealthy'}")


### Cost Estimation Example

Make a simple API call and track the cost.


In [None]:
# Make a simple API call
response = client.models.generate_content(
    model=MODEL_ID,
    contents="Explain quantum computing in one sentence."
)

# Get usage metadata from the response
usage_metadata = response.usage_metadata

# Calculate and display the cost
summary = observability.get_usage_summary(usage_metadata, MODEL_ID)
print("Usage Summary:")
for key, value in summary.items():
    print(f"  {key}: {value}")


### Tracking Multiple Requests

You can track costs across multiple API calls to monitor your total spending.


In [None]:
# Example: Track costs across multiple requests
total_cost = 0.0
queries = [
    "What is machine learning?",
    "Explain neural networks briefly.",
    "What is the difference between AI and ML?"
]

print("Tracking costs for multiple queries:\n")
for i, query in enumerate(queries, 1):
    response = client.models.generate_content(
        model=MODEL_ID,
        contents=query
    )
    usage_metadata = response.usage_metadata
    cost = observability.estimate_cost(usage_metadata, MODEL_ID)
    total_cost += cost

    total_tokens = usage_metadata.input_token_count + usage_metadata.output_token_count
    print(f"Query {i}: {query[:50]}...")
    print(f"  Tokens: {total_tokens} | Cost: ${cost:.6f}\n")

print(f"Total cost for all queries: ${total_cost:.6f}")


## Next Steps

Now that you have a working cost estimation and health monitoring system, you can:

1. **Integrate into your applications**: Add the `GeminiObservability` class to your production code to track costs in real-time
2. **Set up alerts**: Monitor total costs and set up alerts when spending exceeds thresholds
3. **Optimize usage**: Use the token counts to identify expensive operations and optimize your prompts
4. **Update pricing**: Keep the pricing dictionary up-to-date with current rates from [Google AI Pricing](https://ai.google.dev/pricing)

For more information about the Gemini API, check out the [quickstarts](https://github.com/google-gemini/cookbook/tree/main/quickstarts) and other [examples](https://github.com/google-gemini/cookbook/tree/main/examples).
