# Token usage tracking

LangChain offers you to track the number of consumed tokens, measure the latency of the LLM responses, and even estimate the usage cost of your application via the highly customizable `token_usage` module. You can analyze the token usage of your application directly from the code, or send the usage data to a metrics repository such as Amazon CloudWatch.

The `token_usage` module splits the task of collecting the usage metrics from the LLMs via standard [Callbacks](..) in the `handlers` submodule, and the processing of the collected metrics in the `reporters` submodule.

Some LLM APIs (for example, OpenAI) sends you back the number of consumed tokens after each call. Other APIs do not provide such usage information. In this case you can configure a `LocalTokenUsageCallbackHandler` instance to track the token usage directly on client side (your application's code).

## Track the usage of OpenAI LLM

This example shows you how to track the token usage of your LLM model directly in your application.

In [1]:
import os, getpass
if "OPENAI_API_KEY" not in os.environ:
    os.environ["OPENAI_API_KEY"] = getpass.getpass("Enter your OpenAI API key:")

In [2]:
from langchain.chat_models import ChatOpenAI
from langchain.prompts import PromptTemplate
from langchain.chains import LLMChain
from langchain.callbacks.token_usage.handlers import OpenAITokenUsageCallbackHandler
from langchain.callbacks.token_usage.reporters import LocalStatsReporter

reporter = LocalStatsReporter()
handler = OpenAITokenUsageCallbackHandler(reporter)

llm = ChatOpenAI(model="gpt-3.5-turbo", callbacks=[handler])
prompt = PromptTemplate.from_template("Which city is the capital of {country}?")

chain = LLMChain(llm=llm, prompt=prompt)
print(chain.run(country="Italy"))

reporter

The capital of Italy is Rome.


Local stats report:
  total_tokens=22
  prompt_tokens=15
  completion_tokens=7
  successful_requests=1
  total_cost=0.00003650

## Send the token usage to Amazon CloudWatch Metrics

[Amazon CloudWatch Metrics](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/working_with_metrics.html) is a metrics database that lets you store, aggregate and analyze the performance and telemetric data of your system, including your LLM usage. 

To successfully execute the following cell, you should [create an AWS account](https://docs.aws.amazon.com/accounts/latest/reference/manage-acct-creating.html), [install boto3](https://boto3.amazonaws.com/v1/documentation/api/latest/guide/quickstart.html) and [configure your AWS credentials for boto3](https://boto3.amazonaws.com/v1/documentation/api/latest/guide/credentials.html). 

You can configure any CloudWatch Metrics namespace and dimensions for your application. Some additional dimensions, like the model name and the caller id, will be injected to the dimensions list. For more information about namespace and dimensions, refer to [Amazon CloudWatch concepts docs](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/cloudwatch_concepts.html).

After successfully running the following cell, you should be able to see the recorded metrics in the [CloudWatch Metrics console](https://console.aws.amazon.com/cloudwatch/home).

In [None]:
# %pip install boto3

In [4]:
from langchain.callbacks.token_usage.reporters import CloudWatchTokenUsageReporter

reporter = CloudWatchTokenUsageReporter("openai_token_usage", {"project": "toke_usage_test"})
handler = handler = OpenAITokenUsageCallbackHandler(reporter)

llm = ChatOpenAI(model="gpt-3.5-turbo", callbacks=[handler])
prompt = PromptTemplate.from_template("Which city is the capital of {country}?")

chain = LLMChain(llm=llm, prompt=prompt)
print(chain.run(country="Italy"))

The capital city of Italy is Rome.


## Track usage of LLMs that do not return token consumption

This cell shows you how to configure token tracking for LLMs that do not provide token usage info via their API. As an example, we will track the token usage of the Anthropic Claude-v2 model in [Amazon Bedrock service](https://aws.amazon.com/bedrock/).

You can easily customize `LocalTokenUsageCallbackHandler` to any model providing a token counter and an optional cost function.

In [None]:
# %pip install anthropic boto3

In [6]:
import boto3

from langchain.llms import Bedrock
from langchain.utilities.anthropic import get_num_tokens_anthropic
from langchain.callbacks.token_usage.handlers import LocalTokenUsageCallbackHandler

MODEL_NAME = "anthropic.claude-v2"

def bedrock_anthropic_claude_cost_func(num_input_tokens: int, num_output_tokens: int) -> float:
    # See https://aws.amazon.com/bedrock/pricing/
    return num_input_tokens * 0.01102 / 1000.0 + num_output_tokens * 0.03268 / 1000.0

reporter = LocalStatsReporter()

handler = LocalTokenUsageCallbackHandler(
    reporter=reporter,
    model_name=MODEL_NAME,
    caller_id="test_user",
    token_counter_func=get_num_tokens_anthropic,
    cost_func=bedrock_anthropic_claude_cost_func,
)

session = boto3.Session(region_name="us-east-1")
llm = Bedrock(
    client=session.client("bedrock-runtime"),
    model_id=MODEL_NAME,
    model_kwargs={
        "max_tokens_to_sample": 4096,
        "temperature": 0.2,
    },
    callbacks=[handler],
)

prompt = PromptTemplate.from_template("Which city is the capital of {country}?")

chain = LLMChain(llm=llm, prompt=prompt)
print(chain.run(country="Italy"))

reporter

 Rome is the capital and largest city of Italy.


Local stats report:
  total_tokens=18
  prompt_tokens=8
  completion_tokens=10
  successful_requests=1
  total_cost=0.00041496