---
title: "Counting Tokens, Counting Impact"
description: "A tiny token-usage decorator for LLM calls"
author: "Eric Zou"
date: "11/27/2025"
categories:
  - LLMs
  - Tooling
  - Experiments
---


## Why I Care About Tokens At All

Up to this point, I've mostly treated API calls as opaque little boxes: I send some text in, I get some text out, and I vaguely know I'm spending "tokens" in the process. That's fine for toy experiments, but once these conversations get longer and more complicated, it becomes a lot less obvious what I'm actually burning through.

In this post, I want to treat token usage as a first-class signal instead of an afterthought. Rather than hard-coding a particular notion of "cost" (dollars, latency, carbon, whatever), I'll build a small wrapper that:

- keeps track of **how many tokens** a function is using (split by type), and
- lets me plug in my own **impact calculus** that turns those raw counts into something I care about.

The end result is a Python decorator I can drop onto any function that makes LLM calls, with a long-lived object quietly accumulating stats in the background while I run my experiments.


In [1]:
from __future__ import annotations

from collections import defaultdict
from dataclasses import dataclass
from typing import Any, Callable, Dict, Mapping

from openai import OpenAI
from dotenv import load_dotenv

# Load API key from the project root .env
_ = load_dotenv(".env")
client = OpenAI()


@dataclass
class TokenUsageSnapshot:
    """A simple view of one LLM call's token usage.

    The OpenAI client exposes more detailed usage objects, but for this blog
    I'm mainly interested in the usual trio: prompt, completion, total.
    """

    prompt_tokens: int = 0
    completion_tokens: int = 0
    total_tokens: int = 0

    @classmethod
    def from_raw(cls, raw: Mapping[str, Any]) -> "TokenUsageSnapshot":
        return cls(
            prompt_tokens=int(raw.get("prompt_tokens", 0) or 0),
            completion_tokens=int(raw.get("completion_tokens", 0) or 0),
            total_tokens=int(raw.get("total_tokens", 0) or 0),
        )

    def to_dict(self) -> Dict[str, int]:
        return {
            "prompt_tokens": self.prompt_tokens,
            "completion_tokens": self.completion_tokens,
            "total_tokens": self.total_tokens,
        }


ImpactFn = Callable[[Dict[str, int]], Dict[str, int]]


class TokenImpactTracker:
    """Wraps a decorator that tracks token usage across many LLM calls.

    - Every decorated function call tries to read a `.usage` field on the
      returned OpenAI response (or a `{"usage": ...}` mapping).
    - The raw token counts are accumulated over the lifetime of this object.
    - An `impact_fn` converts a single-call usage dict into keyâ†’int scores
      (my personal "impact calculus"), which are also accumulated.
    """

    def __init__(self, impact_fn: ImpactFn):
        self.impact_fn: ImpactFn = impact_fn
        # Raw counts straight from the API, aggregated over time
        self.raw_totals: Dict[str, int] = defaultdict(int)
        # Impact-space totals, defined by whatever function I care about
        self.impact_totals: Dict[str, int] = defaultdict(int)

    # --- internals -----------------------------------------------------

    def _extract_usage_dict(self, result: Any) -> Dict[str, int]:
        """Best-effort extraction of a usage dict from an OpenAI response.

        This handles the current OpenAI Python client objects as well as
        plain dicts. If no usage information is available, we just return
        an empty dict and quietly skip accounting for that call.
        """

        usage_obj: Any = getattr(result, "usage", None)

        if usage_obj is None and isinstance(result, Mapping):
            usage_obj = result.get("usage")

        if usage_obj is None:
            return {}

        # New-style OpenAI objects are pydantic-like; try a few options.
        if hasattr(usage_obj, "to_dict"):
            raw = usage_obj.to_dict()
        elif hasattr(usage_obj, "model_dump"):
            raw = usage_obj.model_dump()
        elif isinstance(usage_obj, Mapping):
            raw = dict(usage_obj)
        else:
            # Last resort: nothing we understand
            return {}

        snapshot = TokenUsageSnapshot.from_raw(raw)
        return snapshot.to_dict()

    def _record_usage(self, usage: Dict[str, int]) -> None:
        for key, value in usage.items():
            self.raw_totals[key] += int(value)

        impact = self.impact_fn(usage)
        for key, value in impact.items():
            self.impact_totals[key] += int(value)

    # --- the decorator interface --------------------------------------

    def decorator(self, fn: Callable[..., Any]) -> Callable[..., Any]:
        """Turn this tracker into a decorator for an LLM-calling function."""

        def wrapped(*args: Any, **kwargs: Any) -> Any:
            result = fn(*args, **kwargs)
            usage = self._extract_usage_dict(result)
            if usage:
                self._record_usage(usage)
            return result

        # Be a polite decorator and preserve the original function name
        wrapped.__name__ = fn.__name__
        wrapped.__doc__ = fn.__doc__
        return wrapped

    # Convenience helpers for inspection in the notebook
    def as_dict(self) -> Dict[str, Dict[str, int]]:
        return {
            "raw_totals": dict(self.raw_totals),
            "impact_totals": dict(self.impact_totals),
        }


def example_pricing_impact(usage: Dict[str, int]) -> Dict[str, int]:
    """A tiny, opinionated impact function.

    For this assignment I'm not going to pull live pricing tables; instead,
    I'll hard-code a rough toy pricing model for `gpt-4o-mini` in terms of
    **micro-dollars** (1e-6 USD) so everything stays integer-based:

    - prompt tokens: 0.15 USD / 1M tokens
    - completion tokens: 0.60 USD / 1M tokens

    That works out to:
      0.15 / 1_000_000 = 1.5e-7 USD/token
      0.60 / 1_000_000 = 6e-7 USD/token

    We store everything as integers so downstream code doesn't have to think
    about floats.
    """

    prompt = usage.get("prompt_tokens", 0)
    completion = usage.get("completion_tokens", 0)
    total = usage.get("total_tokens", prompt + completion)

    # micro-dollars per token
    prompt_per_token = 15  # 1.5e-7 USD == 15 micro-dollars
    completion_per_token = 60  # 6e-7 USD == 60 micro-dollars

    micro_usd = prompt * prompt_per_token + completion * completion_per_token

    return {
        "prompt_tokens": prompt,
        "completion_tokens": completion,
        "total_tokens": total,
        "micro_usd": micro_usd,
    }


In [2]:
## A Tiny Decorated Helper Around `client.chat.completions.create`

tracker = TokenImpactTracker(impact_fn=example_pricing_impact)


@tracker.decorator
def call_gpt_4o_mini(prompt: str):
    """Single-turn helper so I have something concrete to decorate.

    In a "real" system this would probably be buried a few layers down in an
    agent loop or conversation runner. Here I just want something that:

    - calls the OpenAI Chat Completions API,
    - returns the raw response object so we can introspect `.usage`, and
    - plays nicely with the notebook.
    """

    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[
            {"role": "system", "content": "You are a friendly, concise assistant."},
            {"role": "user", "content": prompt},
        ],
        store=False,
    )
    return response


# quick smoke test (you can re-run this cell a few times to accumulate stats)
example_response = call_gpt_4o_mini("In one or two sentences, explain what a token is in the context of LLMs.")
print(example_response.choices[0].message.content)

tracker.as_dict()


In the context of large language models (LLMs), a token is a unit of text, which can be a word, part of a word, or even a punctuation mark, used by the model to process and generate language. Tokens serve as the basic building blocks for understanding and producing text, with each token representing a specific piece of information.


{'raw_totals': {'prompt_tokens': 38,
  'completion_tokens': 69,
  'total_tokens': 107},
 'impact_totals': {'prompt_tokens': 38,
  'completion_tokens': 69,
  'total_tokens': 107,
  'micro_usd': 4710}}

## Playing With Impact Calculi

The nice thing about this setup is that I never hard-code what "impact" means.
All the tracker ever sees is a mapping from strings to integers. Today, I'm
using micro-dollars and raw token counts, but a future experiment could plug in
something very different:

- **budgeting**: keep separate counters for prompt vs. completion tokens and
  alert when one crosses a threshold,
- **carbon**: approximate energy usage per 1K tokens and track an emissions
  budget alongside dollars, or
- **fairness**: tag calls by scenario/user and see which groups are soaking up
  the most capacity.

For now, I'm happy that I can decorate a plain old function and get a running
summary of how "expensive" my experiments are without changing the surrounding
code. It feels like a good building block for the messier multi-agent setups
I've been playing with in the earlier posts.

> **Future Work:**
> - Try plugging this into one of the multi-speaker simulations and track
>   impact per speaker/persona.
> - Add simple budget guards that short-circuit a function once a quota is hit.
> - Layer in richer token-type distinctions (cached vs. uncached, tools, etc.)
>   as the API surface evolves.
