# Week 3 — Part 01: Tokens and Context Lab

**Estimated time:** 60–90 minutes

## Learning Objectives

- Explain tokens and context windows
- Count tokens using a tokenizer
- Compare token usage across inputs
- Practice truncation strategies


## Overview

Tokens are the units models operate on. Context windows limit how many tokens you can send at once, which affects prompt design and truncation strategies.

In [None]:
import re

try:
    import tiktoken
except Exception as e:  # pragma: no cover
    tiktoken = None
    _tiktoken_error = e


def simple_tokenize(text: str) -> list[str]:
    return re.findall(r"\w+|[^\w\s]", text)


sample = "Hello, world! Token test."
print("simple tokens:", simple_tokenize(sample))

if tiktoken is None:
    print("tiktoken not available:", _tiktoken_error)
else:
    enc = tiktoken.get_encoding("cl100k_base")
    tokens = enc.encode("Token counting is useful for budgeting.")
    print("tiktoken count:", len(tokens))

## Truncation exercise

Use a simple truncation strategy to fit text into a max token budget.

In [None]:
def truncate_text_tokens(text: str, max_tokens: int, *, enc) -> str:
    tokens = enc.encode(text)
    if len(tokens) <= max_tokens:
        return text
    return enc.decode(tokens[:max_tokens])


if tiktoken is not None:
    long_text = "data " * 400
    truncated = truncate_text_tokens(long_text, 200, enc=enc)
    print("orig tokens:", len(enc.encode(long_text)))
    print("trunc tokens:", len(enc.encode(truncated)))

## Practice exercises

- Compare token counts for URLs vs plain text.
- Experiment with different max token budgets.
- Add a function that estimates cost given tokens.

## References

- tiktoken: https://github.com/openai/tiktoken