# Lesson 0 — Python Foundations for Language Model Builders

Welcome! This warm-up lesson gives you everything you need to understand the Python code that powers the rest of the course. We'll start with fundamentals and layer on new skills until you are ready to read and write the machine learning programs that appear in Lessons 1–6.

Throughout the notebook you will see:

* **Concept callouts** that introduce new vocabulary.
* **Code demos** that mirror the exact techniques used in later lessons.
* **Practice prompts** that encourage you to tweak the examples and observe what happens.

## Learning goals

By the end of this lesson you should be able to:

1. Read and write Python expressions with numbers, strings, lists, dictionaries, and loops.
2. Define reusable functions and understand how Python modules (packages of functions) are imported and used.
3. Manipulate text data, file paths, and counters—core tools for tokenization and language modeling.
4. Work comfortably with scientific libraries such as `numpy`, `matplotlib`, `torch`, and `transformers` that appear in later lessons.
5. Explain the purpose of decorators, classes, and object-oriented patterns used to build neural networks.

## 1. Python syntax essentials

Let's begin with the smallest building blocks: values, variables, and expressions.

* **Value** — a piece of data such as the number `42` or the string `'token'`.
* **Variable** — a name that points to a value so we can reuse it.
* **Expression** — something Python can evaluate to produce a result.

In [None]:

# numbers and strings
count = 3 + 4
message = "Tokens learned"
# f-strings let us combine text and variable values
print(f"We discovered {count} new tokens. {message}!")
# built-in len() works on many objects, including strings
print("Characters in message:", len(message))


### Comments and whitespace

You will notice comments that start with `#`. These are notes for humans and Python ignores them. Blank lines help group ideas and do not change the program.

### Built-in functions we will reuse often

| Function | Purpose in later lessons |
| --- | --- |
| `len(obj)` | Count characters or tokens (Lessons 1–4). |
| `sorted(iterable)` | Order vocabulary entries (Lessons 1–3). |
| `set(iterable)` | Collect unique characters or words (Lessons 1–3). |
| `sum(values)` | Combine counts or losses (Lessons 3–4). |
| `min`, `max` | Track best scores or probabilities (Lessons 1–3). |
| `round(number, ndigits)` | Format metrics like perplexity (Lessons 3 & 6). |

## 2. Strings, lists, tuples, dictionaries, and sets

The language modeling notebooks work with sequences of characters and words, so we must be comfortable with Python's core containers.

In [None]:

words = ["rocket", "ship", "portal"]  # list: ordered and editable
first, second, third = words  # tuple-unpacking of a list
print("First word:", first)
words.append("village")  # add to a list (Lesson 1)
print("Updated words:", words)
letters = set("astronaut")  # set: unordered, unique elements
print("Unique letters:", sorted(letters))
facts = {"creeper": "Minecraft enemy", "falcon": "bird"}  # dictionary
print("Lookup:", facts["creeper"])  # key-based access (Lessons 1–3)
facts["falcon"] = "space launch company"
print("Updated dictionary:", facts)


### Slicing and comprehension patterns

Slicing lets us grab a portion of a sequence, and comprehensions provide a concise way to build new lists or dictionaries. These appear repeatedly when we create vocabularies and embeddings.

In [None]:

text = "dogonauts"
print("First three letters:", text[:3])
print("Every other letter:", text[::2])

# List comprehension to lowercase each token
raw_tokens = ["Dog", "Wolves", "Stars"]
clean_tokens = [token.lower() for token in raw_tokens]
print(clean_tokens)

# Dictionary comprehension builds {word: index}
vocab = sorted(set(clean_tokens))
word_to_index = {word: idx for idx, word in enumerate(vocab)}
print(word_to_index)


### Practice prompt

Change `raw_tokens` to include a phrase with a space, such as `'Space Station'`. Predict how the list and dictionary comprehensions respond and then run the cell to verify.

## 3. Conditionals and loops

The later lessons loop over tokens, characters, and batches. Control flow statements give us that power.

* `if`/`elif`/`else` choose a branch.
* `for` iterates over items in a sequence.
* `while` keeps going until a condition stops it.
* `break` and `continue` adjust the loop's behavior.

In [None]:

words = ["dog", "dogs", "wolf", "wolves"]
for word in words:
    if word.endswith("s"):
        print(f"Plural detected: {word}")
    else:
        print(f"Singular detected: {word}")

# while-loop for iterative merges (mirrors Lesson 1's tokenizer)
symbols = list("moon") + ["</w>"]
merges = [("o", "o")]
for pair in merges:
    i = 0
    while i < len(symbols) - 1:
        if symbols[i] == pair[0] and symbols[i + 1] == pair[1]:
            symbols[i : i + 2] = [pair[0] + pair[1]]
        else:
            i += 1
print("After merge:", symbols)


### Ranges, enumeration, and zipping

When you need index positions alongside values (for example, iterating over embedding dimensions), `range`, `enumerate`, and `zip` are indispensable.

In [None]:

# range(start, stop) generates integers; stop is exclusive
for i in range(3):
    print("Iteration", i)

# enumerate supplies (index, value)
planets = ["Mercury", "Venus", "Earth"]
for idx, planet in enumerate(planets, start=1):
    print(f"Planet {idx}: {planet}")

# zip combines sequences position-wise
coords = [0, 1, 2]
names = ["x", "y", "z"]
for coord, name in zip(coords, names):
    print(f"Axis {name} -> index {coord}")


## 4. Defining and returning functions

In Lessons 1–3 we create helper functions like `get_vocab` and `sample`. Here's how to define our own.

* Use the `def` keyword.
* Provide parameters inside parentheses.
* Use `return` to hand back a value.

In [None]:

def describe_token(token):
    """Return token length and uppercase form."""
    length = len(token)
    shout = token.upper()
    return length, shout

info = describe_token("astronaut")
print(info)

# Functions can be used inside other functions
def tokens_longer_than(tokens, min_length):
    return [tok for tok in tokens if len(tok) >= min_length]

print(tokens_longer_than(["dog", "satellite", "wolf"], 5))


### Lambda expressions (anonymous functions)

Sometimes we pass a tiny function as an argument—`lambda` gives us a compact way to do that. Lesson 3 uses custom scoring functions, and Lesson 2 sorts by cosine similarity.

In [None]:

words = ["dog", "wolf", "falcon", "rocket"]

# Sort by word length using a lambda function as the key
by_length = sorted(words, key=lambda w: len(w))
print(by_length)


## 5. Modules and imports

A **module** is a file that contains reusable code. We bring modules into our notebooks with `import` statements.

| Module | What it provides in later lessons |
| --- | --- |
| `pathlib` | The `Path` class for navigating the data folder (Lessons 1–6). |
| `collections` | `Counter` for counting tokens (Lessons 1 & 3). |
| `re` | Regular expressions for splitting text (Lessons 1, 3, 6). |
| `random` | Sampling words when generating text (Lessons 3 & 4). |
| `math` | Functions like `log2` and `sqrt` for probabilities and scaling (Lessons 3–6). |
| `numpy` | Numerical arrays for embeddings (Lesson 2). |
| `matplotlib.pyplot` | Plotting embeddings (Lesson 2). |
| `torch` and `torch.nn` | Neural network building blocks (Lesson 4). |
| `datasets` & `transformers` | Hugging Face tools for fine-tuning (Lesson 5). |

In [None]:

from pathlib import Path
from collections import Counter
import re
import random
import math

# Path helps us build OS-independent file paths
corpus_path = Path("../data") / "space.txt"
print("Path exists?", corpus_path.exists())

# Read text from a file (Lesson 1 style)
text = corpus_path.read_text(encoding="utf-8")
print("Sample:", text[:60])

# Use regex to find words
tokens = re.findall(r"[a-zA-Z']+", text.lower())
print("First tokens:", tokens[:10])

# Count frequencies
counts = Counter(tokens)
print("Most common:", counts.most_common(5))

# Random sampling (e.g., Lesson 3's text generation)
random.seed(42)
print("Random token:", random.choice(tokens))

# Math utilities
prob = 0.125
print("Information content (bits):", -math.log2(prob))


### Practice prompt

Try changing the regular expression to `r"[a-zA-Z]+"` and observe how contractions such as "it's" are handled.

## 6. File input/output basics

We often store intermediate artifacts such as merged corpora or saved models. The two most common approaches in this course are:

1. `Path.read_text()` and `Path.write_text()` from the `pathlib` module.
2. The built-in `open()` function used inside a `with` block.

In [None]:

from pathlib import Path

lesson_dir = Path("../data")
snippet_path = lesson_dir / "mini_snippet.txt"

snippet_path.write_text("Galaxies whisper through the vacuum.", encoding="utf-8")
print(snippet_path.read_text(encoding="utf-8"))

# with-open pattern automatically closes the file
data = []
with open(snippet_path, "r", encoding="utf-8") as handle:
    for line in handle:
        data.append(line.strip())
print(data)


## 7. Numerical computing with NumPy

Lesson 2 builds word embeddings using the Singular Value Decomposition (SVD). To follow that lesson you need to understand:

* How to create arrays with `numpy` (`np.array`, `np.zeros`).
* Basic arithmetic and slicing on arrays.
* Linear algebra helpers like `np.dot`, `np.linalg.norm`, and `np.linalg.svd`.

In [None]:

import numpy as np

# Co-occurrence matrix example (rows and columns correspond to words)
words = ["dog", "wolf", "moon"]
cooc = np.zeros((3, 3), dtype=np.float32)
cooc[0, 1] = 2
cooc[0, 2] = 1
cooc[1, 0] = 2
cooc[2, 0] = 1
print(cooc)

# Matrix factorization via SVD
U, S, VT = np.linalg.svd(cooc + 1e-6, full_matrices=False)
embeddings = U[:, :2] * S[:2]
print("Embeddings shape:", embeddings.shape)

# Cosine similarity helper (Lesson 2)
def cosine(a, b):
    return float(np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b) + 1e-9))

print("Similarity dog/wolf:", cosine(embeddings[0], embeddings[1]))


## 8. Visualization with Matplotlib

Plotting embeddings helps us reason about spatial relationships. Lesson 2 saves charts with `plt.savefig`. Here's a miniature example.

In [None]:

import matplotlib.pyplot as plt

plt.figure(figsize=(4, 4))
plt.scatter(embeddings[:, 0], embeddings[:, 1])
for idx, word in enumerate(words):
    plt.text(embeddings[idx, 0], embeddings[idx, 1], word)
plt.title("Mini embedding map")
plt.savefig("../images/lesson0_embeddings.png", bbox_inches="tight")
plt.show()


## 9. Probabilities, randomness, and evaluation metrics

Language models assign probabilities to sequences and we evaluate them with cross-entropy or perplexity. The utilities below match Lessons 3 and 6.

In [None]:

import math
import random

# Probability sampling (akin to Lesson 3's n-gram sampling)
options = ["wolf", "villager", "astronaut"]
weights = [0.5, 0.3, 0.2]

def weighted_choice(options, weights):
    total = sum(weights)
    r = random.random() * total
    cumulative = 0.0
    for option, weight in zip(options, weights):
        cumulative += weight
        if cumulative >= r:
            return option

print("Sampled option:", weighted_choice(options, weights))

# Cross-entropy and perplexity (Lesson 3 & 6)
def cross_entropy(probabilities):
    return -sum(math.log2(max(p, 1e-12)) for p in probabilities) / len(probabilities)

def perplexity(probabilities):
    return 2 ** cross_entropy(probabilities)

sample_probs = [0.5, 0.25, 0.25]
print("Cross-entropy:", cross_entropy(sample_probs))
print("Perplexity:", perplexity(sample_probs))


## 10. PyTorch fundamentals

PyTorch powers Lesson 4's tiny transformer. Key ideas:

* A **tensor** is a multi-dimensional array (similar to a NumPy array) that tracks gradients.
* `torch.nn.Module` is the base class for neural network components.
* Layers like `nn.Linear`, `nn.LayerNorm`, and activation functions transform tensors.
* The training loop repeats forward pass → loss computation → backward pass → parameter update.

In [None]:

import torch
import torch.nn as nn

print("PyTorch version:", torch.__version__)
device = "cuda" if torch.cuda.is_available() else "cpu"
print("Using device:", device)

# Create tensors
x = torch.tensor([[1.0, 2.0, 3.0]])
W = torch.randn(3, 2)
print("Linear transform:", x @ W)

# Simple feed-forward block similar to Lesson 4's transformer block
class FeedForward(nn.Module):
    def __init__(self, n_embd):
        super().__init__()
        self.net = nn.Sequential(
            nn.Linear(n_embd, 4 * n_embd),
            nn.GELU(),
            nn.Linear(4 * n_embd, n_embd),
        )

    def forward(self, x):
        return self.net(x)

block = FeedForward(8)
inputs = torch.randn(2, 5, 8)  # batch, time, embedding
outputs = block(inputs)
print("Output shape:", outputs.shape)


### Attention-style tensor operations

Lesson 4 relies on reshaping tensors to manage multiple attention heads. You should recognize these methods:

* `.view` and `.reshape` change tensor dimensions without copying data.
* `.transpose` swaps axes.
* `.contiguous()` ensures memory layout is suitable for reshaping.
* `@` performs matrix multiplication, while `torch.softmax` converts logits to probabilities.

In [None]:

import math

B, T, C = 2, 4, 8   # batch, time, channels
heads = 2
head_dim = C // heads

x = torch.randn(B, T, C)
qkv = torch.randn(B, T, 3 * C)
q, k, v = qkv.chunk(3, dim=-1)
q = q.view(B, T, heads, head_dim).transpose(1, 2)
k = k.view(B, T, heads, head_dim).transpose(1, 2)
v = v.view(B, T, heads, head_dim).transpose(1, 2)

scores = (q @ k.transpose(-2, -1)) / math.sqrt(head_dim)
mask = torch.tril(torch.ones(T, T)) == 0
scores.masked_fill_(mask, float("-inf"))
attn = torch.softmax(scores, dim=-1)
context = attn @ v
context = context.transpose(1, 2).contiguous().view(B, T, C)
print("Attention context shape:", context.shape)


### Training loop ingredients

`torch.no_grad` tells PyTorch not to track gradients (used when evaluating). Optimizers such as `torch.optim.AdamW` update parameters.

In [None]:

model = nn.Linear(4, 2)
optimizer = torch.optim.AdamW(model.parameters(), lr=1e-3)
criterion = nn.MSELoss()

for step in range(3):
    inputs = torch.randn(5, 4)
    targets = torch.randn(5, 2)
    outputs = model(inputs)
    loss = criterion(outputs, targets)
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()
    print(f"Step {step}: loss={loss.item():.4f}")

@torch.no_grad()
def evaluate(inputs):
    logits = model(inputs)
    return logits

print("Eval logits:", evaluate(torch.randn(2, 4)))


## 11. Hugging Face datasets and transformers

Lesson 5 fine-tunes `distilgpt2`. The key Python concepts are:

* Using `Path` and standard file writing to merge corpora.
* Creating a `datasets.Dataset` from a dictionary.
* Loading pre-trained tokenizers and models with `transformers`.
* Configuring `TrainingArguments` and running a `Trainer`.
* Generating text with the `pipeline` helper.

Below is a lightweight dry run that mirrors the structure without requiring a full training session.

In [None]:

from datasets import Dataset
from transformers import AutoTokenizer

mini_corpus = "Wolves explore space. Villagers watch the stars."
dataset = Dataset.from_dict({"text": [mini_corpus]})
print(dataset)

tokenizer = AutoTokenizer.from_pretrained("distilgpt2", use_fast=True)
encoded = dataset.map(lambda batch: tokenizer(batch["text"], truncation=True), batched=True)
print(encoded[0]["input_ids"][:10])


> **Note:** Fine-tuning requires additional components (`AutoModelForCausalLM`, `DataCollatorForLanguageModeling`, `TrainingArguments`, and `Trainer`). You will assemble those pieces in Lesson 5 once you are comfortable with the syntax shown here.

## 12. Regular expressions and simple safety filters

Lesson 6 demonstrates rule-based safety checks. Python's `re` module searches for patterns inside text.

In [None]:

import re

FORBIDDEN = [r"credit card number", r"social security number"]

def safe_input(user_text):
    lowered = user_text.lower()
    for pattern in FORBIDDEN:
        if re.search(pattern, lowered):
            return False, f"Blocked by rule: {pattern}"
    return True, "ok"

examples = [
    "Tell me about wolves in Minecraft.",
    "What is a social security number?",
]
for text in examples:
    print(text, "->", safe_input(text))


## 13. Pulling it all together

You now have a working vocabulary for every concept used in Lessons 1–6:

1. Read and manipulate text with `Path`, `re`, and `Counter`.
2. Control program flow using functions, loops, comprehensions, and conditionals.
3. Work with arrays, tensors, and neural network layers in NumPy and PyTorch.
4. Load and fine-tune large language models with Hugging Face tools.

Keep this notebook handy as a reference. Whenever later lessons introduce a new idea, look back at the corresponding section here to reinforce the foundation.