### What does the decorator do?

When you write:

In [None]:
@add_to_class(A)
def do(self):
    print('Class attribute "b" is', self.b)

Python transforms it into:

In [None]:
def do(self):
    print('Class attribute "b" is', self.b)

do = add_to_class(A)(do)


So the key question becomes:
What does `add_to_class(A)(do)` do?

#### Expand `add_to_class`

In [None]:
def add_to_class(Class):
    def wrapper(obj):
        setattr(Class, obj.__name__, obj)
    return wrapper

Now evaluate it step-by-step

First call: `add_to_class(A)`

This returns the inner function:

In [None]:
wrapper(obj)

So no we effectively have:

In [None]:
do = wrapper(do)

#### What does `wrapper(do)` do?

Inside `wrapper`:   

In [None]:
setattr(Class, obj.__name__, obj)

In our case:

* `Class` → `A`

* `obj` → `do`

* `obj.__name__` → `"do"`

This is exactly equivalent to writing

In [None]:
A.do = do

#### Why does this work even after instance creation?

We already created

In [None]:
a = A()

Then later we do

In [None]:
setattr(A, 'do', do)

Instances in Python don't copy methods when they're created. They look them up dynamically vi the class

So when we call

In [None]:
a.do()

Python:

1. Looks for `do` in `a`

2. Doesn’t find it

3. Looks in `A`

4. Finds `do`

5. Binds `self = a`

6. Calls it

So even existing instances gain the new method immediately.

This is pure Python dynamic behavior.

### Progress Board

#### What are the args?

They’re plot styling / axis configuration defaults for Matplotlib.

* `xscale='linear'`, `yscale='linear'`

    * Controls the axis scale.

    * `'linear'` means normal spacing.

    * Common alternatives: `'log'`, `'symlog'`, `'logit'`.

    * In the code, they get applied here:

        ```
        axes.set_xscale(self.xscale)
        axes.set_yscale(self.yscale)
        ```

* `ls=['-', '--', '-.', ':']`

    * A list of linestyles (solid, dashed, dash-dot, dotted).

    * Used so different series (different labels) are visually distinguishable:

        ```
        for (k, v), ls, color in zip(self.data.items(), self.ls, self.colors):
            d2l.plt.plot(..., linestyle=ls, ...)

        ```

Important gotcha: because they use `zip(...)`, you can only display as many labels as the shortest list length. With the defaults, you’ll see up to 4 lines (unless you extend `ls` and `colors`).

#### Why does `ProgressBoard` inherit from `d2l.HyperParameters`?

Because `HyperParameters` provides the method:

In [None]:

save_hyperparameters()

and `ProgressBoard.__init__` calls:

In [None]:
self.save_hyperparameters()

So inheritance is used to reuse that utility: automatically turn all constructor arguments into attributes, without writing:

In [None]:
self.xlabel = xlabel
self.ylabel = ylabel
self.xlim = xlim
...


### `plot()`

#### The `\` Backslash

Example:

In [None]:
x = self.trainer.train_batch_idx / \
    self.trainer.num_train_batches

The \ means:

* Continue this line on the next line.

So Python reads this as:

In [None]:
x = self.trainer.train_batch_idx / self.trainer.num_train_batches

#### What is `x`?

Training case:

In [None]:
x = self.trainer.train_batch_idx / self.trainer.num_train_batches

Suppose:
* 100 batches per epoch
* current batch = 30

Then:

x = 30 / 100 = 0.3

So during training, x moves smoothly from 0 -> 1

This makes the plot update continuously within an epoch

Validation case:

In [None]:
x = self.trainer.epoch + 1

If current epoch = 2

In [None]:
x = 3

Validation is plotted once per epoch, so it jumps discretely.

#### What is `n`?

Training case:

In [None]:
n = self.trainer.num_train_batches / self.plot_train_per_epoch

If:

* 100 batches per epoch
* `plot_train_per_epoch = 2`

Then:

n = 100 / 2 =50

Meaning:
* Only plot every 50 batches

This controls smoothing frequency

Later:

In [None]:
every_n = int(n)

so `draw()` will only plot once every `n` cells

#### The Most Interesting Line

In [None]:
self.board.draw(
    x,
    value.to(d2l.cpu()).detach().numpy(),
    ('train_' if train else 'val_') + key,
    every_n=int(n)
)

In [None]:
('train_' if train else 'val_') + key

If:

In [None]:
key = "loss"
train = True

Then:

In [None]:
label = "train_loss"

If validation:

In [None]:
label = "val_loss"

So training and validation losses appear as separate lines.

### Training Step and Validation Step

#### Difference between `training_step` and `validation_step`

They do the same forward + loss computation, but differ in what happens next:

* training_step

    * Computes loss l

    * Calls plot(..., train=True) so it’s logged as "train_loss"

    * Returns l (so the Trainer can call backward() and optimizer.step())

* validation_step

    * Computes loss l

    * Calls plot(..., train=False) so it’s logged as "val_loss"

    * Does not return anything (validation typically has no backprop / no optimizer step)

#### What does `self.loss(self(*batch[:-1]), batch[-1])` mean?

Assume each `batch` is like `(X, y)`.

* `batch[:-1]` = everything except the last element → `(X,)`

* `*batch[:-1]` splats into positional args → `self(X)`

* `self(X)` calls `Module.__call__` (from `nn.Module`), which internally calls `forward(X)` → produces `y_hat`

* `batch[-1]` = last element → `y`

* `self.loss(y_hat, y)` computes the loss tensor `l`

So it’s a compact way of supporting batches like `(X, y)` or even `(X1, X2, ..., y)`.

#### Step-by-step example: 2 epochs, 10 batches each

Let’s assume:

* `num_train_batches = 10`

* `num_val_batches = 10` (just for illustration)

* `plot_train_per_epoch = 2` (default)

* `plot_valid_per_epoch = 1` (default)

* The trainer maintains:

    * `trainer.epoch` (0-based)

    * `trainer.train_batch_idx` (batch index within current epoch, typically 0..9)

Epoch 0 (first epoch)
Training loop, batches 0..9

Each batch does:

1. `training_step(batch)`:

    * `y_hat = self(X) (forward)`

    * `l = loss(y_hat, y)`

    * `plot('loss', l, train=True)`

2. Inside `plot(..., train=True)`:

    * Check `self.trainer` exists

    * Compute:

        * x = train_batch_idx / 10

        * every_n = 5

        * label = "train_loss"

    * Call:

In [None]:
board.draw(x, loss_value, "train_loss", every_n=5)

Validation loop, batches 0..9 (after training epoch 0)

For each val batch:

1. `validation_step(batch)`:

    * compute `l`

    * `plot('loss', l, train=False)`

2. Inside `plot(..., train=False)`:

    * `x = epoch + 1 = 1`

    * `every_n = 10`

    * label = `"val_loss"`

    * call `board.draw(1, loss_value, "val_loss", every_n=10)`

3. Inside `board.draw(... every_n=10)`:

    * collects 10 raw points

    * only draws when it has all 10

So:

* Val batch 0..8: not drawn yet

* Val batch 9: the 10th point → draw happens

    * x-values are all 1

So validation loss is plotted once per epoch (at x=epoch+1).