# Understanding Classes, References, and Parameter Updates

This notebook is designed to strengthen your understanding of **Python classes, objects, and references**, using a minimal optimizer example inspired by neural networks.

The core question we answer is:

**How can an optimizer update parameters of many Layer objects when it only sees a flat list of parameters?**


## Key Idea

In Python:

- Variables store **references** to objects, not copies
- Attributes like `layer.W` are themselves objects
- Lists can store references to those same objects

If two names reference the same mutable object, modifying it through one name affects the other.


## Mathematical Context (Minimal)

In gradient descent, each parameter $p$ is updated as:

$p \leftarrow p - \eta g$

where $\eta$ is the learning rate and $g = \frac{\partial L}{\partial p}$.

The important part here is **how this update reaches the correct object in memory**, not the calculus.


## Part A — A Minimal Layer Class

We start with a very small class that mimics a neural network layer.
It has a parameter `W`, a gradient `dW`, and exposes them via `params()` and `grads()`.


In [1]:
class ToyLayer:
    def __init__(self, W, dW):
        self.W = W      # parameter (mutable object)
        self.dW = dW    # gradient

    def params(self):
        # Returns a reference, NOT a copy
        return [self.W]

    def grads(self):
        return [self.dW]


### Experiment A1 — Reference behavior

We will modify the parameter through the list returned by `params()` and observe the effect.


In [2]:
layer = ToyLayer(W=[10, 20], dW=[1, 1])
params = layer.params()

print('Before modification:')
print('layer.W =', layer.W)
print('params[0] =', params[0])

params[0][0] -= 1

print('\nAfter modification through params:')
print('layer.W =', layer.W)
print('params[0] =', params[0])


Before modification:
layer.W = [10, 20]
params[0] = [10, 20]

After modification through params:
layer.W = [9, 20]
params[0] = [9, 20]


**Explanation:** `params[0]` and `layer.W` reference the same object. Changing one changes the other.


## Part B — A Minimal Optimizer

The optimizer does not know what a Layer is. It only assumes the object passed to it has `params()` and `grads()` methods.


In [3]:
class ToySGD:
    def __init__(self, lr=0.1):
        self.lr = lr

    def step(self, model):
        for p, g in zip(model.params(), model.grads()):
            p[0] = p[0] - self.lr * g[0]


### Experiment B1 — Updating a single layer via the optimizer


In [4]:
layer = ToyLayer(W=[10, 20], dW=[1, 1])
opt = ToySGD(lr=0.1)

print('Before step:', layer.W)
opt.step(layer)
print('After step :', layer.W)


Before step: [10, 20]
After step : [9.9, 20]


**Explanation:** The optimizer updates the same object referenced by the layer.


## Part C — Multiple Layers with a Sequential Container

We now collect parameters from multiple layers into a flat list.


In [5]:
class ToySequential:
    def __init__(self, *layers):
        self.layers = list(layers)

    def params(self):
        ps = []
        for layer in self.layers:
            ps.extend(layer.params())
        return ps

    def grads(self):
        gs = []
        for layer in self.layers:
            gs.extend(layer.grads())
        return gs


### Experiment C1 — Updating multiple layers at once


In [6]:
layer1 = ToyLayer(W=[10, 20], dW=[1, 1])
layer2 = ToyLayer(W=[100, 200], dW=[10, 10])

model = ToySequential(layer1, layer2)
opt = ToySGD(lr=0.1)

print('Before step:')
print('layer1.W =', layer1.W)
print('layer2.W =', layer2.W)

opt.step(model)

print('\nAfter step:')
print('layer1.W =', layer1.W)
print('layer2.W =', layer2.W)


Before step:
layer1.W = [10, 20]
layer2.W = [100, 200]

After step:
layer1.W = [9.9, 20]
layer2.W = [99.0, 200]


## Final Mental Model

1. Parameters live inside layer objects
2. `params()` returns references to those parameters
3. Lists store references, not copies
4. Optimizers modify parameters in place

**Optimizers do not update layers — they update the objects layers reference.**
