(*This is part of a series of blog posts relating to and responding to the live
FastAI course (part 2) being taught in late 2022. To read others, see the ones
listed for [the ‘parttwo’ tag](https://mlops.systems/#category=parttwo).*)

This week in the second half of the class we covered matrix multiplication and
broadcasting. (I'll write a separate blog about the parts of lessons 9-11 where
we discuss specific papers.) These two concepts are important because machine
learning (and deep learning especially) is all about operations on matrices and
vectors, so we both had best understand the concepts but also be able to
make those calculations quickly and efficiently.

At the bottom of this post, I'll also include my collation of the "Things Jeremy
says to do" from this lesson. The relevant topics and libraries covered in this
lecture are linked below.

| MetaTask          | Subtask                               | Concept / Skill       | Docs / Link                                                                                                                                                                    |
| ----------------- | ------------------------------------- | --------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| Multiply matrices | Implement basic matrix multiplication | Matrix multiplication | [link](https://youtu.be/kT4Mp9EdVqs)                                                                                                                                           |
|                   | Pre-compile using numba               | Numba conversion      | [link](https://numba.pydata.org)                                                                                                                                               |
|                   | Use elementwise ops                   | APL                   | [link](https://tryapl.org)                                                                                                                                                     |
|                   |                                       | Frobenius norm        |                                                                                                                                                                                |
|                   | Broadcast to match up matrices        | Broadcasting          | [link](https://numpy.org/doc/stable/user/basics.broadcasting.html) / [link](https://pytorch.org/docs/stable/notes/broadcasting.html#broadcasting-semantics)                    |
|                   |                                       | `expand_as`           | [link](https://pytorch.org/docs/stable/generated/torch.Tensor.expand_as.html) / [link](https://pytorch.org/docs/stable/generated/torch.Tensor.expand.html#torch.Tensor.expand) |
|                   |                                       | `storage`             | [link](https://pytorch.org/docs/stable/generated/torch.Tensor.storage.html)                                                                                                    |
|                   |                                       | stride                | [link](https://pytorch.org/docs/stable/generated/torch.Tensor.stride.html)                                                                                                     |
|                   | Adding dimensions                     | `unsqueeze`           | [link](https://pytorch.org/docs/stable/generated/torch.unsqueeze.html)                                                                                                         |

Before we get into the things we're doing with matrix multiplication, I'm
including the code from [the previous post](https://mlops.systems/computervision/fastai/parttwo/2022/10/24/foundations-mnist-basics.html) so that the variables are available to
us in this notebook, but hiding the cell so that it doesn't distract from the
main content.

In [1]:
#collapse-hide
import matplotlib as mpl, matplotlib.pyplot as plt
import torch
from torch import tensor
import gzip
from urllib.request import urlretrieve
from pathlib import Path
import os
import functools
import operator
import struct
import array
import tempfile

torch.set_printoptions(precision=2, linewidth=140, sci_mode=False)

BASE_URL = "http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/"

TRAINING_IMAGES = f"{BASE_URL}train-images-idx3-ubyte.gz"
TRAINING_IMAGES_LABELS = f"{BASE_URL}train-labels-idx1-ubyte.gz"
TEST_IMAGES = f"{BASE_URL}t10k-images-idx3-ubyte.gz"
TEST_IMAGES_LABELS = f"{BASE_URL}t10k-labels-idx1-ubyte.gz"

# create local path directory if it doesn't exist
LOCAL_PATH = Path("data")
# check whether the path exists; create it if it doesn't exist
if not LOCAL_PATH.exists():
    LOCAL_PATH.mkdir(exist_ok=True)

training_images_path = LOCAL_PATH / "train-images-idx3-ubyte.gz"
training_images_labels_path = LOCAL_PATH / "train-labels-idx1-ubyte.gz"
test_images_path = LOCAL_PATH / "t10k-images-idx3-ubyte.gz"
test_images_labels_path = LOCAL_PATH / "t10k-labels-idx1-ubyte.gz"

# download the raw data if it doesn't exist
if not training_images_path.exists():
    urlretrieve(TRAINING_IMAGES, training_images_path)
if not training_images_labels_path.exists():
    urlretrieve(TRAINING_IMAGES_LABELS, training_images_labels_path)
if not test_images_path.exists():
    urlretrieve(TEST_IMAGES, test_images_path)
if not test_images_labels_path.exists():
    urlretrieve(TEST_IMAGES_LABELS, test_images_labels_path)

# taken from https://github.com/datapythonista/mnist/blob/master/mnist/__init__.py

def parse_idx(fd):
    """Parse an IDX file, and return it as an array of arrays.

    Parameters
    ----------
    fd : file
        File descriptor of the IDX file to parse
    endian : str
        Byte order of the IDX file. See [1] for available options
    Returns
    -------
    data : array
        Numpy array with the dimensions and the data in the IDX file
    1. https://docs.python.org/3/library/struct.html
        #byte-order-size-and-alignment
    """
    DATA_TYPES = {
        0x08: "B",  # unsigned byte
        0x09: "b",  # signed byte
        0x0B: "h",  # short (2 bytes)
        0x0C: "i",  # int (4 bytes)
        0x0D: "f",  # float (4 bytes)
        0x0E: "d",
    }  # double (8 bytes)

    header = fd.read(4)
    if len(header) != 4:
        raise IdxDecodeError(
            "Invalid IDX file, "
            "file empty or does not contain a full header."
        )

    zeros, data_type, num_dimensions = struct.unpack(">HBB", header)

    if zeros != 0:
        raise IdxDecodeError(
            "Invalid IDX file, "
            "file must start with two zero bytes. "
            "Found 0x%02x" % zeros
        )

    try:
        data_type = DATA_TYPES[data_type]
    except KeyError:
        raise IdxDecodeError(
            "Unknown data type " "0x%02x in IDX file" % data_type
        )

    dimension_sizes = struct.unpack(
        ">" + "I" * num_dimensions, fd.read(4 * num_dimensions)
    )

    data = array.array(data_type, fd.read())
    data.byteswap()  # looks like array.array reads data as little endian

    expected_items = functools.reduce(operator.mul, dimension_sizes)
    if len(data) != expected_items:
        raise IdxDecodeError(
            "IDX file has wrong number of items. "
            "Expected: %d. Found: %d" % (expected_items, len(data))
        )
    return data

# chunk things together
def chunks(x, size):
    for i in range(0, len(x), size):
        yield x[i : i + size]

# unzip the files and extract the images
with gzip.open(training_images_path, "rb") as f:
    pixels = list(parse_idx(f))
    x_train = list(chunks(pixels, 784))

with gzip.open(training_images_labels_path, "rb") as f:
    y_train = list(parse_idx(f))

with gzip.open(test_images_path, "rb") as f:
    pixels = list(parse_idx(f))
    x_valid = list(chunks(pixels, 784))

with gzip.open(test_images_labels_path, "rb") as f:
    y_valid = list(parse_idx(f))

# this list is taken from the README of the Fashion-MNIST repository
# https://github.com/zalandoresearch/fashion-mnist
index_to_label = {
    0: "T-shirt/top",
    1: "Trouser",
    2: "Pullover",
    3: "Dress",
    4: "Coat",
    5: "Sandal",
    6: "Shirt",
    7: "Sneaker",
    8: "Bag",
    9: "Ankle boot",
}

x_train, y_train, x_valid, y_valid = map(
    tensor, (x_train, y_train, x_valid, y_valid)
)
# train_imgs = x_train.reshape((60000,28,28))
train_imgs = x_train.reshape((-1, 28, 28))

# What is matrix multiplication?

Before we get too far down the road with implementing this, it's worth just
revising what's happening with matrix multiplication. I rewatched [part of the
Khan Academy introduction to this topic](https://youtu.be/kT4Mp9EdVqs) and I was
really glad that I did because it mentions that the way we do matrix
multiplication is a convention that's agreed upon by mathematicians and those
who've used it over time. Until now, I never really understood why we did it
this way; I'm sure there's a good reason, but without this context that it's
just something that's been agreed upon, I was always a bit confused.

A very simple matrix multiplication may be represented as follows, and I hope
it's clear in the stages what's going on here. Note that in Python this would be
represented as arrays of arrays, but I'm using the mathematical notation here
following how it's demonstrated in the video.

![Basic matrix multiplication](./images/part2-lesson11/basic-matmul-hand-drawn.png)

(1) shows the two matrices that we're going to multiply together. We're starting
with some really simple 2x2 matrices to make it clear what's happening.
(2) shows the calculations that are going on underneath when we 'do matrix
multiplication'. Again, this is just a convention that's been agreed upon by the
people that do matrix multiplication and it doesn't necessarily need to make
sense at the moment in terms of why they chose to do it this way.
(3) shows the final result from the calculation.

Note that there is [this website](http://matrixmultiplication.xyz) that also
attempts to show what's happening, but I never found it particularly helpful for
me to understand, except the part where we visualise the second matrix being
rotated on its side. YMMV.

Also, if you want to do this using APL, it's pretty easy to do these kinds of
multiplication, with a notation that is very compact:

```
    M ← 2 2 ⍴ 2 3 6 9
    N ← 2 2 ⍴ 2 1 3 6

    M +.× N
13 20
39 60
```

We first assign the matrices to variables `M` and `N`, and then we do the
multiplication which gives us the same result as when we did it by hand.

To do the same in Python takes a bit more effort. I'll show it with arrays first.

In [2]:
M = [[2, 3], [6, 9]]
N = [[2, 1], [3, 6]]

def matmul_array(a1, a2):
    result = [[0, 0], [0, 0]]
    for i in range(len(a1)):
        for j in range(len(a2[0])):
            for k in range(len(a2)):
                result[i][j] += a1[i][k] * a2[k][j]
    return result

matmul_array(M, N)

[[13, 20], [39, 60]]

In [3]:
# to benchmark this against other implementations
%time _=matmul_array(M, N)

CPU times: user 9 µs, sys: 1e+03 ns, total: 10 µs
Wall time: 11 µs


We can do the same thing using PyTorch, since we're allowed to use it as per the
rules of the game here:

In [4]:
M = tensor([[2, 3], [6, 9]])
N = tensor([[2, 1], [3, 6]])

def matmul_tensor(a1, a2):
    result = torch.zeros(len(a1), len(a2[0]))
    for i in range(len(a1)):
        for j in range(len(a2[0])):
            for k in range(len(a2)):
                result[i][j] += a1[i][k] * a2[k][j]
    return result

In [5]:
matmul_tensor(M, N)

tensor([[13., 20.],
        [39., 60.]])

In [6]:
# to benchmark this against other implementations
%time _=matmul_tensor(M, N)

CPU times: user 1.13 ms, sys: 930 µs, total: 2.06 ms
Wall time: 1.31 ms


It is considerably slower using PyTorch, which I suppose is the penalty you pay
immediately when doing things with an imported library rather than something
built-in and using one of the core Python primitives (i.e. arrays).

To demonstrate this one final time, we can do some multiplication using the
Fashion-MNIST data we loaded in during the previous post and class. To simulate
what happens early on when trying to fit our model to our data, we'll multiply
a subset of our data by a matrix of random digits (which is what we begin with
when we start the process).

In [7]:
# 784 is our 28x28 matrix flattened out, and 10 is the number of classes
weights = torch.randn(784, 10)

# we'll take 5 images
imgs_subset = x_train[:5]

# just to confirm the shapes of these tensors
imgs_subset.shape, weights.shape

(torch.Size([5, 784]), torch.Size([784, 10]))

In [8]:
%time _=matmul_tensor(weights, imgs_subset)

CPU times: user 42.5 s, sys: 87.2 ms, total: 42.5 s
Wall time: 44.6 s


I'm just running this on my CPU, so it's going to take a while to run, but
nevertheless it's a good demonstration of just how slow this way of running
things is. It takes over 40 seconds to run the multiplication for just this
small subset of 5 images.

# Speed things up with Numba

Using numba is a way to speed up Python code by pre-compiling it to machine
code. The first time you run your numba-optimised code, it runs just at normal
speed, but then when you run it again you'll see the speed improvement.

It runs with numpy arrays, so we can use the same code as before, but using
those arrays instead of tensors.

In [9]:
from numba import njit

@njit
def dot_product(a, b):
    result = 0.
    for i in range(len(a)):
        result += a[i] * b[i]
    return result

def matmul_numba(a1, a2):
    result = torch.zeros(len(a1), len(a2[0]))
    for i in range(len(a1)):
        for j in range(len(a2[0])):
            result[i][j] = dot_product(a1[i,:], a2[:, j])
    return result

In [10]:
# 784 is our 28x28 matrix flattened out, and 10 is the number of classes
weights_np = torch.randn(784, 10).numpy()

# we'll take 5 images
imgs_subset_np = x_train[:5].numpy()

In [11]:
# first time
%time _=matmul_numba(weights_np, imgs_subset_np)

CPU times: user 3.91 s, sys: 39.5 ms, total: 3.95 s
Wall time: 4.02 s


In [12]:
# second time
%time _=matmul_numba(weights_np, imgs_subset_np)

CPU times: user 3.57 s, sys: 7.74 ms, total: 3.58 s
Wall time: 3.58 s


Interestingly, this doesn't actually bring much of a speed increase on my
machine, but I do see if it I am multiplying a trivial matrix of 3x1:

```python
%time dot_product(array([1.,2,3]),array([2.,3,4]))
```

This brings a speed up from a few hundred miliseconds to a few tens of
microseconds. My hypothesis is that the memory constraints of the bigger
calculation become the bottleneck at that point and thus there's nothing that
numba can do to help with that.

# Elementwise operations

We can also do elementwise operations on tensors. In the lecture we see this
easily demonstrated in APL, but we can do the same thing in Python. The
advantage of doing it in a language like APL is that it's very compact and you
can spend more time thinking about what you're actually trying to do instead of
writing boilerplate code.

For our matrix multiplication, we can replace our innermost loop (in which we
multiply all the pairs of values) with an elementwise multiplication. With this
replacement, we just use the corresponding code:

In [13]:
def matmul_elementwise(a, b):
    result = torch.zeros(len(a), len(b[0]))
    for i in range(len(a)):
        for j in range(len(b[0])):
            result[i, j] = (a[i, :] * b[:, j]).sum()  # elementwise
    return result

When trying to run our `matmul_elementwise` in a notebook on the Fashion-MNIST data, I get an error
at this point, because our two tensors do not have the right dimensions to work
together in this reformulation:

```
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Cell In [21], line 1
----> 1 matmul_elementwise(weights, imgs_subset)

Cell In [20], line 5, in matmul_elementwise(a, b)
      3 for i in range(len(a)):
      4     for j in range(len(b[0])):
----> 5         result[i, j] = (a[i, :] * b[:, j]).sum(dim=0)  # elementwise
      6 return result

RuntimeError: The size of tensor a (10) must match the size of tensor b (5) at non-singleton dimension 0
```

You can run it on matrices with shapes that work together for this operation. For example:

In [14]:
m1 = tensor([[1., 2, 3], [4, 5, 6]])
m2 = tensor([[7., 8], [9, 10], [11, 12]])

matmul_elementwise(m1, m2)

tensor([[ 58.,  64.],
        [139., 154.]])

But what if they don't match? What if you want to multiply a vector by a matrix,
where you want to simply repeat the vector values repeated over and over for the
number of rows in the matrix? For that we'll need to learn about broadcasting.

# Broadcasting to fill in the blanks

Broadcasting is what you use when you have two tensors that don't have the same
shape. Not only do you want this mismatched pair to work together (i.e. be
multiplied or whatever you are trying to achieve), but you want to do it in a
way that is as efficient as possible. Broadcasting is how we do this, where a
sort of ghost copy of whatever we specify is 'broadcast' over the other tensor
to make it work.

The easiest way to see this in action (as Jeremy demonstrates in the lecture) is
to specify something once that you want to be applied across the whole vector.
Here, we want to get a tensor of the same shape as `e`, but that contains
boolean values depending on whether the individual values are less than 2 or not.

In [15]:
e = tensor([1., 2, 3])
e < 2

tensor([ True, False, False])

We didn't need to specify that the `e < 2` calculation needed to be applied to
every element, but rather it was broadcast across the values in the vector, as
if what we were passing in for the comparison was `tensor([2., 2., 2.,])` so
that it could make the comparison.

When broadcasting happens, the tensor with the smaller number of dimensions is
not literally copied multiple times in memory, but rather the comparison is
handled without needing to do that. This is what makes broadcasting so
efficient.

We don't need to confine ourselves to simple examples like this, however.
Broadcasting works across multiple dimensions, such that we could, for example,
multiply a matrix by a vector, where the vector is broadcast across the rows of
the matrix.

If you want to see what's going on under the hood, you can use the helpful `expand_as` method to see what the tensor looks like after it's been broadcast.

In [16]:
v = tensor([10., 20, 30])
m = tensor([[1., 2, 3], [4, 5, 6]])

v.expand_as(m)

tensor([[10., 20., 30.],
        [10., 20., 30.]])

Here you can see that the vector has been broadcast across the rows of the
matrix `m`. This is what allows us to do elementwise multiplication of the two
tensors as in the following example:

In [17]:
v + m

tensor([[11., 22., 33.],
        [14., 25., 36.]])

To learn more about `expand_as` we can take a look at the [PyTorch
documentation](https://pytorch.org/docs/stable/generated/torch.Tensor.expand.html#torch.Tensor.expand).
It includes the useful warning that inplace operations using these
ghost/broadcast 'copies' of values can lead to unexpected results and thus it's
not encouraged.

To see what is being stored in memory when we broadcast, we can use the
`storage` method. Here you see that the vector is not actually being copied but
rather we're just getting a reference to the same memory location for those
three values.

In [18]:
v.expand_as(m).storage()

 10.0
 20.0
 30.0
[torch.storage._TypedStorage(dtype=torch.float32, device=cpu) of size 3]

All of this is enabled using something called a 'stride'. A stride is the number
of elements that you need to skip in order to get to the next element in a
particular dimension. For example, with a stride of 1 for an array then each
element is followed one by another.

For a matrix, as in the example given by the PyTorch documentation, the stride
tuple value tells us how many elements are in each row and then how to iterate
through the elements in the rows (i.e. one by one):

In [19]:
x = torch.tensor([[1, 2, 3, 4, 5], [6, 7, 8, 9, 10]])
x.stride()

(5, 1)

It seems that what is happening is that the underlying data structure (i.e. the
things you see when you call `storage`) is an array (or an array-like object),
and that `stride` is used to construct how the array is interpreted as a
tensor. The docs explain that `stride` is used in indexing operations, so if you
index into your tensor, then it uses the values in `stride` to figure out what
to pull out from the tensor. (If you want to learn more about `stride` and
`storage`, check out [this useful blogpost](https://zhang-yang.medium.com/explain-pytorch-tensor-stride-and-tensor-storage-with-code-examples-50e637f1076d) with examples.)

Bringing it back to broadcasting, `stride` is the trick used to make it so that
it seems like the vector (or whatever) is being copied across the rows of the
matrix, but in reality it's just a reference to the same memory location. So
what happens is that the `stride` is set to 0 for the dimension that is being broadcast.

In [20]:
# a stride of zero meaning that it'll be broadcast
v.expand_as(m).stride()

(0, 1)

# Unsqueeze yourself

We can use unsqueeze to add a dimension to a tensor. For example, if we have a
scalar, we can turn it into a vector by adding a dimension to it.

In [21]:
tensor(1).unsqueeze(0), tensor([1, 2, 3]).unsqueeze(0)

(tensor([1]), tensor([[1, 2, 3]]))

There we were adding a dimension to a scalar, and to a vector. Alternatively, we
can use the following notation to achieve the same effect:

In [22]:
tensor([1, 2, 3])[None, :]
tensor([1, 2, 3])[None] # same as above

tensor([[1, 2, 3]])

Here, we are indexing into our tensor using the special value `None`, which (if
you do this) means that a new dimension gets added to the tensor. Note that you
could do this more than one time and then you'll get more dimensions added. This
is equivalent to `tensor([1, 2, 3]).unsqueeze(0).unsqueeze(0)`.

In [23]:
tensor([1, 2, 3])[None, None, :]

tensor([[[1, 2, 3]]])

If we want to do this by column instead of by row, we can add the new dimension
across the columns as well by either passing in the dimension into `unsqueeze`
or by using the `None` notation but indexing into the columns instead of the
rows.

In [24]:
tensor([1, 2, 3])[:, None]
tensor([1, 2, 3]).unsqueeze(1) # these are the same

tensor([[1],
        [2],
        [3]])

If you have multiple dimensions and you want to use the `None` notation, you can
use `...` to refer to the dimensions that you don't want to specify. For example:

In [25]:
tensor([[[1, 2, 3]]])[..., None]

tensor([[[[1],
          [2],
          [3]]]])

The key thing to remember is that broadcasting is handling a lot of the heavy
lifting here. You don't need to manually 'expand' vectors so that they match the
matrices that you're trying to use them with. This happens automagically and
PyTorch (or numpy) handles all that for you.

# Broadcasting for matrix multiplication

We now have all the pieces we need to use broadcasting in our function to do
matrix multiplication. We just redefine the function to use broadcasting and
then we can use it on any two matrices that are compatible for multiplication.
(In the lesson, Jeremy goes through the rules of what makes for a compatible
tensor, but for now we just say that the dimensions should either be equal or
one of the dimensions should be 1 for broadcasting to work / apply. We start at
the trailing (i.e. last) dimension and work our way backwards.)

In [26]:
def matmul_broadcast(a, b):
    result = torch.zeros(len(a), len(b[0]))
    for i in range(len(a)):
        result[i] = (a[i, :, None] * b).sum(dim=0)
    return result

Note that we have managed to reduce the number of loops down to a single one. We
expand `a` so that it is the same shape as `b`, and then we can do elementwise
multiplication and summing over the columns. This current implementation works
for the case that we're trying to implement, but I'm not sure if it would work
for higher dimensions. (Question: does matrix multiplication even make sense at
higher dimensions? Are there conventions around that?)

In [27]:
m1 = tensor([[1, 2, 3], [4, 5, 6]])
m2 = tensor([[7, 8], [9, 10], [11, 12]])
matmul_broadcast(m1, m2)

tensor([[ 58.,  64.],
        [139., 154.]])

In the case of our images, we have a batch of 60,000 images, each of which is
28x28 (flattened down to a vector of 784 individual pixel values). Our weights
matrix is 784x10, because we have 10 classes that we're trying to predict and
for each class we have a set of weights for each of the 784 pixels. (Think of
the 'ideal' image for each class as what those weights would be.)

So in our matrix multiplication over the full set, we loop over each image of
the 60,000, then for each one we will do a matrix multiplication where we have
784x10 so that they work together. These values are then summed up as per the
convention for this kind of calculation. In the end, we get a tensor of 60000x10
which is what we were hoping to achieve.

In [28]:
%time _=matmul_broadcast(x_train, weights)

CPU times: user 1.4 s, sys: 6.45 ms, total: 1.41 s
Wall time: 1.41 s


In [29]:
x_train.shape, weights.shape

(torch.Size([60000, 784]), torch.Size([784, 10]))

In [30]:
matmul_broadcast(x_train, weights).shape

torch.Size([60000, 10])

Broadcasting is, as Jeremy emphasised in the class, a really important concept
and technique used all the time in what we're doing, so it pays to be familiar
with it. The [numpy docs explainer on
broadcasting](https://numpy.org/doc/stable/user/basics.broadcasting.html) is
really great, with some nice illustrations and examples, so I'd recommend
checking that out to learn more.

# Definitions

- Dot product: this is the sum of the products of the paired elements of two
  vectors or arrays. It is sometimes known as the scalar product.
- Frobenius norm: this is the square root of the sum of the squares of the
  elements of a matrix. It is sometimes known as the Euclidean norm.
- Outer product: this is the product of two vectors, where the result is a
  matrix. It is sometimes known as the tensor product. So if you have two
  tensors with shapes `(3,)` and `(4,)` then the outer product will be a tensor
  with shape `(3, 4)`.
- Scalar: a single number, as opposed to a vector or matrix.
- Stride: the number of elements that you need to skip in order to get to the
  next element in a particular dimension.
- Trailing dimension: the last dimension of a tensor.

# "Things Jeremy says to do"

I thought I'd gather some of the core 'things Jeremy says to do' comments from
the video lecture for this part of the lecture.

- develop your intuition around concepts and code in the notebook, but then once
  you've reached a point where you have something working, copy the code cells above
  and turn it (minus the comments and markdown) into a function. ([link](https://youtu.be/Tf-8F5q8Xww?t=4039))
    - keep the stuff above it, though, so you can how you reached that final
      point
- if you're refactoring your code, make sure that the new / refactored code that
  you write is giving the same results as the slower / pre-refactoring code.
  Jeremy uses the `test_close` function from `fastcore.test` to do this.
  ([link](https://youtu.be/Tf-8F5q8Xww?t=4292))
- if you're new to something, experiment with it in a notebook to cement your understanding