## PyTorch

## PyTorch

[PyTorch](https://pytorch.org/) is the open-source machine learning framework that we'll be using in this class. It has a wide range of functionality, but today we'll just get started with some of its very basic array-processing functionality.

### Dot Products

Recall that we can make a line by an expression like `y = w*x + b`. (Some of you may remember *mx+b* , but we'll use *w* for the *weight(s)* instead.)

That's a multiplication followed by a sum. We can extend that to lots of *x*'s, each of which needs a corresponding *w*:

`y = w1*x1 + w2*x2 + ... + wN*xN + b`

For simplicity, let's start by ignoring the `b`ias.  So we're left with

`y = w1*x1 + w2*x2 + ... + wN*xN`

that is, multiply each number in `w` by its corresponding number in `x` and add up the results: `sum(w[i] * x[i] for i in range(N))`.

The result is called a *dot product*, and is one of the fundamental operations in linear algebra. At this point you don't need to understand all the linear algebra part of this, we're just implementing a common calculation.

Let's do that in Python, and then Torch. To start, let's make a `w`eights and an `x`. (See the notebook.) Note that the shapes must match.

#### `for` loop approach

{{% task %}}
**Write a function that uses a `for` loop** to compute the dot product of `w` and `x`. Name the function `dot_loop`. Check that you get `-1.0` for the `w` and `x` provided in the template.
{{% /task %}}

#### Torch Elementwise Operations

But that's a lot of typing for a concept that we're going to use very frequently. To shorten it (and make it run way faster too!), we'll start taking advantage of some of Torch's builtin functionality.

First, we'll learn about *elementwise operations* (called *pointwise operations* in the [PyTorch docs](https://pytorch.org/docs/stable/torch.html#pointwise-ops)).

If you try to `*` two Python lists together, you get a `TypeError` (how do you multiply lists??). But in PyTorch (and NumPy, which it's heavily based on), array operations happen *element-by-element* (sometimes called *elementwise*): to multiply two tensors that have the same shape, multiply each number in the first tensor with the corresponding number of the second tensor. The result is a new tensor of the same shape with all the elementwise products.

{{% task %}}Try running `w * x`{{% /task %}}

Torch also provides [*reduction* methods](https://pytorch.org/docs/stable/torch.html#reduction-ops), so named because they *reduce* the number of elements in a Tensor.

One really useful reduction op is `.sum`.

{{% task %}}Try running `w.sum()`.{{% /task %}}

> You can also write that as `torch.sum(w)`.

{{% task %}}Now **make a new version of `dot_loop`, called `dot_ops`**, that uses an elementwise op to multiply corresponding numbers and a reduction op to sum the result. Check that the result is the same.{{% /task %}}

Finally, since `dot` is such an important operation, PyTorch provides it directly:

```python
torch.dot(w, x)
```

Python recently introduced a "matmul operator", `@`, that does the same thing.

```python
w @ x
```

To apply this knowledge, let's try writing a slightly more complex function: a linear transformation layer.

## Linear Layer

The most basic component of a neural network (and many other machine learning methods) is a *linear transformation layer*. Going back to our `y = w*x + b` example, the `w*x + b` is the linear transformation: given an `x`, dot it with some `w`eights and add a `b`ias.

{{% task %}}
**Write a function that performs a linear transformation of a vector `x`.** Use PyTorch's built-in functionality for dot products.
{{% /task %}}

### Linear layer, Module-style

Notice that `linear`'s job is to transform `x`, but it needed 3 parameters, not just 1. It would be convenient to view the `linear` function as simply a function of `x`, with `weights` and `bias` being internal details.

One way to do this is to make a `Linear` class that has these as parameters. Fill in the blanks in the template code to do this.

## Mean Squared Error

Now let's apply what you just learned about elementwise operations on PyTorch tensors to another very common building block in machine learning: measuring *error*.

Once we make some predictions, we usually want to be able to measure how *good* the predictions were. For regression tasks, i.e., tasks where we're predicting *numbers*, one very common measure is the *mean squared error*. Here's an algorithm to compute it:

- compute `resid` as true (`y_true`) minus predicted (`y_pred`).
- compute `squared_error` by squaring each number in `resid`
- compute `mean_squared_error` by taking the `mean` of `squared_error`.

> **Technical note**: This process implements the mean squared error *loss function*. That is a function that is given some *true* values (call them `$y_1$` through `$y_n$`) and some *predicted* values (call them `$\hat{y}_1$` through `$\hat{y}_n$`) and returns `$$\text{MSE} = \frac{1}{n}\sum_{i=1}^{n}(y_i - \hat{y}_i)^2.$$`

Generally you'd get the predicted values, `y_pred`, by calling a function that implements a model (like `linear.forward()` above. But to focus our attention on the error computation, we've provided sample values for `y_true` and `y_pred` in the template that you can just use as-is.

{{% task %}}

1. Implement each line of the above algorithm in PyTorch code.
    - Use separate cells so you can check the results along the way. For example, the first cell should have two lines, the first to assign (`resid = ...`) and the second to show the result (`resid`).
    - **You should not need to write any loops.**
    - Try using both `squared_error.mean()` and `torch.mean(squared_error)`.
2. Now, write the entire computation in a single succinct expression (i.e., without having to create intermediate variables for `resid` and `squared_error`). Check that you get the same result.

{{% /task %}}

Notes:

- Recall that Python's exponentiation operator is `**`.
- PyTorch tensors also have a `.pow()` method. So you might see `.pow(2)`.



### Dot products

Recall that we can make a line by an expression like `y = w*x + b`. (Some of you may remember *mx+b* , but we'll use *w* for the *weight(s)* instead.)

That's a multiplication followed by a sum. We can extend that to lots of *x*'s, each of which needs a corresponding *w*:

`y = w1*x1 + w2*x2 + ... + wN*xN + b`

For simplicity, let's start by ignoring the `b`ias.  So we're left with

`y = w1*x1 + w2*x2 + ... + wN*xN`

that is, multiply each number in `w` by its corresponding number in `x` and add up the results: `sum(w[i] * x[i] for i in range(N))`.

The result is called a *dot product*, and is one of the fundamental operations in linear algebra. At this point you don't need to understand all the linear algebra part of this, we're just implementing a common calculation.

Let's do that in Python, and then Torch. To start, let's make a `w`eights and an `x`.

In [None]:
w = tensor([-2.0, 1.0])
w

In [None]:
x = tensor([1.5, 2.0])
x

The shapes of `w` and `x` must match.

In [None]:
N = len(w)
assert N == len(x)

#### `for` loop approach

In [None]:
def dot_loop(w, x):
    return 0.0 # FIXME
dot_loop(w, x)

#### Torch Elementwise Operations

In [None]:
def dot_ops(w, x):
    return 0.0
dot_ops(w, x)

## Linear layer

In [None]:
def linear(weights, bias, x):
    return 0.0 # FIXME
linear(w, 1.0, x)

### Linear layer, Module-style

In [None]:
class Linear:
    def __init__(self, weights, bias):
        self.weights = ...
        self.bias = ...
        
    def forward(self, x):
        return ...

layer = Linear(weights=w, bias=1.0)
layer.forward(x)

## Mean Squared Error

In [None]:
y_true = tensor([3.14, 1.59, 2.65])
y_pred = tensor([2.71, 8.28, 1.83])