<div class="alert block alert-info alert">

# <center>PyTorch</center>

A popular machine learning library.

- Major focus: a class called Tensor (torch.Tensor)
    - stores and operates on homogeneous multidimensional rectangular arrays of numbers
    - similar to NumPy Arrays, but can also be operated on using GPUs

**Sources**:
- https://pytorch.org
- https://pytorch.org/tutorials/beginner/introyt/autogradyt_tutorial.html

#### Citation:

Paszke, Adam, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen et al. "Pytorch: An imperative style, high-performance deep learning library." Advances in neural information processing systems 32 (2019).

<br><br>

@inproceedings{pytorch,

 author = {Paszke, Adam and Gross, Sam and Massa, Francisco and Lerer, Adam and Bradbury, James and Chanan, Gregory and Killeen, Trevor and Lin, Zeming and Gimelshein, Natalia and Antiga, Luca and Desmaison, Alban and Kopf, Andreas and Yang, Edward and DeVito, Zachary and Raison, Martin and Tejani, Alykhan and Chilamkurthy, Sasank and Steiner, Benoit and Fang, Lu and Bai, Junjie and Chintala, Soumith},
 
 booktitle = {Advances in Neural Information Processing Systems},
 
 editor = {H. Wallach and H. Larochelle and A. Beygelzimer and F. d\textquotesingle Alch\'{e}-Buc and E. Fox and R. Garnett},
 pages = {},
 
 publisher = {Curran Associates, Inc.},
 
 title = {PyTorch: An Imperative Style, High-Performance Deep Learning Library},
 
 url = {https://proceedings.neurips.cc/paper_files/paper/2019/file/bdbca288fee7f92f2bfa9f7012727740-Paper.pdf},
 
 volume = {32},
 
 year = {2019}
 
}

<hr style="border:2px solid gray"></hr>

In [None]:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import torch

<hr style="border:2px solid gray"></hr>

## Initializing a PyTorch Tensor - different approaches
#### 1. From a <font color='DodgerBlue'>NumPy array</font>

In [None]:
data_array_np = np.array([[1, 2], [3, 4]])

display(data_array_np)

data_array_np.shape

PyTorch tensor:

In [None]:
generic_tensor_torch = torch.from_numpy(data_array_np)

generic_tensor_torch

#### 2. Based on an <font color='DodgerBlue'>existing tensor</font>
- <font color='DodgerBlue'>retains</font> the <font color='DodgerBlue'>shape</font> (e.g., the `generic_tensor_torch` given above)
- replaces the existing values with others, for example
    - 1's via `ones_like`
    - random values via `rand_like`

In [None]:
torch.ones_like(generic_tensor_torch) 

In [None]:
torch.rand_like(generic_tensor_torch, dtype=torch.float)

#### 3. Based on a <font color='DodgerBlue'>specific shape</font>

<font color='DodgerBlue'>2-dimensional</font> tensor:

In [None]:
my_shape = (2, 3)

rand_tensor = torch.rand(my_shape)

rand_tensor

<font color='DodgerBlue'>3-dimensional</font> tensor:

Explanation of the tensor's shape

- 2: The number of matrices (the "stacks" or "batches").
- 3: The number of rows in each matrix.
- 5: The number of columns in each matrix.

In [None]:
my_shape = (2, 3, 5)

three_dim_tensor = torch.rand(my_shape)

three_dim_tensor

<hr style="border:1px solid gray"></hr>

## Book keeping
- `tensor.shape`
- `tensor.dtype`
- `tensor.device`

In [None]:
print(f'Shape of tensor: {three_dim_tensor.shape}')
print(f'Datatype of tensor: {three_dim_tensor.dtype}')
print(f'Device tensor is stored on: {three_dim_tensor.device}')

<hr style="border:1px solid gray"></hr>

## Tensor operations
- https://pytorch.org/docs/stable/torch.html

#### 1. Joining tensors via **concatenation**: `torch.cat`
- https://pytorch.org/docs/stable/generated/torch.cat.html#torch.cat

Combine <font color='DodgerBlue'>1 tensor</font>, <font color='DodgerBlue'>3 times</font>
- `dim=0`: conceptually like adding more **rows** (same as NumPy)
- `dim=1`: conceptually like adding more **columns** (same as NumPy)

**Example 1**: As <font color='DodgerBlue'>rows</font>

In [None]:
torch.cat([rand_tensor, rand_tensor, rand_tensor], dim=0)

View **tensor** as a **Pandas' `DataFrame`** (same idea as we did with NumPy):

In [None]:
pd.DataFrame(torch.cat([rand_tensor, rand_tensor, rand_tensor], dim=0))

**Example 2**: As <font color='DodgerBlue'>columns</font>

In [None]:
torch.cat([rand_tensor, rand_tensor, rand_tensor], dim=1)

In [None]:
pd.DataFrame(torch.cat([rand_tensor, rand_tensor, rand_tensor], dim=1))

#### 2. Pairwise multiplication
- **`torch.mul`**
    - https://docs.pytorch.org/docs/stable/generated/torch.mul.html

Multiply `rand_tensor` * `rand_tensor`:

In [None]:
torch.mul(rand_tensor, rand_tensor)

- **Chaining methods**

However, you could also do the following
- a statement that <font color='DodgerBlue'>chains</font> the math together:

In [None]:
rand_tensor.mul(rand_tensor)

This makes reading <font color='DodgerBlue'>multpile operations easier</font>, for example:

In [None]:
rand_tensor.abs().mul(rand_tensor).sqrt()

For comparison, the equivalent using `torch.?` type commands (<font color='DodgerBlue'>harder to read</font>) would be:

In [None]:
torch.sqrt(torch.mul(torch.abs(rand_tensor), rand_tensor))

- **Operator (`*`)**

In [None]:
rand_tensor*rand_tensor

One can also multiply individual tensor elements:

In [None]:
rand_tensor[0][0] # Accessing specific values

In [None]:
rand_tensor[0][0]*rand_tensor[0][0]

START HERE

## Using GPU

Moving to GPU: <font color='DodgerBlue'>move a **specified** tensor</font> from the <font color='DodgerBlue'>**CPU**</font> to the <font color='DodgerBlue'>**GPU**</font>.
 
(If youâ€™re using Colab, allocate a GPU by going to Edit > Notebook Settings.)

Let's move our `rand_tensor` to GPU if possible:

In [None]:
if torch.cuda.is_available():
    print('CUDA is available - congrats!')
    tensor = rand_tensor.to('cuda')
    display(tensor.device)

    print('\nExample GPU calculation - adding the tensor to itself:')
    result_tensor = tensor + tensor
    display(result_tensor)
else:
    print('CUDA is not available - bummer.')

#### Summary
1. `Torch` has a lot of <font color='DodgerBlue'>similarity</font> to `NumPy`
2. However, it can be used on a **GPU**
3. Some special "things" that make it unique
    - <font color='dodgerblue'>history accumulation</font> (coming below)

<hr style="border:2px solid gray"></hr>

<div class="alert block alert-info alert">

# <center>Neural Networks (NNs)</center>

## Background:
- https://en.wikipedia.org/wiki/Neural_network_(machine_learning)

<font color='dodgerblue'>"A machine learning model is a **function**, with **inputs** and **outputs**." [1]</font>

- a **collection of <font color='dodgerblue'>nested functions</font>**
    - these functions are **executed on input data**
    - these functions are defined by parameters (stored in PyTorch tensors)
        - **weights**, and 
        - **biases**

<br>

What is the difference between **weights** and **bias** values?

If we have a **linear <font color='DodgerBlue'>"activation function"</font>**: $\mathbf{y = mx + b}$
- **weight** (i.e., the weight for x): <font color='DodgerBlue'>$\mathbf{m}$</font>
    - for polynomial functions (e.g., linear equation), the **weights** are the **coefficients** (see "Extra Information" for a nonlinear example)
<br>
- **bias** (i.e., the equation's bias): <font color='dodgerblue'>$\mathbf{b}$</font>
    - <font color='DodgerBlue'>offset</font> - a constant term (consider the phase shifts we covered in SciPy lecture)

**What is an <font color='DodgerBlue'>activation function</font>**
- A function (surprise ðŸ™‚)
- Calculates a **node's output** (i.e., the <font color='DodgerBlue'>information</font>) that is **passed** to the **next node**, using *its* specific input parameters (i.e., weights and bias)

<br>

#### Additional Resources

##### Great Tutorial Series: **3Blue1Brown**
- https://youtube.com/playlist?list=PLZHQObOWTQDNU6R1_67000Dx_ZCJB-3pi&feature=shared
    - DeepLearning Chapter 1 (DL1): "What is a NN"
    - DL2: "Gradient descent, how neural networks learn"
    - DL3: "Backpropagation, step-by-step" (meta-level)
    - DL4: "Backpropagation calculus" (the mathematics/formulas)

##### TensorFlow Playground
- (For later)
- https://playground.tensorflow.org

**Sources**:
1. https://pytorch.org/tutorials/beginner/introyt/autogradyt_tutorial.html

<figure>
  <center>
    <img src="00_images/31_machine_learning/deep_neural_network.png" style="width: 800px; margin: 0 40px;"/>
    <figcaption style="margin-top: 10px; color: black; font-style: italic;">
          <b>Figure 1</b>: Example of a Neural Network.<br>
        <b>Image Source</b>: https://www.studytonight.com/post/understanding-deep-learning.
    </figcaption>
  </center>
</figure>

<div class="alert alert-block alert-warning">
<hr style="border:1.5px dashed gray"></hr>

## Extra Information

### Sigmoid Activation Functions
- What might a weighting factor (i.e., $w$) look like

Sigmoid Equation:
\begin{equation}
\huge y(x) = \frac{1}{1 + \exp^{w*x}}
\end{equation}

In [None]:
def weighted_sigmoid(x_data: np.array, weight: float):
    ''' Simple Sigmoid function with a weight.

        Args:
            x_data: independent data
            weight: weighting factor that adjusts the equation's shape

        Return
            results: calculated results
    '''

    result = 1/(1 + np.exp(weight*-x_data))
    return result

<div class="alert alert-block alert-warning">

Plot a sigmoid equation using different weights:

In [None]:
weight_list = [-1, 1, 5]

x_values = np.linspace(-5, 5)

fig, ax = plt.subplots()
for weight in weight_list:
    y_values = weighted_sigmoid(x_data=x_values, weight=weight)

    ax.plot(x_values, y_values, label=f'{weight}')

plt.legend(loc='right', title='weights')

<div class="alert alert-block alert-warning">

So, we see that the **weighting factor** can **adjust** the **function's shape**, and consequently **adjust** the **information** being **passed** from one node to another.
    
<hr style="border:1.5px dashed gray"></hr>

## NN Training

Training is done in **2 steps**

1. **Forward Propagation**
    - <font color='DodgerBlue'>Pass data</font> (observables, weights and biases) <font color='DodgerBlue'>forward</font> to **predict target observable**
        - Compute the **activation function** that connects each node
    - (Basically, running the model as it will be eventually used.)

<br>

2. **Backward Propagation**
    - <font color='DodgerBlue'>Adjusts parameters</font> proportionally to the <font color='DodgerBlue'>observable error</font> (i.e., the forward propagation's output/result vs. the target)
    - Done by
        - traversing backwards from the output (i.e., looping back the first layer),
        - collect the **derivatives** (i.e., <font color='DodgerBlue'>gradients</font>) of the error with respect to the functions' parameters,
        - **optimizes** the parameters using <font color='DodgerBlue'>**gradient descent**</font> method.

**`torch.autograd`** is the workhorse for <font color='DodgerBlue'>**backward propagation**</font>.

**Source**: https://pytorch.org/tutorials/beginner/introyt/autogradyt_tutorial.html

### Tensor History

A key concept in PyTorch is that it's <font color='dodgerblue'>tensor objects</font> can have a <font color='dodgerblue'>**"history"**</font> (i.e., **new data** that is **attached** to the object **after** it is used in a **calculation**).

For example, using `torch.rand`:
- `torch.rand(*size, *, generator=None, out=None, dtype=None, layout=torch.strided, device=None,` **`requires_grad=False,`**` pin_memory=False) â†’ Tensor`

- **`requires_grad=True`**: for every computation that follows, **`autograd`** will <font color='dodgerblue'>record the computation's history</font> in the output tensors

<br>

#### Example: A <font color='DodgerBlue'>Single Layer Perceptron</font> (SLP)
- The simplest type of artificial NN
    -  classify binary categories

- https://en.wikipedia.org/wiki/Perceptron

<figure>
  <center>
    <img src="00_images/31_machine_learning/perceptron_english.png" style="width: 500px; margin: 0 40px;"/>
    <figcaption style="margin-top: 10px; color: black; font-style: italic;">
          <b>Figure 2</b>: An illustration of a single layer perceptron.<br>
        <b>Image Source</b>: https://commons.wikimedia.org/wiki/File:ArtificialNeuronModel_english.png.
    </figcaption>
  </center>
</figure>


<br>

Our **example workflow** will be:

1. **Create input data** (e.g., independent x data)
    - `requires_grad=True`
2. **Create** a simple **model** based on an equation: $\mathbf{y = a*sin(b*x)}$
    - the activation function 
4. **Forward Propagation**
5. **Backwards Propagation** (i.e., compute & collect first derivatives)

<br>

#### 1. Create input data
   - Everything is weighted equally (i.e., <font color='DodgerBlue'>no weights</font>)

In [None]:
input_data = torch.linspace(0.0, 2.0*torch.pi, steps=25, requires_grad=True)

input_data

Notice the `requires_grad=True` in the output.

#### 2. Create a simple model based on an **activation equation**
- Equation: $\mathbf{y = a*sin(b*x)}$
    - <font color='DodgerBlue'>Hyperparameters</font>
        - $\mathbf{a}$: amplitude
        - $\mathbf{b}$: period/frequency (a.k.a. wavenumber)

In [None]:
def sine_activation_func(x: float, a: float, b: float) -> float:
    ''' Sine function for use in PyTorch neural networks.
    ''' 
    return a * torch.sin(b * x)

#### 3. Forward Propagation

<font color='dodgerblue'>Activation Function: $\mathbf{y = a*sin(b*x)}$</font>

This is a <font color='dodgerblue'>"**Single Layer Perceptron**" NN:
- **1 input layer** (n features $\rightarrow$ n nodes),
- **1 activation function** (i.e., a transfer function that gets the data ready)
- **1 node (in 1 layer)** $\rightarrow$ output
- **Initial Guess**
    - a = 1
    - b = 1

In [None]:
sin_model = sine_activation_func(x=input_data, a=1, b=1)

sin_model

What operations just happened:
1. `torch.sin(b * x)` creates a tensor with `grad_fn=<SinBackward0>`.
2. `a * (that sine result)` creates a new tensor with `grad_fn=<MulBackward0>`.

Two things to note:
- in the tensor output above, the **`grad_fn=<MulBackward0>`** tells us that this object <font color='DodgerBlue'>**is accumulating history**</font>.
- That is the **first** output of our <font color='DodgerBlue'>**forward propogation**</font>.

<br><br>

Okay, let's compare this to what would have happened if `requires_grad=False`:

In [None]:
input_data_false = torch.linspace(0.0, 2.0*torch.pi, steps=25, requires_grad=False)

print(input_data_false)

torch.sin(input_data_false)

Notice this changes the object - **no** `grad_fn=<SinBackward0>` - thus, **no history accumulation**.
<br><br>

#### History Accumulation Demonstration

Okay, now back to our `sin_model`.


##### <font color='DodgerBlue'>Visualizing Data</font>

Let's first visualize what we are working with.

- Since <font color='dodgerblue'>history is accumulated</font> in `sin_model`, we can now use `detach()` to <font color='dodgerblue'>**grab**</font> only the tensor **values**:
    - `Tensor.detach`: Returns a new tensor, detached from the current graph (https://docs.pytorch.org/docs/stable/generated/torch.Tensor.detach.html).
    - To **plot**, we must **detach** the tensor - otherwise we obtain **errors**.

In [None]:
sin_model.detach()

**Visualize** by plotting the
- <font color='DodgerBlue'>independent</font> variable (i.e., `input_data.detach()`), and
- <font color='DodgerBlue'>dependent</font> variable (i.e., `sin_model.detach()`).

In [None]:
plt.plot(input_data.detach(), sin_model.detach())

##### <font color='DodgerBlue'>Adding to the History</font>

Let's include **an additional operation** on the `sin_model` by summing values:

- `torch.sum()`: https://pytorch.org/docs/stable/generated/torch.sum.html
- Record to a new object (i.e., `sin_sum`).

In [None]:
sin_sum = sin_model.sum()
print(sin_sum)

In [None]:
print(sin_sum.grad_fn)
print(sin_sum.grad_fn.next_functions)
print(sin_sum.grad_fn.next_functions[0][0].next_functions)
print(sin_sum.grad_fn.next_functions[0][0].next_functions[0][0].next_functions)
print(sin_sum.grad_fn.next_functions[0][0].next_functions[0][0].next_functions[0][0].next_functions) 
print(sin_sum.grad_fn.next_functions[0][0].next_functions[0][0].next_functions[0][0].next_functions[0][0].next_functions) 

We see that if we access the <font color='DodgerBlue'>"history"</font> portion of the object via `grad_fn`, we get a short explanation of what it is
- <font color='DodgerBlue'>**6 total operations**</font>, from <font color='DodgerBlue'>**most recent**</font> operation $\rightarrow$ <font color='DodgerBlue'>**oldest**</font> operation

    #6: `SumBackward0` $\rightarrow$ `sin_model.sum()`

    #5: `MulBackward0` $\rightarrow$ `b*x`
       
    #4: `SinBackward0` $\rightarrow$ `torch.sin(input_data)`
  
    #3: `MulBackward0` $\rightarrow$ `a*sin`
  
- Note: the above three operations come from the line: `sin_model = sine_activation_func(x=input_data, a=1, b=1)`
  
    #2: `AccumulateGrad` $\rightarrow$ `torch.linspace(0.0, 2.0*np.pi, steps=25, requires_grad=True)`
  
    #1: `()` $\rightarrow$ starting point

With that basic understanding in place, we can now continue to the <font color='DodgerBlue'>backward propagation</font> idea.

### Taking the first derivative of sin(x)
- a.k.a the **"gradient"**

\begin{equation}
    \Large
    \frac{d}{dx} sin(x) = cos(x)
\end{equation}

##### `Autograd`: <font color='DodgerBlue'>Computing gradients</font>

- Note: `autograd` is used when calling `backward()` function (i.e., `autograd.backwards`)
- https://pytorch.org/docs/stable/generated/torch.Tensor.backward.html

<div class="alert alert-block alert-warning">
<hr style="border:1.5px dashed gray"></hr>

## Extra Information

### `autograd.grad`

Both `autograd.grad` and `autograd.backwards` take derivatives. However, they differ concerning if they affect the accumulation history.

`autograd.grad`: compute gradients for specific tensors without affecting others (e.g., **does not** accumulate history)

Source: https://www.geeksforgeeks.org/understanding-pytorchs-autogradgrad-and-autogradbackward/

In [None]:
## Create a new object
test_sin_model = torch.sin(input_data)
test_sin_sum = test_sin_model.sum()
test_grad = torch.autograd.grad(outputs=test_sin_sum, inputs=input_data)

display(test_grad)
display(test_grad[0].sum())
# print(test_grad.grad_fn) ## demonstrates the lack of history

<div class="alert alert-block alert-warning">
<hr style="border:1.5px dashed gray"></hr>

#### 1. `backward()`: compute gradients

- will be added to the history

In [None]:
sin_sum.backward()
sin_sum

**Output explanation**:

- `1.4504e-07`: <font color='DodgerBlue'>**sum**</font> of all **elements** in the `sin_model` **tensor**
- `<SumBackward0>`: name of the PyTorch function (i.e., sum during backward propagation)

Let's verify that `1.4504e-07` is indeed the sum of the tensor elements.

- Recall that `sin_sum` was created using `sin_sum = sin_model.sum()`
- Thus, we have to go back to `sin_model`:

In [None]:
print(sin_model.sum()) 

So, everything looks okay.

<br>

- <font color='DodgerBlue'>**Uncovering what `backward()` actually did.**</font>

    - What did `backward()` do with the history?

`sin_sum.backward()` <font color='dodgerblue'>created a **`grad` property</font>** within the <font color='dodgerblue'>**original input data</font>**  (i.e., `input_data`) as part of its **accumulated history**.

(A property of an object is coming from a Python class.)

In [None]:
print(f'Input values:'
      f'\n{input_data}')

print()
print(f'Gradient values:'
      f'\n{input_data.grad}')

##### Side note: Verification of the derivative of sine

Prove to ourselves that `backwards()` computes the sin(x) derivative:

$$\Large \frac{d}{dx}sin(x) = cos(x)$$

In [None]:
first_derivative_sin = np.cos(input_data.detach().numpy())
first_derivative_sin

Visualize <font color='DodgerBlue'>PyTorch's</font> **`sin_sum.backward()`** output versus <font color='DodgerBlue'>our computed derivative</font> (i.e., **`first_derivative_sin`**):

In [None]:
plt.plot(input_data.detach(), input_data.grad.detach(), linestyle='-', linewidth=10)

plt.plot(input_data.detach(), first_derivative_sin, linestyle='--', linewidth=5)

**Conclusion**
- We confirm that the `backwards()` does indeed take the derivative of a tensor.

<font color='DodgerBlue'>**Important**</font>: What is left that is missing is the **weights** and **<font color='DodgerBlue'>bias optimization</font>** portion of the workflow.

# Take-home

1. PyTorch is similar to NumPy
    - similar functions
    - `torch` instead of `array`
    - able to use GPUs (not just CPUs)
2.  Basics of neural networks
    - forward propagation
    - backwards propagation
3. History accumulation
4. Example: A Single Layer Perceptron (SLP)

The above forms the foundation for understading how PyTorch is "typically" used in a ML project -- a future lecture.