<div class="alert block alert-info alert">

# <center>PyTorch</center>

A popular machine learning library.

- Major focus: a class called Tensor (torch.Tensor)
    - stores and operates on homogeneous multidimensional rectangular arrays of numbers
    - similar to NumPy Arrays, but can also be operated on via a CUDA-capable NVIDIA GPU

**Sources**:
- https://pytorch.org
- https://pytorch.org/tutorials/beginner/introyt/autogradyt_tutorial.html

#### Citation:

Paszke, Adam, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen et al. "Pytorch: An imperative style, high-performance deep learning library." Advances in neural information processing systems 32 (2019).

<br><br>

@inproceedings{pytorch,

 author = {Paszke, Adam and Gross, Sam and Massa, Francisco and Lerer, Adam and Bradbury, James and Chanan, Gregory and Killeen, Trevor and Lin, Zeming and Gimelshein, Natalia and Antiga, Luca and Desmaison, Alban and Kopf, Andreas and Yang, Edward and DeVito, Zachary and Raison, Martin and Tejani, Alykhan and Chilamkurthy, Sasank and Steiner, Benoit and Fang, Lu and Bai, Junjie and Chintala, Soumith},
 
 booktitle = {Advances in Neural Information Processing Systems},
 
 editor = {H. Wallach and H. Larochelle and A. Beygelzimer and F. d\textquotesingle Alch\'{e}-Buc and E. Fox and R. Garnett},
 pages = {},
 
 publisher = {Curran Associates, Inc.},
 
 title = {PyTorch: An Imperative Style, High-Performance Deep Learning Library},
 
 url = {https://proceedings.neurips.cc/paper_files/paper/2019/file/bdbca288fee7f92f2bfa9f7012727740-Paper.pdf},
 
 volume = {32},
 
 year = {2019}
 
}

<hr style="border:2px solid gray"></hr>

In [None]:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import torch

<hr style="border:2px solid gray"></hr>

## Initializing a PyTorch Tensor - different approaches
#### 1. From a NumPy array

In [None]:
data_array_np = np.array([[1, 2], [3, 4]])

display(data_array_np)

data_array_np.shape

PyTorch tensor:

In [None]:
generic_tensor_torch = torch.from_numpy(data_array_np)

generic_tensor_torch

#### 2. Based on an existing tensor
- retains shape (e.g., the `generic_tensor_torch` given above)
- replace existing values with others, for example
    - 1's via `ones_like`
    - random values via `rand_like`

In [None]:
torch.ones_like(generic_tensor_torch) 

In [None]:
torch.rand_like(generic_tensor_torch, dtype=torch.float)

#### 3. Fill in a specific shape
    - Note: you can have multiple dimensions (e.g., (2, 3, 2))

In [None]:
my_shape = (2, 3)

rand_tensor = torch.rand(my_shape)

rand_tensor

In [None]:
my_shape = (2, 3, 5)

three_dim_tensor = torch.rand(my_shape)

three_dim_tensor

<hr style="border:1px solid gray"></hr>

## Book keeping
- `tensor.shape`
- `tensor.dtype`
- `tensor.device`

In [None]:
print(f'Shape of tensor: {three_dim_tensor.shape}')
print(f'Datatype of tensor: {three_dim_tensor.dtype}')
print(f'Device tensor is stored on: {three_dim_tensor.device}')

<hr style="border:1px solid gray"></hr>

## Tensor operations
- https://pytorch.org/docs/stable/torch.html

#### Joining tensors `torch.cat`
- https://pytorch.org/docs/stable/generated/torch.cat.html#torch.cat

Add together three times
- `dim=0`: conceptually like adding more rows (same as NumPy)
- `dim=1`: conceptually like adding more columns (same as NumPy)

In [None]:
torch.cat([rand_tensor, rand_tensor, rand_tensor], dim=0)

View as a Pandas' `DataFrame`:
- same idea as we did with NumPy before

In [None]:
pd.DataFrame(torch.cat([rand_tensor, rand_tensor, rand_tensor], dim=0))

In [None]:
torch.cat([rand_tensor, rand_tensor, rand_tensor], dim=1)

In [None]:
pd.DataFrame(torch.cat([rand_tensor, rand_tensor, rand_tensor], dim=1))

#### Pairwise multiplication
- `mul`

In [None]:
rand_tensor

Multiple `rand_tensor` * `rand_tensor`:

In [None]:
rand_tensor.mul(rand_tensor)

Accessing specific values

In [None]:
rand_tensor[0][0]

Multiply individual elements:

In [None]:
rand_tensor[0][0]*rand_tensor[0][0]

## PyTorch and NumPy Interactions

Torch to Numpy:

In [None]:
rand_array = rand_tensor.numpy()
rand_array

## Using GPU

To use a GPU device instead of a CPU device, one needs to <font color='dodgerblue'>move a **specified** PyTorch tensor from its current device (i.e., CPU) to the GPU</font>.
 
(If you’re using Colab, allocate a GPU by going to Edit > Notebook Settings.)

Let's move our `rand_tensor` to GPU if possible:

In [None]:
if torch.cuda.is_available():
    print('CUDA is available - congrats!')
    tensor = rand_tensor.to('cuda')
    display(tensor.device)

    print('\nExample GPU calculation - adding the tensor to itself:')
    result_tensor = tensor + tensor
    display(result_tensor)
else:
    print('CUDA is not available - bummer.')

#### Summary
1. `Torch` has a lot of similarity to `NumPy`
2. However, it can be used on a **GPU**
3. Some special "things" that make it unique
    - <font color='dodgerblue'>history accumulation</font> (coming below)

<hr style="border:2px solid gray"></hr>

<div class="alert block alert-info alert">

# <center>Neural Networks (NNs)</center>

## Background:
- https://en.wikipedia.org/wiki/Neural_network_(machine_learning)

<font color='dodgerblue'>"A machine learning model is a **function**, with **inputs** and **outputs**." [1]</font>

- a **collection of <font color='dodgerblue'>nested functions</font>**
    - these functions are **executed on input data**
    - these functions are defined by parameters (stored in PyTorch tensors)
        - **weights**, and 
        - **biases**

<br>

What is the difference between **weights** and **bias** values?

If we have a **linear <font color='dodgerblue'>"activation function"</font>**: $y = mx + b$
- **weight** (i.e., the weight for x): <font color='dodgerblue'>$m$</font>
    - for polynomial functions (e.g. linear equation), the weights are the coefficients (see "Extra Information" for nonlinear example)
<br>
- **bias** (i.e., the equation's bias): <font color='dodgerblue'>$b$</font>
    - offset - a constant term (consider the phase shifts we covered in SciPy lecture)

**What is an <font color='dodgerblue'>activation function</font>**
- A function (surprise 🙂)
- Calculates a **node's output** (i.e., information) that is **passed** to the **next node**, using *its* specific input parameters (i.e., weights and bias)

<br>

#### Additional Resources

##### Great Tutorial Series: **3Blue1Brown**
- https://youtube.com/playlist?list=PLZHQObOWTQDNU6R1_67000Dx_ZCJB-3pi&feature=shared
    - DeepLearning Chapter 1 (DL1): "What is a NN"
    - DL2: "Gradient descent, how neural networks learn"
    - DL3: "Backpropagation, step-by-step" (meta-level)
    - DL4: "Backpropagation calculus" (the mathematics/formulas)

##### TensorFlow Playground
- (For later)
- https://playground.tensorflow.org

**Sources**:
1. https://pytorch.org/tutorials/beginner/introyt/autogradyt_tutorial.html

<p><img alt="neural network" width="800" src="00_images/31_machine_learning/deep_neural_network.png" align="center" hspace="10px" vspace="0px"></p>

Image Source: https://www.studytonight.com/post/understanding-deep-learning

<div class="alert alert-block alert-warning">
<hr style="border:1.5px dashed gray"></hr>

## Extra Information

### Sigmoid Activation Functions
- What might a weighting factor (i.e., $w$) look like

Sigmoid Equation:
\begin{equation}
\huge y(x) = \frac{1}{1 + \exp^{w*x}}
\end{equation}

In [None]:
def weighted_sigmoid(x_data: np.array, weight: float):
    ''' Simple Sigmoid function with a weight.

        Args:
            x_data: independent data
            weight: weighting factor that adjusts the equation's shape

        Return
            results: calculated results
    '''

    result = 1/(1 + np.exp(weight*-x_data))
    return result

<div class="alert alert-block alert-warning">

Plot a sigmoid equation using different weights:

In [None]:
weight_list = [-1, 1, 5]

x_values = np.linspace(-5, 5)

fig, ax = plt.subplots()
for weight in weight_list:
    y_values = weighted_sigmoid(x_data=x_values, weight=weight)

    ax.plot(x_values, y_values, label=f'{weight}')

plt.legend(loc='right', title='weights')

<div class="alert alert-block alert-warning">

So, we see that the **weighting factor** can **adjust** the **function's shape**, and consequently **adjust** the **information** being **passed** from one node to another.
    
<hr style="border:1.5px dashed gray"></hr>

## NN Training

Training is done in 2 Steps

1. **Forward Propagation**
    - <font color='dodgerblue'>pass data</font> (observables, weights and biases) <font color='dodgerblue'>forward</font> to **predict target observable**
        - compute the **activation function** that connects each node
    - (Basically, running the model as it will be eventually used.)

<br>

2. **Backward Propagation**
    - adjusts parameters proportionally to the observable error (i.e., forward propagation's output/result vs. the target)
    - done by
        - traversing backwards from the output
        - collect the **derivatives** (i.e., gradients) of the error with respect to the functions' parameters
        - **optimizes** the parameters using <font color='dodgerblue'>**gradient descent**</font>

**`torch.autograd`** is the workhorse for <font color='dodgerblue'>backward propagation</font>.

**Source**: https://pytorch.org/tutorials/beginner/introyt/autogradyt_tutorial.html

### Tensor History

A key concept in PyTorch is that it's <font color='dodgerblue'>tensor objects</font> can have a <font color='dodgerblue'>"history"</font> (i.e., **new data** that is **attached** to the object **after** it is used in a **calculation**).

For example:
- `torch.rand(*size, *, generator=None, out=None, dtype=None, layout=torch.strided, device=None,` **`requires_grad=False,`**` pin_memory=False) → Tensor`

- **`requires_grad=True`**: for every computation that follows, **`autograd`** will <font color='dodgerblue'>record the computation's history</font> in the output tensors

<br>

#### Example: A <font color='dodgerblue'>Single Layer Perceptron</font> (SLP)
- the simplest type of artificial NN
    -  classify binary categories

- https://en.wikipedia.org/wiki/Perceptron

<center>
<img alt="neural network" width="500" src="00_images/31_machine_learning/perceptron_english.png" align="center" hspace="10px" vspace="0px"></center>

<center>Image Source: https://commons.wikimedia.org/wiki/File:ArtificialNeuronModel_english.png</center>


<br>
Example Workflow:

1. Create input data (e.g., independent x data)
    - `requires_grad=True`
2. Create a simple model based on an equation: $y = sin(x)$
    - the activation function 
4. Forward Propagation
5. Backwards Propagation (i.e., first derivatives)

<br>

#### 1. Create input data
   - Everything is weighted equally (i.e., no weights)

In [None]:
input_data = torch.linspace(0.0, 2.0*torch.pi, steps=25, requires_grad=True)

input_data

Notice the `requires_grad=True` in the output.

#### 2. Create a simple model based on an equation: $y = sin(x)$ and
#### 3. Forward Propagation

<font color='dodgerblue'>Activation Function: $y = a*sin(b*x)$</font>

This is a <font color='dodgerblue'>"**Single Layer Perceptron**" NN:
- **1 input layer** (n features $\rightarrow$ n nodes),
- **1 transfer function** (basically an activation function that gets the data ready)
- **1 node (in 1 layer)** $\rightarrow$ output
- **activation function** (i.e., $a*sin(b*x)$)
    - a = 1
    - b = 1

In [None]:
def sine_activation_func(x: float, a: float, b: float) -> float:
    ''' Sine function for use in PyTorch neural networks.
    ''' 
    return a * torch.sin(b * x)

In [None]:
sin_model = sine_activation_func(x=input_data, a=1, b=1)

sin_model

Two things to note:
- The `grad_fn=<SinBackward0>` tells us that this object <font color='dodgerblue'>**is accumulating history**</font>.
- That is the **first** output of our <font color='dodgerblue'>**forward propogation**</font>.

<br><br>

Okay, let's compare this to what would have happened if `requires_grad=False`:

In [None]:
input_data_false = torch.linspace(0.0, 2.0*torch.pi, steps=25, requires_grad=False)

print(input_data_false)

torch.sin(input_data_false)

Notice this changes the object - **no** `grad_fn=<SinBackward0>` - thus, **no history accumulation**.
<br><br>

#### History Accumulation Demonstration

Okay, now back to our `sin_model`.

Since <font color='dodgerblue'>history is accumulated</font> in `sin_model`, we can now use `detach()` to <font color='dodgerblue'>grab</font> only the tensor values:

In [None]:
sin_model.detach()

**Visualize** the independent (`input_data`) and dependent (sin_model) data:

In [None]:
plt.plot(input_data.detach(), sin_model.detach())

##### <font color='dodgerblue'>Adding to the History</font>

Let's include **an additional operation** on the `sin_model` by summing values:

- `torch.sum()`: https://pytorch.org/docs/stable/generated/torch.sum.html

In [None]:
sin_sum = sin_model.sum()
print(sin_sum)

In [None]:
print(sin_model) 

In [None]:
print(sin_sum.grad_fn)
print(sin_sum.grad_fn.next_functions)
print(sin_sum.grad_fn.next_functions[0][0].next_functions)
print(sin_sum.grad_fn.next_functions[0][0].next_functions[0][0].next_functions)
print(sin_sum.grad_fn.next_functions[0][0].next_functions[0][0].next_functions[0][0].next_functions) 
print(sin_sum.grad_fn.next_functions[0][0].next_functions[0][0].next_functions[0][0].next_functions[0][0].next_functions) 

We see that if we access the <font color='dodgerblue'>"history"</font> portion of the object via `grad_fn`, we get a short explanation of what it is
- from **most recent** operation $\rightarrow$ **oldest** operation
    - `SumBackward0` $\rightarrow$ `sin_model.sum()`
    - `MulBackward0` $\rightarrow$ `b*x`
    - `SinBackward0` $\rightarrow$ `torch.sin(input_data)`
    - `MulBackward0` $\rightarrow$ `a*sin`
        - Note: the above three operations come from the line: `sin_model = sine_activation_func(x=input_data, a=1, b=1)`
    - `AccumulateGrad` $\rightarrow$ `torch.linspace(0.0, 2.0*np.pi, steps=25, requires_grad=True)`
    - `()` $\rightarrow$ starting point

With that basic understanding in place, we can now continue to the <font color='dodgerblue'>backwards propagation</font> idea.

### Taking the first derivative of sin(x)
- a.k.a "gradient"

\begin{equation}
    \frac{d}{dx} sin(x) = cos(x)
\end{equation}

##### `Autograd`: Computing Gradients

- `autograd` is used when calling `backward()` function (i.e., `autograd.backwards`)
- https://pytorch.org/docs/stable/generated/torch.Tensor.backward.html

<div class="alert alert-block alert-warning">
<hr style="border:1.5px dashed gray"></hr>

## Extra Information

### `autograd.grad`

Both `autograd.grad` and `autograd.backwards` take derivatives. However, they differ concerning if they affect the accumulation history.

`autograd.grad`: compute gradients for specific tensors without affecting others (e.g., **does not** accumulate history)

Source: https://www.geeksforgeeks.org/understanding-pytorchs-autogradgrad-and-autogradbackward/

In [None]:
## Create a new object
test_sin_model = torch.sin(input_data)
test_sin_sum = test_sin_model.sum()
test_grad = torch.autograd.grad(outputs=test_sin_sum, inputs=input_data)

display(test_grad)
display(test_grad[0].sum())
# print(test_grad.grad_fn) ## demonstrates the lack of history

<div class="alert alert-block alert-warning">
<hr style="border:1.5px dashed gray"></hr>

#### 1. `backward()`: compute gradients

- will be added to the history

In [None]:
sin_sum.backward()
sin_sum

- `1.4504e-07`: **sum** of all **elements** in the `sin_model` **tensor**
    - Recall that `sin_sum` was created using `sin_sum = sin_model.sum()`
    - Thus, we have to go back to `sin_model`
- `<SumBackward0>`: name of the PyTorch function (i.e., sum during backward propagation)

Let's verify that `1.4504e-07` is indeed the summ of the tensor elements:

In [None]:
sin_model

In [None]:
print(sin_model.sum()) 

So, everything looks okay.

<br>

Now what did `backward()` do with the history?

`sin_sum.backward()` **<font color='dodgerblue'>created a `grad` property</font>** within the <font color='dodgerblue'>**original input data</font>** as part of its **accumulated history**.

(A property of an object is coming from a Python class.)

In [None]:
print(f'Input values:'
      f'\n{input_data}')
print()
print(f'Gradient values:'
      f'\n{input_data.grad}')
print()
print(f'Detached from the input_data:'
      f'\n{input_data.grad.detach()}')

##### Side note: Verification of the derivative of sin

We can now prove to ourselves that what is happening by `backwards()` is what is expected

In [None]:
first_derivative_sin = np.cos(input_data.detach().numpy())
first_derivative_sin

In [None]:
plt.plot(input_data.detach(), input_data.grad.detach(), linestyle='-', linewidth=10)
plt.plot(input_data.detach(), first_derivative_sin, linestyle='--', linewidth=5)

What is left that is missing is the **weights** and **bias optimization** portion of the workflow.

# Take-home

1. PyTorch is similar to NumPy
    - similar functions
    - `torch` instead of `array`
    - able to use GPUs (not just CPUs)
2.  Basics of neural networks
    - forward propagation
    - backwards propagation
3. History accumulation
4. Example: A Single Layer Perceptron (SLP)

The above forms the foundation for understading how PyTorch is "typically" used in a ML project -- a future lecture.