Credit to Daniel Godoy. Link to article: https://towardsdatascience.com/understanding-pytorch-with-an-example-a-step-by-step-tutorial-81fc5f8c4e8e 

Simple Linear Regression model:
![function](https://miro.medium.com/max/188/1*a7_GUQQT5BjvAhh3qq0JwA.png)

### Numpy Data Generation
Generate data and split into train/validation sets.

In [2]:
import numpy as np
import torch

In [3]:
# Data Generation
np.random.seed(42)
x = np.random.rand(100,1)
y = 1 + 2 * x + .1 * np.random.randn(100,1)

# Shuffle indices
idx = np.arange(100)
np.random.shuffle(idx)

# Split into train/validation sets
train_idx = idx[:80]
val_idx = idx[80:]

# Generate train/validation sets
x_train, y_train = x[train_idx], y[train_idx]
x_val, y_val = x[val_idx], y[val_idx]

## Gradient Descent
Four Steps:
1. Compute the loss
2. Compute the gradients
3. Update the Parameters
4. Restart the process

### 1. Compute the loss
MSE (Mean Square Error) is often used for regression problems.
![MSE Function](https://miro.medium.com/max/345/1*7fmJUcQT578OBfX7Q8hluQ.png)


### 2. Compute the Gradients
Compute the gradients w.r.t. coefficients a and b
![compute gradients function](https://miro.medium.com/max/700/1*YvTj1B-h1gzSI5F24OgrrA.png)

### 3. Update the Parameters
Use the gradients to update the parameters. Goal is to minimize loss.
![Update Coefficients](https://miro.medium.com/max/209/1*eWnUloBYcSNPRBzVcaIr1g.png)

### 4. Use updated parameters and restart the process
Repeating this over and over is considered training a model

## Numpy Implementation
Numpy implementation to show manual complexity vs eventual ease with which PyTorch runs

In [16]:
np.random.seed(42)
a = np.random.randn(1)
b = np.random.randn(1)

print(a, b)

# Set learning rate
lr = 1e-1
 
# Define num epochs
n_epochs = 1000

for epoch in range(n_epochs):
    # Compute predicted output
    yhat = a + b * x_train
    
    # How wrong is the model?
    error = (y_train - yhat)
    # Compute MSE
    loss = (error ** 2).mean()
    
    # Compute gradient for a and b params
    a_grad = -2 * error.mean()
    b_grad = -2 * (x_train * error).mean()
    
    # Update parameters using gradients and the lr
    a = a - lr * a_grad
    b = b - lr * b_grad

    print(a, b)
    
# Check model with SKLearn
from sklearn.linear_model import LinearRegression
linr = LinearRegression()
linr.fit(x_train, y_train)
print('SKLearn Results:')
print(linr.intercept_, linr.coef_[0])

[0.49671415] [-0.1382643]
[0.80119529] [0.04511107]
[1.02745273] [0.1880898]
[1.19494837] [0.30062446]
[1.31831128] [0.39019766]
[1.40853768] [0.46243529]
[1.47389293] [0.52156753]
[1.52058962] [0.57077536]
[1.55329724] [0.61245104]
[1.57552532] [0.64839396]
[1.58991148] [0.67995779]
[1.59843788] [0.70816115]
[1.60259402] [0.7337708]
[1.60349903] [0.75736412]
[1.60199366] [0.77937615]
[1.59870941] [0.8001349]
[1.59412047] [0.81988791]
[1.58858283] [0.83882224]
[1.58236358] [0.85707942]
[1.57566302] [0.8747668]
[1.56863126] [0.89196598]
[1.56138068] [0.90873921]
[1.55399527] [0.92513416]
[1.54653776] [0.94118756]
[1.53905484] [0.95692788]
[1.53158117] [0.97237736]
[1.52414239] [0.98755358]
[1.51675733] [1.00247055]
[1.50943976] [1.01713964]
[1.50219959] [1.03157019]
[1.49504388] [1.04577]
[1.48797756] [1.05974573]
[1.4810039] [1.07350312]
[1.47412502] [1.08704728]
[1.4673421] [1.10038276]
[1.46065567] [1.11351372]
[1.45406576] [1.12644402]
[1.44757202] [1.13917725]
[1.44117385] [1.15171

[1.02510604] [1.96590166]
[1.02508239] [1.96594793]
[1.0250591] [1.9659935]
[1.02503617] [1.96603839]
[1.02501357] [1.96608259]
[1.02499132] [1.96612613]
[1.02496941] [1.96616901]
[1.02494783] [1.96621124]
[1.02492657] [1.96625283]
[1.02490564] [1.9662938]
[1.02488502] [1.96633414]
[1.02486471] [1.96637388]
[1.02484471] [1.96641302]
[1.02482501] [1.96645156]
[1.02480561] [1.96648952]
[1.0247865] [1.96652691]
[1.02476768] [1.96656374]
[1.02474914] [1.9666]
[1.02473089] [1.96663572]
[1.02471291] [1.96667091]
[1.0246952] [1.96670555]
[1.02467776] [1.96673968]
[1.02466058] [1.96677329]
[1.02464367] [1.96680639]
[1.02462701] [1.96683899]
[1.0246106] [1.9668711]
[1.02459443] [1.96690273]
[1.02457852] [1.96693388]
[1.02456284] [1.96696455]
[1.0245474] [1.96699476]
[1.02453219] [1.96702452]
[1.02451721] [1.96705383]
[1.02450246] [1.96708269]
[1.02448793] [1.96711112]
[1.02447362] [1.96713912]
[1.02445953] [1.96716669]
[1.02444565] [1.96719385]
[1.02443198] [1.9672206]
[1.02441852] [1.96724695]

[1.02354304] [1.96896]
[1.023543] [1.96896007]
[1.02354297] [1.96896014]
[1.02354294] [1.9689602]
[1.0235429] [1.96896027]
[1.02354287] [1.96896033]
[1.02354284] [1.96896039]
[1.02354281] [1.96896046]
[1.02354278] [1.96896052]
[1.02354275] [1.96896058]
[1.02354272] [1.96896063]
[1.02354269] [1.96896069]
[1.02354266] [1.96896075]
[1.02354263] [1.96896081]
[1.0235426] [1.96896086]
[1.02354257] [1.96896092]
[1.02354254] [1.96896097]
[1.02354252] [1.96896102]
[1.02354249] [1.96896107]
[1.02354246] [1.96896113]
[1.02354244] [1.96896118]
[1.02354241] [1.96896123]
[1.02354239] [1.96896127]
[1.02354236] [1.96896132]
[1.02354234] [1.96896137]
[1.02354231] [1.96896142]
[1.02354229] [1.96896146]
[1.02354227] [1.96896151]
[1.02354225] [1.96896155]
[1.02354222] [1.9689616]
[1.0235422] [1.96896164]
[1.02354218] [1.96896168]
[1.02354216] [1.96896173]
[1.02354214] [1.96896177]
[1.02354212] [1.96896181]
[1.02354209] [1.96896185]
[1.02354207] [1.96896189]
[1.02354205] [1.96896193]
[1.02354203] [1.968961

## PyTorch
Time to TORCH it.

* A scalar has zero dimensions
* A vector has one dimension
* A matrix has two dimensions and a tensor has three or more dimensions. 

In [28]:
import torch
import torch.optim as optim
import torch.nn as nn
#from torchviz import make_dot

device = 'cuda' if torch.cuda.is_available() else 'cpu'

x_train_tensor = torch.from_numpy(x_train).float().to(device)
y_train_tensor = torch.from_numpy(y_train).float().to(device)

print(type(x_train), type(x_train_tensor), x_train_tensor.type())

<class 'numpy.ndarray'> <class 'torch.Tensor'> torch.cuda.FloatTensor
