# PyTorch introduction
The fundamental building block in PyTorch is *tensors*. A PyTorch Tensor is conceptually identical to a numpy array. Any computation you might want to perform with numpy can also be accomplished with PyTorch Tensors.

Unlike NumPy, Tensors can utilize GPUs to accelerate numeric computations.

In [None]:
import torch

In [None]:
torch.cuda.is_available()

In [None]:
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

In [None]:
# Create a random matrix
x = torch.randn(2,3, device=device)
x

In [None]:
x+1

In [None]:
torch.matmul(x,x.T)

In [None]:
x @ x.T # Matrix multiplication

In [None]:
x * x # Elementwise multiplication

## PyTorch: Autograd
Manually implementing the backward pass for large networks can quickly get complex. 
With *automatic differentiation*, this can be done automatically using `autograd` in PyTorch. The forward pass in a network will define a computational graph, where nodes will be Tensors and edges will be functions that produce output Tensors from the input Tensors. 

The only thing we need to do it specifying `requires_grad=True` when constructing a Tensor.

With `x` being a tensor with `requires_grad=True`, after backpropagation `x.grad` will be a Tensor holding the gradient of `x` with respect to some scalar value

In [None]:
w = torch.tensor([0.1], requires_grad=True)
b = torch.tensor([2.0], requires_grad=True)

In [None]:
x = torch.tensor([0.0])
y = torch.tensor([1.0])

Forward pass: compute predicted y using operations on Tensors. 

Since w and b have requires_grad=True, operations involving these Tensors will cause
PyTorch to build a computational graph, allowing automatic computation of
gradients. 

Since we are no longer implementing the backward pass by hand we
don't need to keep references to intermediate values.

In [None]:
y_pred = w*x+b # Linear regression

In [None]:
print(f'True label: {y}', f'\nPredicted: {y_pred}')

In [None]:
loss = (y_pred - y).pow(2)
loss

In [None]:
loss.backward()
print(f'Gradient b: {b.grad}')
print(f'Gradient w: {w.grad}')

In [None]:
# Manually zero the gradients after running the backward pass
w.grad.zero_()
b.grad.zero_()

## PyTorch: nn
For large neural networks, raw autograd can be a bit too low-level. 

When building neural networks we usually arrange the computation into layers, wheras some have learnable parameters to be optimized during learning. 

In PyTorch the `nn` package defines a set of *Modules* which are roughly equivalent to neural network layers

In [None]:
import torch

N = 64 # N is batch size
D_in = 1000 # D_in is input dimension
H = 100 # H is hidden dimension
D_out = 10 # D_out is output dimension.

x = torch.randn(N, D_in)
y = torch.randn(N, D_out)

In [None]:
model = torch.nn.Sequential(
    torch.nn.Linear(D_in, H),
    torch.nn.ReLU(),
    torch.nn.Linear(H,D_out),
)

In [None]:
loss_fn = torch.nn.MSELoss() # reduction = none, default=mean, sum

In [None]:
y_pred = model(x)
y_pred[0], y[0]

In [None]:
loss = loss_fn(y_pred, y)
loss.item()

In [None]:
loss.backward()

In [None]:
# Update the weights using gradient descent. Each parameter is a Tensor, so
# we can access its data and gradients like we did before.
# Example of parameter update
learning_rate = 1e-1
with torch.no_grad():
    for param in model.parameters():
        param.data -= learning_rate * param.grad
model.zero_grad()

## PyTorch: Custom nn Modules
You can define your own Modules by subclassing `nn.Module` and defining the `forward` pass.

### Constructor:
In the constructor we instantiate two nn.Linear modules and assign them as member variables.
### Forward function:
In the forward function we accept a Tensor of input data and we must return a Tensor of output data. 
We can use Modules defined in the constructor as well as arbitrary (differentiable) operations on Tensors.

In [None]:
import torch

class TwoLayerNet(torch.nn.Module):
  def __init__(self, D_in, H, D_out):
    super(TwoLayerNet, self).__init__()
    self.linear1 = torch.nn.Linear(D_in, H)
    self.linear2 = torch.nn.Linear(H, D_out)

  def forward(self, x):
    relu = torch.nn.ReLU()
    h_relu = relu(self.linear1(x))
    y_pred = self.linear2(h_relu)
    return y_pred

In [None]:
N, D_in, H, D_out = 64, 1000, 100, 10
x = torch.randn(N, D_in)
y = torch.randn(N, D_out)

In [None]:
model = TwoLayerNet(D_in, H, D_out)

In [None]:
loss_fn = torch.nn.MSELoss()

In [None]:
# Define the optimization algorithm to be used (Stochastic Gradient Descent):
optimizer = torch.optim.SGD(model.parameters(), lr=1e-1)

## Training

In [None]:
y_pred = model(x)

In [None]:
loss = loss_fn(y_pred, y)
loss.item()

### Zero gradients, perform a backward pass, and update the weights.

In [None]:
optimizer.zero_grad()
loss.backward()
optimizer.step()