# Notes on Pytorch turtorial
(https://pytorch.org/tutorials/)

In [2]:
import torch
x = torch.randn(5, 3, dtype=torch.float)
print(f"x.shape = {x.size()}")
print(f"x.size()[0] = {x.size()[0]}, x.size()[1] = {x.size()[1]}")

x.shape = torch.Size([5, 3])
x.size()[0] = 5, x.size()[1] = 3


<font color=red> `torch.Size` is in fact a tuple, so it supports all tuple operations.</font>

In [4]:
result = torch.empty(5, 3)
y = torch.randn(5, 3, dtype=torch.float)
torch.add(x, y, out=result) # provide an output tensor as argument
print(result)

tensor([[ 0.0173,  0.7390, -1.9385],
        [-1.3984,  1.1377,  1.3435],
        [-0.1264,  0.3964,  3.0010],
        [ 3.2115, -1.3093,  0.3868],
        [ 0.3710, -0.7705,  0.4424]])


In [5]:
# add in-place
y.add_(x)
print(y)

tensor([[ 0.0173,  0.7390, -1.9385],
        [-1.3984,  1.1377,  1.3435],
        [-0.1264,  0.3964,  3.0010],
        [ 3.2115, -1.3093,  0.3868],
        [ 0.3710, -0.7705,  0.4424]])


<font color=red>**Any operation that mutates a tensor in-place is post-fixed with an _.** For example: `x.copy_(y)`, `x.t_()`, will change x.</font> 
<br> We can use standard NumPy-like indexing with all tensors. 
<br> Resizing: If you want to resize/reshape tensor, you can use `torch.view` or `x.reshape`:

In [7]:
z = x.view(-1, 5)
print(z)
print(x.reshape(-1, 5))

tensor([[-0.5424, -0.6362, -1.9036, -1.7576,  0.6408],
        [ 0.9333, -0.1417, -0.0428,  0.3181,  3.1830],
        [-0.9809,  0.6009,  1.1706,  1.0136, -0.3764]])
tensor([[-0.5424, -0.6362, -1.9036, -1.7576,  0.6408],
        [ 0.9333, -0.1417, -0.0428,  0.3181,  3.1830],
        [-0.9809,  0.6009,  1.1706,  1.0136, -0.3764]])


### Convertion between a Torch Tensor and a NumPy array:
The Torch Tensor and NumPy array will <font color=red> share their underlying memory locations </font> (if the Torch Tensor is on CPU), and changing one will change the other.
<br> All the Tensors on the CPU except a CharTensor support converting to NumPy and back.

In [9]:
a = torch.ones(5)
b = a.numpy()
a.add_(1)
print(f"b = {b}") # b will be affected if a is changed in-place
c = torch.from_numpy(b)
print(c)

b = [2. 2. 2. 2. 2.]
tensor([2., 2., 2., 2., 2.])


## AUTOGRAD: AUTOMATIC DIFFERENTIATION
Central to all neural networks in PyTorch is the `autograd` package.<br>
To prevent tracking history (and using memory), you can wrap the code block in `with torch.no_grad():`. <br>
Each tensor has a `.grad_fn` attribute that references a Function that has created the Tensor (<font color=red>except for Tensors created by the user - their `grad_fn is None`)</font>.

**Important attributes in Variables: data, requires_grad, grad_fn, grad**
1. `grad_fn` is None for leaf Tensor, while its grad is a Tensor. 
2. `grad_fn` is not None for other tree-node Tensors, but there will be a warining if you access its grad: *warnings.warn("The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad "*

In [19]:
a = torch.randn(2, 2)
print(f"a.requires_grad = {a.requires_grad}")
a.requires_grad_(True)
print(f"a.requires_grad = {a.requires_grad}")
b = (a * a).sum()
print(f"b.grad_fn = {b.grad_fn}")

a.requires_grad = False
a.requires_grad = True
b.grad_fn = <SumBackward0 object at 0x125db30b8>


In [20]:
b.backward()
print(f"a.grad = {a.grad}")
print(f"b.grad = {b.grad}")

a.grad = tensor([[ 1.1097,  0.2849],
        [-1.8266, -1.2235]])
b.grad = None


In [21]:
b = (a * a).sum()
b.backward(torch.ones_like(b))
print(f"a.grad = {a.grad}")
print(f"b.grad = {b.grad}")

a.grad = tensor([[ 2.2193,  0.5699],
        [-3.6532, -2.4469]])
b.grad = None


## Neural Networks
`torch.nn` only supports mini-batches. The entire `torch.nn` package only supports inputs that are a mini-batch of samples, and not a single sample. <br>
For example, `nn.Conv2d` will take in a 4D Tensor of `nSamples x nChannels x Height x Width`.
If you have a single sample, just use `input.unsqueeze(0)` to add a fake batch dimension.

If you follow `loss` in the backward direction, using its `.grad_fn` attribute, you will see a graph of computations.  
```
print(loss.grad_fn)  # MSELoss
print(loss.grad_fn.next_functions[0][0])  # Linear
print(loss.grad_fn.next_functions[0][0].next_functions[0][0])  # ReLU
```

In [None]:
import torch.optim as optim

# create your optimizer
optimizer = optim.SGD(net.parameters(), lr=0.01)

# in your training loop:
optimizer.zero_grad()   # zero the gradient buffers
output = net(input)
loss = criterion(output, target)
loss.backward()
optimizer.step()    # Does the update

The output of `torchvision` datasets are **PILImage images of range [0, 1]. We transform them to Tensors of normalized range [-1, 1].**
```
import torchvision.transforms as transforms
transform = transforms.Compose(
    [transforms.ToTensor(),
     transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])

trainset = torchvision.datasets.CIFAR10(root='./data', train=True,
                                        download=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=4,
                                          shuffle=True, num_workers=2)```

### save & load model 
```
PATH = './cifar_net.pth'
torch.save(net.state_dict(), PATH)

net = Net()
net.load_state_dict(torch.load(PATH))
```
