<div class="alert alert-block alert-info" style="margin-top: 20px">

      
| Name | Description | Date
| :- |-------------: | :-:
|Reza Hashemi| Building Model Basics 2nd  | On 23rd of August 2019 | width="750" align="center"></a></p>
</div>

# Building Blocks of Models
- ```nn.Linear```
- Nonlinear Activations
- Loss functions
- Optimizers

In [0]:
!pip3 install torch torchvision



In [0]:
import numpy as np
import pandas as pd
import torch, torchvision
torch.__version__

'1.1.0'

In [0]:
import torch.nn as nn

## 1. nn.Linear
```nn.Linear()``` is one of the basic building blocks of any neural network (NN) model
  - Performs linear (or affine) transformation in the form of ```Wx (+ b)```. In NN terminology, generates a fully connected, or dense, layer.
  - Two parameters, ```in_features``` and ```out_features``` should be specified
  - Documentation: [linear_layers](https://pytorch.org/docs/stable/nn.html#linear-layers)
  
```python
torch.nn.Linear(in_features,       # size of each input sample
                out_features,     # size of each output sample
                bias = True)         # whether bias (b) will be added or not
                         
```

In [0]:
linear = nn.Linear(5, 1)             # input dim = 5, output dim = 1
x = torch.FloatTensor([1, 2, 3, 4, 5])    # 1d tensor
print(linear(x))      
y = torch.ones(3, 5)                      # 2d tensor
print(linear(y))

tensor([-1.2518], grad_fn=<AddBackward0>)
tensor([[-0.3499],
        [-0.3499],
        [-0.3499]], grad_fn=<AddmmBackward>)


## 2. Nonlinear activations
PyTorch provides a number of nonlinear activation functions. Most commonly used ones are:
```python
torch.nn.ReLU()                # relu
torch.nn.Sigmoid()         # sigmoid
torch.nn.Tanh()        # tangent hyperbolic
torch.nn.Softmax()        # softmax
```
  - Documentation: [nonlinear_activations](https://pytorch.org/docs/stable/nn.html#non-linear-activations-weighted-sum-nonlinearity)

In [0]:
relu = torch.nn.ReLU()
sigmoid = torch.nn.Sigmoid()
tanh = torch.nn.Tanh()
softmax = torch.nn.Softmax(dim = 0)   # when using softmax, explicitly designate dimension

x = torch.randn(5)     # five random numbers
print(x)
print(relu(x))       
print(sigmoid(x))
print(tanh(x))
print(softmax(x))

tensor([-0.5488, -0.1497,  0.0791, -1.1919, -0.8415])
tensor([0.0000, 0.0000, 0.0791, 0.0000, 0.0000])
tensor([0.3661, 0.4626, 0.5198, 0.2329, 0.3012])
tensor([-0.4996, -0.1486,  0.0789, -0.8312, -0.6866])
tensor([0.1774, 0.2645, 0.3324, 0.0933, 0.1324])


## 3. Loss Functions
There are a number of loss functions that are already implemented in PyTorch. Common ones include:
- ```nn.MSELoss```: Mean squared error. Commonly used in regression tasks.
- ```nn.CrossEntropyLoss```: Cross entropy loss. Commonly used in classification tasks

In [0]:
a = torch.FloatTensor([2, 4, 5])
b = torch.FloatTensor([1, 3, 2])

mse = nn.MSELoss()
print(mse(a, b))

tensor(3.6667)


In [0]:
# note that when using CrossEntropyLoss, input has to have (N, C) shape, where
# N is the batch size
# C is the number of classes
a = torch.FloatTensor([[0.5, 0], [4.5, 0], [0, 0.4], [0, 0.1]])   # input
b = torch.LongTensor([1, 1, 1, 0])                                # target

ce = nn.CrossEntropyLoss()
print(ce(a,b))

tensor(1.6856)


## 4. Optimizers
- ```torch.optim``` provides various optimization algorithms that are commonly used. Some of them are: 
```python
optim.Adagrad   
optim.Adam
optim.RMSprop
optim.SGD
```
- As arguments, (model) parameters and (optionally) learning rate are passed
- Model training process
  - ```optimizer.zero_grad()```: sets all gradients to zero (for every training batches)
  - ```loss_fn.backward()```: back propagate with respect to the loss function
  - ```optimizer.step()```: update model parameters

In [0]:
## how pytorch models are trained with loss function and optimizers

# input and output data
x = torch.randn(5)
y = torch.ones(1)

model = nn.Linear(5, 1)  # generate model
loss_fn = nn.MSELoss()   # define loss function
optimizer = torch.optim.RMSprop(model.parameters(), lr = 0.01)     # create optimizer 
optimizer.zero_grad()                      # setting gradients to zero
loss_fn(model(x), y).backward()            # back propagation
optimizer.step()                           # update parameters based on gradients computed