# Artificial Neural Networks With PyTorch
>  In this second chapter, we delve deeper into Artificial Neural Networks, learning how to train them with real datasets.

- toc: true 
- badges: true
- comments: true
- author: Lucas Nunes
- categories: [Datacamp]
- image: images/datacamp/___

> Note: This is a summary of the course's chapter 2 exercises "Introduction to Deep Learning with PyTorch" at datacamp. <br>[Github repo](https://github.com/lnunesAI/Datacamp/) / [Course link](https://www.datacamp.com/tracks/deep-learning-in-python)

In [None]:
import torch
import torch.nn as nn
import numpy as np

## Activation functions

### Neural networks

<div class=""><p>Let us see the differences between neural networks which apply <code>ReLU</code> and those which do not apply <code>ReLU</code>. We have already initialized the input called <code>input_layer</code>, and three sets of weights, called <code>weight_1</code>, <code>weight_2</code> and <code>weight_3</code>.</p>
<p>We are going to convince ourselves that networks with multiple layers which do not contain non-linearity can be expressed as neural networks with one layer.</p>
<p>The network and the shape of layers and weights is shown below.</p>
<p><img src="https://assets.datacamp.com/production/repositories/4094/datasets/90a76cfc0248297aa65f7cb9bdc17602b6b1d84b/net-ex.jpg" alt=""></p></div>

In [None]:
input_layer = torch.tensor([[ 0.0401, -0.9005,  0.0397, -0.0876]])

weight_1 = torch.tensor([[-0.1094, -0.8285,  0.0416, -1.1222],
                        [ 0.3327, -0.0461,  1.4473, -0.8070],
                        [ 0.0681, -0.7058, -1.8017,  0.5857],
                        [ 0.8764,  0.9618, -0.4505,  0.2888]])

weight_2 = torch.tensor([[ 0.6856, -1.7650,  1.6375, -1.5759],
                        [-0.1092, -0.1620,  0.1951, -0.1169],
                        [-0.5120,  1.1997,  0.8483, -0.2476],
                        [-0.3369,  0.5617, -0.6658,  0.2221]])

weight_3 = torch.tensor([[ 0.8824,  0.1268,  1.1951,  1.3061],
                        [-0.8753, -0.3277, -0.1454, -0.0167],
                        [ 0.3582,  0.3254, -1.8509, -1.4205],
                        [ 0.3786,  0.5999, -0.5665, -0.3975]])

Instructions
<ul>
<li>Calculate the first and second hidden layer by multiplying the appropriate inputs with the corresponding weights.</li>
<li>Calculate and print the results of the output.</li>
<li>Set <code>weight_composed_1</code> to the product of <code>weight_1</code> with <code>weight_2</code>, then set <code>weight</code> to the product of <code>weight_composed_1</code> with <code>weight_3</code>.</li>
<li>Calculate and print the output.</li>
</ul>

In [None]:
# Calculate the first and second hidden layer
hidden_1 = torch.matmul(input_layer, weight_1)
hidden_2 = torch.matmul(hidden_1, weight_2)

# Calculate the output
print(torch.matmul(hidden_2, weight_3))

# Calculate weight_composed_1 and weight
weight_composed_1 = torch.matmul(weight_1, weight_2)
weight = torch.matmul(weight_composed_1, weight_3)

# Multiply input_layer with weight
print(torch.matmul(input_layer, weight))

tensor([[0.2653, 0.1311, 3.8219, 3.0032]])
tensor([[0.2653, 0.1311, 3.8219, 3.0032]])


**You see how the results are the same.**

### ReLU activation

<div class=""><p>In this exercise, we have the same settings as the previous exercise. In addition, we have instantiated the <code>ReLU</code> activation function called <code>relu()</code>.</p>
<p>Now we are going to build a neural network which has non-linearity and by doing so, we are going to convince ourselves that networks with multiple layers and non-linearity functions cannot be expressed as a neural network with one layer.</p>
<p><img src="https://assets.datacamp.com/production/repositories/4094/datasets/90a76cfc0248297aa65f7cb9bdc17602b6b1d84b/net-ex.jpg" alt=""></p></div>

In [None]:
relu = nn.ReLU()

Instructions
<ul>
<li>Apply non-linearity on <code>hidden_1</code> and <code>hidden_2</code>.</li>
<li>Apply non-linearity in the product of first two weight.</li>
<li>Multiply the result of the previous step with <code>weight_3</code>.</li>
<li>Multiply <code>input_layer</code> with <code>weight</code> and print the results.</li>
</ul>

In [None]:
# Apply non-linearity on hidden_1 and hidden_2
hidden_1_activated = relu(torch.matmul(input_layer, weight_1))
hidden_2_activated = relu(torch.matmul(hidden_1_activated, weight_2))
print(torch.matmul(hidden_2_activated, weight_3))

# Apply non-linearity in the product of first two weights. 
weight_composed_1_activated = relu(torch.matmul(weight_1, weight_2))

# Multiply `weight_composed_1_activated` with `weight_3
weight = torch.matmul(weight_composed_1_activated, weight_3)

# Multiply input_layer with weight
print(torch.matmul(input_layer, weight))

tensor([[-0.2770, -0.0345, -0.1410, -0.0664]])
tensor([[-0.2117, -0.4782,  4.0438,  3.0417]])


**As expected the results are different from the previous exercise.**

### ReLU activation again

<div class=""><p>Neural networks don't need to have the same number of units in each layer. Here, you are going to experiment with the <code>ReLU</code> activation function again, but this time we are going to have a different number of units in the layers of the neural network. The input layer will still have <code>4</code> features, but then the first hidden layer will have <code>6</code> units and the output layer will have <code>2</code> units.</p>
<p><img src="https://assets.datacamp.com/production/repositories/4094/datasets/d55fe04c7d4cc7e2b1614e39e77e832ce56c3fc8/net-ex2.jpg" alt=""></p></div>

Instructions
<ul>
<li>Instantiate the <code>ReLU()</code> activation function as <code>relu</code> (the function is part of <code>nn</code> module).</li>
<li>Initialize <code>weight_1</code> and <code>weight_2</code> with random numbers.</li>
<li>Multiply the <code>input_layer</code> with <code>weight_1</code>, storing results in <code>hidden_1</code>.</li>
<li>Apply the <code>relu</code> activation function over <code>hidden_1</code>, and then multiply the output of it with <code>weight_2</code>.</li>
</ul>

In [None]:
# Instantiate ReLU activation function as relu
relu = nn.ReLU()

# Initialize weight_1 and weight_2 with random numbers
weight_1 = torch.rand(4, 6)
weight_2 = torch.rand(6, 2)

# Multiply input_layer with weight_1
hidden_1 = torch.matmul(input_layer, weight_1)

# Apply ReLU activation function over hidden_1 and multiply with weight_2
hidden_1_activated = relu(hidden_1)
print(torch.matmul(hidden_1_activated, weight_2))

tensor([[0., 0.]])


## Loss functions

### Calculating loss function by hand

<div class=""><p>Let's start the exercises by calculating the loss function by hand. Don't do this exercise in PyTorch, it is important to first do it using only pen and paper (and a calculator).</p>
<p>We have the same example as before but now our object is actually a frog, and the predicted scores are <code>-1.2</code> for class <code>0</code> (cat), <code>0.12</code> for class <code>1</code> (car) and <code>4.8</code> for class <code>2</code> (frog).</p>
<hr>
<p>What is the result of the softmax cross-entropy loss function?</p>
<table>
<thead>
<tr>
<th>Class</th>
<th>Predicted Score</th>
</tr>
</thead>
<tbody>
<tr>
<td>Cat</td>
<td>-1.2</td>
</tr>
<tr>
<td>Car</td>
<td>0.12</td>
</tr>
<tr>
<td>Frog</td>
<td>4.8</td>
</tr>
</tbody>
</table></div>

In [None]:
c0 = np.exp(-1.2)
c1 = np.exp(0.12)
c2 = np.exp(4.8)
unnormalized = c0+c1+c2
p0 = c0/unnormalized
p1 = c1/unnormalized
p2 = c2/unnormalized
np.round(-np.log(p2), 4)

0.0117

<pre>
Possible Answers
6.0117
4.6917
<b>0.0117</b>
Score for frog is high, so loss is 0.
</pre>

### Calculating loss function in PyTorch

<div class=""><p>You are going to code the previous exercise, and make sure that we computed the loss correctly. Predicted scores are <code>-1.2</code> for class <code>0</code> (cat), <code>0.12</code> for class <code>1</code> (car) and <code>4.8</code> for class <code>2</code> (frog). The ground truth is class <code>2</code> (frog). Compute the loss function in PyTorch.</p>
<table>
<thead>
<tr>
<th>Class</th>
<th>Predicted Score</th>
</tr>
</thead>
<tbody>
<tr>
<td>Cat</td>
<td>-1.2</td>
</tr>
<tr>
<td>Car</td>
<td>0.12</td>
</tr>
<tr>
<td>Frog</td>
<td>4.8</td>
</tr>
</tbody>
</table></div>

Instructions
<ul>
<li>Initialize the tensor of scores with numbers <code>[[-1.2, 0.12, 4.8]]</code>, and the tensor of ground truth <code>[2]</code>.</li>
<li>Instantiate the cross-entropy loss and call it <code>criterion</code>.</li>
<li>Compute and print the loss.</li>
</ul>

In [None]:
# Initialize the scores and ground truth
logits = torch.tensor([[-1.2, 0.12, 4.8]])
ground_truth = torch.tensor([2])

# Instantiate cross entropy loss
criterion = nn.CrossEntropyLoss()

# Compute and print the loss
loss = criterion(logits, ground_truth)
print(loss)

tensor(0.0117)


**As you can see, the loss function PyTorch calculated gives the same number as the loss function you calculated. Being proficient in understanding and calculating loss functions is a very important skill in deep learning.**

### Loss function of random scores

<p>If the neural network predicts random scores, what would be its loss function? Let's find it out in PyTorch. The neural network is going to have 1000 classes, each having a random score. For ground truth, it will have class 111. Calculate the loss function.</p>

Instructions
<ul>
<li>Import <code>torch</code> and <code>torch.nn as nn</code></li>
<li>Initialize <code>logits</code> with a random tensor of shape <code>(1, 1000)</code> and <code>ground_truth</code> with a tensor containing the number <code>111</code>.</li>
<li>Instantiate the cross-entropy loss in a variable called <code>criterion</code>.</li>
<li>Calculate and print the loss function.</li>
</ul>

In [None]:
# Initialize logits and ground truth
logits = torch.rand(1, 1000)
ground_truth = torch.tensor([111])

# Instantiate cross-entropy loss
criterion = nn.CrossEntropyLoss()

# Calculate and print the loss
loss = criterion(logits, ground_truth)
print(loss)

tensor(6.6900)


**The score is close to -ln(1/1000) = 6.9. This is not surprising, considering that scores were random and close to each other, so the probability for each class was approximately the same (1/1000) = 0.001.**

## Preparing a dataset in PyTorch

### Preparing MNIST dataset

<p>You are going to prepare dataloaders for <code>MNIST</code> training and testing set. As we explained in the lecture, <code>MNIST</code> has some differences to <code>CIFAR-10</code>, with the main difference being that <code>MNIST</code> images are <code>grayscale</code> (1 channel based) instead of <code>RGB</code> (3 channels).</p>

Instructions
<ul>
<li>Transform the data to torch tensors and normalize it, <code>mean</code> is <code>0.1307</code> while <code>std</code> is <code>0.3081</code>.</li>
<li>Prepare the <code>trainset</code> and the <code>testset</code>.</li>
<li>Prepare the dataloaders for training and testing so that only 32 pictures are processed at a time.</li>
</ul>

In [None]:
import torchvision
import torchvision.transforms as transforms

from six.moves import urllib
opener = urllib.request.build_opener()
opener.addheaders = [('User-agent', 'Mozilla/5.0')]
urllib.request.install_opener(opener)

# Transform the data to torch tensors and normalize it 
transform = transforms.Compose([transforms.ToTensor(),
								transforms.Normalize((0.1307), ((0.3081)))])

# Prepare the datasets
trainset = torchvision.datasets.MNIST('mnist', train=True, 
									  download=True, transform=transform)
testset = torchvision.datasets.MNIST('mnist', train=False, 
									  download=True, transform=transform)

# Prepare the dataloaders
train_loader = torch.utils.data.DataLoader(trainset, batch_size=32,
										  shuffle=True, num_workers=0)
test_loader = torch.utils.data.DataLoader(testset, batch_size=32,
										 shuffle=False, num_workers=0)       

Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz to mnist/MNIST/raw/train-images-idx3-ubyte.gz


HBox(children=(FloatProgress(value=1.0, bar_style='info', max=1.0), HTML(value='')))

Extracting mnist/MNIST/raw/train-images-idx3-ubyte.gz to mnist/MNIST/raw
Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz to mnist/MNIST/raw/train-labels-idx1-ubyte.gz


HBox(children=(FloatProgress(value=1.0, bar_style='info', max=1.0), HTML(value='')))

Extracting mnist/MNIST/raw/train-labels-idx1-ubyte.gz to mnist/MNIST/raw
Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz to mnist/MNIST/raw/t10k-images-idx3-ubyte.gz


HBox(children=(FloatProgress(value=1.0, bar_style='info', max=1.0), HTML(value='')))

Extracting mnist/MNIST/raw/t10k-images-idx3-ubyte.gz to mnist/MNIST/raw
Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz to mnist/MNIST/raw/t10k-labels-idx1-ubyte.gz





HBox(children=(FloatProgress(value=1.0, bar_style='info', max=1.0), HTML(value='')))

Extracting mnist/MNIST/raw/t10k-labels-idx1-ubyte.gz to mnist/MNIST/raw
Processing...
Done!


  return torch.from_numpy(parsed.astype(m[2], copy=False)).view(*s)


## Inspecting the dataloaders

<p>Now you are going to explore a bit the <code>dataloaders</code> you created in the previous exercise. In particular, you will compute the shape of the dataset in addition to the minibatch size.</p>

Instructions
<ul>
<li>Compute the shapes of the <code>trainset</code> and <code>testset</code>.</li>
<li>Print the computed values.</li>
<li>Compute the size of the minibatch for both <code>trainset</code> and <code>testset</code>.</li>
<li>Print the minibatch size.</li>
</ul>

In [None]:
# Compute the shape of the training set and testing set
trainset_shape = train_loader.dataset.data.shape #train_loader.dataset.train_data.shape
testset_shape = test_loader.dataset.data.shape #test_loader.dataset.test_data.shape

# Print the computed shapes
print(trainset_shape, testset_shape)

# Compute the size of the minibatch for training set and testing set
trainset_batchsize = train_loader.batch_size
testset_batchsize = test_loader.batch_size

# Print sizes of the minibatch
print(trainset_batchsize, testset_batchsize)

torch.Size([60000, 28, 28]) torch.Size([10000, 28, 28])
32 32


## Training neural networks

### Building a neural network - again

<div class=""><p>You haven't created a neural network since the end of the first chapter, so this is a good time to build one (practice makes perfect). Build a class for a neural network which will be used to train on the <code>MNIST</code> dataset. The dataset contains images of shape <code>(28, 28, 1)</code>, so you should deduct the size of the input layer. For hidden layer use 200 units, while for output layer use 10 units (1 for each class). For activation function, use <code>relu</code> in a functional way (<code>nn.Functional</code> is already imported as <code>F</code>).</p>
<p>For context, the same net will be trained and used to make predictions in the next two exercises.</p></div>

In [None]:
import torch.nn.functional as F

Instructions
<ul>
<li>Define the class called <code>Net</code> which inherits from <code>nn.Module</code>.</li>
<li>In the <code>__init__()</code> method, define the parameters for the two fully connected layers.</li>
<li>In the <code>.forward()</code> method, do the forward step.</li>
</ul>

In [None]:
# Define the class Net
class Net(nn.Module):
    def __init__(self):    
    	# Define all the parameters of the net
        super(Net, self).__init__()
        self.fc1 = nn.Linear(28 * 28 * 1, 200)
        self.fc2 = nn.Linear(200, 10)

    def forward(self, x):   
    	# Do the forward pass
        x = F.relu(self.fc1(x))
        x = self.fc2(x)
        return x

### Training a neural network

<p>Given the fully connected neural network (called <code>model</code>) which you built in the previous exercise and a train loader called <code>train_loader</code> containing the <code>MNIST</code> dataset (which we created for you), you're to train the net in order to predict the classes of digits. You will use the Adam optimizer to optimize the network, and considering that this is a classification problem you are going to use cross entropy as loss function.</p>

In [None]:
import torch.optim as optim

Instructions
<ul>
<li>Instantiate the Adam optimizer with learning rate <code>3e-4</code> and instantiate Cross-Entropy as loss function.</li>
<li>Complete a forward pass on the neural network using the input <code>data</code>.</li>
<li>Using backpropagation, compute the gradients of the weights, and then change the weights using the <code>Adam</code> optimizer.</li>
</ul>

In [None]:
# Instantiate the Adam optimizer and Cross-Entropy loss function
model = Net()   
optimizer = optim.Adam(model.parameters(), lr=3e-4)
criterion = nn.CrossEntropyLoss()
  
for batch_idx, data_target in enumerate(train_loader):
    data = data_target[0]
    target = data_target[1]
    data = data.view(-1, 28 * 28)
    optimizer.zero_grad()

    # Complete a forward pass
    output = model(data)

    # Compute the loss, gradients and change the weights
    loss = criterion(output, target)
    loss.backward()
    optimizer.step()

### Using the network to make predictions

<p>Now that you have trained the network, use it to make predictions for the data in the testing set. The network is called <code>model</code> (same as in the previous exercise), and the loader is called <code>test_loader</code>. We have already initialized variables <code>total</code> and <code>correct</code> to <code>0</code>.</p>

In [None]:
correct, total = 0, 0

Instructions
<ul>
<li>Set the network in testing (eval) mode.</li>
<li>Put each image into a vector using <code>inputs.view(-1, number_of_features)</code> where the number of features should be deducted by multiplying spatial dimensions (shape) of the image.</li>
<li>Do the forward pass and put the predictions in <code>output</code> variable.</li>
</ul>

In [None]:
# Set the model in eval mode
model.eval() #model.eval

for i, data in enumerate(test_loader, 0):
    inputs, labels = data
    
    # Put each image into a vector
    inputs = inputs.view(-1, 28*28*1)
    
    # Do the forward pass and get the predictions
    outputs = model(inputs)
    _, outputs = torch.max(outputs.data, 1)
    total += labels.size(0)
    correct += (outputs == labels).sum().item()
print('The testing set accuracy of the network is: %d %%' % (100 * correct / total))

The testing set accuracy of the network is: 95 %
