# Neural networks
- Let us see the differences between neural networks which apply ReLU and those which do not apply ReLU


## 순서
1. Calculate the first and second hidden layer by multiplying the appropriate inputs with the corresponding weights.
2. Calculate and print the results of the output.
3. Set weight_composed_1 to the product of weight_1 with weight_2, then set weight to the product of weight_composed_1 with weight_3.
4. Calculate and print the output.




In [None]:
# Calculate the first and second hidden layer
hidden_1 = torch.matmul(input_layer, weight_1)
hidden_2 = torch.matmul(hidden_1, weight_2)

# Calculate the output
print(torch.matmul(hidden_2, weight_3))

# Calculate weight_composed_1 and weight
weight_composed_1 = torch.matmul(weight_1, weight_2)
weight = torch.matmul(weight_composed_1, weight_3)

# Multiply input_layer with weight
print(torch.matmul(input_layer, weight))

# result
# tensor([[0.2655, 0.1311, 3.8221, 3.0032]])
# tensor([[0.2655, 0.1311, 3.8221, 3.0032]])

# ReLU activation

In [None]:
# Instantiate non-linearity
relu = nn.ReLU()

# Apply non-linearity on the hidden layers
hidden_1_activated = relu(torch.matmul(input_layer, weight_1))
hidden_2_activated = relu(torch.matmul(hidden_1_activated, weight_2))
print(torch.matmul(hidden_2_activated, weight_3))

# Apply non-linearity in the product of first two weights. 
weight_composed_1_activated = relu(torch.matmul(weight_1, weight_2))

# Multiply `weight_composed_1_activated` with `weight_3
weight = torch.matmul(weight_composed_1_activated, weight_3)

# Multiply input_layer with weight
print(torch.matmul(input_layer, weight))

# result
# tensor([[-0.2770, -0.0345, -0.1410, -0.0664]])
# tensor([[-0.2115, -0.4782,  4.0437,  3.0415]])

# ReLU activation again

In [None]:
# Instantiate ReLU activation function as relu
relu = nn.ReLU()

# Initialize weight_1 and weight_2 with random numbers
weight_1 = torch.rand(4, 6)
weight_2 = torch.rand(6, 2)

# Multiply input_layer with weight_1
hidden_1 = torch.matmul(input_layer, weight_1)

# Apply ReLU activation function over hidden_1 and multiply with weight_2
hidden_1_activated = relu(hidden_1)
print(torch.matmul(hidden_1_activated, weight_2))

# Loss functions
- 각 항을 지수화하여(e를 각 점수의 거듭제곱으로 설정), 합이 1이 아닌 숫자인 비정규화 확률을 제공합니다.
- 그런 다음 각 항을 모든 항의 합으로 나눈 분모에서, 예를 들어 비정규화 확률의 합은 188.68이므로 24.5입니다.
- 마지막으로 교차 엔트로피 손실을 계산합니다. 올바른 클래스의 확률에 대한 마이너스 로그(밑수 e 포함)입니다. -ln(0.13)이 2.0404임을 스스로 계산할 수 있습니다. 주의할 점: 올바른 클래스의 확률이 1이고 손실이 0이면 네트워크가 완벽하게 작동하는 것입니다. 확률이 0에 가까우면 손실이 크고 무한할 수 있으며 예측이 빗나가게 됩니다.

In [None]:
# Initialize the scores and ground truth
logits = torch.tensor([[-1.2, 0.12, 4.8]])
ground_truth = torch.tensor([2])

# Instantiate cross entropy loss
criterion = nn.CrossEntropyLoss()

# Compute and print the loss
loss = criterion(logits, ground_truth)
print(loss)

In [None]:
# Import torch and torch.nn
import torch
import torch.nn as nn

# Initialize logits and ground truth
logits = torch.rand(1, 1000)
ground_truth = torch.tensor([111])

# Instantiate cross-entropy loss
criterion = nn.CrossEntropyLoss()

# Calculate and print the loss
loss = criterion(logits, ground_truth)
print(loss)

# Preparing a dataset in PyTorch
- torchvision (a package which deals with datasets and pretrained neural nets) 
- torch.utils.data. From torchvision submodule

In [None]:
# Transform the data to torch tensors and normalize it to have mean is 0.1307 and std is 0.3081.
# Prepare the trainset and the testset.
# Prepare the dataloaders for training and testing so that only 32 pictures are processed at a time and the training data is shuffled each time.

# Transform the data to torch tensors and normalize it 
transform = transforms.Compose([transforms.ToTensor(),
								transforms.Normalize((0.1307), ((0.3081)))])

# Prepare training set and testing set
trainset = torchvision.datasets.MNIST('mnist', train=True, 
									  download=True, transform=transform)
testset = torchvision.datasets.MNIST('mnist', train=False,
			   download=True, transform=transform)

# Prepare training loader and testing loader
trainloader = torch.utils.data.DataLoader(trainset, batch_size=32,
                                          shuffle=True, num_workers=0)
testloader = torch.utils.data.DataLoader(testset, batch_size=32,
										 shuffle=False, num_workers=0) 

In [None]:
# Compute the shape of the training set and testing set
trainset_shape = trainloader.dataset.train_data.shape
testset_shape = testloader.dataset.test_data.shape

# Print the computed shapes
print(trainset_shape, testset_shape)

# Compute the size of the minibatch for training set and testing set
trainset_batchsize = trainloader.batch_size
testset_batchsize = testloader.batch_size

# Print sizes of the minibatch
print(trainset_batchsize, testset_batchsize)

# Training neural networks
## Recipe for training neural network
1. Prepare the data loaders
2.  Build a neural network
Loop over
- do a forward pass
- calculate loss function
-  calculate loss gradients
- change the weights based on gradients 

In [2]:
import torch
import torch.nn as nn

# Define the class Net
class Net(nn.Module):
    def __init__(self):    
    	# Define all the parameters of the net
        super(Net, self).__init__()
        self.fc1 = nn.Linear(28 * 28 * 1, 200)
        self.fc2 = nn.Linear(200, 10)

    def forward(self, x):   
    	# Do the forward pass
        x = F.relu(self.fc1(x))
        x = self.fc2(x)
        return x

Given the fully connected neural network (called model) which you built in the previous exercise and a train loader called train_loader containing the MNIST dataset (which we created for you), you're to train the net in order to predict the classes of digits. You will use the Adam optimizer to optimize the network, and considering that this is a classification problem you are going to use cross entropy as loss function.

In [None]:
# Instantiate the Adam optimizer and Cross-Entropy loss function
model = Net()   
optimizer = optim.Adam(model.parameters(), lr=3e-4)
criterion = nn.CrossEntropyLoss()
  
for batch_idx, data_target in enumerate(train_loader):
    data = data_target[0] # input
    target = data_target[1] # labels
    data = data.view(-1, 28 * 28)
    optimizer.zero_grad()

    # forward, backward, optimize
    # Complete a forward pass
    output = model(data)

    # Compute the loss, gradients and change the weights
    loss = criterion(output, target)
    loss.backward()
    optimizer.step()

In [None]:
# Set the model in eval mode
model.eval()

for i, data in enumerate(test_loader, 0):
    inputs, labels = data
    
    # Put each image into a vector
    inputs = inputs.view(-1, 28*28)
    
    # Do the forward pass and get the predictions
    outputs = model(inputs)
    _, outputs = torch.max(outputs.data, 1)
    total += labels.size(0)
    correct += (outputs == labels).sum().item()
print('The testing set accuracy of the network is: %d %%' % (100 * correct / total))