<a href="https://colab.research.google.com/github/erhansozen/Assignments-LearningPortfolio/blob/main/Erhan_S%C3%B6zen_homework_4_NEW.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#Theory

In the following assignment, your task is to complete the MNIST Basics chapter. It is best to repeat everything from last week and try to answer the following questions. Afterwards you have to summarize the learned facts with two programming tasks.

What is "torch.cat()" and ".view(-1, 28*28)" doing in the beginning of the "The MNIST Loss Function" chapter?

✈ In PyTorch, torch.cat() is a function that concatenates tensors along a specified dimension. It is commonly used in artificial neural networks (ANNs) to combine the outputs of multiple layers or to concatenate multiple input tensors. Page 131 on PDF.

✈ We can do this using view, which is a PyTorch method that changes the shape of a tensor without changing its contents. -1 is a special parameter to view that means "make this axis as big as necessary to fit all the data

Can you draw the neuronal network, which is manually trained in chapter "The MNIST Loss Function"?

Why is it not possible to use the accuracy as loss function?

✈ A very small change in the value of a weight will often not actually change the accuracy at all. This means it is not useful to use accuracy as a loss function—if we do, most of the time our gradients will actually be 0, and the model will not be able to learn from that number.

What is the defined `mnist_loss` function doing? 

✈ Takes the mean of the previous tensor

```
def mnist_loss(predictions, targets):
    return torch.where(targets==1, 1-predictions, predictions).mean()
```

Why do we need additionaly the sigmoid() function? What is it technically in our TLU?

Again, what are mini batches, why are we using them and why should they be shuffeld? 

So instead we take a compromise between the two: we calculate the average loss for a few data items at a time. This is called a mini-batch.  That is, you'd be going to the trouble of updating the weights, but taking into account only how that would improve the model's performance on that single item.

Another good reason for using mini-batches rather than calculating the gradient on individual data items is that, in practice, we nearly always do our training on an accelerator such as a GPU. Rather than simply enumerating our dataset in order for every epoch, instead what we normally do is randomly shuffle it on every epoch, before we create mini-batches.

#Practical Part

Try to understand all parts of the code needed to manually train a single TLU/Perceptron, so use and copy all parts of the code from "First Try: Pixel Similarity" to the "Putting it all together" chapter. In the second step, use an optimizer, a second layer, and a ReLU as a hidden activation function to train a simple neural network. When copying the code, think carefully about what you really need and how you can summarize it as compactly as possible. (Probably each attempt requires about 15 lines of code.)

In [2]:
#YOUR TASK: Manually train a single layer perceptron without using an optimizer.
import numpy as np

# Define the step function
def step_function(x):
    return np.where(x > 0, 1, -1)

# Define the training function
def train(X, y, num_epochs, learning_rate):
    # Initialize the weights and bias
    w = np.zeros(X.shape[1])
    b = 0
    
    # Perform the training loop
    for epoch in range(num_epochs):
        for i in range(X.shape[0]):
            # Forward pass
            z = np.dot(X[i], w) + b
            a = step_function(z)
            
            # Backward pass
            if a != y[i]:
                w += learning_rate * y[i] * X[i]
                b += learning_rate * y[i]
    
    # Return the weights and bias
    return w, b

# Define the main function
if __name__ == '__main__':
    # Define the training data
    X = np.array([[1, 1], [1, -1], [-1, 1], [-1, -1]])
    y = np.array([1, -1, -1, -1])
    
    # Train the perceptron
    w, b = train(X, y, num_epochs=10, learning_rate=0.1)
    
    # Test the perceptron
    for i in range(X.shape[0]):
        z = np.dot(X[i], w) + b
        a = step_function(z)
        print('Input: {}, Output: {}'.format(X[i], a))


Input: [1 1], Output: 1
Input: [ 1 -1], Output: -1
Input: [-1  1], Output: -1
Input: [-1 -1], Output: -1


In [6]:
#YOUR TASK: Train a simple two-layer neural network (two perceptrons + hidden activation function) with built-in functions and an optimizer.

import torch
from torch import nn
from torch import optim
from torchvision import datasets, transforms

# Define the network architecture
class Net(nn.Module):
    def __init__(self):
        super().__init__()
        self.fc1 = nn.Linear(28*28, 64) # input layer to hidden layer
        self.fc2 = nn.Linear(64, 10) # hidden layer to output layer
        self.relu = nn.ReLU() # activation function
        
    def forward(self, x):
        x = x.view(x.shape[0], -1) # flatten the input tensor
        x = self.relu(self.fc1(x)) # pass through hidden layer with ReLU activation function
        x = self.fc2(x) # pass through output layer
        return x

# Load the MNIST dataset
transform = transforms.Compose([transforms.ToTensor(),
                                transforms.Normalize((0.5,), (0.5,))])
trainset = datasets.MNIST('~/.pytorch/MNIST_data/', download=True, train=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=64, shuffle=True)

# Define the loss function and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(Net().parameters(), lr=0.1)

# Train the network
net = Net()
for epoch in range(5): # number of epochs
    running_loss = 0
    for images, labels in trainloader: # loop through mini-batches
        optimizer.zero_grad() # zero the gradients
        output = net(images) # forward pass
        loss = criterion(output, labels) # calculate loss
        loss.backward() # backward pass
        optimizer.step() # update weights
        running_loss += loss.item() # calculate running loss
        
    print(f"Epoch {epoch+1} - Training loss: {running_loss/len(trainloader)}")

Epoch 1 - Training loss: 2.3262079646592455
Epoch 2 - Training loss: 2.3262049180866557
Epoch 3 - Training loss: 2.3262290575865237
Epoch 4 - Training loss: 2.326196160651982
Epoch 5 - Training loss: 2.32622232899737
