<a id="3"></a> <br>
### Logistic Regression
- Linear regression is not good at classification.
- We use logistic regression for classification.
- linear regression + logistic function(softmax) = logistic regression
- Check my deep learning tutorial. There is detailed explanation of logistic regression. 
    - https://www.kaggle.com/kanncaa1/deep-learning-tutorial-for-beginners
- **Steps of Logistic Regression**
    1. Import Libraries
    1. Prepare Dataset
        - We use MNIST dataset.
        - There are 28*28 images and 10 labels from 0 to 9
        - Data is not normalized so we divide each image to 255 that is basic normalization for images.
        - In order to split data, we use train_test_split method from sklearn library
        - Size of train data is 80% and size of test data is 20%.
        - Create feature and target tensors. At the next parts we create variable from these tensors. As you remember we need to define variable for accumulation of gradients.
        - batch_size = batch size means is that for example we have data and it includes 1000 sample. We can train 1000 sample in a same time or we can divide it 10 groups which include 100 sample and train 10 groups in order. Batch size is the group size. For example, I choose batch_size = 100, that means in order to train all data only once we have 336 groups. We train each groups(336) that have batch_size(quota) 100. Finally we train 33600 sample one time.
        - epoch: 1 epoch means training all samples one time.
        - In our example: we have 33600 sample to train and we decide our batch_size is 100. Also we decide epoch is 29(accuracy achieves almost highest value when epoch is 29). Data is trained 29 times. Question is that how many iteration do I need? Lets calculate: 
            - training data 1 times = training 33600 sample (because data includes 33600 sample) 
            - But we split our data 336 groups(group_size = batch_size = 100) our data 
            - Therefore, 1 epoch(training data only once) takes 336 iteration
            - We have 29 epoch, so total iterarion is 9744(that is almost 10000 which I used)
        - TensorDataset(): Data set wrapping tensors. Each sample is retrieved by indexing tensors along the first dimension.
        - DataLoader(): It combines dataset and sample. It also provides multi process iterators over the dataset.
        - Visualize one of the images in dataset
    1. Create Logistic Regression Model
        - Same with linear regression.
        - However as you expect, there should be logistic function in model right?
        - In pytorch, logistic function is in the loss function where we will use at next parts.
    1. Instantiate Model
        - input_dim = 28*28 # size of image px*px
        - output_dim = 10  # labels 0,1,2,3,4,5,6,7,8,9
        - create model
    1. Instantiate Loss 
        - Cross entropy loss
        - It calculates loss that is not surprise :)
        - It also has softmax(logistic function) in it.
    1. Instantiate Optimizer 
        - SGD Optimizer
    1. Traning the Model
    1. Prediction
- As a result, as you can see from plot, while loss decreasing, accuracy(almost 85%) is increasing and our model is learning(training).    

In [1]:
import torch
import torch.nn as nn
import torch.optim as optim
from torchvision import datasets, transforms
from torch.utils.data import DataLoader


In [2]:
# Hyperparameters
batch_size = 64
learning_rate = 1e-3
num_epochs = 20

# Transform: Normalize and Flatten Images
transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Lambda(lambda x: x.view(-1))  # Flatten the image
])

# Load MNIST Dataset
train_dataset = datasets.MNIST(root='./data', train=True, transform=transform, download=True)
train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)

test_dataset = datasets.MNIST(root='./data', train=False, transform=transform, download=True)
test_loader = DataLoader(test_dataset, batch_size=batch_size, shuffle=True)

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(device)

cuda


In [5]:
# Create Logistic Regression Model
class LogisticRegressionModel(nn.Module):
    def __init__(self, input_dim, output_dim):
        super(LogisticRegressionModel, self).__init__()
        # Linear part
        self.linear = nn.Linear(input_dim, output_dim)
        # There should be logistic function right?
        # However logistic function in pytorch is in loss function
        # So actually we do not forget to put it, it is only at next parts
    
    def forward(self, x):
        out = self.linear(x)
        return out

# Instantiate Model Class
input_dim = 28*28 # size of image px*px
output_dim = 10  # labels 0,1,2,3,4,5,6,7,8,9

# create logistic regression model
model = LogisticRegressionModel(input_dim, output_dim).to(device)

# Cross Entropy Loss  
error = nn.CrossEntropyLoss()

# SGD Optimizer 
learning_rate = 0.001
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)

In [None]:
# Traning the Model
for epoch in range(num_epochs):
    model.train()
    running_loss = 0.0
    for (images, labels) in train_loader:
        images = images.to(device)
        labels = labels.to(device)

        outputs = model(images)
        loss = error(outputs, labels)
        
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

    running_loss += loss.item()
        
    # Calculate Accuracy         
    correct = 0
    total = 0
    # Predict test dataset
    for images, labels in test_loader: 
        images = images.to(device)
        labels = labels.to(device)

        outputs = model(images)
        predicted = torch.max(outputs.data, 1)[1]
        total += len(labels)
        correct += (predicted == labels).sum()
    
    accuracy = 100 * correct / float(total)
    avg_loss = running_loss / len(train_loader)
    print(f"Epoch [{epoch}/{num_epochs}], Loss: {avg_loss:.4f}, Acc: {accuracy:.4f}%")

In [9]:
# visualization
# plt.plot(iteration_list,loss_list)
# plt.xlabel("Number of iteration")
# plt.ylabel("Loss")
# plt.title("Logistic Regression: Loss vs Number of iteration")
# plt.show()

<a id="4"></a> <br>
### Artificial Neural Network (ANN)
- Logistic regression is good at classification but when complexity(non linearity) increases, the accuracy of model decreases.
- Therefore, we need to increase complexity of model.
- In order to increase complexity of model, we need to add more non linear functions as hidden layer. 
- I am saying again that if you do not know what is artificial neural network check my deep learning tutorial because I will not explain neural network detailed here, only explain pytorch.
- Artificial Neural Network tutorial: https://www.kaggle.com/kanncaa1/deep-learning-tutorial-for-beginners
- What we expect from artificial neural network is that when complexity increases, we use more hidden layers and our model can adapt better. As a result accuracy increase.
- **Steps of ANN:**
    1. Import Libraries
        - In order to show you, I import again but we actually imported them at previous parts.
    1. Prepare Dataset
        - Totally same with previous part(logistic regression).
        - We use same dataset so we only need train_loader and test_loader. 
        - We use same batch size, epoch and iteration numbers.
    1. Create ANN Model
        - We add 3 hidden layers.
        - We use ReLU, Tanh and ELU activation functions for diversity.
    1. Instantiate Model Class
        - input_dim = 28*28 # size of image px*px
        - output_dim = 10  # labels 0,1,2,3,4,5,6,7,8,9
        - Hidden layer dimension is 150. I only choose it as 150 there is no reason. Actually hidden layer dimension is hyperparameter and it should be chosen and tuned. You can try different values for hidden layer dimension and observe the results.
        - create model
    1. Instantiate Loss
        - Cross entropy loss
        - It also has softmax(logistic function) in it.
    1. Instantiate Optimizer
        - SGD Optimizer
    1. Traning the Model
    1. Prediction
- As a result, as you can see from plot, while loss decreasing, accuracy is increasing and our model is learning(training). 
- Thanks to hidden layers model learnt better and accuracy(almost 95%) is better than accuracy of logistic regression model.

In [3]:
# Create ANN Model
class ANNModel(nn.Module):
    def __init__(self):
        super(ANNModel, self).__init__()

        self.fc1 = nn.Linear(28 * 28, 128) 
        self.relu1 = nn.ReLU()
        self.fc2 = nn.Linear(128, 64)
        self.relu2 = nn.ReLU()
        self.fc3 = nn.Linear(64, 32)
        self.relu3 = nn.ReLU()
        self.fc4 = nn.Linear(32, 10)  
        
    def forward(self, x):
        out = self.fc1(x)
        out = self.relu1(out)
        out = self.fc2(out)
        out = self.relu2(out)
        out = self.fc3(out)
        out = self.relu3(out)
        out = self.fc4(out)
        return out


model = ANNModel().to(device)
error = nn.CrossEntropyLoss()
learning_rate = 0.02
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)

In [4]:
num_epochs = 50

# Traning the Model
for epoch in range(num_epochs):
    model.train()
    running_loss = 0.0
    for (images, labels) in train_loader:
        images = images.to(device)
        labels = labels.to(device)
        
        outputs = model(images)
        loss = error(outputs, labels)
        
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

    running_loss += loss.item()
        
    # Calculate Accuracy         
    correct = 0
    total = 0
    # Predict test dataset
    for images, labels in test_loader: 
        images = images.to(device)
        labels = labels.to(device)

        outputs = model(images)
        predicted = torch.max(outputs.data, 1)[1]
        total += len(labels)
        correct += (predicted == labels).sum()
    
    accuracy = 100 * correct / float(total)
    avg_loss = running_loss / len(train_loader)
    print(f"Epoch [{epoch}/{num_epochs}], Loss: {avg_loss:.4f}, Acc: {accuracy:.4f}%")

Epoch [0/50], Loss: 0.0007, Acc: 78.9800%
Epoch [1/50], Loss: 0.0005, Acc: 88.5000%
Epoch [2/50], Loss: 0.0003, Acc: 91.3500%
Epoch [3/50], Loss: 0.0002, Acc: 92.5000%
Epoch [4/50], Loss: 0.0001, Acc: 93.5700%
Epoch [5/50], Loss: 0.0002, Acc: 94.4400%
Epoch [6/50], Loss: 0.0001, Acc: 95.0800%
Epoch [7/50], Loss: 0.0001, Acc: 95.5800%
Epoch [8/50], Loss: 0.0002, Acc: 95.7200%
Epoch [9/50], Loss: 0.0002, Acc: 95.9900%
Epoch [10/50], Loss: 0.0000, Acc: 96.3400%
Epoch [11/50], Loss: 0.0001, Acc: 96.3700%
Epoch [12/50], Loss: 0.0006, Acc: 96.8500%
Epoch [13/50], Loss: 0.0000, Acc: 97.1100%
Epoch [14/50], Loss: 0.0000, Acc: 96.7500%
Epoch [15/50], Loss: 0.0001, Acc: 96.8700%
Epoch [16/50], Loss: 0.0000, Acc: 97.3300%
Epoch [17/50], Loss: 0.0000, Acc: 97.2600%
Epoch [18/50], Loss: 0.0000, Acc: 97.4000%
Epoch [19/50], Loss: 0.0000, Acc: 97.2400%
Epoch [20/50], Loss: 0.0001, Acc: 97.2200%
Epoch [21/50], Loss: 0.0000, Acc: 97.4500%
Epoch [22/50], Loss: 0.0000, Acc: 97.3300%
Epoch [23/50], Loss: 

In [18]:
# # visualization loss 
# plt.plot(iteration_list,loss_list)
# plt.xlabel("Number of iteration")
# plt.ylabel("Loss")
# plt.title("ANN: Loss vs Number of iteration")
# plt.show()

# # visualization accuracy 
# plt.plot(iteration_list,accuracy_list,color = "red")
# plt.xlabel("Number of iteration")
# plt.ylabel("Accuracy")
# plt.title("ANN: Accuracy vs Number of iteration")
# plt.show()

<a id="5"></a> <br>
### Convolutional Neural Network (CNN)
- CNN is well adapted to classify images.
- You can learn CNN basics: https://www.kaggle.com/kanncaa1/convolutional-neural-network-cnn-tutorial
- **Steps of CNN:**
    1. Import Libraries
    1. Prepare Dataset
        - Totally same with previous parts.
        - We use same dataset so we only need train_loader and test_loader. 
    1. Convolutional layer: 
        - Create feature maps with filters(kernels).
        - Padding: After applying filter, dimensions of original image decreases. However, we want to preserve as much as information about the original image. We can apply padding to increase dimension of feature map after convolutional layer.
        - We use 2 convolutional layer.
        - Number of feature map is out_channels = 16
        - Filter(kernel) size is 5*5
    1. Pooling layer: 
        - Prepares a condensed feature map from output of convolutional layer(feature map) 
        - 2 pooling layer that we will use max pooling.
        - Pooling size is 2*2
    1. Flattening: Flats the features map
    1. Fully Connected Layer: 
        - Artificial Neural Network that we learnt at previous part.
        - Or it can be only linear like logistic regression but at the end there is always softmax function.
        - We will not use activation function in fully connected layer.
        - You can think that our fully connected layer is logistic regression.
        - We combine convolutional part and logistic regression to create our CNN model.
    1. Instantiate Model Class
        - create model
    1. Instantiate Loss
        - Cross entropy loss
        - It also has softmax(logistic function) in it.
    1. Instantiate Optimizer
        - SGD Optimizer
    1. Traning the Model
    1. Prediction
- As a result, as you can see from plot, while loss decreasing, accuracy is increasing and our model is learning(training). 
- Thanks to convolutional layer, model learnt better and accuracy(almost 98%) is better than accuracy of ANN. Actually while tuning hyperparameters, increase in iteration and expanding convolutional neural network can increase accuracy but it takes too much running time that we do not want at kaggle.   
        

In [5]:
# Create CNN Model
class CNNModel(nn.Module):
    def __init__(self):
        super(CNNModel, self).__init__()
        
        self.cnn1 = nn.Conv2d(in_channels=1, out_channels=16, kernel_size=5, stride=1, padding=0)
        self.relu1 = nn.ReLU()
        self.maxpool1 = nn.MaxPool2d(kernel_size=2)
        self.cnn2 = nn.Conv2d(in_channels=16, out_channels=32, kernel_size=5, stride=1, padding=0)
        self.relu2 = nn.ReLU()
        self.maxpool2 = nn.MaxPool2d(kernel_size=2)
        self.fc1 = nn.Linear(32 * 4 * 4, 10) 
    
    def forward(self, x):
        out = self.cnn1(x)
        out = self.relu1(out)
        out = self.maxpool1(out)
        out = self.cnn2(out)
        out = self.relu2(out)
        out = self.maxpool2(out)
        out = out.view(out.size(0), -1)
        out = self.fc1(out)
        return out


model = CNNModel().to(device)
error = nn.CrossEntropyLoss()
learning_rate = 0.02
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)

In [6]:
# Traning the Model
for epoch in range(num_epochs):
    model.train()
    running_loss = 0.0
    for (images, labels) in train_loader:
        images = images.to(device).view(-1, 1, 28, 28)
        labels = labels.to(device)
        
        outputs = model(images)
        loss = error(outputs, labels)
        
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

    running_loss += loss.item()
        
    # Calculate Accuracy         
    correct = 0
    total = 0
    # Predict test dataset
    for images, labels in test_loader: 
        images = images.to(device).view(-1, 1, 28, 28)
        labels = labels.to(device)

        outputs = model(images)
        predicted = torch.max(outputs.data, 1)[1]
        total += len(labels)
        correct += (predicted == labels).sum()
    
    accuracy = 100 * correct / float(total)
    avg_loss = running_loss / len(train_loader)
    print(f"Epoch [{epoch}/{num_epochs}], Loss: {avg_loss:.4f}, Acc: {accuracy:.4f}%")

Epoch [0/50], Loss: 0.0000, Acc: 95.3700%
Epoch [1/50], Loss: 0.0001, Acc: 96.3400%
Epoch [2/50], Loss: 0.0001, Acc: 97.4300%
Epoch [3/50], Loss: 0.0000, Acc: 97.9900%
Epoch [4/50], Loss: 0.0001, Acc: 97.5200%
Epoch [5/50], Loss: 0.0000, Acc: 98.0100%
Epoch [6/50], Loss: 0.0000, Acc: 98.2500%
Epoch [7/50], Loss: 0.0000, Acc: 98.2900%
Epoch [8/50], Loss: 0.0000, Acc: 98.4300%
Epoch [9/50], Loss: 0.0001, Acc: 97.9200%
Epoch [10/50], Loss: 0.0000, Acc: 98.5000%
Epoch [11/50], Loss: 0.0001, Acc: 98.4400%
Epoch [12/50], Loss: 0.0000, Acc: 98.4200%
Epoch [13/50], Loss: 0.0001, Acc: 98.3900%
Epoch [14/50], Loss: 0.0001, Acc: 98.5400%
Epoch [15/50], Loss: 0.0000, Acc: 98.6100%
Epoch [16/50], Loss: 0.0001, Acc: 98.5600%
Epoch [17/50], Loss: 0.0000, Acc: 98.6400%
Epoch [18/50], Loss: 0.0000, Acc: 98.6800%
Epoch [19/50], Loss: 0.0000, Acc: 98.7300%
Epoch [20/50], Loss: 0.0000, Acc: 98.6900%
Epoch [21/50], Loss: 0.0000, Acc: 98.8000%
Epoch [22/50], Loss: 0.0000, Acc: 98.6800%
Epoch [23/50], Loss: 

In [29]:
# # visualization loss 
# plt.plot(iteration_list,loss_list)
# plt.xlabel("Number of iteration")
# plt.ylabel("Loss")
# plt.title("CNN: Loss vs Number of iteration")
# plt.show()

# # visualization accuracy 
# plt.plot(iteration_list,accuracy_list,color = "red")
# plt.xlabel("Number of iteration")
# plt.ylabel("Accuracy")
# plt.title("CNN: Accuracy vs Number of iteration")
# plt.show()