In [None]:
# Sources:
# https://pytorch.org/tutorials/beginner/basics/intro.html
# https://pytorch.org/tutorials/beginner/basics/data_tutorial.html
# https://pytorch.org/tutorials/beginner/basics/buildmodel_tutorial.html
# https://pytorch.org/tutorials/beginner/basics/optimization_tutorial.html

import torch
from torch import nn
from torch.utils.data import DataLoader, Dataset
from torchvision import datasets
from torchvision.transforms import ToTensor, Lambda, Compose

# Datasets and data loaders
Code for processing data samples can get messy and hard to maintain; we ideally want our dataset code to be decoupled from our model training code for better readability and modularity. PyTorch provides two data primitives: `torch.utils.data.DataLoader` and `torch.utils.data.Dataset` that allow you to use pre-loaded datasets as well as your own data. `Dataset` stores the samples and their corresponding labels, and `DataLoader` wraps an iterable around the Dataset to enable easy access to the samples.

PyTorch domain libraries provide a number of pre-loaded datasets (such as FashionMNIST) that subclass `torch.utils.data.Dataset` and implement functions specific to the particular data. They can be used to prototype and benchmark your model. You can find them here: [Image Datasets](https://pytorch.org/vision/stable/datasets.html), [Text Datasets](https://pytorch.org/text/stable/datasets.html), and [Audio Datasets](https://pytorch.org/audio/stable/datasets.html).

### Loading a Dataset
Here is an example of how to load the [Fashion-MNIST](https://research.zalando.com/project/fashion_mnist/fashion_mnist/) dataset from TorchVision. Fashion-MNIST is a dataset of Zalando’s article images consisting of 60,000 training examples and 10,000 test examples. Each example comprises a 28×28 grayscale image and an associated label from one of 10 classes.

We load the [FashionMNIST Dataset](https://pytorch.org/vision/stable/datasets.html#fashion-mnist) with the following parameters:
* `root` is the path where the train/test data is stored,
* `train` specifies training or test dataset,
* `download=True` downloads the data from the internet if it’s not available at root.
* `transform` and `target_transform` specify the feature and label transformations

In [None]:
training_data = datasets.FashionMNIST(
    root="data",
    train=True,
    download=True,
    transform=ToTensor()
)

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-images-idx3-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-images-idx3-ubyte.gz to data/FashionMNIST/raw/train-images-idx3-ubyte.gz


  0%|          | 0/26421880 [00:00<?, ?it/s]

Extracting data/FashionMNIST/raw/train-images-idx3-ubyte.gz to data/FashionMNIST/raw

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-labels-idx1-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-labels-idx1-ubyte.gz to data/FashionMNIST/raw/train-labels-idx1-ubyte.gz


  0%|          | 0/29515 [00:00<?, ?it/s]

Extracting data/FashionMNIST/raw/train-labels-idx1-ubyte.gz to data/FashionMNIST/raw

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-images-idx3-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-images-idx3-ubyte.gz to data/FashionMNIST/raw/t10k-images-idx3-ubyte.gz


  0%|          | 0/4422102 [00:00<?, ?it/s]

Extracting data/FashionMNIST/raw/t10k-images-idx3-ubyte.gz to data/FashionMNIST/raw

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-labels-idx1-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-labels-idx1-ubyte.gz to data/FashionMNIST/raw/t10k-labels-idx1-ubyte.gz


  0%|          | 0/5148 [00:00<?, ?it/s]

Extracting data/FashionMNIST/raw/t10k-labels-idx1-ubyte.gz to data/FashionMNIST/raw



In [None]:
test_data = datasets.FashionMNIST(
    root="data",
    train=False,
    download=True,
    transform=ToTensor()
)

We pass the Dataset as an argument to DataLoader. This wraps an iterable over our dataset, and supports automatic batching, sampling, shuffling and multiprocess data loading. Here we define a batch size of 64, i.e. each element in the dataloader iterable will return a batch of 64 features and labels.

In [None]:
batch_size = 64

# Create data loaders.
train_dataloader = DataLoader(training_data, batch_size=batch_size)
test_dataloader = DataLoader(test_data, batch_size=batch_size)

for X, y in test_dataloader:
    print("Shape of X [N, C, H, W]: ", X.shape)
    print("Shape of y: ", y.shape, y.dtype)
    break

Shape of X [N, C, H, W]:  torch.Size([64, 1, 28, 28])
Shape of y:  torch.Size([64]) torch.int64


# Creating a Custom Dataset for your files
A custom Dataset class must implement three functions: `__init__`, `__len__`, and `__getitem__`. Take a look at this implementation; the FashionMNIST images are stored in a directory `img_dir`, and their labels are stored separately in a CSV file `annotations_file`.

In the next sections, we’ll break down what’s happening in each of these functions.

In [None]:
import os
import pandas as pd
from torchvision.io import read_image

class CustomImageDataset(Dataset):
    def __init__(self, annotations_file, img_dir, transform=None, target_transform=None):
        self.img_labels = pd.read_csv(annotations_file)
        self.img_dir = img_dir
        self.transform = transform
        self.target_transform = target_transform

    def __len__(self):
        return len(self.img_labels)

    def __getitem__(self, idx):
        img_path = os.path.join(self.img_dir, self.img_labels.iloc[idx, 0])
        image = read_image(img_path)
        label = self.img_labels.iloc[idx, 1]
        if self.transform:
            image = self.transform(image)
        if self.target_transform:
            label = self.target_transform(label)
        return image, label

### `__init__`
The `__init__` function is run once when instantiating the Dataset object. We initialize the directory containing the images, the annotations file, and both transforms (covered in more detail in the next section).

The labels.csv file looks like:


```
tshirt1.jpg, 0
tshirt2.jpg, 0
......
ankleboot999.jpg, 9
```



In [None]:
def __init__(self, annotations_file, img_dir, transform=None, target_transform=None):
    self.img_labels = pd.read_csv(annotations_file, names=['file_name', 'label'])
    self.img_dir = img_dir
    self.transform = transform
    self.target_transform = target_transform

### `__len__`
The `__len__` function returns the number of samples in our dataset.
Example:

In [None]:
def __len__(self):
    return len(self.img_labels)

### `__getitem__`
The `__getitem__` function loads and returns a sample from the dataset at the given index `idx`. Based on the index, it identifies the image’s location on disk, converts that to a tensor using `read_image`, retrieves the corresponding label from the csv data in `self.img_labels`, calls the transform functions on them (if applicable), and returns the tensor image and corresponding label in a tuple.

In [None]:
def __getitem__(self, idx):
    img_path = os.path.join(self.img_dir, self.img_labels.iloc[idx, 0])
    image = read_image(img_path)
    label = self.img_labels.iloc[idx, 1]
    if self.transform:
        image = self.transform(image)
    if self.target_transform:
        label = self.target_transform(label)
    return image, label

# Creating Models

To define a neural network in PyTorch, we create a class that inherits from [nn.Module](https://pytorch.org/docs/stable/generated/torch.nn.Module.html). We define the layers of the network in the `__init__` function and specify how data will pass through the network in the `forward` function. To accelerate operations in the neural network, we move it to the GPU if available.

In [None]:
# Get cpu or gpu device for training.
device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"Using {device} device")

# Define model
class NeuralNetwork(nn.Module):
    def __init__(self):
        super(NeuralNetwork, self).__init__()
        self.flatten = nn.Flatten()
        self.linear_relu_stack = nn.Sequential(
            nn.Linear(28*28, 512),
            nn.ReLU(),
            nn.Linear(512, 512),
            nn.ReLU(),
            nn.Linear(512, 10)
        )

    def forward(self, x):
        x = self.flatten(x)
        logits = self.linear_relu_stack(x)
        return logits

model = NeuralNetwork().to(device)
print(model)

Using cuda device
NeuralNetwork(
  (flatten): Flatten(start_dim=1, end_dim=-1)
  (linear_relu_stack): Sequential(
    (0): Linear(in_features=784, out_features=512, bias=True)
    (1): ReLU()
    (2): Linear(in_features=512, out_features=512, bias=True)
    (3): ReLU()
    (4): Linear(in_features=512, out_features=10, bias=True)
  )
)


# Model Layers

Let’s break down the layers in the FashionMNIST model. To illustrate it, we will take a sample minibatch of 3 images of size 28x28 and see what happens to it as we pass it through the network.

In [None]:
input_image = torch.rand(3,28,28)
print(input_image.size())

torch.Size([3, 28, 28])


### nn.Flatten

We initialize the [nn.Flatten](https://pytorch.org/docs/stable/generated/torch.nn.Flatten.html) layer to convert each 2D 28x28 image into a contiguous array of 784 pixel values ( the minibatch dimension (at dim=0) is maintained).

In [None]:
flatten = nn.Flatten()
flat_image = flatten(input_image)
print(flat_image.size())

torch.Size([3, 784])


### nn.Linear
The [linear layer](https://pytorch.org/docs/stable/generated/torch.nn.Linear.html) is a module that applies a linear transformation on the input using its stored weights and biases.

In [None]:
layer1 = nn.Linear(in_features=28*28, out_features=20)
hidden1 = layer1(flat_image)
print(hidden1.size())

torch.Size([3, 20])


### nn.ReLU
Non-linear activations are what create the complex mappings between the model’s inputs and outputs. They are applied after linear transformations to introduce nonlinearity, helping neural networks learn a wide variety of phenomena.

In this model, we use [nn.ReLU](https://pytorch.org/docs/stable/generated/torch.nn.ReLU.html) between our linear layers, but there’s other activations to introduce non-linearity in your model.

In [None]:
print(f"Before ReLU: {hidden1}\n\n")
hidden1 = nn.ReLU()(hidden1)
print(f"After ReLU: {hidden1}")

Before ReLU: tensor([[ 0.5236,  0.5798,  0.1236, -0.3153, -0.3376,  0.1703, -0.6148,  0.3994,
          0.0334, -0.0918,  0.2348, -0.3999, -0.1435, -0.0367, -0.1337,  0.0689,
          0.1211,  0.3558, -0.1990,  0.1541],
        [ 0.1054,  0.5222,  0.1590, -0.5097, -0.1862, -0.0438, -0.3596,  0.1914,
         -0.0064, -0.5752,  0.1239, -0.7025,  0.2334, -0.1341, -0.0578, -0.0324,
          0.4976,  0.6137, -0.1215,  0.2029],
        [ 0.1859,  0.3033,  0.2665, -0.3396, -0.3460, -0.0082, -0.4383,  0.4028,
         -0.2976, -0.3912,  0.0619, -0.4468,  0.0310,  0.0781,  0.0872, -0.0916,
          0.1320,  0.6854, -0.3045,  0.2567]], grad_fn=<AddmmBackward0>)


After ReLU: tensor([[0.5236, 0.5798, 0.1236, 0.0000, 0.0000, 0.1703, 0.0000, 0.3994, 0.0334,
         0.0000, 0.2348, 0.0000, 0.0000, 0.0000, 0.0000, 0.0689, 0.1211, 0.3558,
         0.0000, 0.1541],
        [0.1054, 0.5222, 0.1590, 0.0000, 0.0000, 0.0000, 0.0000, 0.1914, 0.0000,
         0.0000, 0.1239, 0.0000, 0.2334, 0.0000, 0.00

### nn.Sequential
[nn.Sequential](https://pytorch.org/docs/stable/generated/torch.nn.Sequential.html) is an ordered container of modules. The data is passed through all the modules in the same order as defined. You can use sequential containers to put together a quick network like `seq_modules`.

In [None]:
seq_modules = nn.Sequential(
    flatten,
    layer1,
    nn.ReLU(),
    nn.Linear(20, 10)
)
input_image = torch.rand(3,28,28)
logits = seq_modules(input_image)

### nn.Softmax
The last linear layer of the neural network returns *logits* - raw values in [-infty, infty] - which are passed to the [nn.Softmax](https://pytorch.org/docs/stable/generated/torch.nn.Softmax.html) module. The logits are scaled to values [0, 1] representing the model’s predicted probabilities for each class. `dim` parameter indicates the dimension along which the values must sum to 1.

In [None]:
softmax = nn.Softmax(dim=1)
pred_probab = softmax(logits)

# Model Parameters
Many layers inside a neural network are parameterized, i.e. have associated weights and biases that are optimized during training. Subclassing `nn.Module` automatically tracks all fields defined inside your model object, and makes all parameters accessible using your model’s `parameters()` or `named_parameters()` methods.

In this example, we iterate over each parameter, and print its size and a preview of its values.

In [None]:
print("Model structure: ", model, "\n\n")

for name, param in model.named_parameters():
    print(f"Layer: {name} | Size: {param.size()} | Values : {param[:2]} \n")

Model structure:  NeuralNetwork(
  (flatten): Flatten(start_dim=1, end_dim=-1)
  (linear_relu_stack): Sequential(
    (0): Linear(in_features=784, out_features=512, bias=True)
    (1): ReLU()
    (2): Linear(in_features=512, out_features=512, bias=True)
    (3): ReLU()
    (4): Linear(in_features=512, out_features=10, bias=True)
  )
) 


Layer: linear_relu_stack.0.weight | Size: torch.Size([512, 784]) | Values : tensor([[-0.0003,  0.0264,  0.0280,  ..., -0.0015, -0.0327,  0.0347],
        [-0.0032,  0.0302, -0.0270,  ...,  0.0224,  0.0310, -0.0349]],
       device='cuda:0', grad_fn=<SliceBackward0>) 

Layer: linear_relu_stack.0.bias | Size: torch.Size([512]) | Values : tensor([-0.0089, -0.0275], device='cuda:0', grad_fn=<SliceBackward0>) 

Layer: linear_relu_stack.2.weight | Size: torch.Size([512, 512]) | Values : tensor([[ 0.0211,  0.0336, -0.0422,  ...,  0.0378,  0.0379,  0.0138],
        [ 0.0391, -0.0382, -0.0378,  ...,  0.0141,  0.0123, -0.0058]],
       device='cuda:0', grad_fn=<

# Optimizing the Model Parameters

### Hyperparameters
Hyperparameters are adjustable parameters that let you control the model optimization process. Different hyperparameter values can impact model training and convergence rates (read more about hyperparameter tuning)

We define the following hyperparameters for training:



*   **Number of Epochs** - the number times to iterate over the dataset
*   **Batch Size** - the number of data samples propagated through the network before the parameters are updated
*   **Learning Rate** - how much to update models parameters at each batch/epoch. Smaller values yield slow learning speed, while large values may result in unpredictable behavior during training.



In [None]:
learning_rate = 1e-3
batch_size = 64
epochs = 5

### Optimization Loop
Once we set our hyperparameters, we can then train and optimize our model with an optimization loop. Each iteration of the optimization loop is called an epoch.

Each epoch consists of two main parts:
* **The Train Loop** - iterate over the training dataset and try to converge to optimal parameters.
* **The Validation/Test Loop** - iterate over the test dataset to check if model performance is improving.

### Loss Function
When presented with some training data, our untrained network is likely not to give the correct answer. **Loss function** measures the degree of dissimilarity of obtained result to the target value, and it is the loss function that we want to minimize during training. To calculate the loss we make a prediction using the inputs of our given data sample and compare it against the true data label value.

Common loss functions include [nn.MSELoss](https://pytorch.org/docs/stable/generated/torch.nn.MSELoss.html#torch.nn.MSELoss) (Mean Square Error) for regression tasks, and [nn.NLLLoss](https://pytorch.org/docs/stable/generated/torch.nn.NLLLoss.html#torch.nn.NLLLoss) (Negative Log Likelihood) for classification. [nn.CrossEntropyLoss](https://pytorch.org/docs/stable/generated/torch.nn.CrossEntropyLoss.html#torch.nn.CrossEntropyLoss) combines nn.LogSoftmax and nn.NLLLoss.

We pass our model’s output logits to nn.CrossEntropyLoss, which will normalize the logits and compute the prediction error.

In [None]:
# Initialize the loss function
loss_fn = nn.CrossEntropyLoss()

### Optimizer
Optimization is the process of adjusting model parameters to reduce model error in each training step. **Optimization algorithms** define how this process is performed (in this example we use Stochastic Gradient Descent). All optimization logic is encapsulated in the optimizer object. Here, we use the SGD optimizer; additionally, there are many different [optimizers](https://pytorch.org/docs/stable/optim.html) available in PyTorch such as ADAM and RMSProp, that work better for different kinds of models and data.

We initialize the optimizer by registering the model’s parameters that need to be trained, and passing in the learning rate hyperparameter.

In [None]:
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)

Inside the training loop, optimization happens in three steps:
* Call `optimizer.zero_grad()` to reset the gradients of model parameters. Gradients by default add up; to prevent double-counting, we explicitly zero them at each iteration.
* Backpropagate the prediction loss with a call to `loss.backwards()`. PyTorch deposits the gradients of the loss w.r.t. each parameter.
* Once we have our gradients, we call `optimizer.step()` to adjust the parameters by the gradients collected in the backward pass.

### Training and test loops
We define `train_loop` that loops over our optimization code, and `test_loop` that evaluates the model’s performance against our test data.

In [None]:
def train_loop(dataloader, model, loss_fn, optimizer):
    size = len(dataloader.dataset)
    for batch, (X, y) in enumerate(dataloader):
        X, y = X.to(device), y.to(device)
        
        # Compute prediction and loss
        pred = model(X)
        loss = loss_fn(pred, y)

        # Backpropagation
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        if batch % 100 == 0:
            loss, current = loss.item(), batch * len(X)
            print(f"loss: {loss:>7f}  [{current:>5d}/{size:>5d}]")

In [None]:
def test_loop(dataloader, model, loss_fn):
    size = len(dataloader.dataset)
    num_batches = len(dataloader)
    test_loss, correct = 0, 0

    with torch.no_grad():
        for X, y in dataloader:
            X, y = X.to(device), y.to(device)
            
            pred = model(X)
            test_loss += loss_fn(pred, y).item()
            correct += (pred.argmax(1) == y).type(torch.float).sum().item()

    test_loss /= num_batches
    correct /= size
    print(f"Test Error: \n Accuracy: {(100*correct):>0.1f}%, Avg loss: {test_loss:>8f} \n")

We initialize the loss function and optimizer, and pass it to `train_loop` and `test_loop`. Feel free to increase the number of epochs to track the model’s improving performance.

In [None]:
epochs = 10
for t in range(epochs):
    print(f"Epoch {t+1}\n-------------------------------")
    train_loop(train_dataloader, model, loss_fn, optimizer)
    test_loop(test_dataloader, model, loss_fn)
print("Done!")

Epoch 1
-------------------------------
loss: 2.311020  [    0/60000]
loss: 2.300525  [ 6400/60000]
loss: 2.286558  [12800/60000]
loss: 2.273364  [19200/60000]
loss: 2.260027  [25600/60000]
loss: 2.238786  [32000/60000]
loss: 2.236733  [38400/60000]
loss: 2.210154  [44800/60000]
loss: 2.202801  [51200/60000]
loss: 2.165487  [57600/60000]
Test Error: 
 Accuracy: 49.0%, Avg loss: 2.170253 

Epoch 2
-------------------------------
loss: 2.180191  [    0/60000]
loss: 2.172447  [ 6400/60000]
loss: 2.120987  [12800/60000]
loss: 2.128174  [19200/60000]
loss: 2.095242  [25600/60000]
loss: 2.032433  [32000/60000]
loss: 2.054998  [38400/60000]
loss: 1.982041  [44800/60000]
loss: 1.980405  [51200/60000]
loss: 1.904839  [57600/60000]
Test Error: 
 Accuracy: 60.3%, Avg loss: 1.913268 

Epoch 3
-------------------------------
loss: 1.944755  [    0/60000]
loss: 1.916689  [ 6400/60000]
loss: 1.805809  [12800/60000]
loss: 1.835377  [19200/60000]
loss: 1.752264  [25600/60000]
loss: 1.688429  [32000/600

# Saving and loading models


### Saving and loading model weights

PyTorch models store the learned parameters in an internal state dictionary, called `state_dict`. These can be persisted via the `torch.save` method:

In [None]:
torch.save(model.state_dict(), 'model_weights.pth')

To load model weights, you need to create an instance of the same model first, and then load the parameters using `load_state_dict()` method.

In [None]:
model.load_state_dict(torch.load('model_weights.pth'))
model.eval()

NeuralNetwork(
  (flatten): Flatten(start_dim=1, end_dim=-1)
  (linear_relu_stack): Sequential(
    (0): Linear(in_features=784, out_features=512, bias=True)
    (1): ReLU()
    (2): Linear(in_features=512, out_features=512, bias=True)
    (3): ReLU()
    (4): Linear(in_features=512, out_features=10, bias=True)
  )
)

<font color='red'>!!! IMPORTANT !!!</font>

be sure to call `model.eval()` method before inferencing to set the dropout and batch normalization layers to evaluation mode. Failing to do this will yield inconsistent inference results.

### Saving and loading models with shapes
When loading model weights, we needed to instantiate the model class first, because the class defines the structure of a network. We might want to save the structure of this class together with the model, in which case we can pass `model` (and not `model.state_dict()`) to the saving function:

In [None]:
torch.save(model, 'model.pth')

We can then load the model like this:

In [None]:
model = torch.load('model.pth')

# Task for today / homework

Implement a simple CNN for classification of CIFAR10 images using both PyTorch and Keras/TF. Use the same architecture (number of layers, number of filters, number of neurons) and hyperparameters (optimizer, learning rate, batch size, number of epochs). Report the differences in training time, inference time and programming time (how long did it take you to solve this task **without copy-pasting your previous code**). Comment on an ease of use, code readability and your general opinion on both frameworks.

### Tensorflow CNN

In [None]:
from tensorflow.keras.datasets import cifar10
from tensorflow.keras import Sequential
from tensorflow.keras.layers import Convolution2D, MaxPooling2D, Dense, Flatten
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.optimizers import Adam
import time

In [None]:
(x_train, y_train), (x_test, y_test) = cifar10.load_data()

print(x_train.shape)
print(y_train.shape)

(x_train, y_train), (x_test, y_test) = (x_train/255, to_categorical(y_train)), (x_test/255, to_categorical(y_test))

(50000, 32, 32, 3)
(50000, 1)


In [None]:
cnn_tf = Sequential()
cnn_tf.add(Convolution2D(64, (3,3), input_shape=(32,32,3), activation='relu'))
cnn_tf.add(MaxPooling2D((2,2)))
cnn_tf.add(Convolution2D(32, (3,3), activation='relu'))
cnn_tf.add(MaxPooling2D((2,2)))
cnn_tf.add(Convolution2D(16, (3,3), activation='relu'))
cnn_tf.add(MaxPooling2D((2,2)))
cnn_tf.add(Flatten())
cnn_tf.add(Dense(64, 'relu'))
cnn_tf.add(Dense(32, 'relu'))
cnn_tf.add(Dense(10, 'softmax'))

cnn_tf.summary()

Model: "sequential_4"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 conv2d_12 (Conv2D)          (None, 30, 30, 64)        1792      
                                                                 
 max_pooling2d_12 (MaxPoolin  (None, 15, 15, 64)       0         
 g2D)                                                            
                                                                 
 conv2d_13 (Conv2D)          (None, 13, 13, 32)        18464     
                                                                 
 max_pooling2d_13 (MaxPoolin  (None, 6, 6, 32)         0         
 g2D)                                                            
                                                                 
 conv2d_14 (Conv2D)          (None, 4, 4, 16)          4624      
                                                                 
 max_pooling2d_14 (MaxPoolin  (None, 2, 2, 16)        

In [None]:
cnn_tf.compile(optimizer=Adam(learning_rate=0.001),
               loss='categorical_crossentropy',
               metrics='accuracy')

start = time.time()
cnn_tf.fit(x_train, y_train, batch_size=32, epochs=10, validation_split=0.15, verbose=1)
print('CNN_TF model traning time: ', time.time()-start)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
[1.0696358680725098, 0.6151000261306763]
CNN_TF model traning time:  115.95207190513611


### PyTorch

In [None]:
from torch.utils.data import DataLoader
from torchvision import transforms
from torchvision import datasets
from torch import nn
from torch.nn import functional
from torch.optim import Adam
import torch

In [None]:
batch_size = 32

trainset = datasets.CIFAR10(root='data', train=True, download=True, transform=transforms.ToTensor())
trainloader = DataLoader(trainset, batch_size=batch_size)

testset = datasets.CIFAR10(root='data', train=False, download=True, transform=transforms.ToTensor())
testloader = DataLoader(testset, batch_size=batch_size)

for X, y in trainloader:
  print(X.shape)
  print(y.shape)
  break

Files already downloaded and verified
Files already downloaded and verified
torch.Size([32, 3, 32, 32])
torch.Size([32])


In [None]:
class cnnTorch(nn.Module):
  def __init__(self):
    super().__init__()
    self.convolution_stack = nn.Sequential(
      nn.Conv2d(in_channels=3, out_channels=64, kernel_size=3),
      nn.ReLU(),
      nn.MaxPool2d(2),
      nn.Conv2d(in_channels=64, out_channels=32, kernel_size=3),
      nn.ReLU(),
      nn.MaxPool2d(2),
      nn.Conv2d(in_channels=32, out_channels=16, kernel_size=3),
      nn.ReLU(),
      nn.MaxPool2d(2),
      nn.Flatten()
    )
    self.linear_stack = nn.Sequential(
        nn.Linear(2*2*16, 64),
        nn.ReLU(),
        nn.Linear(64, 32),
        nn.ReLU(),
        nn.Linear(32, 10)
    )
  
  def forward(self, x):
    x = self.convolution_stack(x)
    x = self.linear_stack(x)
    return x

cnn_torch = cnnTorch().to('cuda')
print(cnn_torch)

cnnTorch(
  (convolution_stack): Sequential(
    (0): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1))
    (1): ReLU()
    (2): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (3): Conv2d(64, 32, kernel_size=(3, 3), stride=(1, 1))
    (4): ReLU()
    (5): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (6): Conv2d(32, 16, kernel_size=(3, 3), stride=(1, 1))
    (7): ReLU()
    (8): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (9): Flatten(start_dim=1, end_dim=-1)
  )
  (linear_stack): Sequential(
    (0): Linear(in_features=64, out_features=64, bias=True)
    (1): ReLU()
    (2): Linear(in_features=64, out_features=32, bias=True)
    (3): ReLU()
    (4): Linear(in_features=32, out_features=10, bias=True)
  )
)


In [None]:
loss_ce = nn.CrossEntropyLoss()
optimizer = Adam(cnn_torch.parameters(), lr=0.001)

In [None]:
def train_loop():
  train_size = len(trainloader.dataset)
  train_loss, train_correct = 0, 0

  for i, (X, y) in enumerate(trainloader):
    X, y = X.to('cuda'), y.to('cuda')
    pred = cnn_torch(X)
    loss = loss_ce(pred, y)

    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

    train_loss += loss.item()
    train_correct += (pred.argmax(1) == y).type(torch.float).sum().item()

    print(f"\r Epoch {epoch+1}, batch [{i * len(X):>5d}/{train_size:>5d}]", end="")

  loss, acc = train_loss / train_size, train_correct / train_size, 
  print(f"\t loss: {loss:>7f}, acc: {acc:>7f}")

def test_loop():
  test_size = len(testloader.dataset)
  test_loss, correct = 0, 0

  with torch.no_grad():
      for X, y in testloader:
          X, y = X.to('cuda'), y.to('cuda')
            
          pred = cnn_torch(X)
          test_loss += loss_ce(pred, y).item()
          correct += (pred.argmax(1) == y).type(torch.float).sum().item()
    
  test_loss, test_acc = test_loss / len(testloader), correct / test_size
  print(f"\t test loss: {test_loss:>7f}, test acc: {test_acc:>4f}")

In [None]:
start = time.time()
for epoch in range(10):
  train_loop()
test_loop()
print('CNN_Torch model traning time: ', time.time()-start)

 Epoch 1, batch [24992/50000]	 loss: 0.029707, acc: 0.662580
 Epoch 2, batch [24992/50000]	 loss: 0.029430, acc: 0.667280
 Epoch 3, batch [24992/50000]	 loss: 0.029123, acc: 0.667780
 Epoch 4, batch [24992/50000]	 loss: 0.028887, acc: 0.670620
 Epoch 5, batch [24992/50000]	 loss: 0.028678, acc: 0.673460
 Epoch 6, batch [24992/50000]	 loss: 0.028483, acc: 0.675700
 Epoch 7, batch [24992/50000]	 loss: 0.028292, acc: 0.679360
 Epoch 8, batch [24992/50000]	 loss: 0.028056, acc: 0.680560
 Epoch 9, batch [24992/50000]	 loss: 0.027955, acc: 0.683360
 Epoch 10, batch [24992/50000]	 loss: 0.027748, acc: 0.683480
	 test loss: 1.040048, test acc: 0.638800
CNN_Torch model traning time:  246.77141189575195


## Conclusions

We can see implementations of simple CNN models trained on CIFAR10 dataset. First model was created using Tensorflow2 framework, second one using PyTorch. Both models have the same architecture and hyperparameters. 

There are three Conv2d+Maxpool2d layers with ReLU activation functions. There are also three Dense/Linear layers with ReLU (and softmax in the output layer) activation functions. Both models were trained for 10 epochs long with batch size = 32, both using Colab GPU. 

It seems that in that case, model written in Tensorflow was trained faster. Traning process of TF model took 155s, while traning of PyTorch model took 246s. So TF was about 60% faster here, which is huge difference. 

It's intrestring, because actulally I saw some other people comparison of these two frameworks, and PyTorch's performance was generally faster. It means that performance comparison of these two deep learning frameworks depends on many more aspects than only architecture. Maybe running it on Colab provided by Google made some difference. 

It took me about 30 min to create model using TF framework, and about 1 hour to create it using PyTorch. I tried not to copy any other implementation, but in case of PyTorch it was impossible, because It was my first model created using that framework. 

Generally speaking, at this moment, when I'am more familiar with TensorFlow, and just started to learn PyTorch, for me Tensorflow looks easier. It seems to operate on higher level of abstracion, so for me even if It's not "pythonic", it looks faster to prototype and for example it's easier to explain to someone else. 

But I've heard a lot of opionions that PyTorch is generally better, especially after update, that gives a possiblity to connect PyTorch model to tensorboard. Because of that I want to learn PyTorch better, I want to use It more often to get familiar with that framework. 

After some time my opinion will be more reliable, but at this moment I definitely prefer TensorFlow.
