Build a simple classifier that can tell the difference between fish and cats iterating over the design and how the model is built to make it more and more accurate.

**Traditional challenges**

Determine a set of rules to differentiate a cat from a fish, this set of rules can be describing that a cat has a tail, or that a fish has different colors or scales, and apply these rules to an image to determine what are we looking at. There are other caveats that have to be considered in the model, for example, what happens if we find a Manx cat? While is clearly a cat it doesn't have a tail.

These rules are just going the get more and more complicated to describe all posible scenarios.

What we are after is a function that, givem the input of an image, returns a cat or fish.

**Data**

First, we need data, How much? Depends, the idea for any deep learning technique to work, you need a LOT of data to train the NN is not necessarily true. However, right now we're going to be training from scratch, which often does require access to a large quantity of data. We need a lot of pictures of fish and cats.

A standard collection of images used to train neural networks, called **ImageNet** contains 14 million images and 20,000 image categories. It's the the standard that all image classifiers judge themselves against.

Loading and converting data into formatis that are ready for training can often end up being one of the areas in data science that sucks up far too much of time.

PyTorch has developed a standard conventions of interacting with data that make it faily consistent to work with, whether you're working with images, text, audio or video.

The two main conventions of interacting with data are **datasets**  and **data loaders**. A dataset is a Python class that allows us to get at the data we're supplying to the neural network. A data loader is what feeds data from the dataset into the network.

Looking at the following class, every dataset, no matter whether includes images, audio, text, 3D, stock market info, or whatever, can interact iwth PyTorch if it satisfies this abstract Python class:

In [1]:
# class Dataset(object):
#     def __getitem__(self, index):
#         raise NotImplementedError
#     def __len__(self):
#         raise NotImplementedError

This is fairly straighforward: we have to implement a method that returns the size of our dataset ```(len)```, and implement a method that can retrieve an item from our dataset in a ```(label, tensor)``` pair. This is called by the data loader as it is pushing data into the neural network for training. So we have the body ```__getitem__``` that can take an image and transform it into a tensor and return that and the label back so PyTorch can operate it. This is fine, but you can imagine that this scenario comes up a lot.

**Building a training dataset**

In [2]:
import torch
import torch.nn as nn
import torch.optim as optim
import torch.utils.data
import torch.nn.functional as F
import torchvision
from torchvision import transforms
from PIL import Image, ImageFile

ImageFile.LOAD_TRUNCATED_IMAGES=True

In [3]:
train_data_path = 'C02Dataset/train/'

In [4]:
transforms = transforms.Compose([
    transforms.Resize((64,64)),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485,0.456,0.406],
                        std=[0.229,0.224,0.225])
])

In [5]:
def check_image(path):
    try:
        im = Image.open(path)
        return True
    except:
        return False

In [6]:
train_data = torchvision.datasets.ImageFolder(root=train_data_path, 
                                              transform=transforms, is_valid_file=check_image)

```torchvision``` allows to specify a list of transforms that will be applied to an image before it gets into the NN. The default transform is to take image data and turn in into a tensor ```transforms.ToTensor()```, but we also doing a couple of other things that might not seem obvious.

GPUs are built to be fast at performing calculations that are a standard size. But we probably have an assortment of images at many resolutions. To increase our processing performance, we scale everything to 64x64 resolution via ```Resize(64)``` transform. After the image is converted into a tensor, we finally normalize tensor around a specific set of mean and standard deviation points.

Normalizing is important to avoid a **exploding gradient** problem, in which you keep the values between 0 and 1 during training phase this prevents values from getting too large. These values of $\sigma$ and $\mu$ are previously taken from ImageNet, for other implementations you'd have to calculate that $\sigma$ and $\mu$.

**Building validation and test datasets**

The training data is setup, but we need to repeat the same steps for validation and test datasets. What is the difference? One danger in deep learning and all machine learning in fact, is the concept of overfitting when the model gets really good at recognizing what is been trained on but it can generalize to examples out of the training samples. To prevent this, we use a validation set, which is another serie of cats and fishes that do not occur in the training set. At the end of each training cycle ```(epoch)```, we compare against this set to make sure our network isn't getting things wrong. 

In [7]:
val_data_path = 'C02Dataset/val/' 

In [8]:
val_data = torchvision.datasets.ImageFolder(root=val_data_path,
                                           transform=transforms, is_valid_file=check_image)

In [9]:
test_data_path = 'C02Dataset/test/'

In [10]:
test_data = torchvision.datasets.ImageFolder(root=test_data_path,
                                            transform=transforms, is_valid_file=check_image)

**Training set:** Used in the training pass to update the model.
**Validation set:** Used to evaluate how the model is generalizing to the problem domain, rather than fitting to the training data (this doesn't update the model directly).
**Test set:** A final dataset that provides a final evaluation of the model's performance after training is complete.

In [11]:
batch_size = 64 #How many images will go each epoch
train_data_loader = torch.utils.data.DataLoader(train_data, batch_size=batch_size)
val_data_loader  = torch.utils.data.DataLoader(val_data, batch_size=batch_size) 
test_data_loader  = torch.utils.data.DataLoader(test_data, batch_size=batch_size)

**Building the neural network**

Starting with:

${Input Layer} \Leftrightarrow {Hidden_1} \Leftrightarrow {Output Layer}$

Fully connected net with a **ReLU** as activation function ${max}({0},{x})$ so if the input is negative the result is 0. **ReLU** is more appropiate for this kind of binary classification because if we implemented a **Softmax** function this goes from adding from ${0}$ to ${1}$ "probabilities" and it would exaggerate the differences, so it is better to use in the hidden layers **ReLU** and **Softmax** at the output layer then use ```argmax()``` when trying to predict.

In [12]:
class SimpleNet(nn.Module):

    def __init__(self):
        super(SimpleNet, self).__init__()
        self.fc1 = nn.Linear(12288, 84)
        self.fc2 = nn.Linear(84, 50)
        self.fc3 = nn.Linear(50,2)
    
    def forward(self, x):
        x = x.view(-1, 12288)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

In [13]:
simplenet = SimpleNet()

At ```__init__``` we do the setup, calling the superclass constructor and 3 fully connected layers this class only implements **inference** it means the data flows through the network and make predictions. First we have to convert the 3D tensor x and y plus three-channel color information **RBG** into a 1D-Tensor so it can be fed into the first Linear layer, we use that with the method ```view()```, from there we apply the layers and the activation for each of it returning the softmax output value that will contain the probability of the image to be a cat or fish.

The numbers in the hidden layers are somewhat arbitrary, with the exception of the output that has to match the desired output (2), the data has to be compressed as it goes down the stack so we prevent that the network cheating by only passing the n connections to the n outputs and consider the job done.

**Loss Function** 

For multi-class categorization problems it is recommended to use ```CrossEntropyLoss```. Another loss function is ```MSELoss```, which is recommended when you making a numerical prediction.

One thing to be aware is that ```CrossEntropyLoss``` incorporates ```softmax()``` as part of its operation, so we can remove this sofmax activation function from the last layer.

**Optimizing**

We can optimize using the loss function to determine the difference between the prediction and the actual label, and then use that information to update weights on the net minimizing as much as possible the loss, for this we use an optimizer. One of the most used is **SGD *Stochastic Gradient Descent*** we will go through several optimizers and pick the best suit.

In [14]:
optimizer = optim.Adam(simplenet.parameters(), lr=0.001)

For training we make a loop going forward the net, calculate error, backpropagate and update weights, and so on. We make use of ```zero_grad()``` in the loop to make sure that the gradients aren't accumulated as this is a default behaviour, with this we guarantee we only have the gradients for each batch of training for our optimization.

In [15]:
if torch.cuda.is_available():
    device = torch.device('cuda')
else:
    device = torch.device('cpu')
print(device)
simplenet.to(device)

cuda


SimpleNet(
  (fc1): Linear(in_features=12288, out_features=84, bias=True)
  (fc2): Linear(in_features=84, out_features=50, bias=True)
  (fc3): Linear(in_features=50, out_features=2, bias=True)
)

In [16]:
def train(model, optimizer, loss_fn, train_loader, val_loader, epochs=20, device="cpu"):
    for epoch in range(1, epochs+1):
        training_loss = 0.0
        valid_loss = 0.0
        model.train()
        for batch in train_loader:
            optimizer.zero_grad()
            inputs, targets = batch
            inputs = inputs.to(device)
            targets = targets.to(device)
            output = model(inputs)
            loss = loss_fn(output, targets)
            loss.backward()
            optimizer.step()
            training_loss += loss.data.item() * inputs.size(0)
        training_loss /= len(train_loader.dataset)
        
        model.eval()
        num_correct = 0 
        num_examples = 0
        for batch in val_loader:
            inputs, targets = batch
            inputs = inputs.to(device)
            output = model(inputs)
            targets = targets.to(device)
            loss = loss_fn(output,targets) 
            valid_loss += loss.data.item() * inputs.size(0)
            correct = torch.eq(torch.max(F.softmax(output, dim=1), dim=1)[1], targets)
            num_correct += torch.sum(correct).item()
            num_examples += correct.shape[0]
        valid_loss /= len(val_loader.dataset)

        print('Epoch: {}, Training Loss: {:.2f}, Validation Loss: {:.2f}, accuracy = {:.2f}'.format(epoch, training_loss,
        valid_loss, num_correct / num_examples))

In [17]:
train(simplenet, optimizer,torch.nn.CrossEntropyLoss(), train_data_loader,val_data_loader, epochs=20, device=device)

Epoch: 1, Training Loss: 1.78, Validation Loss: 6.02, accuracy = 0.24
Epoch: 2, Training Loss: 2.55, Validation Loss: 1.45, accuracy = 0.46
Epoch: 3, Training Loss: 0.79, Validation Loss: 1.17, accuracy = 0.55
Epoch: 4, Training Loss: 0.55, Validation Loss: 1.12, accuracy = 0.63
Epoch: 5, Training Loss: 0.38, Validation Loss: 1.02, accuracy = 0.62
Epoch: 6, Training Loss: 0.30, Validation Loss: 0.99, accuracy = 0.65
Epoch: 7, Training Loss: 0.23, Validation Loss: 0.98, accuracy = 0.67
Epoch: 8, Training Loss: 0.19, Validation Loss: 1.00, accuracy = 0.64
Epoch: 9, Training Loss: 0.17, Validation Loss: 1.01, accuracy = 0.68
Epoch: 10, Training Loss: 0.14, Validation Loss: 1.03, accuracy = 0.66
Epoch: 11, Training Loss: 0.13, Validation Loss: 1.04, accuracy = 0.68
Epoch: 12, Training Loss: 0.10, Validation Loss: 1.08, accuracy = 0.68
Epoch: 13, Training Loss: 0.09, Validation Loss: 1.08, accuracy = 0.68
Epoch: 14, Training Loss: 0.08, Validation Loss: 1.14, accuracy = 0.66
Epoch: 15, Trai

In [18]:
labels = ['cat','fish']

from os import listdir
from os.path import isfile, join
cats = [f for f in listdir('C02Dataset/val/cat') if isfile(join('C02Dataset/val/cat', f))]
fishes = [f for f in listdir('C02Dataset/val/fish') if isfile(join('C02Dataset/val/fish', f))]
cats_pred = []
fishes_pred = []
for cat in cats:
    img = Image.open("C02Dataset/val/cat/"+cat) 
    img = transforms(img).to(device)
    img = torch.unsqueeze(img, 0)
    simplenet.eval()
    prediction = F.softmax(simplenet(img), dim=1)
    prediction = prediction.argmax()
    cats_pred.append(labels[prediction])
for fish in fishes:
    img = Image.open("C02Dataset/val/fish/"+fish) 
    img = transforms(img).to(device)
    img = torch.unsqueeze(img, 0)
    simplenet.eval()
    prediction = F.softmax(simplenet(img), dim=1)
    prediction = prediction.argmax()
    fishes_pred.append(labels[prediction])

In [19]:
total_samples = len(cats_pred) + len(fishes_pred)
true_positives = sum(1 for i in cats_pred if i == 'cat') 
false_negative = sum(1 for i in cats_pred if i == 'fish')
false_positives = sum(1 for i in fishes_pred if i == 'cat')
true_negative = sum(1 for i in fishes_pred if i == 'fish')
classification_accuracy = true_positives+true_negative/total_samples*100
prevelence = len(cats_pred)/total_samples
PPV = true_positives/true_positives+true_negative
FDR = false_positives/true_positives+true_negative
FOR = false_negative/false_negative
error_rate = (1 - (true_positives/total_samples))*100
x = torch.tensor([[true_positives, false_positives], [false_negative, true_negative]])

In [20]:
prevelence

0.7927927927927928

In [21]:
cats_expected = torch.ones(len(cats), dtype=torch.int8).tolist()
cats_predicted = [1 if x == 'cat' else 0 for x in cats_pred]
fishes_expected = torch.zeros(len(fishes), dtype=torch.int8).tolist()
fishes_predicted = [1 if x == 'cat' else 0 for x in fishes_pred]
expected = cats_expected + fishes_expected
predicted = cats_predicted + fishes_predicted


In [22]:
from sklearn.metrics import confusion_matrix
print(confusion_matrix(expected, predicted))

[[20  3]
 [36 52]]


In [23]:
x

tensor([[52,  3],
        [36, 20]])

In [24]:
classification_accuracy

70.01801801801801

**TODO: Manually calculate val_loss and ROI analysis**

Saving Models
We can either save the entire model using save or just the parameters using state_dict. Using the latter is normally preferable, as it allows you to reuse parameters even if the model's structure changes (or apply parameters from one model to another).

In [25]:
torch.save(simplenet, "simplenet") 
simplenet = torch.load("simplenet")

In [26]:

torch.save(simplenet.state_dict(), "simplenet")    
simplenet = SimpleNet()
simplenet_state_dict = torch.load("simplenet")
simplenet.load_state_dict(simplenet_state_dict)

<All keys matched successfully>