# Using Ensemble Pretrained Models to classify Human Protein Atlas Image

## Motivation and Background

#### [CNN](http://cs231n.github.io/convolutional-networks/)

We can identify two main blocks inside of a typical CNN: 
 - Feature extraction
 - Classification
 
The feature extraction is made of a series of convolutional and pooling layers which extract features from the image, increasing in complexity in each layer (i.e. from simpler features in the first layers as points, to more complex ones in the last layers like edges and shapes. These features are then fed to a fully connected network (classifier), which learns to classify them.

Convolutional Neural Networks is a sequence of layers, and every layer of a ConvNet transforms one volume of activations to another through a differentialble function. 
There are three main types of layers to build the ConvNet(architecture[INPUT-CONV-RELU-POOL-FC]):
1. INPUT [$ 32  \times  32 \times 3 $]: An image of width 32, height 32, and with three color channels, R,G,B
2. Convolutional Layer: Compute the output of neurons that are connected to local regions in the input. This may redult in volume such as [$ 32  \times  32 \times 12 $] if we decided to use 12 filters.
3. RELU Layer: Activation function $ max(0,x) $ 
4. Pooling Layer: Perform a downsampling operation along the spatial dimensions, resulting in volume such as [$ 16  \times  16 \times 12 $] 
5. Fully-Connected Layer (Regular Neural Networks): Compute the class scores, resulting in colume of size [$ 1  \times  1 \times 28 $], where each of the 10 numbers correspond to a class score.
<img src="https://www.mdpi.com/entropy/entropy-19-00242/article_deploy/html/images/entropy-19-00242-g001.png" width=800>


#### [Transfer learning](https://en.wikipedia.org/wiki/Transfer_learning) 

Most deep learning networks learn to detect edges in the earlier layers, shapes in the middle layer and some spefific features in the later layer. Therefore, with certain data types it is possible to use the weights learned in one task to be transfered to another task. 
It turns out to be useful when dealing with relatively small datasets; for examples medical images, which are harder to obtain in large numbers than other datasets. 
Instead of training a deep neural network from scratch, which would require a significant amount of data, power and time, it's often convenient to use a pretrained model and just finetune its performance to simplify and speed up the process. It is heavily used in Image recognition and Natural Language Processing related tasks.


Ensemble Pretrained models
1. Using a model (VGG-16)  which is already capable of extracting features from an image and train its fully connected network in order to classify different types of retinal damage instead of objects.

The model we'll use is a [VGG-16](https://www.quora.com/What-is-the-VGG-neural-network)- convolutional network, built by Visual Geometry Group and trained on [ImageNet](http://www.image-net.org/) dataset. 


<img src="https://cdn-images-1.medium.com/max/1600/1*0Tk4JclhGOCR_uLe6RKvUQ.png" width=500>



2. ResNet
Residual Neural Network introduced a concept called 'skip connections'. In residual network, it directly copy the input matrix to the second transformation output and sum the output in final RELU function.
<img src="https://cdn-images-1.medium.com/max/1600/1*beczGvPaBBnyauKItYGTYQ.png" width=800>

3. [Inception](https://towardsdatascience.com/a-simple-guide-to-the-versions-of-the-inception-network-7fc52b863202)

<img src="https://cdn-images-1.medium.com/max/1600/1*uW81y16b-ptBDV8SIT1beQ.png" width=800>



## About Human Protein Atlas Image Classification Challenge

#### Our goal and challenge
1. Unlike most image labeling tasks, where binary or multiclass labeling is considered, in this competition each image have multiple labels. Multiclass multilabel task has its own specific affecting the design of the model and the loass function.Our models should be able to classify mixed patterns of proteins in microscope images. It means that is a mutilabel classifiication problem so each image could have more than one possible labels. 
2. Also, we need to develop a model that is fast during prediction while maintaining high accuracy and can run on minimum hardware resources. Since the final model submission would be executed in the Docker container and the Docker container will be run on hardware meeting the following specificationsand must generate a submission result within 1 hour.

Submission hardware limits:
* CPU Cores: 2
* RAM: 4 GB
* GPUs: Integrated Intel Graphics

#### [About Human Protein Atlas Project](https://www.youtube.com/watch?v=GUvHrs5lKtU) 
Human protein, human being's building blocks, they build, repair and give signals to our bodies, executing many functions that tohether enable life.  Different cells take on completely different functions which is due to different genes in the DNA being activated in accordance with teh part of the body in which the cell is located. It is the genes that give the instructions, but not for the cells,  for the proteins. The active gene is a code for a certein type of protein to be created which will determine the functins of the cells. The porject want to understand where are the 20,000 proteins are and to understand their functions. There only few proteins are only expressed uniquely in one of the tissues or organs in the body, only 10% of the proteins are tissue secific, and over the half of all the proteins are found everywhere. Almost all of today's medications are directed against proteins but so far they are only targeting 3% of 20,000 different proteins. If there is a treatment for liver disease, the risk is that the protein which the medicine is aimed at is also found in the brain. This will cause series consequences and also explains why many drugs have side effects. The goal of the project is to fully understand the complexity of the human cell, models must classify mixed patterns across a range of different human cells.

[The cell atlas](https://www.proteinatlas.org/humanproteome/cell)

<img src="http://science.sciencemag.org/content/sci/early/2017/05/10/science.aal3321/F1.large.jpg?width=800&height=600&carousel=1" width=500>

- Each image is of size $ 512 \times 512 $. All image samples are represented by four filters.
##### Green 
Indicates the protein of interest. It shows thw stained target proteins and consequently it's the most informative one.
##### Blue
Indicates the nucleus
##### Red
Indicates the microtubules
##### Yellow
Indicates the endoplasmic reticulum

Notice, the green filter should be used to predict the label, other filters are used as references. 

This work will use [PyTorch](https://pytorch.org/) as deep learning framework and [CUDA](https://developer.nvidia.com/cuda-zone) for GPU acceleration. 

###### This protein_pretrained_models_Ensemble_in_pytorch notebook will includes:
- Our own pytorch dataset/loader class with the protein dataset
- Several pretrained networks with Pytorch weights from this [dataset](https://www.kaggle.com/pvlima/pretrained-pytorch-models)
- Our own ensemble method 
- Our own loss-function (Micro-F1)
- Setup the training loop

Import Python Modules

In [None]:
from __future__ import print_function, division

import os
from os.path import isfile, join, abspath, exists, isdir, expanduser
from os import listdir, makedirs, getcwd, remove
import sys
import time
import copy
import random
import math
import logging
import numpy as np
import pandas as pd
import PIL
from PIL import Image
from PIL import ImageChops
from textwrap import wrap

import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline
from collections import OrderedDict, defaultdict
from scipy.misc import imread
from sklearn.model_selection import RepeatedKFold
from skimage.transform import resize
from sklearn.preprocessing import MultiLabelBinarizer
from sklearn.metrics import f1_score
from sklearn.model_selection import train_test_split

import torch
import torch.optim as optim
import torch.nn as nn
import torch.optim as optim
import torch.backends.cudnn as cudnn
import torch.nn.functional as F
from torch.autograd import Variable
from torchvision import datasets, transforms, models
from torch.utils.data import Dataset, DataLoader
from torch.optim import lr_scheduler

from IPython.display import HTML
import base64


import warnings
warnings.filterwarnings("ignore", category=DeprecationWarning)
warnings.filterwarnings("ignore", category=UserWarning)

In [None]:
!ls ../input/pretrained-pytorch-models

We have to copy the pretrained models to the cache directory (~/.torch/models) where Pytorch is looking for them.

In [None]:
cache_dir = expanduser(join('~', '.torch'))
if not exists(cache_dir):
    makedirs(cache_dir)
models_dir = join(cache_dir, 'models')
if not exists(models_dir):
    makedirs(models_dir)

!cp ../input/pretrained-pytorch-models/* ~/.torch/models/
!ls ~/.torch/models

#### EDA

In [None]:
path=("../input/human-protein-atlas-image-classification")
print ("1. Loading data & Converting data")
print ("----------------------------------------")
print('Data files: ')
for file in os.listdir(path):
    print(file) 
print ("----------------------------------------")
train_class = pd.read_csv(path+"/train.csv")
print("There is %s samples in traininig set."%train_class.shape[0])
test_label = pd.read_csv(path+"/sample_submission.csv")
print("There is %s samples in testing set."%test_label.shape[0])
print ("----------------------------------------")
print ("There are in total 28 different labels present in the dataset, each image has various possible labels.")
# train
train_class.head()

In [None]:
label_names = {
    0:  "Nucleoplasm",  
    1:  "Nuclear membrane",   
    2:  "Nucleoli",   
    3:  "Nucleoli fibrillar center",   
    4:  "Nuclear speckles",
    5:  "Nuclear bodies",   
    6:  "Endoplasmic reticulum",   
    7:  "Golgi apparatus",   
    8:  "Peroxisomes",   
    9:  "Endosomes",   
    10:  "Lysosomes",   
    11:  "Intermediate filaments",   
    12:  "Actin filaments",   
    13:  "Focal adhesion sites",   
    14:  "Microtubules",   
    15:  "Microtubule ends",   
    16:  "Cytokinetic bridge",   
    17:  "Mitotic spindle",   
    18:  "Microtubule organizing center",   
    19:  "Centrosome",   
    20:  "Lipid droplets",   
    21:  "Plasma membrane",   
    22:  "Cell junctions",   
    23:  "Mitochondria",   
    24:  "Aggresome",   
    25:  "Cytosol",   
    26:  "Cytoplasmic bodies",   
    27:  "Rods & rings"
}
LABELS = []

for label in label_names.values():
    LABELS.append(label)

###### Show the highest frequent single label in the dataset

In [None]:
def fill_targets(row):
    row.Target = np.array(row.Target.split(" ")).astype(np.int)
    for num in row.Target:
        name = label_names[int(num)]
        row.loc[name] = 1
    return row
for key in label_names.keys():
    train_class[label_names[key]] = 0
train_label = train_class.apply(fill_targets, axis=1)
#train_label.head()
n=28
target_counts = train_label.drop(["Id", "Target"],axis=1).sum(axis=0).sort_values(ascending=False)
plt.figure(figsize=(10,6))
pal = sns.cubehelix_palette(n, start=2, rot=-0.1, dark=0, light=.65, reverse=True)
sns.barplot(y=target_counts.index.values, x=target_counts.values, order=target_counts.index, palette=pal).set_title('The distribution Labels')
plt.show()
plt.savefig('single_label.png')

####  Show the most common label combinations

In [None]:
print("There are "+  str(len(train_class.Target.unique())) + " different combinations of labels in our dataset")
sns.set(style="dark")
n=40 # top 40 common label combinations
values = train_class['Target'].value_counts()[:n].keys().tolist()
counts = train_class['Target'].value_counts()[:n].tolist()

plt.figure(figsize=(12,5))
pal = sns.cubehelix_palette(n, start=2, rot=-0.1, dark=0, light=.65, reverse=True)
sns.barplot(y=counts, x=values, palette=pal).set_title(str(n)+" MOST COMMON LABEL COMBINATIONS")
plt.xticks(rotation=45)
plt.show()
plt.savefig('combination_label.png')

## the least populated class combinations counts
print("There are %s label combinations only has one sample." % str(sum(train_class['Target'].value_counts(ascending=True)<=1)))

## ratio of single label to multiple label
n=len(train_class.Target.unique())
values = train_class['Target'].value_counts()[:n].keys().tolist()
counts = train_class['Target'].value_counts()[:n].tolist()
single_count=0
comb_count=0
for labels in values:
    if len(labels.split())==1:
        single_count+=train_class['Target'].value_counts()[labels]
    else:
        comb_count+=train_class['Target'].value_counts()[labels]
print("There are " + str(single_count)+" samples have single label")
print("There are " + str(comb_count)+" samples have multiple labels")
print("The percentage of images which has single label is: " + str(single_count/train_class.shape[0]))

## Dataset loader in Pytorch

The dataset is divided in three categories: training, validation and test. 

#### Dataset Class
To create a dataset class for the human protein atlas image dataset, we will read the csv in __init__ but leave the reading of image to __getitem__. This is memory efficient because all the images are not stored in the memory at once but read as required.

#### Transforms
Most neural networks expect the images of a fixed size. Therefore, we need to provide a preprocessing function which includes:
* Rescale: to scale the image
* RandomCrop: to crop from image randomly which is data augmentation.
* ToTensor: to convert the numpy images to torch images
* Normalize: normalize a tensor image with mean and standard deviation. 


In [None]:
def load_image(basepath, image_id):
    images = np.zeros(shape=(224,224,3))
    r = np.array(Image.open(basepath+image_id+"_red.png").resize((224,224)))
    g = np.array(Image.open(basepath+image_id+"_green.png").resize((224,224)))
    b = np.array(Image.open(basepath+image_id+"_blue.png").resize((224,224)))
    y =np.array( Image.open(basepath+image_id+"_yellow.png").resize((224,224)))

    ## add yellow channel to red channel
    r+=y
    
    images[:,:,0] = r.astype(np.uint8) 
    images[:,:,1] = g.astype(np.uint8)
    images[:,:,2] = b.astype(np.uint8)
    #images[:,:,3] = y.astype(np.uint8)
    images = images.astype(np.uint8)
    return images

In [None]:
targets = train_class['Target'].value_counts().keys()
counts =train_class['Target'].value_counts().values
BATCH_SIZE =16
W = H = 224

#### Define a dataset class for Human Protein Dataset
*  read the csv in ' __init__ ' but leave the reading of images to  ' __getitem__ '
*  This is memory efficient because all the images are not stored in the memory at once but read as required
*  any required processing can be applied by '__transform__' argument 

In [None]:
num_classes = 28
class HumanProteinDataset(Dataset):

    def __init__(self, df,set_path,transform=False,test=False):
        self.set_path=set_path
        self.test = test
        self.df = df.copy()
        #self.mlb= MultiLabelBinarizer(num_classes)
        #self.mlb.fit(num_classes)
        self.transform = transform
           
    def __getitem__(self, idx):
        
        image = load_image(self.set_path, self.df['Id'].iloc[idx])
        
        sample = {'image': image}

        if not self.test:
            target=np.array(list(map(int, self.df.iloc[idx].Target.split(' '))))
            sample['target'] = np.eye(num_classes,dtype=np.float)[target].sum(axis=0) 
        else:
            sample['Id'] = self.df['Id'].iloc[idx]
        
        sample['image']=transforms.ToPILImage()(sample['image'])
        
        if self.transform:
            sample['image'] = self.image_transform(sample['image'])
        
        totensor = transforms.Compose([transforms.ToTensor(), transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])])
        sample['image']=totensor(sample['image'])
        return sample
        
       # ret = {'image': totensor(image)}
       #
       # if "target" in sample.keys():
       #     target = sample['target'][0]
       #     ret['target'] = target
       # else:
       #     ret['Id'] = sample['Id']
       #          
       # return ret
    
    def __len__(self):
        return self.df.shape[0]
    
    def shape(self):
        return self.df.shape
## Apply a list of transformations in a random order
    def image_transform(self,image):
        transform=transforms.RandomOrder([
        transforms.RandomRotation(30),
        transforms.RandomHorizontalFlip(),
        transforms.RandomVerticalFlip()
    ])
        t_image=transform(image)
        return t_image
    
## split train set into trainning set and validation set
train_path= path+'/train/'
train=pd.read_csv(path+'/train.csv')

#####--------------------------------------
##### delete those class combinations with less than 4 samples
#####--------------------------------------
count=train['Target'].value_counts(ascending=True)
condition=count<=4
one_sample_label=count[condition].index
one_sample_label.tolist()
print("the sample numbers before dropping four samples target is : "+ str(len(train)))
train=train[~train['Target'].isin(one_sample_label)]
print("the sample numbers after dropping four samples target is : "+ str(len(train)))


#### ----------------------------------------
#### Downsample majority class: ['0', '0,25', '23', '25']
#### ----------------------------------------

from sklearn.utils import resample
## Downsample class '0'
df_majority=train[train.Target=='0']
df_minority=train[train.Target!='0']
df_majority_downsampled = resample(df_majority, 
                                 replace=False,    # sample without replacement
                                 n_samples=1500,     # to match minority class
                                 random_state=123) # reproducible results

train = pd.concat([df_majority_downsampled, df_minority])
print(len(train))
## Downsample class '0,25'
df_majority=train[train.Target=='25 0']
df_minority=train[train.Target!='25 0']
df_majority_downsampled = resample(df_majority, 
                                 replace=False,    # sample without replacement
                                 n_samples=1300,     # to match minority class
                                 random_state=12) # reproducible results
train = pd.concat([df_majority_downsampled, df_minority])
print(len(train))
## Downsample class '23'
df_majority=train[train.Target=='23']
df_minority=train[train.Target!='23']
df_majority_downsampled = resample(df_majority, 
                                 replace=False,    # sample without replacement
                                 n_samples=1200,     # to match minority class
                                 random_state=23) # reproducible results
train = pd.concat([df_majority_downsampled, df_minority])
print(len(train))
## Downsample class '25'
df_majority=train[train.Target=='25']
df_minority=train[train.Target!='25']
df_majority_downsampled = resample(df_majority, 
                                 replace=False,    # sample without replacement
                                 n_samples=1300,     # to match minority class
                                 random_state=13) # reproducible results
train = pd.concat([df_majority_downsampled, df_minority])
print(len(train))
id=train['Id']
y=train['Target']
## randomly pick 80% of the data for training
train_id,val_id,train_y,val_y = train_test_split(id,y,stratify=y,test_size = 0.2)

## randomly split the picked data into training set and validation set
train_id,val_id,train_y,val_y = train_test_split(train_id,train_y,stratify=train_y,test_size = 0.2)

train_df=pd.DataFrame({'Id':train_id, 'Target':train_y})
val_df=pd.DataFrame({'Id':val_id, 'Target':val_y})

## load dataset
train_set= HumanProteinDataset(train_df,train_path,transform=True,test=False)
val_set= HumanProteinDataset(val_df,train_path,transform=False,test=False)

dataset_sizes = {}

dataset_sizes['train'] = len(train_df)
dataset_sizes['val'] = len(val_df)
    
train_loader = DataLoader(train_set, batch_size=BATCH_SIZE,num_workers=0,shuffle=True)
validation_loader = DataLoader(val_set, batch_size=BATCH_SIZE, num_workers=0,shuffle=False)

dataloaders = {}

dataloaders['train'] = train_loader
dataloaders['val'] = validation_loader

print(len(train_df))
print(len(val_df))

In [None]:
print(train_set.df.head())
print(val_set.df.head())
###### --------------------------------------------------------
###### plot the target distribution in train set and val set
##### -------------------------------------------------------
print("There are "+  str(len(train_df.Target.unique())) + " different combinations of labels in our dataset in train set and validation set")
sns.set(style="dark")
n=40 # top 40 common label combinations
train_values = train_df['Target'].value_counts()[:n].keys().tolist()
train_counts = train_df['Target'].value_counts()[:n].tolist()

val_values = val_df['Target'].value_counts()[:n].keys().tolist()
val_counts = val_df['Target'].value_counts()[:n].tolist()

plt.figure(figsize=(16,6))
sns.barplot(y=train_counts, x=train_values, palette=pal).set_title(str(n)+" MOST COMMON LABEL COMBINATIONS DISTRIBUTIONS BETWEEN TRAIN SET AND VALIDATION SET ")
pal_val = sns.cubehelix_palette(n, start=0.5, rot=-.4, dark=0, light=.75, reverse=False)
sns.barplot(y=val_counts, x=val_values, palette=pal_val).set_title(str(n)+" MOST COMMON LABEL COMBINATIONS DISTRIBUTIONS BETWEEN TRAIN SET AND VALIDATION SET")
plt.xticks(rotation=45)
plt.legend(['Train','Validation'], loc = "upper right", framealpha=1, shadow=True)
plt.show()
plt.savefig('split_label.png')

In [None]:
def Show(sample):
    f, (ax1,ax2,ax3,ax4) = plt.subplots(1, 4, figsize=(16,16), sharey=True)
    #f, (ax1,ax2,ax3) = plt.subplots(1, 3, figsize=(16,16), sharey=True)
    title = ''
    
    labels =sample['target']
                
    for i, label in enumerate(LABELS):
        if labels[i] == 1:
            if title == '':
                title += label
            else:
                title += " & " + label
            
    ax1.imshow(sample['image'][0,:,:],cmap="Reds")
    ax1.set_title("\n".join(wrap('(Red&Yellow)- Mocrotublues & Endoplasmic Reticulum', 30)),fontsize=12)
    ax2.imshow(sample['image'][1,:,:],cmap="Greens")
    ax2.set_title('(Green)The Protein of interest',fontsize=12)
    ax3.imshow(sample['image'][2,:,:],cmap="Blues")
    ax3.set_title('(Blue)nucleus',fontsize=12)
    img_conb=np.stack((
    sample['image'][0,:,:],
    sample['image'][1,:,:],
    sample['image'][2,:,:]
    ),-1)
    ax4.imshow(img_conb,interpolation='nearest')
    ax4.set_title('Combined RGBY images',fontsize=12)
    f.suptitle(title, fontsize=16, y=0.65)
    plt.savefig('image.png')

In [None]:
idxs = random.sample(range(1, train_set.df.shape[0]), 3)

for idx in idxs:
    Show( train_set[idx])

##### As the challenge has indicated that tthe final submission would be evaluated by macro F1 Score
[macro F1 Score](https://sebastianraschka.com/faq/docs/multiclass-metric.html)

<img src='https://sebastianraschka.com/images/faq/multiclass-metric/conf_mat.png'>
<img src='https://sebastianraschka.com/images/faq/multiclass-metric/pre-rec.png'>
#### Note that  PRE=precision, REC=recall, F1=F1-score
<img src='https://sebastianraschka.com/images/faq/multiclass-metric/micro.png'>
<img src='https://sebastianraschka.com/images/faq/multiclass-metric/macro.png'>

In [None]:
### loss function based on micro f1-score
#def f1_loss(target, output, epsilon=1e-7):
#    y_pred = nn.Sigmoid()(output)
#    y_true = target.double()

#    TP = (y_pred * y_true).sum(1)
#    prec = TP / (y_pred.sum(1) + epsilon)
#    rec = TP / (y_true.sum(1) + epsilon)
#    res = 2 * prec * rec / (prec + rec + epsilon)

#    f1 = res
#    f1 = f1.clamp(min=0)
#    return 1 - f1.mean()

class AverageMeter:
    """Computes and stores the average and current value"""

    def __init__(self):
        self.val = 0
        self.avg = 0
        self.sum = 0
        self.count = 0

    def init(self):
        self.val = 0
        self.avg = 0
        self.sum = 0
        self.count = 0

    def update(self, val, n=1):
        self.val = val
        self.sum += val * n
        self.count += n
        self.avg = self.sum / self.count

    def __repr__(self):
        return str(self.avg)

dloaders = {'train':dataloaders['train'], 'val':dataloaders['val']}

In [None]:
def train_model(dataloders, model, criterion, optimizer,num_epochs=0):
    device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
    best_model_wts = model.state_dict()
    best_f1= 0
    losses=AverageMeter()
    epoch_f1=AverageMeter()
    dataset_sizes = {'train': len(dataloders['train'].dataset), 
                     'val': len(dataloders['val'].dataset)}

    for epoch in range(num_epochs):
        for phase in ['train', 'val']:
            if phase == 'train':
                model.train(True)
            else:
                model.train(False)
            for i, data in enumerate(dataloaders[phase], 0):            
                # get the inputs
                inputs, labels = data['image'], data['target']
                if use_gpu:
                     inputs, labels = inputs.to(device,dtype=torch.float), labels.to(device,dtype=torch.float)
                else:
                    inputs, labels = Variable(inputs), Variable(labels)

                optimizer.zero_grad()
                outputs = model(inputs)
                outputs=outputs.to(device,dtype=torch.float)
                loss = criterion(outputs,labels)
                losses.update(loss.item(),inputs.size(0))
                f1=f1_score(labels.cpu().data.numpy(),(outputs.sigmoid()>0.2).cpu().data.numpy(),average='macro')
                epoch_f1.update(f1,inputs.size(0))

                if phase == 'train':
                    loss.backward()
                    optimizer.step()

                labels = labels.data.byte()
            
            if phase == 'train':
                train_epoch_loss = losses.avg
                train_epoch_f1 = epoch_f1.avg
            else:
                valid_epoch_loss =  losses.avg
                valid_epoch_f1 = epoch_f1.avg
                
            if phase == 'val' and valid_epoch_f1 > best_f1:
                best_f1= valid_epoch_f1
                best_model_wts = model.state_dict()
        
        print('Epoch [{}/{}] train loss: {:.4f} train f1: {:.4f} ,' 
              'valid loss: {:.4f} valid f1: {:.4f}'.format(
                epoch, num_epochs - 1,
                train_epoch_loss, train_epoch_f1,
                valid_epoch_loss, valid_epoch_f1))
            
    print('Best val Acc: {:4f}'.format(best_f1))

    model.load_state_dict(best_model_wts)
    return model

### ResNet50

In [None]:
### !nvidia-smi
#### ResNet
use_gpu = torch.cuda.is_available()
resnet = models.resnet50(pretrained=True)

# freeze all the layers except the final one
for param in resnet.parameters():
    param.requires_grad = False
resnet.dropout=nn.Dropout(0.5)
# add new final layer with 28 classes
num_ftrs = resnet.fc.in_features
resnet.fc = nn.Linear(num_ftrs,28)
print(resnet)
if use_gpu:
    print('CUDA')
    resnet = resnet.cuda()

In [None]:
criterion = torch.nn.BCEWithLogitsLoss().cuda()
lr=0.09
optimizer = torch.optim.SGD(resnet.fc.parameters(), lr=lr,momentum=0.9,weight_decay=1e-4)

start_time = time.time()
resnet_model = train_model(dloaders, resnet, criterion, optimizer, num_epochs=20)
print('Training time: {:10f} minutes'.format((time.time()-start_time)/60))

#### Squeezenet1_1

In [None]:

# squeeze = models.densenet161(pretrained=True)
# # freeze all the layers except the final one
# for param in squeeze.parameters():
#     param.requires_grad = False
# squeeze.dropout=nn.Dropout(0.3)
# # change the last conv2d layer
# squeeze.classifier._modules["1"] = nn.Conv2d(512, 28, kernel_size=(1, 1))
# # change the internal num_classes variable rather than redefining the forward pass
# squeeze.num_classes = 28
# print(squeeze)
# if use_gpu:
#     print('CUDA')
#     squeeze = squeeze.cuda()

In [None]:
# criterion = torch.nn.BCEWithLogitsLoss().cuda()
# lr=0.05
# optimizer = torch.optim.SGD(resnet.fc.parameters(), lr=lr,momentum=0.9,weight_decay=1e-4)

# start_time = time.time()
# squeeze_model = train_model(dloaders, squeeze, criterion, optimizer, num_epochs=10)
# print('Training time: {:10f} minutes'.format((time.time()-start_time)/60))

#### Inception

In [None]:
# #### Inception
# inception = models.inception_v3(pretrained=True)
# print(inception)
# # freeze all the layers except the final one
# for param in inception.parameters():
#     param.requires_grad = False
# inception.dropout=nn.Dropout(0.5)
# # add new final layer with 28 classes 
# num_ftrs = inception.fc.in_features
# inception.fc = nn.Linear(num_ftrs,28)
# if use_gpu:
#     print('CUDA')
#     inception= inception.cuda()

# ## notice the input image format for inception is different than for VGG16 and ResNet models ( 299 x 299 instead of 224 x 224)
# start_time = time.time()
# inception_model = train_model(dloaders, inception, criterion, optimizer, num_epochs=10)
# print('Training time: {:10f} minutes'.format((time.time()-start_time)/60))

### Predict test set

In [None]:
test_df=pd.read_csv(path+'/sample_submission.csv')
test_path=path+'/test/'
dataset_test = HumanProteinDataset(test_df,test_path,transform=False,test=True)

dataloader_test = DataLoader(dataset_test, 1,shuffle=False, num_workers=0)

In [None]:
def run_model(model,batch):
    device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
    model.to(device)
    inputs = batch
    inputs = inputs.to(device,dtype=torch.float)
    out = model(inputs)
    out = out.cpu()
    return out
ids = []
predictions = []

resnet_model = resnet_model.cuda()

for sample_batched in dataloader_test:
        out = run_model(resnet_model,sample_batched['image'])
        preds = []
        out = out.detach().numpy()
        for sample in out:
            p = ""
            for i, label in enumerate(sample):
                if label > 0.2:
                    p += " " + str(i)
            if p == "":
                p = "0"
            else:
                p = p[1:]
            preds.append(p)

        ids += list(sample_batched['Id'])
        predictions += preds

In [None]:
## Since kaggle kernel is read only file system, we need to create a function that
## takes in a dataframe and creates a link to download it
df = pd.DataFrame({'Id':ids,'Predicted':predictions})
df.to_csv('resNet_protein_classification.csv', header=True, index=False)

# function that takes in a dataframe and creates a text link to  
# download it (will only work for files < 2MB or so)
def create_download_link(df, title = "Download Submission CSV file", filename = "resNet_protein_classification.csv"):  
    csv = df.to_csv( sep=',', encoding='utf-8', index=False)
    b64 = base64.b64encode(csv.encode())
    payload = b64.decode()
    html = '<a download="{filename}" href="data:text/csv;base64,{payload}" target="_blank">{title}</a>'
    html = html.format(payload=payload,title=title,filename=filename)
    return HTML(html)

# create a link to download the dataframe
create_download_link(df)

In [None]:
# squeeze_model = squeeze_model.cuda()

# for sample_batched in dataloader_test:
#         out = run_model(squeeze_model,sample_batched['image'])
#         preds = []
#         out = out.detach().numpy()
#         for sample in out:
#             p = ""
#             for i, label in enumerate(sample):
#                 if label > 0.2:
#                     p += " " + str(i)
#             if p == "":
#                 p = "0"
#             else:
#                 p = p[1:]
#             preds.append(p)

#         ids += list(sample_batched['Id'])
#         predictions += preds


# df = pd.DataFrame({'Id':ids,'Predicted':predictions})
# df.to_csv('squeeze_protein_classification.csv', header=True, index=False)

# # function that takes in a dataframe and creates a text link to  
# # download it (will only work for files < 2MB or so)
# def create_download_link(df, title = "Download Submission CSV file", filename = "squeeze_protein_classification.csv"):  
#     csv = df.to_csv( sep=',', encoding='utf-8', index=False)
#     b64 = base64.b64encode(csv.encode())
#     payload = b64.decode()
#     html = '<a download="{filename}" href="data:text/csv;base64,{payload}" target="_blank">{title}</a>'
#     html = html.format(payload=payload,title=title,filename=filename)
#     return HTML(html)

# # create a link to download the dataframe
# create_download_link(df)

In [None]:
df

## TO DO LIST
- [Ensemble Multiple Pretrained models](https://github.com/QuantScientist/Deep-Learning-Boot-Camp/tree/master/Kaggle-PyTorch/PyTorch-Ensembler/nnmodels)
- Implement other evaluation metrics
- [Densenet as feature Extractor](https://www.kaggle.com/renatobmlr/pytorch-densenet-as-feature-extractor )
- Ref: [Multi-label-Inception-net](https://github.com/BartyzalRadek/Multi-label-Inception-net)
- Prepare the solution in the very begining
- Make a list includes the challenges I've faced
- The solution from some related papers
- Possible CV strategies
- Possible Data Augmentations
-Add Model Diversities: High model complexity 
-Fix class imbalances issues: f1-score on out validation set is higher than the public LB.



