### Problem Statement: 
Ensemble Learning

Implement bagging ensemble technique in image classification


In this example, we are going to train an ensemble of Convolution Networks for Image Classification.

There are different CNN models for image classification (VGG, ResNet, DenseNet, MobileNet, etc), they provide different accuracies. But ensemble of these CNN can improve accuracy. Here we use PyTorch to train 3 image classification models (DenseNet161, ResNet152, and VGG19) on TinyImageNet dataset. 

In [11]:
# IMPORTS
import os
import numpy as np
import pandas as pd
import torch

from sklearn.preprocessing import LabelEncoder


In [4]:
"""
Setting up CUDA and PyTorch:
1. Check weather your system can support CUDA or not. Visit https://developer.nvidia.com/cuda-gpus
2. If Yes, visit https://developer.nvidia.com/cuda-downloads and install NVIDIA CUDA Toolkit.
3. Visit https://pytorch.org/
Goto Install Pytorch Section, select your specifications and run the command in your terminal. Make sure to activate the virtual environment.
"""

# Checking GPU availability
if torch.cuda.is_available():
    print("CUDA is available. Working on GPU.")
    DEVICE = torch.device('cuda')
else:
    
    print("CUDA is not available. Working on CPU.")
    DEVICE = torch.device('cpu')

CUDA is available. Working on GPU.


In [6]:
!python -m wget http://cs231n.stanford.edu/tiny-imagenet-200.zip -o datasets/tiny-imagenet-200.zip
# unzip the dataset


Saved under datasets/tiny-imagenet-200.zip


In [6]:
DIR_MAIN = 'datasets/tiny-imagenet-200/'
DIR_TRAIN = f"{DIR_MAIN}train/"
DIR_VAL = f"{DIR_MAIN}val/"
DIR_TEST = f"{DIR_MAIN}test/"

# No of labels - 200
labels = os.listdir(DIR_TRAIN)

# Initialize labels encoder
encoder_labels = LabelEncoder()
encoder_labels.fit(labels)


In [9]:
# Create lists of files and labels for training (100'000 items)
files_train = []
labels_train = []
for label in labels:
    for filename in os.listdir(DIR_TRAIN + label + '/images/'):
        files_train.append(DIR_TRAIN+label+'/images/'+filename)
        labels_train.append(labels)



In [15]:
# Create lists of files and labels for validation (10'000 items)
files_val = []
labels_val = []
for filename in os.listdir(f"{DIR_VAL}images/"):
    files_val.append(f"{DIR_VAL}images/{filename}")

val_df = pd.read_csv(
    DIR_VAL + 'val_annotations.txt',
    sep='\t',
    names=["File", "Label", "X1", "Y1", "X2", "Y2"],
    usecols=["File", "Label"]
)

for f in files_val:
    l = val_df.loc[val_df['File'] == f[len(f"{DIR_VAL}images/"):]]['Label'].values[0]
    labels_val.append(1)


In [16]:
# List of files for testing (10'000 items)
files_test = []
for filename in os.listdir(f"{DIR_TEST}images/"):
    files_test.append(f"{DIR_TEST}images/{filename}")
    files_test = sorted(files_test)

In [17]:
print("train images: ", files_train[:5])
print("\ntrain labels:", labels_train[:5])
print("\nvalidation images: ", files_val[:5])
print("\nvalidation labels:", labels_val[:5])
print("\ntest images: ", files_test[:5])

train images:  ['datasets/tiny-imagenet-200/train/n01443537/images/n01443537_0.JPEG', 'datasets/tiny-imagenet-200/train/n01443537/images/n01443537_1.JPEG', 'datasets/tiny-imagenet-200/train/n01443537/images/n01443537_10.JPEG', 'datasets/tiny-imagenet-200/train/n01443537/images/n01443537_100.JPEG', 'datasets/tiny-imagenet-200/train/n01443537/images/n01443537_101.JPEG']

train labels: [['n01443537', 'n01629819', 'n01641577', 'n01644900', 'n01698640', 'n01742172', 'n01768244', 'n01770393', 'n01774384', 'n01774750', 'n01784675', 'n01855672', 'n01882714', 'n01910747', 'n01917289', 'n01944390', 'n01945685', 'n01950731', 'n01983481', 'n01984695', 'n02002724', 'n02056570', 'n02058221', 'n02074367', 'n02085620', 'n02094433', 'n02099601', 'n02099712', 'n02106662', 'n02113799', 'n02123045', 'n02123394', 'n02124075', 'n02125311', 'n02129165', 'n02132136', 'n02165456', 'n02190166', 'n02206856', 'n02226429', 'n02231487', 'n02233338', 'n02236044', 'n02268443', 'n02279972', 'n02281406', 'n02321529', '