# Image Classification - 101 (using Scikit-Learn)

In [1]:
import os
import numpy as np
import _pickle as cPickle
import sklearn

In [2]:
def _load_cifar10_batch(file): 
    with open(file, 'rb') as fo:
        dict = cPickle.load(fo, encoding='latin1') 
    return dict['data'].reshape(-1, 32, 32, 3), dict['labels'] # reshaping the data to 32 x 32 x 3  

In [3]:
print('Loading...') 
batch_fns = [os.path.join("../data/", 'cifar-10-batches-py', 'data_batch_' + str(i)) for i in range(1, 6)] 
data_batches = [_load_cifar10_batch(fn) for fn in batch_fns] 

Loading...


In [4]:
data_all = np.vstack([data_batches[i][0] for i in range(len(data_batches))]).astype('float') 
labels_all = np.vstack([data_batches[i][1] for i in range(len(data_batches))]).flatten() 

#### Subset Generation

We are going to use only a subset of CIFAR-10 dataset.

The dataset with 50,000 samples is split in the ratio 92:8. This split is done to take a smaller portion of 50000 samples (i.e the 8% contains only 4000 images).

These 4000 samples are used for generating the train and test sets for classification.

Here, **StratifiedShuffleSplit** is used to split the dataset. It splits the data by taking equal number of samples from each class in a random manner.

In [5]:
# Splitting the whole training set into 92:8
seed=7
from sklearn.model_selection import StratifiedShuffleSplit
# Creating data_split object with 8% test size 
# data_split = StratifiedShuffleSplit(data_all, 1, test_size=0.08, random_state=seed) 

sss = StratifiedShuffleSplit(n_splits=1, test_size=0.08, random_state=seed)
sss.get_n_splits(data_all, labels_all)
print(sss)

StratifiedShuffleSplit(n_splits=1, random_state=7, test_size=0.08,
            train_size=None)


In [6]:
for train_index, test_index in  sss.split(data_all, labels_all):
    split_data_92, split_data_8 = data_all[train_index], data_all[test_index]        
    split_label_92, split_label_8 = labels_all[train_index], labels_all[test_index]

#### 4000 samples are split in the ratio 7:3. (i.e., 2800 for training and 1200 for testing) using StratifiedShuffleSplit.

In [7]:
# Splitting the training set into 70 and 30
# test_size=0.3 denotes that 30% of the dataset is used for testing.
# train_test_split = StratifiedShuffleSplit(split_label_8, 1, test_size=0.3,random_state=seed) 
split8 = StratifiedShuffleSplit(n_splits=1, test_size=0.3, random_state=seed)
split8.get_n_splits(split_data_8, split_label_8)
for train_index, test_index in split8.split(split_data_8, split_label_8):
    train_data_70, test_data_30 = split_data_8[train_index], split_data_8[test_index]     
    train_label_70, test_label_30 = split_label_8[train_index], split_label_8[test_index]
train_data = train_data_70 # assigning to variable train_data
train_labels = train_label_70 # assigning to variable train_labels
test_data = test_data_30
test_labels = test_label_30

In [8]:
print('train_data: ', train_data.shape)
print('train_labels: ', train_labels.shape)
print('test_data: ', test_data.shape)
print('test_labels: ', test_labels.shape)

train_data:  (2800, 32, 32, 3)
train_labels:  (2800,)
test_data:  (1200, 32, 32, 3)
test_labels:  (1200,)


#### Need for Preprocessing

Using the Data preprocessing step, the raw data is converted into a form suitable for subsequent analysis. All the steps before data training (model creation) can be considered as a pre-processing step.

The quality of an image is greatly influenced by its clarity and the device used to capture it.

The captured image may contain noise and irregularities, which can be removed via preprocessing steps.

Some of the common preprocessing techniques include:

- Normalization
- Dimensionality reduction (eg. PCA, SVD)
- Feature Extraction (e.g. SIFT, HOG)
- Whitening
- Denoising
- Contrast Stretching
- Background subtraction
- Image Enhancement
- Smoothing

#### Normalization

Normalization is the process of converting the pixel intensity values to a normal state.

It follows a normal distribution.

A normalized image has mean = 0 and variance = 1

In [9]:
# Definition of normalization function
def normalize(data, eps=1e-8): 
    data -= data.mean(axis=(1, 2, 3), keepdims=True) 
    std = np.sqrt(data.var(axis=(1, 2, 3), ddof=1, keepdims=True)) # calculating standard deviation
    std[std < eps] = 1. 
    data /= std 
    return data 

In [10]:
# Calling the function
train_data = normalize(train_data) 
test_data = normalize(test_data) 
# prints the shape of train data and test data 
print('train_data: ', train_data.shape)
print('test_data: ', test_data.shape)

train_data:  (2800, 32, 32, 3)
test_data:  (1200, 32, 32, 3)


#### ZCA Whitening

Normalization is followed by a ZCA whitening process.

The main aim of whitening is to reduce data redundancy, which means the features are less correlated and have the same variance.

ZCA stands for zero-phase component analysis. ZCA whitened images resemble the normal image.

In [11]:
# Computing whitening matrix 
train_data_flat = train_data.reshape(train_data.shape[0], -1).T
test_data_flat = test_data.reshape(test_data.shape[0], -1).T
print('train_data_flat: ', train_data_flat.shape)
print('test_data_flat: ', test_data_flat.shape)
train_data_flat_t = train_data_flat.T
test_data_flat_t = test_data_flat.T

train_data_flat:  (3072, 2800)
test_data_flat:  (3072, 1200)


#### Principle Component Analysis (PCA)

The major function of PCA is to decompose a multivariate dataset into a set of successive orthogonal components. These orthogonal components explain a maximum amount of the variance.

PCA is a dimensionality reduction technique.

The whitened data is given as the input to PCA.

To explore more on PCA, refer this https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html.

In [12]:
from sklearn.decomposition import PCA
# n_components specify the no.of components to keep
train_data_pca = PCA(n_components=train_data_flat.shape[1]).fit_transform(train_data_flat)
test_data_pca = PCA(n_components=test_data_flat.shape[1]).fit_transform(test_data_flat)
train_data_pca = train_data_pca.T
test_data_pca = test_data_pca.T

#### Singular Value Decomposition (SVD)

SVD is a dimensionality reduction technique that has been used in several fields such as image compression, face recognition, and noise filtering.

In this method, a digital image (generally considered as a matrix) is decomposed into three other matrices.

The singular values (less in number) obtained from this refactoring process can preserve useful features of the original image without utilizing high storage space in the memory.

For further details, click http://web.mit.edu/be.400/www/SVD/Singular_Value_Decomposition.htm.

In [13]:
from skimage import color
# Definition for SVD
def svdFeatures(input_data):
    svdArray_input_data = []
    size = input_data.shape[0]
    for i in range (0, size):
        img = color.rgb2gray(input_data[i])
        U, s, V = np.linalg.svd(img, full_matrices=False);
        S = [s[i] for i in range(30)]
        svdArray_input_data.append(S)
        svdMatrix_input_data = np.matrix(svdArray_input_data)
    return svdMatrix_input_data

In [14]:
# Apply SVD for train and test data
train_data_svd=svdFeatures(train_data)
test_data_svd=svdFeatures(test_data)

#### Scale-Invariant Feature Transform for Feature Generation (SIFT)

SIFT is mainly used for images that are less simple and less organized.

Even the photographs of the same material will undergo scale change corresponding to the distance from the material, focal length etc. This is one of the reasons for not considering the raw pixel values as useful features for images.

The main aim of using SIFT for feature extraction is to obtain features that are not sensitive to changes in scale, rotation, image resolution, illumination, etc.

The major steps involved in SIFT algorithm are:

- Scale-space Extrema Detection
- Keypoint Localization
- Orientation Assignment
- Keypoint Descriptor

For further details, refer https://opencv-python-tutroals.readthedocs.io/en/latest/py_tutorials/py_feature2d/py_sift_intro/py_sift_intro.html.

#### Model Building

Here, train_data_flat_t can be replaced with train_data_pca or train_data_svd for PCA and SVD respectively.

In [15]:
from sklearn import svm # creating a svm classifier model
clf = svm.SVC(gamma=.001,probability=True) # model training
clf.fit(train_data_flat_t, train_labels) # after being fitted, the model can then be used to predict the output.

SVC(gamma=0.001, probability=True)

In [16]:
predicted = clf.predict(test_data_flat_t)
score = clf.score(test_data_flat_t,test_labels) # classification score.
print("score",score)

score 0.38666666666666666


Similarly, test_data_flat_t can be replaced with test_data_pca or test_data_svd.

Above mentioned conventional classification algorithms could not give significant accuracy. But, a better performance can be achieved by using deep learning techniques like **Convolutional Neural Networks (CNN)**.

### Convolutional Neural Networks (CNN)

Deep learning has become more important for learning complex algorithms. It is a more refined form of machine learning, which is based on neural networks that emulate the brain.

Neural network consists of:

- input layer
- hidden layers
- output layer

Each layer is composed of nodes, where the computation happens.

Neural Network consists of interconnected neurons that passes messages between each other.

CNN is a special case of neural networks that consists of multiple convolutional layers, pooling layers and finally, fully connected layers.

The improved network structure helps in saving memory and computational complexity. They are mainly used in pattern and image recognition problems.

In [17]:
from sklearn import metrics
conf_matrix = metrics.confusion_matrix(test_labels, predicted)
print("Confusion matrix:", conf_matrix)

Confusion matrix: [[47 13 11  4  1  6  4  5 20  9]
 [ 2 60  4 11  9  7  4  5  7 11]
 [15  8 31 14 15 11  9  7  8  2]
 [ 3  4 10 37 11 27 12  9  3  4]
 [ 7  4 16  8 30 10 19 11  7  8]
 [ 1  4 13 24  9 43 17  6  1  2]
 [ 0  6 18 17 17 11 43  5  0  3]
 [ 4  2  5 11 17  9  8 48  1 15]
 [10 14  1  5  2  6  1  2 62 17]
 [ 3 22  3  6  0  4  5  3 11 63]]


### Class-wise accuracy
CA = (Correctly predicted images of a class / (Total images of the class)) * 100

In [18]:
# To see the accuracy of each class. 
accuracy = []
leng = len(conf_matrix) # finding the length of confusion matrix
for i in range(leng): 
# Each diagonal element (conf_matrix[i,i]) is divided by the sum of the elements of that particular row 
# (conf_matrix[i].sum()).
    ac = (conf_matrix[i,i] / ((conf_matrix[i].sum()) + .0000001)) * 100 
    accuracy.append(ac)

print(accuracy)

[39.16666663402778, 49.99999995833333, 25.833333311805557, 30.83333330763889, 24.999999979166667, 35.83333330347222, 35.83333330347222, 39.999999966666664, 51.666666623611114, 52.49999995625]


In [19]:
# Overall accuracy is given by, OA = Sum of class-wise accuracy/no of classes
summation = 0
no_of_classes = 10
for i in range(0,len(accuracy)):
    summation += accuracy[i]

overall_accuracy = summation / no_of_classes
print(overall_accuracy)

38.66666663444444


## END