# Computer Vision Fundamentals

This notebook was a precursor to the project on object detection with YOLO. This was for me to ease into the computer vision problem and also try my hand at what's been called the "Hello World" of neural networks and computer vision. <br>

The goal is simple - correctly identify digits from a dataset of tens of thousands of handwritten images. In essence, this is a multi-class classification problem. Before diving into neural networks and seeing how they perform, let's benchmark a few other models like logistic regression and Support Vector Machines. 

In [41]:
# Lets read in the data from sklearn datasets
import pandas as pd
from sklearn.datasets import fetch_openml

mnist = fetch_openml('mnist_784', version=1, cache=True)

X = mnist["data"]
y = mnist["target"]

In [42]:
X.shape

(70000, 784)

The MNIST data has 70,000 digits. We will use the first 60,000 as training data and the rest as testing data.

In [43]:
X_train = X[:60000]
y_train = y[:60000]
X_test = X[60000:]
y_test = y[60000:]

In [44]:
print("Training data shape:", X_train.shape)
print("Testing data shape:", X_test.shape)

Training data shape: (60000, 784)
Testing data shape: (10000, 784)


The MNIST data basically has 28X28 images of digits from 0 to 9. Each of these pixels is one column in our data, which is why the 784 columns.

It is a good idea to shuffle the images before we train the models since it helps performance on some models

In [45]:
import numpy as np
np.random.seed(42)
shuffled_indices = np.random.permutation(60000)

X_train = X_train[shuffled_indices]
y_train = y_train[shuffled_indices]

One more thing that we can do is to scale the inputs using sklearn `StandardScaler`. This helps models using SGD to converge faster.

In [46]:
from sklearn.preprocessing import StandardScaler

std_scaler = StandardScaler()
X_train_scaled = std_scaler.fit_transform(X_train)
X_test_scaled = std_scaler.transform(X_test)

Our data is now ready for model fitting. Our approach with each model will be as follows - <br>
1. Fit the model on the training data.
2. Get the accuracy on training data.
3. Get cross validation accuracy on the training data. 

**Model 1:** *Logistic Regression* <br>
Logistic regression is a binary classifier which means that it is built for classification problems with two classes. To use it for multi-class classification, there are two strategies - <br>
1. **One vs Rest Classifier** - This means that we build 'k' logistic classifiers (k being the number of classes in the target). Then we use each of these classifiers to predict the output class and use the class with the highest probability score.
2. **One vs One classifier** - Here we build nC2 classifiers and then we train these classifiers on the two classes only. This requires much less data.
3. **Softmax Regression** (Multinomial logistic regression) - This involves representing the target values as vectors and then using the softmax function for the classification.

Here we will train a **One vs Rest logistic regression** classifier.

In [7]:
from sklearn.linear_model import LogisticRegression

# here we specify the OneVsRest classifier strategy
log_reg = LogisticRegression(multi_class = "ovr", solver = "liblinear")

# sklearn by default fits the OneVsRest classifier for logistic regression
log_reg.fit(X_train_scaled, y_train)

# store the output for later use
from sklearn.externals import joblib
joblib.dump(log_reg, "log_reg.pkl")

LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
          intercept_scaling=1, max_iter=100, multi_class='ovr',
          n_jobs=None, penalty='l2', random_state=None, solver='liblinear',
          tol=0.0001, verbose=0, warm_start=False)

In [9]:
from sklearn.externals import joblib
log_reg = joblib.load("log_reg.pkl")

In [10]:
# Train accuracy score
from sklearn.metrics import accuracy_score

y_train_pred = log_reg.predict(X_train_scaled)
accuracy_score(y_train, y_train_pred)

0.93095

In [11]:
# cross validation accuracy
from sklearn.model_selection import cross_val_score

log_reg_scores = cross_val_score(log_reg,
                                X_train_scaled,
                                y_train,
                                cv = 3,
                                scoring = "accuracy")

In [12]:
print("Cross validation score:", np.mean(log_reg_scores))

Cross validation score: 0.9104830889013348


We see that the logistic regression model performs fairly well - A 93% accuracy on the training set and 91% cross validation accuracy suggest that it is performing almost equally well on unseen data as well!

**Model 2:** - *Multinomial logistic regression model - softmax!*

In [16]:
from sklearn.linear_model import LogisticRegression
smax_clf = LogisticRegression(multi_class = "multinomial", solver = 'lbfgs', max_iter = 1000)
smax_clf.fit(X_train_scaled, y_train)

LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
          intercept_scaling=1, max_iter=1000, multi_class='multinomial',
          n_jobs=None, penalty='l2', random_state=None, solver='lbfgs',
          tol=0.0001, verbose=0, warm_start=False)

In [17]:
# train accuracy
y_train_pred = smax_clf.predict(X_train_scaled)
accuracy_score(y_train, y_train_pred)

0.9442

In [18]:
# cross validation accuracy
smax_clf_scores = cross_val_score(smax_clf,
                                 X_train_scaled,
                                 y_train,
                                 cv = 3,
                                 scoring = "accuracy")

In [23]:
print("Cross validation score: {:.2f}".format(np.mean(smax_clf_scores)))

Cross validation score: 0.91


Although the softmax model does better on the training data, we see that it slightly underperforms on unseen data (as indicated by the cross validation scores)

**Model 3:** *Random Forest Classifier*

In [26]:
from sklearn.ensemble import RandomForestClassifier

rf_clf = RandomForestClassifier(n_estimators = 10, random_state = 42)

rf_clf.fit(X_train_scaled, y_train)

# train accuracy
y_train_pred = rf_clf.predict(X_train_scaled)
print("Accuracy: {:.2f}".format(accuracy_score(y_train, y_train_pred)))

Accuracy: 1.00


In [27]:
# cross validation accuracy
from sklearn.model_selection import cross_val_score

rf_clf_scores = cross_val_score(rf_clf,
                                X_train_scaled,
                                y_train,
                                cv = 3,
                                scoring = "accuracy")

print("CV accuracy: {:.2f}".format(np.mean(rf_clf_scores)))

CV accuracy: 0.94


The random forest achieves a 100% accuracy on the training data but it is clearly overfitting since cross validation accuracy is just 94%. This is still better than the models that came before!

## On to Neural Networks

By far, I have applied some of the more fundamental models to the problem of identifying digits in the MNIST data. Let's try fitting a neural network using `keras` and see how it performs. A basic neural network model has the following components - <br>
1. Input data
2. Layers - These take the input data and extract representations relevant to the problem at hand.
3. Loss function - Calculates how far the output produced is from the actual outputs
4. Optimizer - Takes the loss function and translates that into changes in the parameters.

**Model 1:** - Fully connected neural network <br>
For the problem at hand, lets build a neural network with just one hidden layer with 16 neurons and an output layer with 10 neurons - one for each digit.

In [58]:
from keras import models
from keras import layers
from keras.utils.np_utils import to_categorical # convert to one-hot-encoding

X_train_nn = X_train_scaled[:50000]
X_val_nn = X_train_scaled[50000:]

y_train_nn = y_train[:50000]
y_val_nn= y_train[50000:]

network1 = models.Sequential()
network1.add(layers.Dense(16, activation = 'relu', input_shape = (784,)))
network1.add(layers.Dense(10, activation = 'softmax'))

### Loss function and optimizer
For problems with multi-class classification, we generally use categorical cross entropy as the loss function. We will be using the `rmsprop` variant of the gradient descent to fine tune the weights of the network.

In [59]:
network1.compile(loss = 'categorical_crossentropy',
               optimizer = 'rmsprop',
               metrics = ['accuracy'])

### Train the model
Now that we have the basic elements set up, lets just go ahead and train the network on our data!

In [60]:
y_train_cat = to_categorical(y_train_nn)
y_val_cat = to_categorical(y_val_nn)
y_test_cat = to_categorical(y_test)

network1.fit(X_train_nn, 
            y_train_cat, 
            batch_size = 128, 
            epochs = 5,
           validation_data = (X_val_nn, y_val_cat))

Train on 50000 samples, validate on 10000 samples
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<keras.callbacks.History at 0x1d788cc2be0>

**Model 2:** - Convolutional Neural Network <br>
A convolutional neural network, in contrast to the fully connected neural network, has quite a few number of parameters to train. This is also different from the basic neural network in the fact that while we passed a flattened matrix as input above, we will be sending each image as a 4D tensor (of the form (instances, height, width, channels)) into the CNN. 

CNNs are tuned to image classification tasks and this should significantly outperform the above neural network.

Let's see what kind of accuracies we can get using a CNN on the same data!

In [86]:
from keras.optimizers import RMSprop

# reshape - this is important for CNN since it takes its inputs as a height X width array and not a row array
X_train_cnn = X_train_scaled[:50000].reshape(-1, 28, 28, 1)
X_val_cnn = X_train_scaled[50000:].reshape(-1, 28, 28, 1)
X_test_cnn = X_test_scaled.reshape(-1, 28, 28, 1)

# since we will model a softmax output with 10 neurons, we need output with this!
y_train_cnn = to_categorical(y_train[:50000], num_classes = 10)
y_val_cnn = to_categorical(y_train[50000:], num_classes = 10)

# initialize a sequential network
network = models.Sequential()

# add a convolution layer with 32 filters with f = 5, and input image of dimensions 28X28x1
network.add(layers.Conv2D(filters = 32, kernel_size = (5,5), padding = 'Same', 
                          activation ='relu', input_shape = (28,28,1)))

# add another convolution layer with 32 filters and input that of the previous layer
network.add(layers.Conv2D(filters = 32, kernel_size = (5,5), padding = 'Same', activation ='relu'))

# add a maxpool layer with f =2, s = 1
network.add(layers.MaxPool2D(pool_size=(2,2)))

# add a dropout parameter that drops 25% of weights to 0 randomly. This is sort of like regularization for neural networks.
network.add(layers.Dropout(0.25))

# add another convolution layer with 64 filters with f = 3 and input that of the previous layer
network.add(layers.Conv2D(filters = 64, kernel_size = (3,3),padding = 'Same', 
                 activation ='relu'))

# add another convolution layer with 64 filters with f = 3 and input that of the previous layer
network.add(layers.Conv2D(filters = 64, kernel_size = (3,3),padding = 'Same', 
                 activation ='relu'))

# add maxpool layer with f = 2, s = 2
network.add(layers.MaxPool2D(pool_size=(2,2), strides=(2,2)))

# add dropout to drop 25% of weights to 0 randomly
network.add(layers.Dropout(0.25))

# after the 2 conv layers, maxpool, another conv layers and maxpool, we need to flatten the output
network.add(layers.Flatten())

# add a normal neural network with 256 units
network.add(layers.Dense(256, activation = "relu"))

# another dropout
network.add(layers.Dropout(0.5))

# final layer with 10 way output since we have 10 classes
network.add(layers.Dense(10, activation = "softmax"))

network.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_9 (Conv2D)            (None, 28, 28, 32)        832       
_________________________________________________________________
conv2d_10 (Conv2D)           (None, 28, 28, 32)        25632     
_________________________________________________________________
max_pooling2d_5 (MaxPooling2 (None, 14, 14, 32)        0         
_________________________________________________________________
dropout_7 (Dropout)          (None, 14, 14, 32)        0         
_________________________________________________________________
conv2d_11 (Conv2D)           (None, 14, 14, 64)        18496     
_________________________________________________________________
conv2d_12 (Conv2D)           (None, 14, 14, 64)        36928     
_________________________________________________________________
max_pooling2d_6 (MaxPooling2 (None, 7, 7, 64)          0         
__________

Network structure is created, now we setup loss function, optimizer and a metric to calculate

In [87]:
# Define the optimizer
optimizer = RMSprop(lr=0.001)

# Compile the model
network.compile(optimizer = optimizer , 
                loss = "categorical_crossentropy", 
                metrics=["accuracy"])

# train the model on the network
history = network.fit(X_train_cnn,
                      y_train_cnn,
                      batch_size = 80,
                      epochs = 3, 
                      verbose = 2,
                      validation_data = (X_val_cnn, y_val_cnn))

Train on 50000 samples, validate on 10000 samples
Epoch 1/3
 - 467s - loss: 0.2076 - acc: 0.9361 - val_loss: 0.0520 - val_acc: 0.9841
Epoch 2/3
 - 475s - loss: 0.0715 - acc: 0.9793 - val_loss: 0.0442 - val_acc: 0.9873
Epoch 3/3
 - 474s - loss: 0.0561 - acc: 0.9843 - val_loss: 0.0465 - val_acc: 0.9858


We see that the CNN model gives us a validation accuracy of 98.5% which is 4% higher than the neural network with fully connected layers

### Evaluating on the test data

Let's test these models on the test data now!

In [90]:
print("Logistic Regression:", accuracy_score(y_test, log_reg.predict(X_test_scaled)))
print("Softmax Regression:", accuracy_score(y_test, smax_clf.predict(X_test_scaled)))
print("Random Forest:", accuracy_score(y_test, rf_clf.predict(X_test_scaled)))
print("Neural Network (Fully Connected):", network1.evaluate(X_test_scaled, y_test_cat)[1])
print("CNN:", network.evaluate(X_test_scaled.reshape(-1, 28, 28, 1), y_test_cat))

Logistic Regression: 0.9171
Softmax Regression: 0.9212
Random Forest: 0.9474
Neural Network (Fully Connected): 0.9428
CNN: [0.03470213618227135, 0.9897]


We see that the CNN outperforms all the other models by a significant margin. I can improve the model still with other techniques such as data augmentation and learning schedules!