# DS CONNECT 11
## How to build your own image classifier with limited images

Note: Code is highly abstracted. Bulk of the logic is in utils.py

Import the necessary libraries first

In [None]:
import gc
gc.collect()
import os
os.environ["CUDA_DEVICE_ORDER"]="PCI_BUS_ID"
os.environ["CUDA_VISIBLE_DEVICES"]="0" # Select GPU #0
import keras
import utils
%matplotlib inline

Split the images into test and train folders with the preprocess function

In [None]:
utils.preprocess(test_percentage=0.50, augment=False)

Lets take a look at some images in our datasets

In [None]:
utils.show_data()

Create a basic CNN and look at its summary

In [None]:
model = utils.create_basic_CNN()
model.summary()

Compile the model

In [None]:
utils.compile(model)

Train the model

In [None]:
#utils.train(model, 'train', 'test', 1000, 'model.h5')
model = keras.models.load_model("model.h5")

Evaluate the model

In [None]:
utils.evaluate(model, 'train', 'test')

## Normal CNN takes a long time to train, lets try transfer learning

Create CNN Base of pretrained model

In [None]:
#Use transfer learning for feature extraction
feature_extractor = utils.create_conv_base()

Extract features and labels from the images

In [None]:
train_features, train_labels = utils.extract_features(feature_extractor, 'train', utils.num_samples('train'))
test_features, test_labels = utils.extract_features(feature_extractor, 'test', utils.num_samples('test'))
train_features, test_features = utils.reshape_features(train_features), utils.reshape_features(test_features)

Create multi-layer-perceptron network for training the features

In [None]:
mlp = utils.create_MLP()

Compile the mlp

In [None]:
utils.compile(mlp)

Train the mlp

In [None]:
#utils.train_MLP(mlp, train_features, train_labels, test_features, test_labels, 5000, 'mlp.h5') # 141 epochs
mlp = keras.models.load_model("mlp.h5")

Evaluate the mlp

In [None]:
utils.evaluate_MLP(mlp, test_features, test_labels)

We get much better results even with less training time

# What if you have less training data? 
## We will simulate this by only using 10% of data as training data.

In [None]:
utils.preprocess(test_percentage=0.90, augment=False) # Use only 10% of data as training data

In [None]:
train_features, train_labels = utils.extract_features(feature_extractor, 'train', utils.num_samples('train'))
test_features, test_labels = utils.extract_features(feature_extractor, 'test', utils.num_samples('test'))
train_features, test_features = utils.reshape_features(train_features), utils.reshape_features(test_features)

In [None]:
mlp2 = utils.create_MLP()
utils.compile(mlp2)
#utils.train_MLP(mlp2, train_features, train_labels, test_features, test_labels, 5000, 'mlp2.h5') # 453 epochs
mlp2 = keras.models.load_model("mlp2.h5")

In [None]:
utils.evaluate_MLP(mlp2, test_features, test_labels)

Notice that we get much poorer results when we use significantly less data

Let's see how we can improve this further with data augmentation, which is particularly useful when we have a small dataset

How this works is that we augment the images randomly before extracting features

We will compare the results with and without image augmentation

In [None]:
utils.preprocess(test_percentage=0.90, augment=True) # Also use 10% of data as training data, but augment to x10 the size

In [None]:
train_features, train_labels = utils.extract_features(feature_extractor, 'aug_train', utils.num_samples('aug_train'))
test_features, test_labels = utils.extract_features(feature_extractor, 'test', utils.num_samples('test'))
train_features, test_features = utils.reshape_features(train_features), utils.reshape_features(test_features)

Lets look at some augmented images

In [None]:
utils.show_augment_image() # go to aug_train folder to see images

Let's see the results. We create, compile and train the MLP

In [None]:
mlp3 = utils.create_MLP()
utils.compile(mlp3)
#utils.train_MLP(mlp3, train_features, train_labels, test_features, test_labels, 5000, 'mlp3.h5') # 92 epochs
mlp3 = keras.models.load_model("mlp3.h5")

In [None]:
utils.evaluate_MLP(mlp3, test_features, test_labels)

Notice that we get much better accuracy because we 'expanded' our dataset through image augmentation.

## Lets have some fun predicting images with our classifer

We use mlp which is our classifier trained on 50% of the data

In [None]:
# Places images in predict\images folder
# Output will also be found in output folder
# Note that when this is run, it overwrites all data in output folder
utils.predict_and_show(mlp)