# Classification of Flower Images
## A Comparison between a Conventional ML Model (e.g. Random Forest) vs. a Deep Neural Network Model (e.g. a ConvNet)


The benchmark dataset used for this experiment can be found in the following link:

Dataset: Image source: (http://www.robots.ox.ac.uk/~vgg/data/flowers/17/index.html)

After downloading data, the zip file has to be unzipped. 

This will create a folder named **dataset**. 
Inside this folder there will be two subfolders named - **images** and **masks**. 

**images** folder will contain many images of four category of flowers - crocus, daisy, pansy and sunflower.
There are a total of 234 images.

**masks** folder will contain the binary mask images corresponding to the flower images inside the **images** folder. 

The binary masks can be used to supress the background regions from the original images to take out the regions of the actual flowers.



# Data Preprocessing for Ease of Use

All the images inside the **images** folder have been resized to 256x256 RGB images and put in a disk file named **flower-images-256by256.pkl**.
This pickle file contains a numpy array of dimension (234, 256, 256, 3) -> a total of 234 images each of dimension 256x256 with 3 color channels for RGB.

One more numpy array is used to save the corresponding binary masks - stored in a file named **flower-masks-256by256.pkl**.

The binary masks are used to suppress the background of the images of the flowers before extracting color histograms from the images.


Another pickle file contains the numeric codes representing the labels/categories/target-class of the flowers. This file is named as **flower-labels.pkl**.

### Make sure all three pickle files reside in the current folder before running the rest of the code.








## Load Data From Disk Files

#### Image of flowers stacked as a big numpy array (integer intensity values of image pixels)
#### All images are resized to 256x256 images with 3 channels for RGB planes
#### There are a total of 234 images
#### "flower-images-256by256.pkl"  file contains a big numpy array of the following dimension 234x256x256x3

#### "flower-labels.pkl"  file contains the 234 integer labels for the flowers

#### There are 4 category of flowers labelled with integers 0, 1, 2 and 3

#### Four category of flowers - crocus, daisy, pansy and sunflower

#### >> 0 - crocus, 1-daisy, 2-pansy, 3-sunflower





## Read all the files

In [0]:
import pickle

# original flower image 256x256x3 total 234 images
flower_images = pickle.load(open('flower-images-256by256.pkl','rb')) 

# image mask 256x256 total 234 masks
flower_masks=pickle.load(open('flower-masks-256by256.pkl','rb')) 

# Label encoded numbers ...total 234 labels >> 0 - crocus, 1-daisy, 2-pansy, 3-sunflower
target = pickle.load(open('flower-labels.pkl','rb'))  

print('\n Loaded the files......')


 Loaded the files......


In [0]:
print(type(flower_images))

<type 'numpy.ndarray'>


In [0]:
size=len(flower_images)
print(size)


234


#++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
# PART 1: Flower Image Classification with Random Forest


## Import Libraries

In [0]:

import cv2
import numpy as np
import matplotlib.pyplot as plt


Random Forest and other machine learning classifiers need numeric vectors as input.
They generally don't work on the raw data (e.g. raw images).

Therefore, we need to convert the raw images into **numeric vectors (hand-engineered features)**.

In this experiment, we represent each image with a RGB histogram - which is a frequency distribution of various pixel intensities in the Red, Green and Blue channels.

We write the following custom class for this purpose.


In [0]:

# Create RGB color histogram feature vectors
#------------------------------------------------------------------------------

class RGBHistogram:
	def __init__(self, bins):
		# Store the number of bins for the histogram
		self.bins = bins

	def describe(self, image, mask = None):
		# Compute a 3D RGB histogram and normalize so that images
		# with the same content will have roughly the same histogram
		hist = cv2.calcHist([image], [0, 1, 2], mask, self.bins, [0, 256, 0, 256, 0, 256])
		cv2.normalize(hist, hist)

		# Return 3D histogram as a flattened array
		return hist.flatten()


#------------------------------------------------------------------------------


In [0]:

# Initialize the image descriptor
desc = RGBHistogram([8, 8, 8])

data=[]

for i in range(size):
 image=np.reshape(flower_images[i], (256, 256,3))   
 mask=np.reshape(flower_masks[i], (256, 256))   

 features = desc.describe(image, mask)
 data.append(features)

#print(len(data))

## Import Necessary Library for Machine Learning and Classification Metrics

In [0]:
## Classification of Flower images into different classes

# import the necessary packages for Machine Learning
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report, accuracy_score, confusion_matrix


## Split Data into Training and Test Set

In [0]:
# Construct the training and testing splits
# Keep 70% for training, 30% for testing
(trainData, testData, trainTarget, testTarget) = train_test_split(data, target, test_size = 0.3, random_state = 42)


## Create a Random Forest ML Model

In [0]:
# Initialize and Train the RandomForest Classifier
model_rf = RandomForestClassifier()


## Train the Model on Training Data

In [0]:
model_rf.fit(trainData, trainTarget)

RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
            max_depth=None, max_features='auto', max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, n_estimators=10, n_jobs=None,
            oob_score=False, random_state=None, verbose=0,
            warm_start=False)

## Evaluate the RF Model on Test Data

In [0]:
# Evaluate the classifier
print('\n Classification Report : \n')
print(classification_report(testTarget, model_rf.predict(testData), target_names = ['crocus', 'daisy', 'pansy', 'sunflower']))

print('\n\n Confusion Matrix : \n')
print(confusion_matrix(testTarget, model_rf.predict(testData)))

print('\n\n Classification Acccuracy : \n')
print(accuracy_score(testTarget, model_rf.predict(testData))*100)



 Classification Report : 

              precision    recall  f1-score   support

      crocus       0.31      0.42      0.36        12
       daisy       0.79      0.73      0.76        15
       pansy       0.79      0.75      0.77        20
   sunflower       1.00      0.92      0.96        24

   micro avg       0.75      0.75      0.75        71
   macro avg       0.72      0.70      0.71        71
weighted avg       0.78      0.75      0.76        71



 Confusion Matrix : 

[[ 5  3  4  0]
 [ 4 11  0  0]
 [ 5  0 15  0]
 [ 2  0  0 22]]


 Classification Acccuracy : 

74.64788732394366




# PART 2 : Classify the Raw Flower Images with Convolutional Neural Network




#### Now we will see, how a Convolutional Neural Network will take the raw flower images as input and classify them. 

#### Note that, we are not explicitly converting the images into numeric vectors.
#### The ConvNet is given the **raw pixel values as input**.

#### The ConvNet automatically extracts meaningful features from these flower images and is able to distinguish one type of flower from another.


## Build a ConvNet Model

In [0]:
from keras.models import Sequential
from keras.layers import Dense, Convolution2D, Conv2D, MaxPooling2D, Dropout, Flatten
  
def create_CNN_Model(input_shape, numClasses):
    model = Sequential()
    model.add(Conv2D(16, 3, 3, border_mode='same', activation='relu', input_shape = input_shape))
    model.add(Conv2D(16, 3, 3, border_mode='same', activation='relu'))

    model.add(MaxPooling2D(pool_size=(2, 2)))
    model.add(Dropout(0.25))
  
    model.add(Conv2D(32, 3, 3, border_mode='same', activation='relu'))
    model.add(Conv2D(32, 3, 3, border_mode='same', activation='relu'))
    model.add(MaxPooling2D(pool_size=(2, 2)))
    model.add(Dropout(0.25))
  
    model.add(Conv2D(64, 3, 3, border_mode='same', activation='relu'))
    model.add(Conv2D(64, 3, 3, border_mode='same', activation='relu'))
    model.add(MaxPooling2D(pool_size=(2, 2)))
    model.add(Dropout(0.25))

    model.add(Conv2D(128, 3, 3, border_mode='same', activation='relu'))
    model.add(Conv2D(128, 3, 3, border_mode='same', activation='relu'))
    model.add(MaxPooling2D(pool_size=(2, 2)))
    model.add(Dropout(0.25))

    
    model.add(Flatten())
    model.add(Dense(512, activation='relu'))
    model.add(Dropout(0.5))
    model.add(Dense(numClasses, activation='softmax'))
  
    return model

In [0]:
import pickle


data = pickle.load(open('flower-images-256by256.pkl','rb'))

#target = pickle.load(open('flower-labels.pkl','rb'))

# Note: we are no more using the Binary masks......

### Splitting Data and Convert samples into Float

#### One-Hot-Encode the Output Labels

In [0]:
from sklearn.model_selection import train_test_split
from keras.utils import np_utils

# Split data into training and test set
X_train, X_test, y_train, y_test = train_test_split(data, target, test_size=0.3)


# Convert to Float and Normalize inputs from 0-255 to 0.0-1.0
X_train = X_train.astype('float32')
X_test = X_test.astype('float32')
X_train = X_train / 255.0
X_test = X_test / 255.0


# One-Hot-Encode output labels
y_train = np_utils.to_categorical(y_train)
y_test = np_utils.to_categorical(y_test)


# Count the number of distinct classes available
num_classes = y_test.shape[1]

print('\n Number of distinct classes = {}'.format(num_classes))

# Print the class labels after one-hot-encoded transformation 
#print('\n y_train = {}'.format(y_train))


 Number of distinct classes = 4


## Parameter Setting 

In [0]:
# Set this parameter
input_shape = (256, 256, 3)

batch_size = 16
epochs = 50  

### Create the CNN Model

In [0]:

cnn_model = create_CNN_Model(input_shape, num_classes)
cnn_model.compile(optimizer='rmsprop', loss='categorical_crossentropy', metrics=['accuracy'])
print(cnn_model.summary())

  
  import sys
  if sys.path[0] == '':
  del sys.path[0]


_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_39 (Conv2D)           (None, 256, 256, 16)      448       
_________________________________________________________________
conv2d_40 (Conv2D)           (None, 256, 256, 16)      2320      
_________________________________________________________________
max_pooling2d_20 (MaxPooling (None, 128, 128, 16)      0         
_________________________________________________________________
dropout_25 (Dropout)         (None, 128, 128, 16)      0         
_________________________________________________________________
conv2d_41 (Conv2D)           (None, 128, 128, 32)      4640      
_________________________________________________________________
conv2d_42 (Conv2D)           (None, 128, 128, 32)      9248      
_________________________________________________________________
max_pooling2d_21 (MaxPooling (None, 64, 64, 32)        0         
__________

## Train the Model on Training Data

In [0]:
history = cnn_model.fit(X_train, y_train, batch_size, epochs, verbose=1, validation_data=(X_test, y_test))

Train on 163 samples, validate on 71 samples
Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 35/50
Epoch 36/50
Epoch 37/50
Epoch 38/50
Epoch 39/50
Epoch 40/50
Epoch 41/50
Epoch 42/50
Epoch 43/50
Epoch 44/50
Epoch 45/50
Epoch 46/50
Epoch 47/50
Epoch 48/50
Epoch 49/50
Epoch 50/50


## Evaluate the CNN on Test Data

In [0]:
loss, accuracy = cnn_model.evaluate(X_test, y_test, verbose=0)

print("\n Loss after training for {} epochs = {}".format(epochs, loss))

print("\n Accuracy on validation set = {}".format(accuracy*100))


 Loss after training for 50 epochs = 2.21320889869

 Accuracy on validation set = 73.2394367037
