
## Classification of Flower Images using Image Processing and Machine Learning Techniques



The benchmark dataset used for this experiment can be found in the following link:

Dataset: Image source: (http://www.robots.ox.ac.uk/~vgg/data/flowers/17/index.html)

After downloading data, the zip file has to be unzipped. 

This will create a folder named **dataset**. 
Inside this folder there will be two subfolders named - **images** and **masks**. 

**images** folder will contain many images of four category of flowers - crocus, daisy, pansy and sunflower.
There are a total of 234 images.

**masks** folder will contain the binary mask images corresponding to the flower images inside the **images** folder. 

The binary masks can be used to supress the background regions from the original images to take out the regions of the actual flowers.



# Data Preprocessing for Ease of Use

All the images inside the **images** folder have been resized to 256x256 RGB images and put in a disk file named **flower-images-256by256.pkl**.
This pickle file contains a numpy array of dimension (234, 256, 256, 3) -> a total of 234 images each of dimension 256x256 with 3 color channels for RGB.

One more numpy array is used to save the corresponding binary masks - stored in a file named **flower-masks-256by256.pkl**.

The binary masks are used to suppress the background of the images of the flowers before extracting color histograms from the images.


Another pickle file contains the numeric codes representing the labels/categories/target-class of the flowers. This file is named as **flower-labels.pkl**.

### Make sure all three pickle files reside in the current folder before running the rest of the code.








## Load Data From Disk Files

#### Image of flowers stacked as a big numpy array (integer intensity values of image pixels)
#### All images are resized to 256x256 images with 3 channels for RGB planes
#### There are a total of 234 images
#### "flower-images-256by256.pkl"  file contains a big numpy array of the following dimension 234x256x256x3

#### "flower-labels.pkl"  file contains the 234 integer labels for the flowers

#### There are 4 category of flowers labelled with integers 0, 1, 2 and 3

#### Four category of flowers - crocus, daisy, pansy and sunflower

#### >> 0 - crocus, 1-daisy, 2-pansy, 3-sunflower





## Read all the files

In [None]:
import pickle

# original flower image 256x256x3 total 234 images
flower_images = pickle.load(open('flower-images-256by256.pkl','rb')) 

# image mask 256x256 total 234 masks
flower_masks=pickle.load(open('flower-masks-256by256.pkl','rb')) 

# Label encoded numbers ...total 234 labels >> 0 - crocus, 1-daisy, 2-pansy, 3-sunflower
target = pickle.load(open('flower-labels.pkl','rb'))  

print('\n Loaded the files......')


 Loaded the files......


In [None]:
print(type(flower_images))

<type 'numpy.ndarray'>


In [None]:
size=len(flower_images)
print(size)


234


### Import Dependencies
Import necessary libraries and dependencies.

In [None]:
## Classification of Flower images into different classes

# import the necessary packages
from sklearn.preprocessing import LabelEncoder

from keras.models import Sequential
from keras.layers import Dense
from keras.optimizers import RMSprop
from keras.utils import plot_model

from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report, confusion_matrix

import numpy as np
import cv2

import matplotlib.pyplot as plt

###Feature Extraction from Images

Every machine learning algorithm requires numeric features to be fed as input.
We need to extract features from these images. 
Here, we are going to extract the RGB histogram from each color images. This histogram will act as a feature vector and represent an image.

Let's write a helper function for this.

In [None]:
# Create RGB color histogram feature vectors
#------------------------------------------------------------------------------

class RGBHistogram:
	def __init__(self, bins):
		# store the number of bins the histogram will use
		self.bins = bins

	def describe(self, image, mask = None):
		# compute a 3D histogram in the RGB colorspace,
		# then normalize the histogram so that images
		# with the same content, but either scaled larger
		# or smaller will have (roughly) the same histogram
		hist = cv2.calcHist([image], [0, 1, 2], mask, self.bins, [0, 256, 0, 256, 0, 256])
		cv2.normalize(hist, hist)

		# return out 3D histogram as a flattened array
		return hist.flatten()

#------------------------------------------------------------------------------

### Read Data 

Lets us read the data from files 

In [None]:
# Initialize the image descriptor
desc = RGBHistogram([8, 8, 8])

data=[]

for i in range(size):
 image=np.reshape(flower_images[i], (256, 256,3))   
 mask=np.reshape(flower_masks[i], (256, 256))   

 features = desc.describe(image, mask)
 data.append(features)

#print(len(data))

### Data Transformation

Data read needs to be transformed. The class labels are categorical data. We need to convert the labels into numeric values.

In [None]:
print('\n Target : {}'.format(target)) # class labels for all the images corresponding to the four classes/categories 

# grab the unique target names and encode the labels
targetNames = np.unique(target)   # classes crocus, daisy, pansy, sunflower
print('\n Unique target labels are : {}'.format(targetNames))


le = LabelEncoder()
# convert class labels to numbers 0 1 2 3 corresponding to the four classes 
target = le.fit_transform(target)
#print('\n Target : {}'.format(target)) 



 Target : ['daisy', 'daisy', 'daisy', 'daisy', 'daisy', 'daisy', 'daisy', 'daisy', 'daisy', 'daisy', 'daisy', 'daisy', 'daisy', 'daisy', 'daisy', 'daisy', 'daisy', 'daisy', 'daisy', 'daisy', 'daisy', 'daisy', 'crocus', 'daisy', 'daisy', 'crocus', 'daisy', 'crocus', 'crocus', 'daisy', 'daisy', 'crocus', 'crocus', 'crocus', 'crocus', 'crocus', 'crocus', 'crocus', 'crocus', 'crocus', 'crocus', 'crocus', 'crocus', 'crocus', 'crocus', 'crocus', 'crocus', 'crocus', 'crocus', 'sunflower', 'sunflower', 'sunflower', 'sunflower', 'sunflower', 'sunflower', 'sunflower', 'sunflower', 'sunflower', 'sunflower', 'sunflower', 'sunflower', 'sunflower', 'sunflower', 'sunflower', 'sunflower', 'sunflower', 'sunflower', 'sunflower', 'sunflower', 'sunflower', 'sunflower', 'sunflower', 'sunflower', 'sunflower', 'sunflower', 'sunflower', 'sunflower', 'sunflower', 'sunflower', 'sunflower', 'sunflower', 'sunflower', 'sunflower', 'sunflower', 'sunflower', 'sunflower', 'sunflower', 'sunflower', 'sunflower', 'sunf

In [None]:
from keras.utils import to_categorical
#ohot_target = to_categorical(target, num_classes=4)

### Train and Test Split

In [None]:
data = np.array(data)

In [None]:
# Construct the training and testing splits
# Keep 70% for training, 30% for testing
X_train, X_test, y_train, y_test = train_test_split(data, target, test_size = 0.3, stratify=True, random_state = 42)

In [None]:
train_length = len(X_train)
test_length = len(X_test)

In [None]:
X_train = X_train.reshape(train_length, 512)
X_test = X_test.reshape(test_length, 512)
X_train = X_train.astype('float32')
X_test = X_test.astype('float32')
X_train /= 255
X_test /= 255

print(train_length, 'train samples')
print(test_length, 'test samples')

# Convert numeric class labels to one-hot-encoded vectors
y_train = to_categorical(y_train, num_classes=4)
y_test = to_categorical(y_test, num_classes=4)

In [None]:
X_train.shape

In [None]:
X_train[0,:].shape

In [None]:
X_test.shape

In [None]:
y_train.shape

In [None]:
y_test.shape

## Build and Compile the MLP Model

In [None]:
num_classes = 4

model = Sequential()
model.add(Dense(1024, activation='sigmoid', input_shape=(512,)))
model.add(Dense(512, activation='sigmoid'))
model.add(Dense(128, activation='sigmoid'))
model.add(Dense(num_classes, activation='softmax'))

model.summary()

model.compile(loss='categorical_crossentropy', optimizer=RMSprop(), metrics=['accuracy'])

## Train the MLP - use the Validation Set

In [None]:
history = model.fit(X_train, y_train, batch_size=16, epochs=100, validation_data=(X_test, y_test))

### Plot the History of Accuracy and Loss
Plot the history of the model suring training. How te accuracy changes over time. How the profile of the loss funcrtion changes over the number of training iterations.

In [None]:
# Plot Loss/Accuracy Profile of the model on Training Data
plt.figure(figsize=(12,10))
plt.subplot(211)
plt.title('Loss-Accuracy on Training Data')
plt.plot(history.history['loss'], label='loss')
plt.plot(history.history['accuracy'], label='accuracy')
plt.legend()

In [None]:
# Plot Loss/Accuracy Profile of the model on Test/Validation Data
plt.figure(figsize=(12,10))
plt.subplot(211)
plt.title('Loss-Accuracy on Validation Data')
plt.plot(history.history['val_loss'], label='loss')
plt.plot(history.history['val_accuracy'], label='accuracy')
plt.legend()

### Model Evaluation

Confusion matrix shows that the model produces about 70%  accuracy

In [None]:
# Evaluate the classifier
pred = model.predict(X_test)
pred = np.argmax(pred, axis=1)
test_labels = np.argmax(y_test, axis=1)

print('\n Confusion Matrix : \n\n')
print(confusion_matrix(test_labels, pred))

print('\n Classification Report : \n\n')
print(classification_report(test_labels,pred, target_names = targetNames))