Assignment: Flowers Recognition
Dataset Description:

This dataset contains **4242** images of flowers.
The data collection is based on the data flicr, google images, yandex images.
You can use this datastet to recognize plants from the photo.

Attribute Information:<br>
The pictures are divided into **five classes**: chamomile, tulip, rose, sunflower, dandelion.<br>
For each class there are about **800 photos**. Photos are not high resolution, about **320x240 pixels**.<br>
<b>Also explore how to resize images in tensorflow and then resize all the images to a same size.</b><br>
This is a Multiclass Classification Problem.

WORKFLOW : <br>
Load Data <br>
Split into 60 and 40 ratio.<br>
Encode labels.<br>
Create Model<br>
Compilation Step (Note : Its a Multiclass Classification problem , select loss , metrics according to it)<br>
Train the Model.<br>
If the model gets overfit tune your model by changing the units , No. of layers , epochs , add dropout layer or add Regularizer according to the need .<br>
Prediction should be > 85%<br>
Evaluation Step<br>
Prediction<br>


## **Loading Image Data**

In [None]:
!unzip '/content/drive/MyDrive/flowers archive.zip'   # I collapsed the output because it was unnecessary 4000 lines

In [4]:
import os
os.path.join('/content/flowers', 'flowers')   # Given 2 strings, it makes a path out of them joining them via / )

'/content/flowers/flowers'

### Making a Function for Loading, Resizing, Preprocessing, Normalizing Image Data 

In [5]:
import matplotlib
import os
import PIL


from os import listdir
from matplotlib import image
from PIL import Image
import numpy as np
import PIL.ImageOps

def create_dataset_PIL(imgfolder):
  img_data_array = []
  class_name=[]
  
  for folder in listdir(imgfolder):
      for filename in listdir(os.path.join(imgfolder, folder)):
        if '.jpg' in filename:

          # load images:
          image = Image.open(os.path.join(imgfolder , folder , filename)).resize((320,240))
          image = PIL.ImageOps.grayscale(image)
                              # alternate way via matplotlib : image = image.imread(os.path.join(imgfolder , folder , filename)) 
         # manipulating the image array to our needs:
          image = np.array(image)
          image = image.reshape((320*240,))
          image = image.astype('float32')
          image /= 255 

         #append data + label to their respective lists:
          img_data_array.append(image)
          class_name.append(folder)
        else : 
          pass


  return img_data_array , class_name

### **Loading Data**

In [6]:
image_data , labels = create_dataset_PIL('/content/flowers')

## **Converting Labels to Numbers for encoding purpose**

In [7]:
dicti = dict([(label, index) for index, label in enumerate(np.unique(labels)) ])

In [8]:
target_labels = [dicti[labels[n]] for n in range(len(labels))]

In [9]:
len(target_labels)

4323

In [10]:
len(image_data)

4323

In [11]:
image_data[0].shape

(76800,)

## **Shuffling Dataset**

In [12]:
import random

c = list(zip(image_data, target_labels))

random.shuffle(c)

image_data_shuffled , labels_shuffled = zip(*c)

image_data_shuffled = list(image_data_shuffled)
labels_shuffled = list(labels_shuffled)

In [13]:
labels = labels_shuffled
data = image_data_shuffled

## **Splitting Data**

In [14]:
train_len = len(labels)*6//10
test_len = len(labels) - train_len

In [15]:
train_data = data[0 : train_len].copy()
test_data = data[train_len : train_len + test_len].copy()

train_labels = labels[0 : train_len].copy()
test_labels = labels[train_len : train_len + test_len].copy()

## **One-Hot Encoding of Labels**

In [16]:
from keras.utils.np_utils import to_categorical

train_labels = to_categorical(train_labels)
test_labels = to_categorical(test_labels)

In [17]:
train_labels.dtype

dtype('float32')

In [18]:
train_labels.shape

(2593, 5)

In [19]:
type(train_data)

list

## **additional preprocessing**

In [20]:
# converting lists to numpy arrays
train_data = np.array(train_data, dtype = 'float32')

In [21]:
test_data = np.array(test_data, dtype = 'float32')

In [22]:
train_data.shape , test_data.shape

((2593, 76800), (1730, 76800))

## **Splitting Training Data to obtain Validation Data**

In [23]:
# dividing train_data in ratio of 7:3 for sake of validation

len_partial_train = len(train_data)*7//10
len_validation = train_len - len_partial_train

In [24]:
partial_train_data = train_data[0 : len_partial_train]
val_data = train_data[len_partial_train : ]

In [26]:
partial_train_labels = train_labels[0 : len_partial_train]
val_labels = train_labels[len_partial_train : ]

In [27]:
type(partial_train_labels)

numpy.ndarray

In [28]:
type(val_data)

numpy.ndarray

In [29]:
len_validation

778

In [30]:
partial_train_data.shape

(1815, 76800)

In [31]:
val_data.shape

(778, 76800)

In [32]:
val_labels.shape

(778, 5)

In [33]:
partial_train_labels.shape

(1815, 5)

## **Building Model & Compiling**

In [34]:
from keras import models
from keras import layers
from keras import regularizers

modelF = models.Sequential()

modelF.add(layers.Dense(1024, activation = 'relu', kernel_regularizer = regularizers.l2(0.001), input_shape = (320*240,))) 
modelF.add(layers.Dropout(0.2))
modelF.add(layers.Dense(256, activation = 'relu', kernel_regularizer = regularizers.l2(0.001) ))
modelF.add(layers.Dropout(0.2))
modelF.add(layers.Dense(32, activation = 'relu', kernel_regularizer = regularizers.l2(0.001)))
modelF.add(layers.Dropout(0.2))
modelF.add(layers.Dense(5, activation = 'softmax'))

modelF.compile(optimizer = 'rmsprop',
               loss = 'categorical_crossentropy',
               metrics = ['accuracy'])

## **Training Model**

In [35]:
historyF = modelF.fit(partial_train_data, 
                      partial_train_labels,
                      epochs = 10,
                      batch_size = 32,
                      validation_data = (val_data, val_labels)
                      )

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


## **Evaluating Model on Test Data**

In [36]:
# Retraining model

model = models.Sequential()

model.add(layers.Dense(1024, activation = 'relu', kernel_regularizer = regularizers.l2(0.001), input_shape = (320*240,))) 
model.add(layers.Dropout(0.2))
model.add(layers.Dense(256, activation = 'relu', kernel_regularizer = regularizers.l2(0.001) ))
model.add(layers.Dropout(0.2))
model.add(layers.Dense(32, activation = 'relu', kernel_regularizer = regularizers.l2(0.001)))
model.add(layers.Dropout(0.2))
model.add(layers.Dense(5, activation = 'softmax'))

model.compile(optimizer = 'rmsprop',
               loss = 'categorical_crossentropy',
               metrics = ['accuracy'])

model.fit(train_data, train_labels, epochs = 3, batch_size = 32 , verbose = 0)

model.evaluate(test_data, test_labels)



[1.7071561813354492, 0.2358381450176239]

In [37]:
model.predict(test_data)

array([[0.18857582, 0.22212087, 0.18938379, 0.18693916, 0.21298034],
       [0.18857582, 0.22212087, 0.18938379, 0.18693916, 0.21298034],
       [0.18857582, 0.22212087, 0.18938379, 0.18693916, 0.21298034],
       ...,
       [0.18857582, 0.22212087, 0.18938379, 0.18693916, 0.21298034],
       [0.18857582, 0.22212087, 0.18938379, 0.18693916, 0.21298034],
       [0.18857582, 0.22212087, 0.18938379, 0.18693916, 0.21298034]],
      dtype=float32)