# 1. Problem Statement

Images are one of the major sources of data in the field of data science and AI. This field is making appropriate use of information that can be gathered through images by examining its features and details. We are trying to give you an exposure of how an end to end project is developed in this field. 
The idea behind this project is to build a deep learning-based Image Classification model on images that will be scraped from e-commerce portal. This is done to make the model more and more robust. 
This task is divided into two phases: Data Collection and Mode Building. 


# 2. Import google Colab

we will start with google colab because there no issue with python libraries their dependencies and also its cloud base environment so we will not need a lot of configuration
here we have opened the google colab file and mounted my google drive for accessing the dataset stored in imageclassification.

In [28]:
from google.colab import drive
drive.mount('/content/drive',force_remount=True)

Mounted at /content/drive


# 3. import libraries for dataset reading and CNN (convolutional neural network) model creation.

In [30]:
import os
import cv2
from PIL import Image
import tensorflow as tf
from keras import backend as K
from keras.models import load_model
from keras.preprocessing.image import img_to_array
from tensorflow.keras.optimizers import Adam, RMSprop
from tensorflow.keras.callbacks import ReduceLROnPlateau
from tensorflow.keras.preprocessing.image import ImageDataGenerator

# 4. Creating folder in google drive

we have created folder in google drive with name "imageclassification",which contain two folder taining dataset and testing dataset.Each folder having images of saree,jeans and trouser which has been scrapped from Amazon web sites. 
Also have setup the path of training and testing directories.

In [31]:
base_dir = '/content/drive/MyDrive/imageclassification/Dataset1'
train_dir = '/content/drive/MyDrive/imageclassification/Dataset1/traindata'
train_saree_dir = '/content/drive/MyDrive/imageclassification/Dataset1/traindata/saree'
train_jeans_dir = '/content/drive/MyDrive/imageclassification/Dataset1/traindata/jeans'
train_Trouser_dir = '/content/drive/MyDrive/imageclassification/Dataset1/traindata/Trouser'
test_dir = '/content/drive/MyDrive/imageclassification/Dataset1/testdata'
test_saree_dir = '/content/drive/MyDrive/imageclassification/Dataset1/testdata/saree'
test_jeans_dir = '/content/drive/MyDrive/imageclassification/Dataset1/testdata/jeans'
test_trouser_dir = '/content/drive/MyDrive/imageclassification/Dataset1/testdata/trouser'

# 5. Checking the number of images

In this step we are Checking the number of images in each folder for training and testing and storing in the variables

In [34]:
num_saree_train = len(os.listdir(train_saree_dir))
num_jeans_train=len(os.listdir(train_jeans_dir))
num_trouser_train=len(os.listdir(train_Trouser_dir))
num_trouser_test=len(os.listdir(test_trouser_dir))
num_jeans_test=len(os.listdir(test_jeans_dir))
num_saree_test=len(os.listdir(test_saree_dir))

In [41]:
print("Total Training Saree Images",num_saree_train)
print("Total Training jeans Images",num_jeans_train)
print("Total Training trouser Images",num_trouser_train)
print("--")
print("Total Test Saree Images",num_saree_test)
print("Total Test jeans Images",num_jeans_test)
print("Total test trouser Images",num_trouser_test)
print("--")
total_train = num_saree_train+num_jeans_train+num_trouser_train
total_test = num_saree_test+num_jeans_test+num_trouser_test
print("Total Training Images",total_train)
print("--")
print("Total Testing Images",total_test)

Total Training Saree Images 199
Total Training jeans Images 199
Total Training trouser Images 199
--
Total Test Saree Images 90
Total Test jeans Images 90
Total test trouser Images 90
--
Total Training Images 597
--
Total Testing Images 270


# 6. Setting up Images size

here we have set the size(height, width) of images. This step mostly needs when dataset images have different sizes, it will speed up the training process. I used an image shape of (224,224) becuase VGG-16 algorithm accepts only this image size.

In [42]:
IMG_SHAPE  = 224
batch_size = 32

# 7. Preprocessing of Data

In this step we have preporcessed our data (train and test), which includes, rescaling and shuffling.
also using the Image Data Generator to import the images from the dataset

In [44]:
image_gen_train = ImageDataGenerator(rescale = 1./255)
train_data_gen = image_gen_train.flow_from_directory(batch_size = batch_size,
directory = train_dir,
shuffle= True,
target_size = (IMG_SHAPE,IMG_SHAPE),
class_mode = 'categorical')

image_gen_test = ImageDataGenerator(rescale=1./255)
test_data_gen = image_gen_test.flow_from_directory(batch_size=batch_size,
directory=test_dir,
target_size=(IMG_SHAPE, IMG_SHAPE),
class_mode='categorical')

Found 597 images belonging to 3 classes.
Found 270 images belonging to 3 classes.


# 8. Downloading VGG-16

In this step we have downloaded VGG-16 weights, by including the top layer parameter as false.

In [47]:
pre_trained_model = tf.keras.applications.VGG16(input_shape=(224, 224, 3), include_top=False, weights="imagenet")

Downloading data from https://storage.googleapis.com/tensorflow/keras-applications/vgg16/vgg16_weights_tf_dim_ordering_tf_kernels_notop.h5


# 9. Freezing the training layers

In this step we have freezed the training layers of VGG-16. (because VGG-16, already trained on huge data).

In [49]:
for layer in pre_trained_model.layers:
  print(layer.name)
layer.trainable = False

input_1
block1_conv1
block1_conv2
block1_pool
block2_conv1
block2_conv2
block2_pool
block3_conv1
block3_conv2
block3_conv3
block3_pool
block4_conv1
block4_conv2
block4_conv3
block4_pool
block5_conv1
block5_conv2
block5_conv3
block5_pool


# 10. Modifying the last layer

In this step we have modified the last layer for our classes as all layers of VGG-16 are frozen.We have added one max polling, one dense layer, one dropout, and one output with the last layer of VGG-16.
Since the problem having multiclass so the last dense layer has choosen with 3 and activation with softmax.

In [50]:
last_layer = pre_trained_model.get_layer('block5_pool')
last_output = last_layer.output
x = tf.keras.layers.GlobalMaxPooling2D()(last_output)
x = tf.keras.layers.Dense(512, activation='relu')(x)
x = tf.keras.layers.Dropout(0.5)(x)
x = tf.keras.layers.Dense(3, activation='softmax')(x)

# 11. Merge layers with custom layers

here we have merged the original VGG-16 layers, with our custom layers.

In [51]:
model = tf.keras.Model(pre_trained_model.input, x)

# 12. Compile the model

here we have compiling the model before starting training and since problem is multiclass have choosen categorical_crossentropy

In [52]:
model.compile(optimizer='adam', loss=tf.keras.losses.categorical_crossentropy, metrics=['acc'])

# 13. Checking model summary

In [57]:
model.summary()

Model: "model"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 input_1 (InputLayer)        [(None, 224, 224, 3)]     0         
                                                                 
 block1_conv1 (Conv2D)       (None, 224, 224, 64)      1792      
                                                                 
 block1_conv2 (Conv2D)       (None, 224, 224, 64)      36928     
                                                                 
 block1_pool (MaxPooling2D)  (None, 112, 112, 64)      0         
                                                                 
 block2_conv1 (Conv2D)       (None, 112, 112, 128)     73856     
                                                                 
 block2_conv2 (Conv2D)       (None, 112, 112, 128)     147584    
                                                                 
 block2_pool (MaxPooling2D)  (None, 56, 56, 128)       0     

# 14. Traning the model

I trained the model on five epochs.

In [59]:
vgg_classifier = model.fit(train_data_gen,
steps_per_epoch=(total_train//batch_size),
epochs = 5,
batch_size = batch_size,
verbose = 1)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


# 15. Model testing on testdata

In [60]:
result = model.evaluate(test_data_gen,batch_size=batch_size)
print("test_loss, test accuracy",result)

test_loss, test accuracy [2.6072354316711426, 0.10740740597248077]


# 16. Saving the model

In [None]:
model_json = model.to_json()
with open("/content/drive/MyDrive/imageclassification/image_classification.json", "w") as json_file:
json_file.write(model_json)
model.save("/content/drive/MyDrive/imageclassification/image_classification.h5")
print("Saved model to disk")
model.save_weights("/content/drive/MyDrive/imageclassification/image_classification.h5")