

https://www.hackerearth.com/challenges/competitive/hackerearth-deep-learning-challenge-auto-tag-images-gala/

**Problem statement** 

Galas are the biggest party of the year. Hosting firms of these events are well aware that everyone from around the world has their eyes on these nights—be it for inspiration or for critique. It takes months of meticulous planning and delegation to host these events impeccably.

One such firm has decided to take a data-driven approach for planning their gala nights. Aesthetics and entertainment are the most crucial segments of these events. So, this firm has hired you to help them aggregate and classify all images. These images are published by attendees and the paparazzi on various social media channels and other sources. You are required to build an image auto-tagging model to classify these images into separate categories.

Dataset

The dataset consists of 5,983 images that belong to 4 categories. These categories are food, attire, decor and signage, and miscellaneous.

The benefits of practicing this problem by using Machine Learning or Deep Learning techniques are as follows:

This challenge encourages you to apply your Machine Learning skills to build models that classify images into multiple categories
This challenge helps you enhance your knowledge of classification actively. It is one of the basic building blocks of Machine Learning and Deep Learning techniques.
You are required to build a model that auto-tag images and classifies them into various categories of aesthetics and entertainment for a gala night.

**Uploading the ZIP file containing Images**

In [None]:
from google.colab import files
files.upload()

Saving 9d34462453e311ea.zip to 9d34462453e311ea.zip


In [None]:
# Check version of TensorFlow and using the latest tensorflow library
import tensorflow
print(tensorflow.__version__)


2.2.0-rc2


***TensorFlow and Keras Libraries ***

In [None]:
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
import pandas as pd 
import os 
import shutil
from pathlib import Path  
import numpy as np 
import matplotlib.pyplot as plt 

**Unzipping the Images**

In [None]:
ls

9d34462453e311ea.zip  [0m[01;34msample_data[0m/


In [None]:
!unzip 9d34462453e311ea.zip

[1;30;43mStreaming output truncated to the last 5000 lines.[0m
  inflating: dataset/Train Images/image2213.jpg  
  inflating: dataset/Train Images/image739.jpg  
  inflating: dataset/Train Images/image9406.jpg  
  inflating: dataset/Train Images/image9279.jpg  
  inflating: dataset/Train Images/image6600.jpg  
  inflating: dataset/Train Images/image6137.jpg  
  inflating: dataset/Train Images/image6158.jpg  
  inflating: dataset/Train Images/image5581.jpg  
  inflating: dataset/Train Images/image7045.jpg  
  inflating: dataset/Train Images/image3447.jpg  
  inflating: dataset/Train Images/image2005.jpg  
  inflating: dataset/Train Images/image1976.jpg  
  inflating: dataset/Train Images/image1090.jpg  
  inflating: dataset/Train Images/image3729.jpg  
  inflating: dataset/Train Images/image6098.jpg  
  inflating: dataset/Train Images/image4844.jpg  
  inflating: dataset/Train Images/image6681.jpg  
  inflating: dataset/Train Images/image4204.jpg  
  inflating: dataset/Train Images/im

In [None]:
ls

9d34462453e311ea.zip  [0m[01;34mdataset[0m/  [01;34msample_data[0m/


In [None]:
pwd

'/content'

In [None]:
cd dataset


/content/dataset


**Renaming the directories to train and test**

In [None]:
mv 'Train Images' train

In [None]:
mv  'Test Images' test

In [None]:
ls

[0m[01;34mtest[0m/  test.csv  [01;34mtrain[0m/  train.csv


**Importing the train and test csv to pandas dataframe**




In [None]:
import pandas as pd
train = pd.read_csv('train.csv')

In [None]:
test = pd.read_csv("test.csv")

In [None]:
train.head(5)

Unnamed: 0,Image,Class
0,image7042.jpg,Food
1,image3327.jpg,misc
2,image10335.jpg,Attire
3,image8019.jpg,Food
4,image2128.jpg,Attire


In [None]:
train['Class'].value_counts()

Food                    2278
Attire                  1691
misc                    1271
Decorationandsignage     743
Name: Class, dtype: int64

***Clearly there is a class imbalance***

In [None]:
pwd

'/content/dataset'

**Making separate directory for each class and copying the images into them**

In [None]:
cd train

/content/dataset/train


In [None]:
mkdir Food

In [None]:
mkdir Attire

In [None]:
mkdir misc

In [None]:
mkdir Decorationandsignage

In [None]:
food = train['Image'][train['Class']== 'Food'].values

In [None]:
food.shape

(2278,)

In [None]:
attire = train['Image'][train['Class']== 'Attire'].values

In [None]:
attire.shape

(1691,)

In [None]:
misc = train['Image'][train['Class']== 'misc'].values

In [None]:
misc.shape

(1271,)

In [None]:
ds = train['Image'][train['Class']== 'Decorationandsignage'].values

In [None]:
ds.shape

(743,)

In [None]:
from shutil import copyfile, copy2

In [None]:
pwd

'/content/dataset/train'

In [None]:
cd ../

/content/dataset


In [None]:
pwd

'/content/dataset'

In [None]:
for img in os.listdir('train'):
  if img in food:
    copy2('train/' + img, 'train/Food')

In [None]:
for img in os.listdir('train'):
  if img in attire:
    copy2('train/' + img, 'train/Attire')

In [None]:
for img in os.listdir('train'):
  if img in misc:
    copy2('train/' + img, 'train/misc')

In [None]:
for img in os.listdir('train'):
  if img in ds:
    copy2('train/' + img, 'train/Decorationandsignage')

In [None]:
cd train/Food

/content/dataset/train/Food


In [None]:
ls|wc

   2278    2278   31975


In [None]:
cd ../../../


/content


In [None]:
pwd

'/content'

**Creating directories for each class under image folder to keep the images after applying Data Augmentation**

In [None]:
cd dataset

/content/dataset


In [None]:
mkdir images

In [None]:
cd images

/content/dataset/images


In [None]:
mkdir Food

In [None]:
mkdir Attire

In [None]:
mkdir misc

In [None]:
mkdir Decorationandsignage

In [None]:
pwd

'/content/dataset/images'

In [None]:
cd ../../

/content


In [None]:
pwd

'/content'

In [None]:
cd dataset

/content/dataset


In [None]:
datagen = tf.keras.preprocessing.image.ImageDataGenerator(
        featurewise_center=False,  # set input mean to 0 over the dataset
        samplewise_center=False,  # set each sample mean to 0
        featurewise_std_normalization=False,  # divide inputs by std of the dataset
        samplewise_std_normalization=False,  # divide each input by its std
        zca_whitening=False,  # apply ZCA whitening
        rotation_range=10,  # randomly rotate images in the range (degrees, 0 to 180)
        zoom_range = 0.1, # Randomly zoom image 
        width_shift_range=0.2,  # randomly shift images horizontally (fraction of total width)
        height_shift_range=0.2,  # randomly shift images vertically (fraction of total height)
        horizontal_flip=True,  # randomly flip images
        vertical_flip=False)  # randomly flip images


In [None]:
datagen_set1 = datagen.flow_from_directory(
        './train/',
        target_size=(150, 150),
        batch_size=32,
        classes = ['Decorationandsignage'],
        save_to_dir = './images/Decorationandsignage',
        save_prefix = 'class_ds',
        save_format = 'jpg')

Found 743 images belonging to 1 classes.


In [None]:
for i in range(160):
    image, label = next(datagen_set1)

In [None]:
cd ./images/Decorationandsignage 

/content/dataset/images/Decorationandsignage


In [None]:
ls |wc

   4970    4970  122979


In [None]:
cd ../../

/content/dataset


In [None]:
pwd

'/content/dataset'

In [None]:
datagen_set2 = datagen.flow_from_directory(
        './train/',
        target_size=(150, 150),
        batch_size=32,
        classes = ['misc'],
        save_to_dir = './images/misc',
        save_prefix = 'class_misc',
        save_format = 'jpg')

Found 1271 images belonging to 1 classes.


In [None]:
for i in range(160):
    image, label = next(datagen_set2)

In [None]:
datagen_set3 = datagen.flow_from_directory(
        './train/',
        target_size=(150, 150),
        batch_size=32,
        classes = ['Attire'],
        save_to_dir = './images/Attire',
        save_prefix = 'class_attire',
        save_format = 'jpg')

Found 1691 images belonging to 1 classes.


In [None]:
for i in range(160):
    image, label = next(datagen_set3)

In [None]:
datagen_set4 = datagen.flow_from_directory(
        './train/',
        target_size=(150, 150),
        batch_size=32,
        classes = ['Food'],
        save_to_dir = './images/Food',
        save_prefix = 'class_Food',
        save_format = 'jpg')

Found 2278 images belonging to 1 classes.


In [None]:
for i in range(160):
    image, label = next(datagen_set4)

In [None]:
cd images/Food

/content/dataset/images/Food


In [None]:
ls|wc

   5068    5068  138876


In [None]:
cd ../../

/content/dataset


In [None]:
pwd

'/content/dataset'

**Declaring the Imagegenerator with a validation split of 20% and data augmentation**

In [None]:
datagen1=tf.keras.preprocessing.image.ImageDataGenerator(rescale=1./255,validation_split = 0.2)

In [None]:
pwd

'/content/dataset'

**Creating train and validation generator**

In [None]:
#train_iter=datagen1.flow_from_directory(
#directory="./images/",
#class_mode="categorical",
#batch_size=50,
#shuffle = True,seed = 42)

train_iter = datagen1.flow_from_directory(
    "./images/", 
    subset='training',
    class_mode="categorical",
    batch_size=50,
    target_size=(200, 200),
    shuffle = True)



Found 16183 images belonging to 4 classes.


In [None]:
valid_iter=datagen1.flow_from_directory(
"./images/",
subset='validation',
batch_size =50,
class_mode="categorical",
target_size=(200, 200),
shuffle = False)


Found 4044 images belonging to 4 classes.


In [None]:
test_datagen=tf.keras.preprocessing.image.ImageDataGenerator(rescale=1./255.)

In [None]:
test_generator = test_datagen.flow_from_dataframe(dataframe=test,
                                                  directory='./test/',
                                                  x_col='Image',
                                                  y_col=None,
                                                  has_ext=True,
                                                  target_size=(200, 200),
                                                  class_mode=None,
                                                  batch_size=1,
                                                  shuffle=False)

Found 3219 validated image filenames.


Defining Step Size for Train and Validation sets

In [None]:
STEP_SIZE_TRAIN=train_iter.n//train_iter.batch_size
STEP_SIZE_VALID=valid_iter.n//valid_iter.batch_size


In [None]:
cd ../

/content


Importing Xception for Transfer Learning

In [None]:
from tensorflow.keras.applications import InceptionResNetV2, Xception 
conv_base = Xception(include_top=False, input_shape=(200,200,3))

In [None]:
model = tf.keras.Sequential()
model.add(conv_base)
model.add(layers.Flatten())
#model.add(layers.BatchNormalization())
model.add(layers.Dense(256, activation='relu', 
                kernel_initializer=tf.keras.initializers.he_uniform(seed=None)))
     #           kernel_regularizer=tf.keras.regularizers.l1(0.01)))  
model.add(layers.Dropout(0.5))
model.add(layers.Dense(4, activation='softmax'))

***Compiling the model***

In [None]:
model.compile(loss=tf.keras.losses.CategoricalCrossentropy(label_smoothing=0.2),
              optimizer=tf.keras.optimizers.RMSprop(lr=0.0001,rho=0.9, epsilon=None, decay=0.001),
              #optimizer = optimizers.Adam(lr=0.0001, beta_1=0.9, beta_2=0.999, amsgrad=False),
              metrics=['acc'])

In [None]:
pwd

'/content'

In [None]:
cd dataset

/content/dataset


**Fitting the model on Train & Validation set**

In [None]:
model.fit_generator(generator=train_iter,
                    steps_per_epoch=STEP_SIZE_TRAIN,
                    validation_data=valid_iter,
                    validation_steps=STEP_SIZE_VALID,
                    epochs=7
)

Epoch 1/7
Epoch 2/7
Epoch 3/7
Epoch 4/7
Epoch 5/7
Epoch 6/7
Epoch 7/7


<tensorflow.python.keras.callbacks.History at 0x7f537ec60278>

Saving the model

In [None]:
model.save("model_final_gala.h5")

In [None]:
model.save_weights("model_final.h5")

In [None]:
ls

[0m[01;34mimages[0m/  model_final.h5  model.h5  [01;34mtest[0m/  test.csv  [01;34mtrain[0m/  train.csv


**Prediction on Test Images**

In [None]:
STEP_SIZE_TEST=test_generator.n//test_generator.batch_size

In [None]:
test_generator.reset()

In [None]:
pred=model.predict_generator(test_generator,
steps=STEP_SIZE_TEST,
verbose=1)

Instructions for updating:
Please use Model.predict, which supports generators.


In [None]:
predicted_class_indices=np.argmax(pred,axis=1)

In [None]:
labels = (train_iter.class_indices)
labels = dict((v,k) for k,v in labels.items())
predictions = [labels[k] for k in predicted_class_indices]

In [None]:
filenames=test_generator.filenames
results2=pd.DataFrame({"Image":filenames,
                      "Class":predictions})

In [None]:
results2.Class.value_counts()

Food                    1212
Attire                   965
misc                     564
Decorationandsignage     478
Name: Class, dtype: int64

In [None]:
results2.to_csv("submissions8.csv",index=False)

In [None]:
from google.colab import files

In [None]:
files.download('submissions8.csv')