
# **Notebook and API set up** 

In [None]:
!pip install kaggle # installing kaggle

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


In [None]:
# importing files
from google.colab import files

In [None]:
# uploading a json file download from Kaggel
files.upload()

Saving kaggle.json to kaggle.json


{'kaggle.json': b'{"username":"susaiignesh","key":"4e82e7c97143bc0c1269429e476e9692"}'}

In [None]:
# this will create floder for us to save a json file
!mkdir ~/.kaggle

In [None]:
# cp command actully used to copy the file into the new folder that we're going to create

!cp /content/kaggle.json ~/.kaggle/

In [None]:
# finaly we will call the chmod 600 to allow us to read and wirte to the file
!chmod 600 ~/.kaggle/kaggle.json

In [None]:
# for download the datasets 
#!kaggle competitions download -c (dataset name or path)

!kaggle competitions download -c imaterialist-challenge-fashion-2018

Downloading imaterialist-challenge-fashion-2018.zip to /content
 92% 27.0M/29.4M [00:00<00:00, 40.0MB/s]
100% 29.4M/29.4M [00:00<00:00, 38.7MB/s]


In [None]:
!unzip imaterialist-challenge-fashion-2018.zip # unziping the dataset

Archive:  imaterialist-challenge-fashion-2018.zip
  inflating: sample_submission.csv.zip  
  inflating: test.json.zip           
  inflating: train.json.zip          
  inflating: validation.json.zip     


In [None]:
## unziping the dataset
!unzip test.json.zip
!unzip train.json.zip
!unzip validation.json.zip 
!unzip sample_submission.csv.zip

Archive:  test.json.zip
replace test.json? [y]es, [n]o, [A]ll, [N]one, [r]ename: A
  inflating: test.json               
Archive:  train.json.zip
  inflating: train.json              
Archive:  validation.json.zip
  inflating: validation.json         
Archive:  sample_submission.csv.zip
  inflating: sample_submission.csv   



![](https://theknclan.com/wp-content/uploads/2017/10/635980679147435890-488367249_FashionHeader.png)

# Extensive EDA of iMaterialist (Fashion) Dataset with Object Detection and Color Analysis

This notebook contains the exploration of iMaterialist Challenge (Fashion) at FGVC5 [dataset](https://www.kaggle.com/c/imaterialist-challenge-fashion-2018)

About the iMaterialist (Fashion) Competition - 

As shoppers move online, it would be a dream come true to have products in photos classified automatically. But, automatic product recognition is tough because for the same product, a picture can be taken in different lighting, angles, backgrounds, and levels of occlusion. Meanwhile different fine-grained categories may look very similar, for example, royal blue vs turquoise in color. Many of today’s general-purpose recognition machines simply cannot perceive such subtle differences between photos, yet these differences could be important for shopping decisions.

Tackling issues like this is why the Conference on Computer Vision and Pattern Recognition (CVPR) has put together a workshop specifically for data scientists focused on fine-grained visual categorization called the FGVC5 workshop. As part of this workshop, CVPR is partnering with Google, Wish, and Malong Technologies to challenge the data science community to help push the state of the art in automatic image classification.

In this competition, FGVC workshop organizers with Wish and Malong Technologies challenge you to develop algorithms that will help with an important step towards automatic product detection – to accurately assign attribute labels for fashion images. Individuals/Teams with top submissions will be invited to present their work live at the FGVC5 workshop.  




**Contents**

**1. Descriptive Statistics**   
&nbsp;&nbsp;&nbsp;&nbsp;  1.1 Counts of Images and Labels  
&nbsp;&nbsp;&nbsp;&nbsp;     1.2 Top Labels in the dataset  
&nbsp;&nbsp;&nbsp;&nbsp;     1.3 Most Common Co-occuring Labels  
&nbsp;&nbsp;&nbsp;&nbsp;     1.4 Images with maxium Labels  
&nbsp;&nbsp;&nbsp;&nbsp;     1.5 Images with single Label  
&nbsp;&nbsp;&nbsp;&nbsp;     1.6 Freq Dist of Images in different label count buckets  
**2. Colors Used in the Images**     
&nbsp;&nbsp;&nbsp;&nbsp;     2.1 Top Average Color of the images  
&nbsp;&nbsp;&nbsp;&nbsp;     2.2 Dominant Colors present in the images  
&nbsp;&nbsp;&nbsp;&nbsp;     2.3 Common Color Palletes    
**3. Object Detection**  
&nbsp;&nbsp;&nbsp;&nbsp;     3.1 Top Colors Detected in the images  
&nbsp;&nbsp;&nbsp;&nbsp;     3.2 Top Objects Detected in the images  

## **Dataset Preparation** 

In [None]:
from IPython.core.display import HTML #for displaying HTML content inside jupyter notebook
from IPython.display import Image #for displaying an image with various fromat
from collections import Counter #for counting the occurance of each element
import pandas as pd #for data manupulation and analysis
import json #for encoding and decoding Json data

#importing the necessary library for plotting purpose
from plotly.offline import init_notebook_mode, iplot
import matplotlib.pyplot as plt 
import plotly.graph_objs as go
from wordcloud import WordCloud
from plotly import tools
import seaborn as sns
from PIL import Image

import tensorflow as tf #for training neural network model
import numpy as np

init_notebook_mode(connected=True) #initialize plotly's notebook
%matplotlib inline #enables inline rendering of matplotlib plots.

In [None]:
## read the dataset 

train_path = '../content/train.json' #path of the training dataset
test_path = '../content/test.json' #path of testing dataset
valid_path = '../content/validation.json' #path of validation set

train_inp = open(train_path).read() #opens the file in the training path and read the objects in string format
train_inp = json.loads(train_inp) #reading the json file into python object 

test_inp = open(test_path).read() #opens the file in the test path and read the objects in string format
test_inp = json.loads(test_inp) #reading the json file into python object 

valid_inp = open(valid_path).read() #opens the file in the validation path and read the objects in string format
valid_inp = json.loads(valid_inp) #reading the json file into python object 

## 1. Descriptive Statistics

## 1.1 How many Images and how many distinct labels are there in the dataset?

In [None]:
# how many images 
def get_stats(data):
    total_images = len(data['images']) #counting the number of images 

    all_annotations = [] #creating empty list 
    if 'annotations' in data:
        for each in data['annotations']:
            all_annotations.extend(each['labelId']) 
            #if the data has annotation in it the it iterates through the data and add the labelID of the input data to the empty list
    total_labels = len(set(all_annotations)) #counting the number of labelid's
    return total_images, total_labels, all_annotations

total_images, total_labels, train_annotations = get_stats(train_inp) #applying the above created function to the training data
print ("Total Images in the train:", total_images)
print ("Total Labels in the train:", total_labels)
print ("")

total_images, total_labels, test_annotations = get_stats(test_inp) #applying the above created function to the test data
print ("Total Images in the test:", total_images)
print ("Total Labels in the test:", total_labels)
print ("")

total_images, total_labels, valid_annotations = get_stats(valid_inp) #applying the above created function to the validation data
print ("Total Images in the valid:", total_images)
print ("Total Labels in the valid:", total_labels)

There are about 1 Million images provided in the train dataset and there are 228 distinct labels which are used to label these images. There are two other sources of data as well - test data and validation data but in thie notebook I have only used images from train dataset.

## 1.2 Which are the top used Labels in the dataset ?

In [None]:
train_labels = Counter(train_annotations) #counting the number of occurrences of each element

xvalues = list(train_labels.keys()) #Splitting the x column
yvalues = list(train_labels.values()) #splitting the y column

#the below code is used to create a horizontal bar chart trace that can be added to a Plotly figure to visualize data.
trace1 = go.Bar(x=xvalues, y=yvalues, opacity=0.8, name="year count", marker=dict(color='rgba(20, 20, 20, 1)'))

#setting thr layout for the avove created horizontal bar
layout = dict(width=800, title='Distribution of different labels in the train dataset', legend=dict(orientation="h"));

fig = go.Figure(data=[trace1], layout=layout);
iplot(fig); #displaying the horizontal bar

In [None]:
#repeating the above code for validation data
valid_labels = Counter(valid_annotations) #counting the number of occurances of each element

xvalues = list(valid_labels.keys()) #splitting X column
yvalues = list(valid_labels.values()) #splitting y column
#creating horizontal bar chart trace that can be added to a Plotly figure to visualize data
trace1 = go.Bar(x=xvalues, y=yvalues, opacity=0.8, name="year count", marker=dict(color='rgba(20, 20, 20, 1)'))
#setting the layout for the avove created horizontal bar
layout = dict(width=800, title='Distribution of different labels in the valid dataset', legend=dict(orientation="h"));

fig = go.Figure(data=[trace1], layout=layout);
iplot(fig); #displaying the horizontal bar

In [None]:
def get_images_for_labels(labellist, data): #creating a function for getting image labels
    image_ids = [] #creating an empty function
    for each in data['annotations']: #iterarting through data having annotations in it
        if all(x in each['labelId'] for x in labellist): 
            #checking if all the label IDs in labellist are present in the image's label ID list
            image_ids.append(each['imageId']) #if, so the image ID is added to the abive created empty list
            if len(image_ids) == 2: #if the image id has two elements then the loop breaks
                break
    image_urls = []
    for each in data['images']: #iterating to through the data having images
        if each['imageId'] in image_ids: #if image ID is in the list image_ids then it is appended to image_urls list
            image_urls.append(each['url'])
    return image_urls #returning the appended list

In [None]:
# most common labels 

temps = train_labels.most_common(10) #getting the top 10 most common labels
#creating two function with the top 10 most common labels
labels_tr = ["Label-"+str(x[0]) for x in temps]
values = [x[1] for x in temps]
#creating horizontal bar chart trace that can be added to a Plotly figure to visualize data
trace1 = go.Bar(x=labels_tr, y=values, opacity=0.7, name="year count", marker=dict(color='rgba(120, 120, 120, 0.8)'))
#setting the layout for the avove created horizontal bar
layout = dict(height=400, title='Top 10 Labels in the train dataset', legend=dict(orientation="h"));

fig = go.Figure(data=[trace1], layout=layout);
iplot(fig);#visualizig the plot

Label 66 is the most used label with almost 750K images tagged with this label in the training dataset

In [None]:
temps = valid_labels.most_common(10) #getting the top 10 most common labels
#creating two function with the top 10 most common labels
labels_vl = ["Label-"+str(x[0]) for x in temps]
values = [x[1] for x in temps]

#creating horizontal bar chart trace that can be added to a Plotly figure to visualize data
trace1 = go.Bar(x=labels_vl, y=values, opacity=0.7, name="year count", marker=dict(color='rgba(120, 120, 120, 0.8)'))
#setting the layout for the avove created horizontal bar
layout = dict(height=400, title='Top 10 Labels in the valid dataset', legend=dict(orientation="h"));

fig = go.Figure(data=[trace1], layout=layout);
iplot(fig); #visualizig the plot

Again, in the validation dataset, Label 66 is the most used label but second most label used is label-17 not label-105 of training dataset

## 1.3 What are the most Common Co-Occuring Labels in the dataset

Since every image can be classified into multiple labels, it will be interesting to note which lables have co-occured together

In [None]:
# Most Commonly Occuring Labels 

def cartesian_reduct(alist):
    results = []
    for x in alist:
        for y in alist: #iterating through the list twice
            if x == y: #if x and y are equal then skipping and contonuing to next iteration
                continue
            srtd = sorted([int(x),int(y)]) #two elements x and y are sorted
            srtd = " AND ".join([str(x) for x in srtd]) #x and y are joined function
            results.append(srtd) #appending the joined function to the previously created empty list
    return results 

co_occurance = []
for i, each in enumerate(train_inp['annotations']): #iterating through each iteration in train_inp
    #extracting the label IDs using the key labelId and passes them as an argument to the cartesian_reduct function
    prods = cartesian_reduct(each['labelId'])
    #which returns a list of all possible pairs of label IDs as strings joined by the string " AND
    #The list of pairs is then appended to the co_occurance list using the extend method, 
    #which adds all the elements of the given list to the end of the co_occurance list.
    co_occurance.extend(prods)

In [None]:
coocur = Counter(co_occurance).most_common(10)#getting the top 10 most common labels
#creating a list of the top 10 most common pairs of labels in the train_inp dataset in reverse order
#The label ID is obtained from the first element of each tuple in the coocur list using the index [0]
labels = list(reversed(["Label: "+str(x[0]) for x in coocur]))
#reversing the values in the coocur list 
values = list(reversed([x[1] for x in coocur]))

#creating horizontal bar chart trace that can be added to a Plotly figure to visualize data
trace1 = go.Bar(x=values, y=labels, opacity=0.7, orientation="h", name="year count", marker=dict(color='rgba(130, 130, 230, 0.8)'))
layout = dict(height=400, title='Most Common Co-Occuring Labels in the dataset', legend=dict(orientation="h"));

fig = go.Figure(data=[trace1], layout=layout);
iplot(fig);#visualizing the plot

From the above graph, (label 66 and label 105) and (label 66 and label 171) have been used most number of times while labelling the images, with the total count of 460K and 445K respectively. Apart from the most frequently occuring label "66", label 105 and label 153 have been used repeatedly in the dataset.

## 1.4 Which Images are tagged with Maximum Labels

Some images are labelled with single label but some images can have labels as high as 20. Lets get the images having the largest numbers of labels in the dataset

In [None]:
def get_image_url(imgid, data):
    for each in data['images']: #iterating through the images in data
        if each['imageId'] == imgid:
            #if the imageid of data is equal to the vaulues of list in ImageID then the url of data is returned
            return each['url']
        
#sorting the train_inp dataset based on the length of the label IDs in each annotation.
srtedlist = sorted(train_inp['annotations'], key=lambda d: len(d['labelId']), reverse=True)

In [None]:
for img in srtedlist[:5]: #iterating to the first 5 elements in stredlist
    #applying the above created function which iterates through images in data 
    #and returns the url of the image if the imageid in dataset is equal to the list imageid
    iurl = get_image_url(img['imageId'], train_inp) 
    #joining the imageid sepetrated with comma and storing it in the variable called labelpair
    labelpair = ", ".join(img['labelId'])
    #The imghtml variable is created using a formatted string that includes the labelpair and the length of the labelId list in the annotation, as well as the iurl. 
    imghtml = """Labels: """+ str(labelpair) +""" &nbsp;&nbsp; <b>Total Labels: """+ str(len(img['labelId'])) + """</b><br>""" + "<img src="+iurl+" width=200px; style='float:left'>"
    #displaying the HTML code for the imghtml variable
    display(HTML(imghtml))

## 1.5 Which Images have perfect label ie. a Single Label

Lets get some of the images which has only one label

In [None]:
# How many images are labelled with only 1 label 
for img in srtedlist[-5:]: #iterating to the first 5 elements in stredlist
    #applying the above created function which iterates through images in data 
    #and returns the url of the image if the imageid in dataset is equal to the list imageid
    iurl = get_image_url(img['imageId'], train_inp)
    #joining the imageid sepetrated with comma and storing it in the variable called labelpair
    labelpair = ", ".join(img['labelId'])
    #The imghtml variable is created using a formatted string that includes the labelpair and the length of the labelId list in the annotation, as well as the iurl.
    imghtml = """<b> Label: """+ str(labelpair) +"""</b><br>""" + "<img src="+iurl+" width=200px; height=200px; style='float:left'>"
    display(HTML(imghtml))
#The resulting HTML code displays an image with its associated labels for each of the bottom five annotations in the train_inp dataset based on the length of their labelId lists.

## 1.6 Frequency Distribution of Images with respective Labels Counts in the dataset

Lets visualize how many images are there in each label count bucket. 

In [None]:
#creating dictionary with keys equal to the unique lengths of labelId lists in srtedlist 
#and values equal to the frequency of those lengths.
lbldst = Counter([len(x['labelId']) for x in srtedlist])

labels = list(lbldst.keys()) #converting the keys of the dictionary into list
values = list(lbldst.values()) #converting the values of the dictionary into list

#plotting the unique length of labelId and values equal to the frequency of those lenghts
trace1 = go.Bar(x=labels, y=values, opacity=0.7, name="year count", marker=dict(color='rgba(10, 80, 190, 0.8)'))
layout = dict(height=400, title='Frequency distribution of images with respective labels counts ', legend=dict(orientation="h"));

fig = go.Figure(data=[trace1], layout=layout);
iplot(fig);

Most of the images in the dataset have 5 or 6 labels on an average. 

## 2. Colors Used in the Images 

In the e-commerce industry, colors play a very important role in the customer behaviours. Some people are more inclined towards soft colors while some prefer warm colors. In this section, lets visualize what type of colors are used in the images. 

## 2.1 Common Average Color of the Images 

In [None]:
import urllib
from io import StringIO

def compute_average_image_color(img):
    width, height = img.size #getting width and height of image
    count, r_total, g_total, b_total = 0, 0, 0, 0 #initializing the pixels count as zero
    #iterating through every pixel in the image using two nested function
    for x in range(0, width): 
        for y in range(0, height):
            r, g, b = img.getpixel((x,y)) #getting the pixels of each images using getpixel method
            #incrementing the red, green and blue with their respective pixel value obtained from the above getpixel method
            r_total += r 
            g_total += g
            b_total += b
            count += 1
    return (r_total/count, g_total/count, b_total/count) #returning the average of each colors

In [None]:
import os 
imgpath = '../input/sampleimages/top_images/top_images/'
read_from_disk = True

#If read_from_disk is True, the code uses the os module to list all files in the imgpath directory using the os.listdir() function. 
#The resulting list of filenames is assigned to a variable called srtedlist.
#If read_from_disk is False, the code assumes that there is a dictionary object called inp that contains an 'annotations' key, 
#and it sorts the list of annotations by the length of the 'labelId' field in descending order 
#using a lambda function and the sorted() function.The resulting sorted list is assigned to srtedlist.

if read_from_disk:
    srtedlist = os.listdir(imgpath)
else:
    srtedlist = sorted(inp['annotations'], key=lambda d: len(d['labelId']), reverse=True)

In [None]:
average_colors = {}
for img in srtedlist[:10]:
    if read_from_disk:
        img = Image.open(imgpath + img)
    else:
        iurli = get_image_url(img['imageId'])

        ## download the images 
        # filename = iurli.split("/")[-1].split("-large")[0]
        # urllib.urlretrieve(iurli, "top_images/"+filename)
        
        file = cStringIO.StringIO(urllib.urlopen(iurli).read())
        img = Image.open(img)
           
    average_color = compute_average_image_color(img) #computing the average of the downloaded image
    
    #If the resulting average color tuple is not already in the average_colors dictionary, the code adds it with a frequency of 0.
    #The code then increments the frequency of the average color tuple by 1.
    
    if average_color not in average_colors:
        average_colors[average_color] = 0
    average_colors[average_color] += 1

In [None]:
# visualizing the frequency and distribution of average colors in a dataset of images, 
#specifically by displaying a color swatch for each unique average color tuple present in the average_colors dictionary.

for average_color in average_colors:
    average_color1 = (int(average_color[0]),int(average_color[1]),int(average_color[2]))
    image_url = "<span style='display:inline-block; min-width:200px; background-color:rgb"+str(average_color1)+";padding:10px 10px;'>"+str(average_color1)+"</span>"
#     print (image_url)
    display(HTML(image_url))

## 2.2 Most Dominant Colors Used in the Images 

In [None]:
## top used colors in images 
from colorthief import ColorThief
import urllib 

pallets = []
for img in srtedlist[:10]: #loops over the first 10 images in the srtedlist
    
    if read_from_disk:
        img = imgpath + img
    else:
        iurli = get_image_url(img['imageId'])

        ## download the images 
        # filename = iurli.split("/")[-1].split("-large")[0]
        # urllib.urlretrieve(iurli, "top_images/"+filename)
        
        file = cStringIO.StringIO(urllib.urlopen(iurli).read())
        img = Image.open(img)
        
    #if the images are being read from disk or from a URL, and opens the image file using ColorThief method
    color_thief = ColorThief(img)
    
    #extracting the dominant color of the image using the get_color() method, with a quality value of 1.
    dominant_color = color_thief.get_color(quality=1)
    
    #An HTML code string is generated that contains a color swatch with a background color equal to the dominant color of the image.
    image_url = "<span style='display:inline-block; min-width:200px; background-color:rgb"+str(dominant_color)+";padding:10px 10px;'>"+str(dominant_color)+"</span>"
    #displaying the HTML code
    display(HTML(image_url))
    
    #extracting a color palette of 6 colors from the image, and the resulting palette is appended to the pallets list.
    palette = color_thief.get_palette(color_count=6)
    pallets.append(palette)


## 2.3 Common Color Pallets of the Images

In [None]:
#taking the list of palettes generated in the previous code block and generates an HTML string for each palette. 
#For each palette, a string of HTML code is generated that displays a series of color swatches, with one swatch for each color in the palette

for pallet in pallets:
    img_url = ""
    for pall in pallet:
        img_url += "<span style='background-color:rgb"+str(pall)+";padding:20px 10px;'>"+str(pall)+"</span>"
    img_url += "<br>"
    display(HTML(img_url))
    print 
    

## 3. Object Detection using TensorFlow API 


I have used tensorflow API for object detection the code is given in the following cell.



In [None]:
### UNCOMMENT THE FOLLOWING LINE AFTER DOWNLOADING THE UTILS FROM THIS LINK - https://github.com/tensorflow/models/tree/master/research/object_detection/utils

# from utils import label_map_util

def DOWNLOAD_MODELS():
    MODEL_NAME = 'ssd_mobilenet_v1_coco_2017_11_17' #defining the name of the path
    MODEL_FILE = MODEL_NAME + '.tar.gz' #defining the file format
    DOWNLOAD_BASE = 'http://download.tensorflow.org/models/object_detection/' #defining the download base url
    PATH_TO_CKPT = MODEL_NAME + '/frozen_inference_graph.pb' #defining the actual model file that is used for interface
    PATH_TO_LABELS = os.path.join('data', 'mscoco_label_map.pbtxt') #defining the path to the label map file
    
    #now downloading and extracting the pre-defined object detection model 'mobilenet-v1'
    opener = urllib.request.URLopener()
    opener.retrieve(DOWNLOAD_BASE + MODEL_FILE, MODEL_FILE)
    tar_file = tarfile.open(MODEL_FILE)
    for file in tar_file.getmembers():
        file_name = os.path.basename(file.name)
        if 'frozen_inference_graph.pb' in file_name:
            tar_file.extract(file, os.getcwd())

def detect_object(filename):
    
    #creating an inner function tht converts image to numpy array
    def img2array(img):
        (img_width, img_height) = img.size
        return np.array(img.getdata()).reshape((img_width, img_height, 3)).astype(np.uint8) #converting image to numpy array
    
    #Define 'categories' and 'probabilities' lists to store the detected object categories and their corresponding detection probabilities.
    categories, probabilities = [], []
    
    #Loading the frozen inference graph of the model and create a TensorFlow session.
    #Loading the label map and create a category index.
    #Get the input and output tensors of the model graph.
    
    PATH_TO_CKPT = 'frozen_inference_graph.pb'
    PATH_TO_LABELS = 'mscoco_label_map.pbtxt'
    detection_graph = tf.Graph()
    
    with detection_graph.as_default():
        od_graph_def = tf.GraphDef()
        with tf.gfile.GFile(PATH_TO_CKPT, 'rb') as fid:
            serialized_graph = fid.read()
            od_graph_def.ParseFromString(serialized_graph)
            tf.import_graph_def(od_graph_def, name='')


    label_map = label_map_util.load_labelmap(PATH_TO_LABELS)
    categories = label_map_util.convert_label_map_to_categories(label_map, max_num_classes=100, use_display_name=True)
    category_index = label_map_util.create_category_index(categories)

    with detection_graph.as_default():
        with tf.Session(graph=detection_graph) as sess:
            image_tensor = detection_graph.get_tensor_by_name('image_tensor:0')
            detection_boxes = detection_graph.get_tensor_by_name('detection_boxes:0')
            detection_scores = detection_graph.get_tensor_by_name('detection_scores:0')
            detection_classes = detection_graph.get_tensor_by_name('detection_classes:0')
            num_detections = detection_graph.get_tensor_by_name('num_detections:0')
            
            #Loading the input image, converting it to a numpy array, and expand its dimensions to match the input tensor shape of the model.
            #Running the inference session with the input image and get the output tensors.
            #Iterating over the detected objects and their corresponding scores, and append their categories and probabilities to the respective lists if the score is greater than 0.1.
            
            image = Image.open(filename)
            image_np = img2array(image)
            image_np_expanded = np.expand_dims(image_np, axis=0)
            (boxes, scores, classes, num) = sess.run([detection_boxes, detection_scores, detection_classes, num_detections], feed_dict={image_tensor: image_np_expanded})
            for index,value in enumerate(classes[0]):
                if float(scores[0,index]) > 0.1:
                    temp =  category_index.get(value)['name']
                    if temp not in categories:
                        categories.append(temp)
                        probabilities.append(scores[0,index])
    return categories, probabilities


In [None]:
## UNCOMMENT THE FOLLOWING LINES TO RUN THE OBJECT DETECTION MODEL AND SAVE THE RESULTS 

# for img in srtedlist[:10]:
#     iurli = get_image_url(img['imageId'])
    
#     file = cStringIO.StringIO(urllib.urlopen(iurli).read())
#     objects = detect_object(file)

- Reference: [TensorFlow Object Detection Notebook](https://github.com/tensorflow/models/blob/master/research/object_detection/object_detection_tutorial.ipynb)  
- Pre-Trained Models Reference: [PreTrained Models](https://github.com/tensorflow/models/tree/676a4f70c20020ed41b533e0c331f115eeffe9a3/research/object_detection)  
- Link to download the Utils: https://github.com/tensorflow/models/tree/master/research/object_detection/utils

Since it would have taken a lot of time on kaggle kernals, I have pre-computed the objects in my local machine.

In [None]:
objpath = '../input/precomputedobjects/objects.txt'

objs = open(objpath).read().strip().split("\n")
colors = [_ for _ in objs if "color" in _] #iterating over each element in object  and checks if the string color is present in the object
non_colors = [_ for _ in objs if "color" not in _] #iterating over each element in object  and checks if the string color is not present in the object

## 3.1 Top Objects detected using Object detection 

In [None]:
#creating a word cloud visualization
#Here  the 'WordCloud' class from the 'wordcloud' package is used to generate a word cloud visualization 
#based on the text in 'txt'. The word cloud is set to have a maximum font size of 50, a width of 600 pixels, 
#and a height of 300 pixels, and is generated using the 'generate' method of the 'WordCloud' class.
txt = ""
for i, color in enumerate(Counter(non_colors).most_common(100)):
    txt += color[0]+" "
wordcloud = WordCloud(max_font_size=50, width=600, height=300).generate(txt)
plt.figure(figsize=(15,8))
plt.imshow(wordcloud)
plt.title("Top Objects Detected in the images", fontsize=15)
plt.axis("off")
plt.show() 

## 3.2 Top Color Detected in the images

In [None]:
#creates a word cloud visualization for the most common color names in the 'colors' list.
txt = ""
for i, color in enumerate(Counter(colors).most_common(100)):
    txt += (color[0] + " ")
txt = txt.replace("color", " ")
wordcloud = WordCloud(max_font_size=50, width=600, height=300, background_color='white').generate(txt)
plt.figure(figsize=(15,8))
plt.imshow(wordcloud)
plt.title("Top Colors Used in the images", fontsize=15)
plt.axis("off")
plt.show() 