<a href="https://colab.research.google.com/github/sandeepsaha/SparkRepo/blob/master/Pascal_VOC_data_extraction_notebook_object_localization_v2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Single object localization: Explore PASCAL-VOC dataset

## Table of Contents

1. [Downloading the dataset](#download)
2. [Unzip the dataset](#unzip)
3. [Explore PASCAL VOC 2007 dataset ](#import)
4. [Functions for drawing bounding boxes and displaying images](#nm)
5. [Convert the list of largest bounding box coordinates into a csv](#build)

## 1. Downloading the PASCAL-VOC dataset  <a id='download'>

In [0]:
!wget -q https://www.dropbox.com/s/63j3myz3hobc8cu/PASCAL_VOC.zip

In [0]:
!wget -q http://pjreddie.com/media/files/VOCtrainval_06-Nov-2007.tar

In [0]:
!wget -q http://pjreddie.com/media/files/VOCtest_06-Nov-2007.tar

## 2. Unzip the dataset   <a id='unzip'>

In [0]:
!unzip PASCAL_VOC.zip > /dev/null; echo " done."

In [0]:
!tar -xvf VOCtrainval_06-Nov-2007.tar > /dev/null; echo " done."

In [0]:
!tar -xvf VOCtest_06-Nov-2007.tar > /dev/null; echo " done."

In [0]:
!ls -l

In [0]:
mv -t PASCAL_VOC VOCdevkit

In [0]:
# Explore the file structure

!ls -R | grep ":$" | sed -e 's/:$//' -e 's/[^-][^\/]*\//--/g' -e 's/^/   /' -e 's/-/|/'

In [0]:
#Import packages

import matplotlib
import tensorflow as tf
from matplotlib import patheffects

## Explore PASCAL-VOC 2007 dataset

In [0]:
# find directories in PASCAL_VOC and combines paths

import pathlib
PATH = pathlib.Path('PASCAL_VOC')
list(PATH.iterdir())

In [0]:
# Load train json files 

import json

trn_j = json.load((PATH/'pascal_train2007.json').open())
trn_j.keys()

In [0]:
# Let's assign alias to 'images','annotations', 'categories'

IMAGES,ANNOTATIONS,CATEGORIES = ['images', 'annotations', 'categories']

#### What does the json file contain?

The loaded json data file has the keys **images,type,annoations and categories.** 

The ''images'' key value contains a list of **key:value pairs**, each of which contain:

* filename of the image
* height of the image
* id of the image 
* width of the image

In [0]:
# Details of 'images' key

trn_j[IMAGES][:5]

In [0]:
from google.colab import drive
drive.mount('/content/drive')

In [0]:
# Details of 'annotations' key

trn_j[ANNOTATIONS][:5]

**Observation:** The annotations key consists of rectangular bounding box coordinates as 'bbox' and also has its category mentioned as 'category_id'

In [0]:
# Details of 'categories' key

trn_j[CATEGORIES][:]

**Observation:** PASCAL VOC 2007 dataset consists of 20 categories 

In [0]:
#List categories in cats dictionary variable with ID and corresponding category name
#List image file names in trn_fns dictionary variable with filename id
#Extract filename ids in trn_ids list variable
 
FILE_NAME,ID,IMG_ID,CAT_ID,BBOX = 'file_name','id','image_id','category_id','bbox'

cats = dict((o[ID], o['name']) for o in trn_j[CATEGORIES])
trn_fns = dict((o[ID], o[FILE_NAME]) for o in trn_j[IMAGES])
trn_ids = [o[ID] for o in trn_j[IMAGES]]

In [0]:
print(cats)
print(trn_fns)
print(trn_ids[:5])

In [0]:
#Let's find out where the JPEG Imges are
list((PATH/'VOCdevkit'/'VOC2007').iterdir())

In [0]:
!ls -l PASCAL_VOC/VOCdevkit/VOC2007/Annotations

In [0]:
!cat PASCAL_VOC/VOCdevkit/VOC2007/Annotations/000001.xml

In [0]:
#Assign JPEG folder path to JPEGS
JPEGS = 'VOCdevkit/VOC2007/JPEGImages'

In [0]:
#Let's have a look at the images in the folder
IMG_PATH = PATH/JPEGS
list(IMG_PATH.iterdir())[:5]

In [0]:
#Extracting id of image
im0_d = trn_j[IMAGES][10]
im0_d[FILE_NAME],im0_d[ID]

In [0]:
# Assign bounding box coordinates annotations of images to trn_anno
# The coordinates are defined as top left (x1,y1) corrdinate and bottom right (x2,y2) coordinate instead of (x,y,w,h)

import collections
import numpy as np 

trn_anno = collections.defaultdict(lambda:[])
for o in trn_j[ANNOTATIONS]:
    if not o['ignore']:
        bb = o[BBOX]
        bb = np.array([bb[1], bb[0], bb[3]+bb[1]-1, bb[2]+bb[0]-1])
        trn_anno[o[IMG_ID]].append((bb,o[CAT_ID]))
        
trn_anno

**Observation:** Some images have multiple objects which is why there are multiple arrays representing each of them.

In [0]:
#
im_a = trn_anno[im0_d[ID]]; im_a

In [0]:
im0_a = im_a[0]; im0_a

In [0]:
cats[8]

## Functions for drawing bounding boxes and displaying images

In [0]:
# Function for Displaying image 
def show_img(im, figsize=None, ax=None):
    if not ax: fig,ax = matplotlib.pyplot.subplots(figsize=figsize)
    ax.imshow(im)
    ax.get_xaxis().set_visible(False)
    ax.get_yaxis().set_visible(False)
    return ax

In [0]:
# Function for drawing outline on image
def draw_outline(o, lw):
    o.set_path_effects([patheffects.Stroke(
        linewidth=lw, foreground='black'), patheffects.Normal()])
    


In [0]:
# Function for drawing bounding box
def draw_rect(ax, b):
    patch = ax.add_patch(matplotlib.patches.Rectangle(b[:2], *b[-2:], fill=False, edgecolor='white', lw=2))
    draw_outline(patch, 4)

In [0]:
# Function writing text over bounding box
def draw_text(ax, xy, txt, sz=14):
    text = ax.text(*xy, txt,
        verticalalignment='top', color='white', fontsize=sz, weight='bold')
    draw_outline(text, 1)


In [0]:
#Function for fixing height and width of the bounding box
def bb_hw(a): return np.array([a[1],a[0],a[3]-a[1],a[2]-a[0]])

In [0]:
#Function for displaying image

def draw_im(im, ann):
    ax = show_img(im, figsize=(16,8))
    for b,c in ann: 
        b = bb_hw(b)
        draw_rect(ax, b)
        draw_text(ax, b[:2], cats[c], sz=16)

In [0]:
# returns bounding box values and the object class for image with id i
def draw_idx(i):
    im_a = trn_anno[i]
    print(im_a)
    im = tf.keras.preprocessing.image.load_img(IMG_PATH/trn_fns[i])
    print(im.size)
    draw_im(im, im_a)

In [0]:
# This functions draws bounding box over objects in an image
draw_idx(133)

In [0]:
# The function returns the largest bounding box in a given image using area
def get_lrg(b):
    
    b = sorted(b, key=lambda x: np.prod(x[0][-2:]-x[0][:2]), reverse=True)
    return b

In [0]:
#Assign the largest image bounding box coordinates to trn_lrg_anno
trn_lrg_anno = {a: get_lrg(b) for a,b in trn_anno.items()}

In [0]:
# We now try to draw the largest bounding box in each of the images given image_id
image_id = 133
b,c = trn_lrg_anno[image_id][0]       #[0] since sorting will create largest value at 0th position

b = bb_hw(b)

ax = show_img(tf.keras.preprocessing.image.load_img(IMG_PATH/trn_fns[image_id]), figsize=(5,10))
draw_rect(ax, b)
draw_text(ax, b[:2], cats[c], sz=16)

## Convert the list of largest bounding box coordinates into a csv

In [0]:
(PATH/'tmp').mkdir(exist_ok=True)
CSV = PATH/'tmp/lrg.csv'


In [0]:
# 1.category id, bounding box coordinates(keep it as one column or four columns)
import pandas
df = pandas.DataFrame({'fn': [trn_fns[o] for o in trn_ids],
    'cat': [cats[trn_lrg_anno[o][0][1]] for o in trn_ids],
    'cat_id':[trn_lrg_anno[o][0][1] for o in trn_ids],
    'bb_coords':[(trn_lrg_anno[o][0][0]) for o in trn_ids] },columns=['fn','cat','cat_id','bb_coords'])
df.to_csv(CSV, index=False)


In [0]:
df.head()

In [0]:
df.to_pickle(CSV)

In [0]:
CSV

In [0]:
!ls -l PASCAL_VOC/tmp