## Candidate Generator

The query of interest is: **The highlighted boxes depict a person biking on a street** 

Each candidate is a `(Bbox,Bbox)` tuple. Each _candidate hash_ is a string of the format "[set_name]:[image_idx]:[person_idx]:[bike_idx]" such that it uniquely maps to a candidate. Cirrently, there is no way to go from a `(Bbox,Bbox)` tuple to a candidate hash (but this should not be hard to implement). 

In [14]:
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.patches import Rectangle
%matplotlib inline

import os
import sys
sys.path.append('/dfs/scratch0/paroma/coco')
sys.path.append('/dfs/scratch0/paroma/coco/PythonAPI/')

from pycocotools.coco import COCO
import skimage.io as io
import pylab
import cv2

import pandas as pd
from snorkel.contrib.babble.image import BBox

## Load Annotations
Load the train and validation annotations for this task.

In [8]:
anns_folder = '/dfs/scratch0/paroma/coco/annotations/'
train_anns = np.load(anns_folder+'train_anns.npy').tolist()
val_anns = np.load(anns_folder+'val_anns.npy').tolist()

## New Task Instructions
For each image, we now want to:
* Create separate lists of box indices that are person and that are bike
* Create list of tuples of "person,bike"
* Iterate through pairs to check which ones overlap, only save those

Re-calculate the number of candidate pairs and the class balance.

In [9]:
def get_person_bike_boxes(anns):
    person_indices = []
    bike_indices = []
    
    for i in xrange(len(anns)):
        if anns[i]['category_id'] == 1:
            person_indices.append(i)
        if anns[i]['category_id'] == 2:
            bike_indices.append(i)
    
    return person_indices, bike_indices

In [10]:
def overlap(box1, box2):
    if (box1[0]+box1[2]<box2[0] or box2[0]+box2[2]<box1[0] or box1[1]+box1[3]<box2[1] or box2[1]+box2[3]<box1[1]):
        return False
    else:
        return True

In [11]:
def get_valid_pairs(anns, tuples):
    valid_tuples = []
    for person,bike in tuples:
        person_box = anns[person]['bbox']
        bike_box = anns[bike]['bbox']
        
        if overlap(person_box, bike_box):
            valid_tuples.append((person,bike))
    
    return valid_tuples

### Generate Candidate Hash and Dictionaries
One for Gold Labels and One for BBox Objects

In [15]:
def create_bbox_candidates(set_name, anns, img_idx,pidx,bidx):
    p_bbox = BBox(anns[pidx], img_idx)
    b_bbox = BBox(anns[bidx], img_idx)
    return (p_bbox, b_bbox)
    

In [33]:
def create_candidate_dict(candidate_dict,set_name):
    if set_name == 'val':
        anns = val_anns
    elif set_name == 'train':
        anns = train_anns

    for i in xrange(len(anns)):

        #Find all valid person-bike pairs for given object
        person_indices, bike_indices = get_person_bike_boxes(anns[i])
        person_bike_tuples = [(x,y) for x in person_indices for y in bike_indices]
        valid_pairs = get_valid_pairs(anns[i], person_bike_tuples)

        #Generate candidates for each valid pair
        for j in xrange(len(valid_pairs)):
            candidate_hash = set_name+':%d:%d:%d'%(i,valid_pairs[j][0],valid_pairs[j][1])
            candidate_dict[candidate_hash] = create_bbox_candidates(set_name, anns[i], i, valid_pairs[j][0], valid_pairs[j][1])
            
    
    return candidate_dict

In [34]:
candidate_dict = create_candidate_dict({},'val')
print 'Candidates in Validation Set: ', len(candidate_dict)
candidate_dict = create_candidate_dict(candidate_dict,'train')
print 'Total Candidates: ', len(candidate_dict)

Candidates in Validation Set:  1037
Total Candidates:  3443
