# Photo Triage
In this project we will try to automate the preference between two similar photos based on human preference ground truths. A detailed explaination of the problem is explained in the report

### 1. Setup

* First, set up Python, `numpy`, and `matplotlib`.

In [1]:
# set up Python environment: numpy for numerical routines, and matplotlib for plotting
import numpy as np
import matplotlib.pyplot as plt
import glob2 
import string
# display plots in this notebook
%matplotlib inline

# set display defaults
plt.rcParams['figure.figsize'] = (10, 10)        # large images
plt.rcParams['image.interpolation'] = 'nearest'  # don't interpolate: show square pixels
plt.rcParams['image.cmap'] = 'gray'  # use grayscale output rather than a (potentially misleading) color heatmap

* Load `caffe`.

In [2]:
# The caffe module needs to be on the Python path;
#  we'll add it here explicitly.
import sys
caffe_root = '../'  # this file should be run from {caffe_root}/examples (otherwise change this line)
sys.path.insert(0, caffe_root + 'python')

import caffe
# If you get "No module named _caffe", either you have not built pycaffe or you have the wrong path.

* If needed, download the reference model ("CaffeNet", a variant of AlexNet).

In [3]:
import os
if os.path.isfile(caffe_root + 'models/vggsms/vgg_siamese.caffemodel'):
    print 'CaffeNet found.'
else:
    print 'Downloading pre-trained CaffeNet model...'
    !../scripts/download_model_binary.py ../models/vggsms

CaffeNet found.


### 2. Load net and set up input preprocessing

* Set Caffe to CPU mode and load the net from disk.
* This part loads the pretrained vgg_siamese model, which is used as a feature extractor from the image

In [4]:
caffe.set_mode_cpu()

model_def = caffe_root + 'models/vggsms/deploy.prototxt'
model_weights = caffe_root + 'models/vggsms/vgg_siamese.caffemodel'

net = caffe.Net(model_def,      # defines the structure of the model
                model_weights,  # contains the trained weights
                caffe.TEST)     # use test mode (e.g., don't perform dropout)

* Set up input preprocessing. (We'll use Caffe's `caffe.io.Transformer` to do this, but this step is independent of other parts of Caffe, so any custom preprocessing code may be used).

    Our default CaffeNet is configured to take images in BGR format. Values are expected to start in the range [0, 255] and then have the mean ImageNet pixel value subtracted from them. In addition, the channel dimension is expected as the first (_outermost_) dimension.
    
    As matplotlib will load images with values in the range [0, 1] in RGB format with the channel as the _innermost_ dimension, we are arranging for the needed transformations here.

### 3. Feature extraction from pair of images

* In this part we will consider different combination of pair of similar images and extract a 4096 feature vector. We tap the final layer of the pretrained model to get this feature, it is known as 'fc7' feature since we tap the 7th layer of the deepNet

In [5]:
def pairwiseLabelImages(image_folder, text_labels_path):

    text_labels = np.loadtxt(text_labels_path, delimiter=' ')
    labelmap = {}
    for i in range( len(text_labels) ):
        if(text_labels[i][4]<text_labels[i][5]):
            val = [text_labels[i][3], 1, 0]
        else:
            val = [ text_labels[i][3], 0, 1]

        group_A_B = str(int(text_labels[i][0])) +"_"+ str(int(text_labels[i][1])) +"_"+ str(int(text_labels[i][2]))    
        labelmap[group_A_B] = val

    #for k,v in labelmap.items():
        #print k,v

    labelkeys = labelmap.keys()


    fullfilename = glob2.glob(image_folder + '*.JPG')
    filename = []
    for f in fullfilename:
        filename.append( string.replace(f, image_folder, ''))  

    #print "\n".join(sorted(filename))
    groupmap = {}
    for f in filename:
        if(f.split("-")[0] in groupmap):
            val = groupmap[f.split("-")[0]]
            val.append(f.split("-")[1])
            groupmap[f.split("-")[0]] = val
        else:
            groupmap[f.split("-")[0]] = [ f.split("-")[1] ]

    groupkeys = sorted(groupmap.keys())
    #print "Different groups: ", groupkeys 

    image_pairs_with_label = []
    for key in groupkeys:
        #print "Groups of photos : ", key,groupmap[key]
        items = sorted(groupmap[key])
        for i in range( len(items)-1 ):
            for j in range( i+1,len(items) ):
                if(i != j):
                    #print key+'-'+items[i], key+'-'+items[j]
                    group_A_B = str(int(key)) +"_"+ str( int(items[i].split(".")[0]) ) +"_"+ str( int(items[j].split(".")[0]) )
                    if( group_A_B in labelkeys):
                        #image_pairs_with_label.append( [key+'-'+items[i], key+'-'+items[j], [labelmap[group_A_B][1], labelmap[group_A_B][2]] ] )
                        image_pairs_with_label.append( [image_folder+key+'-'+items[i], image_folder+key+'-'+items[j], [labelmap[group_A_B][1], labelmap[group_A_B][2]] ] )

    #for i in image_pairs_with_label:
        #print i[0], i[1], [i[2][0], i[2][1]] 
        
    return image_pairs_with_label

In [6]:
import PIL
from PIL import Image

def getFeatures(single_image_pair_with_label):
    img1 = Image.open(single_image_pair_with_label[0])
    img1=img1.resize((224,224),PIL.Image.ANTIALIAS)
    img1=np.uint8(img1)
    img1= img1[:, :, (2, 1, 0)]
    img1 = img1.transpose((2, 0, 1))

    img2 = np.uint8(Image.open(single_image_pair_with_label[1]).resize((224,224),PIL.Image.ANTIALIAS))
    img2= img2[:, :, (2, 1, 0)]
    img2 = img2.transpose((2, 0, 1))

    img3 = np.concatenate((img1,img2))
    img3 = img3;

    datum = caffe.io.array_to_datum(img3)  

    #print img1.shape
    #print img2.shape
    #print img3.shape
    #print {'data': net.blobs['data'].data.shape}

    # create transformer for the input called 'data'
    transformer = caffe.io.Transformer({'data': net.blobs['data'].data.shape})

    transformer.set_transpose('data', (2,0,1))  # move image channels to outermost dimension
    #transformer.set_mean('data', mu)            # subtract the dataset-mean value in each channel
    transformer.set_raw_scale('data', 255)      # rescale from [0, 1] to [0, 255]
    transformer.set_channel_swap('data', (5,4,3,2,1,0))  # swap channels from RGB to BGR

    # set the size of the input (we can skip this if we're happy
    #  with the default; we can also change it later, e.g., for different batch sizes)
    net.blobs['data'].reshape(1,          # batch size
                              6,         # 3-channel (BGR) images
                              224, 224)  # image size is 227x227

    # Load an image (that comes with Caffe) and perform the preprocessing we've set up.

    #image = caffe.io.load_image(caffe_root + 'examples/images/cat.jpg')
    #transformer = caffe.io.Transformer({'data': net.blobs['data'].data.shape})
    transformed_image = transformer.preprocess('data', img3)
    #plt.imshow(img3)

    # Adorable! Let's classify it!

    # copy the image data into the memory allocated for the net
    net.blobs['data'].data[...] = transformed_image

    ### perform classification
    output = net.forward()

    output_prob = output['diff'][0]  # the output probability vector for the first image in the batch

    #print output_prob
    #print 'predicted class is:', output_prob.argmax()
    #print len(output_prob)
    
    return list(output_prob)
    

* In this part we will extract the features and their labels and store it in the .numpy database. These features are loaded later for the trining and validation purposes.

In [14]:
from sklearn.neural_network import MLPClassifier
import time

def writeMatrixToFile(image_pairs_with_labels_list, filename, max_iterations):
    start = time.time()
    loop_c = 0
    matrix = []
    for image in image_pairs_with_labels_list:

        t0 = time.time()
        features = getFeatures(image)  
        features.append(image[2][0])
        matrix.append(features)

        t = time.time()-t0
        print "loop", loop_c+1, "of", min(len(image_pairs_with_labels_list),max_iterations), "with time " +  str(t)
        loop_c = loop_c + 1
        
        if(loop_c == max_iterations):
            break

    end = time.time()
    print "Total time = ", str(end-start) 
       
    mat = np.array(matrix)
    # type(mat)
    np.save(filename,mat)
    print "Total rows: ", len(matrix), "and columns", len(matrix[0]), "are written into",  filename

In [15]:
image_folder = '/home/darshan/ML/pics/'
text_labels = '/home/darshan/ML/train_pairlist.txt'

image_pairs_with_label = pairwiseLabelImages(image_folder, text_labels)

In [16]:
two_way_image_pairs_with_label = []
for i in image_pairs_with_label:
    #print i[0], i[1], [i[2][0], i[2][1]] 
    two_way_image_pairs_with_label.append( [i[0], i[1], [i[2][0], i[2][1]] ] )
    two_way_image_pairs_with_label.append( [i[1], i[0], [i[2][1], i[2][0]] ] )
    
print "Number of pairs : ", len(image_pairs_with_label)
print "Number of two way pairs : ", len(two_way_image_pairs_with_label)

#for i in two_way_image_pairs_with_label:
#    print i[0], i[1], [i[2][0], i[2][1]]

Number of pairs :  3023
Number of two way pairs :  6046


* We used MLP as out training model since it is found to give the best results among all. Input layer:4096 nurons, First hidden layer:128 nurons, second hidden layer:128 nuron, output layer: single nuron for binary classification.We train the model on 2885 feature vector.

In [18]:
writeMatrixToFile(two_way_image_pairs_with_label, "train_matrix_2885.npy", 2885)

loop 1 of 2885 with time 6.39626288414
loop 2 of 2885 with time 6.41474080086
loop 3 of 2885 with time 6.4109377861
loop 4 of 2885 with time 6.35202598572
loop 5 of 2885 with time 6.40195798874
loop 6 of 2885 with time 6.42132282257
loop 7 of 2885 with time 6.38849687576
loop 8 of 2885 with time 6.40144395828
loop 9 of 2885 with time 6.41457605362
loop 10 of 2885 with time 6.47701096535
loop 11 of 2885 with time 6.4445078373
loop 12 of 2885 with time 6.48666214943
loop 13 of 2885 with time 6.4600288868
loop 14 of 2885 with time 6.40508008003
loop 15 of 2885 with time 6.43091201782
loop 16 of 2885 with time 6.37551999092
loop 17 of 2885 with time 6.48843693733
loop 18 of 2885 with time 6.47082686424
loop 19 of 2885 with time 6.50606393814
loop 20 of 2885 with time 6.64659690857
loop 21 of 2885 with time 6.67915391922
loop 22 of 2885 with time 6.37243914604
loop 23 of 2885 with time 6.46111679077
loop 24 of 2885 with time 6.48526906967
loop 25 of 2885 with time 6.39492511749
loop 26 of 2

In [217]:
image_folder = '/home/darshan/ML/pics/'
text_labels = '/home/darshan/ML/val_pairlist.txt'

image_pairs_with_label_val_1 = pairwiseLabelImages(image_folder, text_labels)

print "Number of pairs : ", len(image_pairs_with_label_val_1)

Number of pairs :  100


In [178]:
writeMatrixToFile(image_pairs_with_label_val_1, "validation_matrix_100.npy",100)

loop 1 of 100 with time 32.2952570915
loop 2 of 100 with time 8.38135719299
loop 3 of 100 with time 6.92146492004
loop 4 of 100 with time 6.76979112625
loop 5 of 100 with time 6.58378696442
loop 6 of 100 with time 6.46458482742
loop 7 of 100 with time 6.56590294838
loop 8 of 100 with time 6.73145008087
loop 9 of 100 with time 6.57111501694
loop 10 of 100 with time 6.63628411293
loop 11 of 100 with time 6.48739910126
loop 12 of 100 with time 6.654692173
loop 13 of 100 with time 6.4815890789
loop 14 of 100 with time 6.73180699348
loop 15 of 100 with time 6.83107018471
loop 16 of 100 with time 6.55668711662
loop 17 of 100 with time 6.67246198654
loop 18 of 100 with time 6.56513094902
loop 19 of 100 with time 6.52673316002
loop 20 of 100 with time 6.54611778259
loop 21 of 100 with time 6.42651891708
loop 22 of 100 with time 6.46625590324
loop 23 of 100 with time 6.39951014519
loop 24 of 100 with time 6.39801192284
loop 25 of 100 with time 6.38611388206
loop 26 of 100 with time 6.4471859931

In [196]:
mat=np.load('train_matrix_2885.npy')
print "Training Matrix: ", len(mat), "*", len(mat[0])
X= mat[:,0:4096]
print "X: ",len(X), "*", len(X[0])
y = mat[:,4096]
print "Y: ",len(y),"* 1"

Training Matrix:  2885 * 4097
X:  2885 * 4096
Y:  2885 * 1


In [218]:
mat_validation=np.load('validation_matrix_100.npy')

print "Matrix: ", len(mat_validation), "*", len(mat_validation[0])
X_validation = mat_validation[:,0:4096]
print "X: ",len(X_validation), "*", len(X_validation[0])
y_validation = mat_validation[:,4096]
print "Y: ",len(y_validation),"* 1"

Matrix:  100 * 4097
X:  100 * 4096
Y:  100 * 1


* Here we are training the model with 2885 pairs of images.
MLP classifier with the below configuration gave us the best fit.

In [222]:
from sklearn.neural_network import MLPClassifier
import time


clf = MLPClassifier(solver='adam', alpha=1e-5, hidden_layer_sizes=(128,128), random_state=1,activation ='tanh')
    #clf.Layer('Softmax', warning=None, name='output', units=None, weight_decay=None, dropout=None, normalize=None, frozen=False)

start = time.time()

clf.fit(X,y) 

end = time.time()
print "Total time to fit = ", str(end-start) 

pred_train = list(clf.predict(X))

Total time to fit =  16.0360620022


* In this part we will test out model on the 100 different pairs of images. The test accuracy was found to be 66%. The state of the art model is giving accuracy around 78%.  

In [228]:
pred_validation = list(clf.predict(X_validation))
#print pred_validation
#print y_validation
print "Training Accuracy =", float(sum(y==pred_train))/len(pred_train)
print "Validation Accuracy =", float(sum(y_validation==pred_validation))/len(pred_validation)
#print (y_validation==pred_validation)

Training Accuracy = 0.622183708839
Validation Accuracy = 0.66


* Configuration used for the best MFC classifier model is listed below

In [208]:
clf.get_params(deep=True)

{'activation': 'tanh',
 'alpha': 1e-05,
 'batch_size': 'auto',
 'beta_1': 0.9,
 'beta_2': 0.999,
 'early_stopping': False,
 'epsilon': 1e-08,
 'hidden_layer_sizes': (128, 128),
 'learning_rate': 'constant',
 'learning_rate_init': 0.001,
 'max_iter': 200,
 'momentum': 0.9,
 'nesterovs_momentum': True,
 'power_t': 0.5,
 'random_state': 1,
 'shuffle': True,
 'solver': 'adam',
 'tol': 0.0001,
 'validation_fraction': 0.1,
 'verbose': False,
 'warm_start': False}