## YOLO_v3 inspired model

- This model takes in input images of shape (720,720,1) and produces output of shape (15,15,14). Given the k-means clustering analysis done in the clean data step, there are two distinct clusters of images corresponding to open and closed palm. This is due to the inherent image ratio of an open and closed palm. With these two clusters, I am building the model to recognize these two different ratios. the y output shape is (15,15,7).
- This model is inpsired by the YOLO_v3 model and as such is comprised of residual blocks, batch normalization, and other structural elements.
- I will be using mini-batch gradient descent with adam optimization, but will not be using an iteratively decreasing learning rate

Note - given time constraints, within this current iteration of the model I am using one prediction per cell rather than 2 specified earlier.

In [1]:
import numpy as np
import tensorflow as tf
import pandas as pd
from keras import backend as K
import matplotlib.pyplot as plt
import latex
from sklearn.utils import shuffle
from sklearn.model_selection import train_test_split

%matplotlib inline
import warnings
warnings.filterwarnings('ignore')

Using TensorFlow backend.


#### Loading training data, shuffling, and creating a test subset for testing model performance

In [2]:
# Loading data
X = np.load("../../data/dinorunner/images.npy")
y = np.load("../../data/dinorunner/encodings.npy")
print(X.shape)
print(y.shape)

(396, 720, 720, 1)
(396, 15, 15, 7)


In [3]:
# Normalizing image data
X = X / 255

In [4]:
# shuffling the data 
X = shuffle(X,random_state=1)
y = shuffle(y,random_state=1)
print(X.shape)
print(y.shape)

(396, 720, 720, 1)
(396, 15, 15, 7)


In [17]:
# Creating testing set
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.05, random_state=1)
print(X_train.shape)
print(y_train.shape)
print(X_test.shape)
print(y_test.shape)

TypeError: unsupported operand type(s) for *: 'Dimension' and 'float'

In [6]:
ay = np.zeros((1,15,15,7))
ay[0,0,0,:] = np.array([1,0.5,0.5,0.25,0.25,1,0]) # top left corner
az = np.zeros((1,15,15,14))
az[0,0,0,0:7] = np.array([0.8,0.25,0.25,0.2,0.2,0.8,0.2])
az[0,0,1,0] = 1
az[0,1,0,0] = 1

### Building YOLO model

#### YOLO cost function:

$$ \lambda_{coord} \sum_{i=0}^{S^{2}} \sum_{j=0}^{B} 1_{ij}^{obj} \bigg[(x_i-\hat{x_{i}})^2 + (y_i - \hat{y_i})^2\bigg]$$
$$ + \lambda_{coord} \sum_{i=0}^{S^{2}} \sum_{j=0}^{B} 1_{ij}^{obj} \bigg[(\sqrt{w_i}-\sqrt{\hat{w_{i}}})^2 + (\sqrt{h_i} - \sqrt{\hat{h_i}})^2\bigg] $$
$$ + \sum_{i=0}^{S^{2}} \sum_{j=0}^{B} 1_{ij}^{obj} (c_i-\hat{c_{i}})^2 $$
$$ + \lambda_{noobj} \sum_{i=0}^{S^{2}} \sum_{j=0}^{B} 1_{ij}^{noobj} (c_i-\hat{c_{i}})^2 $$
$$ \sum_{i=0}^{S^{2}} 1_{i}^{obj} \sum_{c \in classes} (p_i(c)-\hat{p_{i}}(c))^2 $$

Note - the ground truth box in B will be the box that has the highest IoU with the true box

Terms:
- S<sup>2</sup>: the number of cells in an image (15x15)
- B: all bounding boxes per cell (1) 
- 1<sup>obj</sup><sub>ij</sub>: denotes the bounding box predictor in cell (i,j) responsible for prediction
- 1<sup>obj</sup><sub>ij</sub>: denotes if object appears in cell
- C<sub>i</sub>: confidence score for whether there is an object
- lambda<sub>coord</sub>: (5) weight factor that increases loss from bounding box predictions 
- lambda<sub>noobj</sub>: (0.5) weight factor that decreases loss from predictions for boxes that don't contain objects

### Tensorflow placeholders

In [7]:
# Placeholder values for input X,y data
def get_placeholders(x_h,x_w,x_c,y_h,y_w,y_c):
    """
    x_h: Height for x input 
    x_w: Width for x input
    x_c: Channels for x input
    y_h: Height for y input
    y_w: Width for y input
    y_c: Channels for y input
    """
    X = tf.placeholder(tf.float32, name="X", shape=(None,x_h,x_w,x_c))
    y = tf.placeholder(tf.float32, name="y", shape=(None,y_h,y_w,y_c))
    return X,y

In [8]:
# Testing placeholders
tf.reset_default_graph()
with tf.Session() as sess:
    X,y = get_placeholders(720,720,1,15,15,14)
    print("X shape:",X.shape)
    print("y shape:",y.shape)

X shape: (?, 720, 720, 1)
y shape: (?, 15, 15, 14)


### Tensorflow forward prop

In [9]:
"""
Standard residual block which has the same input shape as output shape
Correspond with 1. conv2d filter(1,1) "valid" 2. conv2d filter(3,3) "same"
"""
def same_identity(the_input,nf,sl):
    """
    the_input: outut from a previous layer of conv net
    nf: number of filters for the same_identity block
    sl: the number of the first layer in this block
    """
    shortcut = the_input # saving previous activation
    
    Z1 = tf.layers.conv2d(the_input,filters=nf,kernel_size=[1,1],strides=(1,1),padding="valid",name="Z"+str(sl),kernel_initializer=tf.contrib.layers.xavier_initializer(seed=5))
    Bn1 = tf.layers.batch_normalization(Z1,name="Bn"+str(sl))
    A1 = tf.nn.leaky_relu(Bn1,alpha=0.1,name="A"+str(sl))
    
    Z2 = tf.layers.conv2d(A1,filters=nf,kernel_size=[3,3],strides=(1,1),padding="same",name="Z"+str(sl+1),kernel_initializer=tf.contrib.layers.xavier_initializer(seed=5))
    Bn2 = tf.layers.batch_normalization(Z2,name="Bn"+str(sl+1))
    
    # updating old residual to new size and channel
    shortcut_Z = tf.layers.conv2d(shortcut,filters=nf,kernel_size=[3,3],strides=(1,1),padding="same",name="shortcut_Z"+str(sl+1),kernel_initializer=tf.contrib.layers.xavier_initializer(seed=5))
    shortcut_Bn = tf.layers.batch_normalization(shortcut_Z,name="shortcut_Bn"+str(sl+1))
    newZ = tf.add(Bn2,shortcut_Bn,name="resid_add"+str(sl+1)) # adding old residual
    A2 = tf.nn.leaky_relu(newZ,alpha=0.1,name="A"+str(sl+1))
    
    return A2

In [10]:
"""
Standard residual block which does not have the same input shape as output shape
Correspond with 1. conv2d filter(1,1) "valid" 2. conv2d filter(3,3) "valid"
"""
def valid_identity(the_input,nf,sl):
    shortcut = the_input # saving previous activation
    
    Z1 = tf.layers.conv2d(the_input,filters=nf,kernel_size=[1,1],strides=(1,1),padding="valid",name="Z"+str(sl),kernel_initializer=tf.contrib.layers.xavier_initializer(seed=5))
    Bn1 = tf.layers.batch_normalization(Z1,name="Bn"+str(sl))
    A1 = tf.nn.leaky_relu(Bn1,alpha=0.1,name="A"+str(sl))
    
    Z2 = tf.layers.conv2d(A1,filters=nf,kernel_size=[3,3],strides=(1,1),padding="valid",name="Z"+str(sl+1),kernel_initializer=tf.contrib.layers.xavier_initializer(seed=5))
    Bn2 = tf.layers.batch_normalization(Z2,name="Bn"+str(sl+1))
    
    # updating old residual to new size and channel
    shortcut_Z = tf.layers.conv2d(shortcut,filters=nf,kernel_size=[3,3],strides=(1,1),padding="valid",name="shortcut_Z"+str(sl+1),kernel_initializer=tf.contrib.layers.xavier_initializer(seed=5))
    shortcut_Bn = tf.layers.batch_normalization(shortcut_Z,name="shortcut_Bn"+str(sl+1))
    newZ = tf.add(Bn2,shortcut_Bn,name="resid_add"+str(sl+1)) # adding old residual
    A2 = tf.nn.leaky_relu(newZ,alpha=0.1,name="A"+str(sl+1))
    
    return A2

In [11]:
"""
Forward pass using residual blocks, batch normalization, leaky relu
Note that these reside blocks jump over a single layer
"""
def forward_pass(X,out):
    """
    Input image or images: X -shape(?,720,720,1)
    out - specifies how many predictions per cell you want, multiple of 7
    """
    # First layer
    input_layer = tf.reshape(X,[-1,720,720,1])
    Z = tf.layers.conv2d(input_layer,filters=4,kernel_size=[5,5],strides=(1,1),padding="same",name="Z1",kernel_initializer=tf.contrib.layers.xavier_initializer(seed=0))
    Bn = tf.layers.batch_normalization(Z,name="Bn1")
    A = tf.nn.leaky_relu(Bn,alpha=0.1,name="A1")
    P1 = tf.layers.max_pooling2d(A,pool_size=[2,2],strides=2,padding="valid",name="P1") # shape (358,358,4)
    # Block 1
    B1 = same_identity(P1,8,2)
    B2 = valid_identity(B1,16,4)
    B2_pool = tf.layers.max_pooling2d(B2,pool_size=[2,2],strides=2,padding="valid",name="P1") # shape (178,178,16)
    # Block 2
    B3 = same_identity(B2_pool,32,6)
    B4 = valid_identity(B3,64,8)
    B4_pool = tf.layers.max_pooling2d(B4,pool_size=[2,2],strides=2,padding="valid",name="P2") # shape (88,88,64)
    # Block 3
    B5 = same_identity(B4_pool,128,10)
    B6 = valid_identity(B5,256,12)
    B7 = same_identity(B6,128,14)
    B8 = valid_identity(B7,256,16)
    B8_pool = tf.layers.max_pooling2d(B8,pool_size=[2,2],strides=2,padding="valid",name="P3") # shape (42,42,256)
    # Block 4
    B9 = same_identity(B8_pool,256,18)
    B10 = valid_identity(B9,512,20)
    B11 = same_identity(B10,256,22)
    B12 = valid_identity(B11,512,24)
    B12_pool = tf.layers.max_pooling2d(B12,pool_size=[2,2],strides=2,padding="valid",name="P4") # shape (19,19,512)
    # Block 5
    B13 = same_identity(B12_pool,512,26)
    B14 = valid_identity(B13,1024,28)
    B15 = same_identity(B14,512,30)
    B16 = valid_identity(B15,1024,32) # shape (15,15,1024)
    # Final layer - no batch norm, linear activation
    Z34 = tf.layers.conv2d(B16,filters=out,kernel_size=[1,1],strides=(1,1),padding="valid",name="Z34",activation=None)
    return Z34

In [12]:
# Testing forward prop
tf.reset_default_graph()
with tf.Session() as sess:
    np.random.seed(1)
    X,y = get_placeholders(720,720,1,15,15,7)
    Z34 = forward_pass(X,out=14)
    init = tf.global_variables_initializer()
    sess.run(init)
    aZ = sess.run(Z34,feed_dict={X:np.random.randn(3,720,720,1),y:np.random.randn(3,15,15,7)})
    print("Z shape:", str(aZ.shape))

Z shape: (3, 15, 15, 14)


### Tensorflow cost function

In [13]:
# Returns the values with a specific mask applied to it
def get_box_values(box,mask):
    """
    Index:
    0: confidence there is an object in cell, 1: mid_x, 2: mid_y, 
    3: width, 4: length, 5: prob_open_palm, 6: prob_close_palm
    """
    confidence = tf.boolean_mask(box[:,:,:,0:1],mask)
    mid_x = tf.boolean_mask(box[:,:,:,1:2],mask)
    mid_y = tf.boolean_mask(box[:,:,:,2:3],mask)
    width = tf.boolean_mask(box[:,:,:,3:4],mask)
    length = tf.boolean_mask(box[:,:,:,4:5],mask)
    prob_dog = tf.boolean_mask(box[:,:,:,5:6],mask)
    prob_cat = tf.boolean_mask(box[:,:,:,6:7],mask)
    box = {"co":confidence, "mx":mid_x,"my":mid_y,"w":width,"l":length,"d":prob_dog,"c":prob_cat}
    return box

In [14]:
# A component of this cost function is that it heavily penalizes negative predictions for height and weight
# This choice was made due to the reality that it is impossible to have a negative height or width
# This cost function functions when there is one prediction per cell
def cost_function(Z,y,coord=5,noobj=0.5):
    """
    Z - shape (?,15,15,7)
    y - shape (?,15,15,7)
    """
    c_mask_true = y[:,:,:,0:1] > 0
    c_mask_false = y[:,:,:,0:1] < 1
    
    y_v = get_box_values(y,c_mask_true)
    m_v = get_box_values(Z,c_mask_true)
    mv_f = get_box_values(Z,c_mask_false)
    y_f = get_box_values(y,c_mask_false)
    
    # seeing if pred width and height are positive
    m_v["w"] = tf.cond(tf.reshape(m_v["w"],[])>0, lambda: tf.sqrt(m_v["w"]), lambda: m_v["w"])
    m_v["l"] = tf.cond(tf.reshape(m_v["l"],[])>0, lambda: tf.sqrt(m_v["l"]), lambda: m_v["l"])
    
    y_v["w"] = tf.sqrt(y_v["w"])
    y_v["l"] = tf.sqrt(y_v["l"])
    
    # correspond to individual summations of the cost function:
    part1 = coord * tf.reduce_sum(tf.square(y_v["mx"]-m_v["mx"])+tf.square(y_v["my"]-m_v["my"]))
    part2 = coord * tf.reduce_sum(tf.square(y_v["w"]-m_v["w"])+tf.square(y_v["l"]-m_v["l"]))
    part3 = tf.reduce_sum(tf.square(y_v["co"]-m_v["co"]))
    part4 = noobj * tf.reduce_sum(tf.square(y_f["co"]-mv_f["co"]))
    part5 = tf.reduce_sum(tf.add(tf.square(y_v["d"]-m_v["d"]),tf.square(y_v["c"]-m_v["c"])))# if obj in cell, if bounding box is highest IoU, compare class predictions
    total_cost = part1 + part2 + part3 + part4 + part5
    return total_cost

In [15]:
# Testing cost function
# predicted cost 2/ rounding error is 1.773
ay = np.zeros((1,15,15,7))
ay[0,0,0,:] = np.array([1,0.5,0.5,0.25,0.25,1,0]) # top left corner
az = np.zeros((1,15,15,14))
az[0,0,0,0:7] = np.array([0.8,0.25,0.25,0.2,0.2,0.8,0.2])
az[0,0,1,0] = 1
az[0,1,0,0] = 1

with tf.Session() as sess:
    y = tf.placeholder(tf.float32,shape=(None,15,15,7))
    Z = tf.placeholder(tf.float32,shape=(None,15,15,14))
    aCost = cost_function(Z,y)
    init = tf.global_variables_initializer()
    sess.run(init)
    tot = sess.run(aCost,feed_dict={Z:az,y:ay})
    print(tot)

1.7728641


### Tensorflow model

In [16]:
# Creates shuffled mini batches
def random_mini_batches(X, y, mini_batch_size, seed):
    """
    Creates a list of random minibatches from (X, Y)
    Returns:
    mini_batches -- list of synchronous (mini_batch_X, mini_batch_Y)
    """
    rounds = int(X.shape[0] / mini_batch_size) # Max number of minibatches
    X_shuffle = shuffle(X, random_state=seed)
    y_shuffle = shuffle(y, random_state=seed)
    mini_batches = []
    a = 0 #used to siphon off sections of X
    b = 0 #used to siphon off sections of y
    
    for around in range(rounds):
        x_mini = X_shuffle[a:a+mini_batch_size]
        y_mini = y_shuffle[b:b+mini_batch_size]
        mini_batch = (x_mini,y_mini)
        mini_batches.append(mini_batch)
        a += mini_batch_size
        b += mini_batch_size
    
    return mini_batches

In [1]:
# Building and training YOLO model
def model(X_train,y_train,lr=0.001,minibatch_size=10,num_epochs=200,print_cost=True):
    tf.reset_default_graph() # resetting graph
    tf.set_random_seed(1)
    seed=0
    out=7 # specifying number of guesses per cell
    costs=[]
    x_h = X_train[0].shape[0]
    x_w = X_train[0].shape[1]
    x_c = X_train[0].shape[2]
    y_h = y_train[0].shape[0]
    y_w = y_train[0].shape[1]
    y_c = y_train[0].shape[2]
    m = X_train.shape[0]
    
    X,y = get_placeholders(x_h,x_w,x_c,y_h,y_w,y_c)
    Z = forward_pass(X,out)
    cost = cost_function(Z,y)
    optimizer = tf.train.AdamOptimizer(learning_rate=lr).minimize(cost)
    
    init = tf.global_variables_initializer()
    saver = tf.train.Saver()
    with tf.Session() as sess:
        # Loading saved model
        #saver = tf.train.import_meta_graph("../../structured_dl_files/models/yolo_model.ckpt.meta")
        #saver.restore(sess, "../../structured_dl_files/models/yolo_model.ckpt")
        sess.run(init) # DONT RUN INIT IF LOADING MODEL
        
        for epoch in range(num_epochs):
            minibatch_cost = 0
            seed += 1
            minibatches = random_mini_batches(X_train, y_train, minibatch_size, seed)
            
            for minibatch in minibatches:
                (mini_x,mini_y) = minibatch
                _,temp_cost = sess.run([optimizer,cost], feed_dict={X:mini_x,y:mini_y})
                minibatch_cost += temp_cost
                
            costs.append(cost)
            if print_cost and epoch % 1 == 0:
                print("Cost at epoch {}: {}".format(epoch+1,minibatch_cost))
                
        loc = saver.save(sess, "../../data/dinorunner/models/yolo_model.ckpt")
        return costs