## Rearing Prediction using Support Vector Machines:


 ![title](rearing_flowchart.png)
 
Here our goal is to predict if the animal model is rearing at each time step given instances of the Marker predictions we derived from Deeplabcut. We will use the Marker predictions file(.csv file) as well as Manual annotations of Rearing(.csv file) to train an Support Vector Machine to learn to predict when the animal is rearing.

 

In [None]:
# library imports

import random
import numpy as np
from plyfile import PlyData, PlyElement
from PIL import Image
import math
import pandas as pd 
from sklearn import svm
import os
from sklearn.metrics import zero_one_loss
from sklearn.metrics import confusion_matrix


To train the model, we need the prediction output from DeeplabCut as well as ground truth annotations for rearing. The directories below should be modified to include the locations of these files on your local machine

In [4]:
# Import the Marker prediction file as well as the rearing annotations
prediction_dir='/Users/goutham/Documents/Senior_year/research_design/Test7PART1DLC_resnet50_Trial3Mar23shuffle1_10000.csv' 
rearing_dat='/Users/goutham/Documents/Senior_year/research_design/trial7_rearing_first10min.xlsx'

prediction_data=pd.read_csv(prediction_dir,header=1)
rearing_gt=pd.read_excel(rearing_dat,header=0)

  interactivity=interactivity, compiler=compiler, result=result)


These are some preprocessing steps. Note that the frame rate for the camera that we used in our study was 15 frames/second and we have done our analysis depending on our start/stop points. This should be modified for different experiments

In [5]:
# feature selection

pred_data_train=prediction_data.iloc[1:2175]#this is the 5th minute to 7.5 minutes
pred_data_test=prediction_data.iloc[2176:4425] #this is the 7.5th minute to the 10 minute

rearing_gt_train=rearing_gt.iloc[4576:6750,1].to_numpy().astype(float) # add 5 second bias 5*15=75 b/c dow started 5 sec late
rearing_gt_test=rearing_gt.iloc[6751:,1]

The features that we derived are experimentally determined for rearing and showed good classification. We used six total features which include: euclidean distance from snout to tailbase, euclidean distance from snout to right hip, euclidean distance from snout to left hip, and the Deeplabcut models confidence in prediction of the centroid, the right centroid, and left centroid.

In [6]:
def Euclidean_distance(x_1,y_1,x_2,y_2):
    euc_dist=np.sqrt((((x_1-x_2)**2) + ((y_1-y_2)**2)))
    return euc_dist



snout_xy_train=pred_data_train[["snout","snout.1"]].to_numpy().astype(float)
centroid_xy_train=pred_data_train[["rightear","rightear.1"]].to_numpy().astype(float)
tailbase_xy_train=pred_data_train[["tailbase","tailbase.1"]].to_numpy().astype(float)
hip_right_xy_train=pred_data_train[["righthip","righthip.1"]].to_numpy().astype(float)
hip_left_xy_train=pred_data_train[["lefthip","lefthip.1"]].to_numpy().astype(float)


#feeatures
distance_snout_tailbase=Euclidean_distance(snout_xy_train[:,0],snout_xy_train[:,1],tailbase_xy_train[:,0],tailbase_xy_train[:,1])
distance_snout_right_hip=Euclidean_distance(snout_xy_train[:,0],snout_xy_train[:,1],hip_right_xy_train[:,0],hip_right_xy_train[:,1])
distance_snout_left_hip=Euclidean_distance(snout_xy_train[:,0],snout_xy_train[:,1],hip_left_xy_train[:,0],hip_left_xy_train[:,1])
prob_centroid=pred_data_train[["centroid.2","rightcentroid.2","leftcentroid.2"]].to_numpy().astype(float)

features=np.column_stack((distance_snout_tailbase,distance_snout_right_hip,distance_snout_left_hip,prob_centroid))



Here we create a balanced dataset(equal number of instances per class) for training and testing

In [7]:

delete_num=[]
negative_instances=[]
for i in range(0,len(rearing_gt_train)):
    if rearing_gt_train[i]==0:
        delete_num.append(i)
    else:
        negative_instances.append(i)

positive_instances_pred=np.delete(features,delete_num,0)
positive_rearing=np.delete(rearing_gt_train,delete_num)

negative_instances_pred=np.delete(features,negative_instances,0)
negative_rearing=np.delete(rearing_gt_train,negative_instances)



averages_pred_positive=np.mean(positive_instances_pred, axis=0)
averages_pred_non_rearing=np.mean(negative_instances_pred, axis=0)

Differences=np.abs(averages_pred_positive-averages_pred_non_rearing)

# creating a balanced dataset
random.seed(2)
training_samples_positive=random.sample(range(0, len(positive_rearing)), int(len(positive_rearing)*0.7)) # first argument is the range of random numbers to chhose from and second is the total number in training
training_samples_negative=random.sample(range(0,len(negative_rearing)),int(len(positive_rearing)*0.7))


holdout_samples_positive=[]
holdout_samples_negative_nooverlap=[]

for i in range(0,len(positive_rearing)):
    if i not in training_samples_positive:
        holdout_samples_positive.append(i)
# this foreloop makes sure that there is no repeats
for i in range(0,len(negative_rearing)):
    if i not in training_samples_negative:
        holdout_samples_negative_nooverlap.append(i)

indicies=random.sample(range(0,len(holdout_samples_negative_nooverlap)),len(holdout_samples_positive))

holdout_samples_negative=[ holdout_samples_negative_nooverlap[i] for i in indicies]

# get prediciton data like this--get data from each specific row into a matrix


training_samples=np.transpose(np.zeros(6))
holdout_data=np.transpose(np.zeros(6))
gt_labels_holdout=[]
gt_labels_training=[]

for i in range(0,len(holdout_samples_negative)):
    row_num_neg= holdout_samples_negative[i]
    row_num_pos=holdout_samples_positive[i]
    holdout_data=np.vstack((holdout_data,negative_instances_pred[row_num_neg]))
    holdout_data=np.vstack((holdout_data,positive_instances_pred[row_num_pos]))
    gt_labels_holdout=np.append(gt_labels_holdout,0)
    gt_labels_holdout=np.append(gt_labels_holdout,1)
    
holdout_data=np.delete(holdout_data,0,0)


    
for i in range(0,len(training_samples_positive)):
    row_num_positive=training_samples_positive[i]
    row_num_negative=training_samples_negative[i]
    training_samples=np.vstack((training_samples,positive_instances_pred[row_num_positive]))
    training_samples=np.vstack((training_samples,negative_instances_pred[row_num_negative]))
    gt_labels_training=np.append(gt_labels_training,1)
    gt_labels_training=np.append(gt_labels_training,0)
    
    
training_samples=np.delete(training_samples,0,0)



Below we train and evaluate our SVM.We present the accuracy on the holdout testing set as well as a 
confusion matrix to visualize our misclassifications

In [19]:
# SVM training and Evaluation

clf = svm.SVC(kernel='linear')


clf.fit(training_samples,gt_labels_training)

predictions=clf.predict(holdout_data)

loss=zero_one_loss(gt_labels_holdout, predictions) 
Accuracy=1-loss


print('Overall Testing Accuracy: {}'.format(Accuracy))


confusion=confusion_matrix(gt_labels_holdout, predictions)
print('Below we have a confusion matrix')
print(confusion)



Overall Testing Accuracy: 0.94
Below we have a confusion matrix
[[23  2]
 [ 1 24]]


In [21]:
# Save the model if your happy with the results
from joblib import dump, load
dump(clf, 'SVM_prediciton_rearing.joblib') 

['SVM_prediciton_rearing.joblib']

We noticed that while we do get good classification on this trial, applying the model on different rat models does not result in good accuracy and therefore we recomend training the model seperately for each trial