# Person Identification by Gait on the Gotcha Dataset

## Overview of Tasks:
1) Extract frames from video, we don’t need all the frames but samples of consecutive
frames<br>
2) Use Real time Multi-Person 2D Pose Estimation Using Part Affinity Fields to detect the
skeleton of the subject in the frame, particularly his feature body points<br>
3) Store the position/coordinates of consecutive frame feature points in an array to create
the pattern array (the pattern array store informations about the subject movement
variations in time)<br>
4) Create a NN that use as input the pattern array and give us the user id as output:<br>
5) Use a lot of pattern array samples to train the network<br>
6) Try to reach the highest accuracy chosing the best loss function.<br>
7) Additionally classify if each video is inside, outside or with a flashlight

## Dataset Content Description:
Content of the Dataset, Folders:<br>
Inside each folder there are two numbers: first number is the number of the folder and the second number is the subject's ID. <br>
1_Indoor lights - cooperative  = there are videos of 62 subjects indoor and in cooperative way.<br> 
2_Indoor light - non cooperative = there are videos of 62 subjects indoor and in cooperative way. <br>
3_Indoor flash - cooperative  = there are videos of 62 subjects indoor with light on and flash on, in cooperative way.<br> 
4_Indoor flash - non cooperative = there are videos of 62 subjects indoor with light on and flash on, in non cooperative way. <br>
5_Outdoor - cooperative  = there are videos of 62 subjects outdoor in cooperative way. <br>
6_Outdoor - non cooperative = there are videos of 62 subjects outdoor in non cooperative way. <br>
7_180 head video =  there are videos of the face in 180¡ (from ear to ear) of 62 subjects.<br>
8_Stairs - cooperative = there are videos of 6 subjects climbing the stairs in 3 different angles with 3 different devices (A-B-C), in cooperative way.<br>
9_Stairs - non cooperative =  there are videos of 6 subjects climbing the stairs in 3 different angles with 3 different devices (A-B-C), in non cooperative way.<br>
10_Outdoor path - cooperative = there are videos of 12 subjects walking outdoor along a path in cooperative way. (iPhone)<br>
11_Outdoor path - non cooperative = there are videos of 12 subjects walking outdoor along a path in non cooperative way. <br>(iPhone)
12_Derived file attached = there are two tipes of folders (1) HPE_x where x = 1, 2, 3, ... 62.  (2) Landmarks<br>

(1) HPE_x folder contains the 3D model of the subject x and 2223 images with the head rotated in pitch, yaw and roll<br>
(2) Landmarks folder contains Body landmarks and face landmarks of 62 subjects in the videos 1, 2, 3, 4, 5, 6.<br>

# 1) Extracting the Frames from the Videos

Starting with this cell, I will always put one explanatory markdown cell on top of each code block, in addition to the comments in each cell. First I install all the necessary dependencies and display the current working path of the notebook.
The first line is to display matplotlib plots in line with the cells.

In [1]:
%matplotlib inline
import pandas as pd                                   # for working with large dataframes
import numpy as np                                    # for working with efficient numpy arrays
import os                                             # for operating system
import shutil                                         # for more utilities
import csv                                            # for reading excel tables and similar files
import matplotlib.pyplot as plt                       # for plotting the images
import cv2                                            # for capturing videos
import math                                           # for mathematical operations
from keras.preprocessing import image                 # for preprocessing the images
from keras.utils import np_utils                      # some utilities
from skimage.transform import resize                  # for resizing images
from scipy import ndimage                             # for rotating images
from config_reader import config_reader               # for 2D pose estimation
from model.cmu_model import get_testing_model         # for loading the affinity model
import time                                           # capturing the time needed for some operations
from tqdm import tqdm_notebook as tqdm                # having progress bars for loops
from sklearn.model_selection import train_test_split  # splitting training and test data
from sklearn.preprocessing import OneHotEncoder       # Encoding labels to vectors
from keras.utils import to_categorical                # Encoding Labels to numbers
from keras.models import Sequential                   # Building network architecture
from keras.layers import Dense, BatchNormalization, LSTM, Conv2D, Conv3D, Flatten    # Neural Network Layers
from sklearn.utils import resample
from sklearn.preprocessing import LabelEncoder
from sklearn.metrics import accuracy_score
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()



wpath = os.getcwd()
print ("The current working directory is %s" % wpath)

Using TensorFlow backend.


The current working directory is C:\Users\marvi\Desktop\UNISA\CV Project


First I declare a function to extact the images. Because the walking of the subjects mostly appears in the middle of each video, I start extracting frames specifically after a third of the video is over and extract 32 consecutive frames.

In [3]:
#function to extract frames from the videos
def vid2image(filename, saveTo):
    
    count = 0
    frameList = []

    cap = cv2.VideoCapture(filename)                    # capturing the video from the given path
    frameRate = cap.get(5)                              # frame rate
    length = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))     #get the total frame count

    if(frameRate == 0):
        print("File could not be loaded")               #in case the file is corrupted

    while(cap.isOpened()):
        frameId = cap.get(1)                            #current frame number
        ret, frame = cap.read()
        if (ret != True):
            break
        
        if(frameId in range(int(round(length/3)),int(round(length/3)+32))): #Extracting 32 frames from the middle of the video
            filename2 = str(filename[-9:-4]) + "_frame%%%d.jpg" % count;count+=1
            cv2.imwrite(os.path.join(saveTo , filename2), frame)
                
                
    cap.release()
    if(count==0):
        print("No Frames Found")

I only used the folders 1 to 6 and placed their videos in a shared folder from where I import all videos. The others had missing subjects and sometimes people in the background. After extracting the images once, I commented the function out, so I dont accidentally start everything again.

In [4]:
path = './data/13_all_62_subjects/'
path_out = './extracted_frames/'

i = 0
fail = 0
X_data = []

for filename in tqdm(os.listdir(path)):        

    if('.mp4' in filename):
        
        # extracing the frame samples
        i = i +1
        label = filename[0:5]
        X_data.append([filename[0],filename[2:5]])
        #vid2image(path+filename, path_out) # remove the first # to load all videos again, this may take an hour
        

print(str(i) + " Videos found")
print(str(fail) + " Videos failed to load")

HBox(children=(IntProgress(value=0, max=372), HTML(value='')))


372 Videos found
0 Videos failed to load


In the next step I resized all the extracted frames from 1920x1080 to 192x108, so the dimensions for the 2D Pose Estimation are greatly reduced but you can still see all necessary details in the images in order to identify body parts. Because a lot of the fotos were shot in vertical mode, I also rotate them into the correct position. While testing I noticed a great drop in performance of the 2d pose estimation algorithm, when the frames are flipped in a wrong direction. I assume its not really trained that way, so its not really robust against rotation. Usually the Pose Estimator missed around 40% of coordinates of almost all joints, when the image wasn't in correct rotation.

In [5]:
#resizing the frames
n=0
path_frames = './extracted_frames/'
outpath = './lowres/'

for filename in tqdm(os.listdir(path_frames)):
    
    n=n+1

    if('frameX' in filename):                                                   #remove X to resize all frames in the folder
        img = cv2.imread(path_frames+filename)
        res = cv2.resize(img, dsize=(192, 108), interpolation=cv2.INTER_CUBIC)
        res = np.rot90(res,k=3)                                                 #rotating images into correct position
        cv2.imwrite(os.path.join(outpath, filename), res)
    
        
print(str(n) + " frames found")

HBox(children=(IntProgress(value=0, max=32), HTML(value='')))


32 frames found


# 2) Performing 2D Pose Estimation and storing the Joint Positions
I downloaded the weights of the 2D Pose Estimation Paper and here I load them into the model

In [None]:
#loading the weighted 2d pose estimation model
afmodel = get_testing_model()
afmodel.load_weights('model/keras/model.h5')

# load config
params, model_params = config_reader()

In order to perform 2D Pose Estimation multiple times, I put the model into a function, where you can also still comment out the draw operation when you don't want to save every skeleton with an image but just want to get the joint coordinates

In [6]:
peaks = []

#function to perform 2d pose estimation on images
def affinity(image_path, outpath):
    
        """the affinity method takes an input image and performs 2D Pose Estimation on it.
        It returns an array with all peaks where the coordinates of the found joints are stored. 
        If needed, it can also create a new image and mark all the found joint in it with points, 
        connecting joints that are neighbours. """
    
        output = outpath
        tic = time.time()
        
        input_image = cv2.imread(image_path)  # B,G,R order
        
        #extracting the affinity field heatmap peaks from the images
        all_peaks, subset, candidate = extract_parts(input_image, params, afmodel, model_params)
        canvas = draw(input_image, all_peaks, subset, candidate)
        peaks.append(np.array(all_peaks))

        toc = time.time()
        print('processing time was %.5f' % (toc - tic))

        cv2.imwrite(output, canvas)
        cv2.destroyAllWindows()

While performing 2D Pose Estimation on every frame, each time the joint coordinates get written into a text file, while the images with the skeleton get saved in a separate folder. In each textfile, I removed the braces from the strings and replaced empty coordinate lists with the numpy nan, for managing missing data easier, later in the pandas dataframe

In [7]:
path_frames = './lowres/'
outpath = './affinity/'

joint_data = []
m=0

for file in tqdm(sorted(os.listdir(path_frames))):
    m+=1

    if('frameX' in file):#remove X to cycle through all the lowres images to perform pose estimation
        
        peaks = []   
        
        #perform 2D Pose Estimation
        affinity(path_frames+file, outpath+file)
        
        #save heatmap peaks of each joint in a text file
        jdf = "./joint_data/"+ file[:-4] +".txt"
        f = open(jdf,"w+")
        
        #removing braces, writing one line per joint , saving only XY coordinates
        for frame in peaks:
            for joint in frame:
                f.write("\n")
                position = str(joint)
                position = position.replace("[]","NaN    ")
                position = position.replace("[","").replace("]","").replace("(","").replace(")","").replace(",","")
                position = position[:-3]
                pos_rounded = position.split()
                for pos in pos_rounded:
                    pos = (pos[:7]) if len(pos) > 7 else pos
                    f.write(pos+" ")

                
        f.close()

    joint_data.append(peaks)    
print(str(m) + " frames found")

HBox(children=(IntProgress(value=0, max=11744), HTML(value='')))


11744 frames found


In the second notebook I'm starting to preprocess the joint data.