# Pose Detection with OpenPose

This notebook uses an open source project [CMU-Perceptual-Computing-Lab/openpose](https://github.com/CMU-Perceptual-Computing-Lab/openpose.git) to detect/track multi person poses on a given youtube video.

For other deep-learning Colab notebooks, visit [tugstugi/dl-colab-notebooks](https://github.com/tugstugi/dl-colab-notebooks).


## Install OpenPose

In [0]:
!rm -r openpose
import os
from os.path import exists, join, basename, splitext

git_repo_url = 'https://github.com/CMU-Perceptual-Computing-Lab/openpose.git'
project_name = splitext(basename(git_repo_url))[0]
if not exists(project_name):
  # see: https://github.com/CMU-Perceptual-Computing-Lab/openpose/issues/949
  # install new CMake becaue of CUDA10
  !wget -q https://cmake.org/files/v3.13/cmake-3.13.0-Linux-x86_64.tar.gz
  !tar xfz cmake-3.13.0-Linux-x86_64.tar.gz --strip-components=1 -C /usr/local
  # clone openpose
  !git clone -q --depth 1 $git_repo_url
  !sed -i 's/execute_process(COMMAND git checkout master WORKING_DIRECTORY ${CMAKE_SOURCE_DIR}\/3rdparty\/caffe)/execute_process(COMMAND git checkout f019d0dfe86f49d1140961f8c7dec22130c83154 WORKING_DIRECTORY ${CMAKE_SOURCE_DIR}\/3rdparty\/caffe)/g' openpose/CMakeLists.txt
  # install system dependencies
  !apt-get -qq install -y libatlas-base-dev libprotobuf-dev libleveldb-dev libsnappy-dev libhdf5-serial-dev protobuf-compiler libgflags-dev libgoogle-glog-dev liblmdb-dev opencl-headers ocl-icd-opencl-dev libviennacl-dev
  # install python dependencies
  !pip install -q youtube-dl
  # build openpose
  !cd openpose && rm -rf build || true && mkdir build && cd build && cmake -DBUILD_PYTHON=ON .. && make -j`nproc` #enable python api for openpose with DBUILD_PYTHON flag
  
from IPython.display import YouTubeVideo

## Detect poses on a test video

We are going to detect poses on the following youtube video:

In [0]:
from IPython.display import YouTubeVideo
YOUTUBE_ID = 'RXABo9hm8B8'


YouTubeVideo(YOUTUBE_ID)

Download the above youtube video, cut the first 5 seconds and do the pose detection on that 5 seconds:

In [0]:
#Install youtube-dl upgraded
!wget http://yt-dl.org/downloads/latest/youtube-dl -O /usr/local/bin/youtube-dl
!chmod a+rx /usr/local/bin/youtube-dl

In [0]:

# download the youtube with the given ID
!youtube-dl -f 'bestvideo[ext=mp4]' --output "youtube.%(ext)s" https://www.youtube.com/watch?v=RXABo9hm8B8
# cut the first 5 seconds
!ffmpeg -y -loglevel info -i youtube.mp4 -t 5 video.mp4
# detect poses on the these 5 seconds
!rm openpose.avi
!cd openpose && ./build/examples/openpose/openpose.bin --video ../video.mp4 --write_json ./output/ --display 0  --write_video ../openpose.avi --part_candidates #enable body parts
# convert the result into MP4
!ffmpeg -y -loglevel info -i openpose.avi output.mp4

Finally, visualize the result:

In [0]:
def show_local_mp4_video(file_name, width=640, height=480):
  import io
  import base64
  from IPython.display import HTML
  video_encoded = base64.b64encode(io.open(file_name, 'rb').read())
  return HTML(data='''<video width="{0}" height="{1}" alt="test" controls>
                        <source src="data:video/mp4;base64,{2}" type="video/mp4" />
                      </video>'''.format(width, height, video_encoded.decode('ascii')))

show_local_mp4_video('Videos/pl_happy.mp4', width=960, height=720)

In [1]:
#Install Deepsort
import os
from os.path import exists, join, basename

project_name = "deep_sort_pytorch"
if not exists(project_name):
  # clone and install
  !git clone -q --recursive https://github.com/ZQPei/deep_sort_pytorch.git
  
import sys
sys.path.append(project_name)

import IPython
from IPython.display import clear_output

In [None]:
#Download pretrained weights
yolo_pretrained_weight_dir = join(project_name, 'detector/YOLOv3/weight/')
if not exists(join(yolo_pretrained_weight_dir, 'yolov3.weights')):
  !cd {yolo_pretrained_weight_dir} && wget -q https://pjreddie.com/media/files/yolov3.weights
    
deepsort_pretrained_weight_dir = join(project_name, 'deep_sort/deep/checkpoint')
if not exists(join(deepsort_pretrained_weight_dir, 'ckpt.t7')):
  file_id = '1_qwTWdzT9dWNudpusgKavj_4elGgbkUN'
  !cd {deepsort_pretrained_weight_dir} && curl -Lb ./cookie "https://drive.google.com/uc?export=download&id={file_id}" -o ckpt.t7

Download test video and show it.

In [0]:
VIDEO_URL = 'http://www.robots.ox.ac.uk/ActiveVision/Research/Projects/2009bbenfold_headpose/Datasets/TownCentreXVID.avi'
DURATION_S = 20  # process only the first 20 seconds



video_file_name = 'deep_video.mp4'
if not exists(video_file_name):
  !wget -q $VIDEO_URL
  dowloaded_file_name = basename(VIDEO_URL)
  # convert to MP4, because we can show only MP4 videos in the colab noteook
  !ffmpeg -y -loglevel info -t $DURATION_S -i $dowloaded_file_name $video_file_name
  
clear_output()
show_local_mp4_video('deep_video.mp4')

Use deep_sort_pytorch to track object in video

In [0]:
!cd {project_name} && python yolov3_deepsort.py --ignore_display ../deep_video.mp4 --save_path ../deep_output.avi

Visualize result

In [0]:
# first convert to mp4 to show in a Colab notebook
!ffmpeg -y -loglevel panic -i ./deep_output.avi ./pedestrian_deeppose.mp4
show_local_mp4_video('pedestrian_deeppose.mp4', width=960, height=720)

In [0]:
from google.colab import drive
drive.mount('/content/drive')

Go to this URL in a browser: https://accounts.google.com/o/oauth2/auth?client_id=947318989803-6bn6qk8qdgf4n4g3pfee6491hc0brc4i.apps.googleusercontent.com&redirect_uri=urn%3aietf%3awg%3aoauth%3a2.0%3aoob&response_type=code&scope=email%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdocs.test%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive.photos.readonly%20https%3a%2f%2fwww.googleapis.com%2fauth%2fpeopleapi.readonly

Enter your authorization code:
··········
Mounted at /content/drive


Read keypoints of body parts through open pose. (The keypoints are saved as JSON format in the 'output' folder)

> Indented block



In [0]:
import json
from google.colab.patches import cv2_imshow
vid_name = 'VID_RGB_070_1'
json_dir = os.getcwd() + f'/openpose_json/{vid_name}'
frame_dir = os.getcwd() + f"/frames/{vid_name}"
  # frame_path = frame_dir + '/output_frame1.jpg'
frame_numbers = [157, 158, 159, 160, 161]
KEYS_TO_COLLECT = ['0', '1','2','3','4','5','6','7','8','9','10','11','12','13','14']
for frame_no in frame_numbers:
  fn = f'{vid_name}_{frame_no:012d}_keypoints.json' 
  #load json data
  print("Reading ", fn)
  fp = json_dir + f'/{fn}'
  with open(fp, 'r') as json_file:
    loaded_json = json.load(json_file)

  #draw points based on saved coordiantes on the frame being read
  frame_path = frame_dir + f'/output_frame{frame_no:03d}.jpg'
  img = cv2.imread(frame_path)
  # cv2_imshow(img)
  for key in KEYS_TO_COLLECT:
    coords = loaded_json['part_candidates'][0][key]
    if coords:
      x,y = coords[0], coords[1]
    # img[x,y] = [0,0,255] #red color
    # cv2.circle(img, (int(x),int(y)), 2, (0, 0, 255), -1)

    #label 
    font                   = cv2.FONT_HERSHEY_SIMPLEX
    pos = (int(x),int(y))
    fontScale              = 1
    fontColor              = (255,255,255)
    lineType               = 2

    cv2.putText(img, key, 
        pos, 
        font, 
        fontScale,
        fontColor,
        int(lineType))
    # break #break first just to see first result. (testing)


  cv2.imwrite(f"results{frame_no:03d}.jpg", img)
  # frame_no += 1


# #To obtain body part mapping.
# poseModel = op.PoseModel.BODY_25
# print(op.getPoseBodyPartMapping(poseModel))
# print(op.getPoseNumberBodyParts(poseModel))
# print(op.getPosePartPairs(poseModel))
# print(op.getPoseMapIndex(poseModel))


Reading  VID_RGB_070_1_000000000157_keypoints.json
Reading  VID_RGB_070_1_000000000158_keypoints.json
Reading  VID_RGB_070_1_000000000159_keypoints.json
Reading  VID_RGB_070_1_000000000160_keypoints.json
Reading  VID_RGB_070_1_000000000161_keypoints.json


Remove frame images

In [0]:
!ls | grep -P "^output_frame\d*\.jpg" | xargs -d"\n" rm

run some commands

MAIN IDEA FOR THE WHOLE PROJECT.

We want to classify if a person is depressed based on their gait. Since gait can be represented by **joints location at each frame**, where **previous frame should affect the next frame**, **LSTM** can be used here.

This RNN layer should take in sequence of features representing the joints location at each time frame. Based on this info, it should learn if the subject is **walking slowly**? with **reduced stride length**? with **reduced arm swing**? with head hung low (**neck angle**)? and train the model with label: **depressed / normal**.

Thus. For now, I think the useful joints location that can be detected with OpenPose will be: #1 to #14
//     {1,  "Neck"},
//     {2,  "RShoulder"},
//     {3,  "RElbow"},
//     {4,  "RWrist"},
//     {5,  "LShoulder"},
//     {6,  "LElbow"},
//     {7,  "LWrist"},
//     {8,  "MidHip"},
//     {9,  "RHip"},
//     {10, "RKnee"},
//     {11, "RAnkle"},
//     {12, "LHip"},
//     {13, "LKnee"},
//     {14, "LAnkle"},

To simplify things, lets assume that video contains:
 - **1 subject only** 
 - **subject is not hurt** (Start considering other outliers like this case only after normal tests on normal subject works well)
 - **open pose detects all joints location correctly**
 - **subject is of age 20-30**

Steps

1. **Collect clear datasets**:

  i. Video of healthy people walking. ~30 Video *(just to test code for the first time. Will increase the number later on)*

  ii. Video of people with anxiety / depression walking. (If dataset is not enough, we will manually create ourselves) ~30 Video

2. Make sure openpose detects well on these datset.

3. **Prepare data to feed into network**:

  i. Using OpenPose, obtain joint location on each frame and store nicely in numpy array.

  ii. Normalize the joint coordinates.
    Idea now:
      To make data independent of 
        - camera distance from subject
        - height of each subject
      Using largest bouding box encasing the subject as reference, width(box_w) and height (box_h) is obtained. Using midhip joint location as centre point (centre_coord), each joint locataion input feature is represented as displacement vector from the centre_coord , divided by width (for x coordinate) and height (for y coordinate)

  iii. Prepare normalized data into x_train, y_train, x_test, y_test

      x shape: (number of data, timesteps, number of features).
        number of data = **60** videos of 3 seconds
        timesteps = (Assuming 20fps, 20x3= **60** sequence in total)
        number of features = (14 joints x 2 coordinates) = 28
        
      y shape: (number of data, 1). (binary 0 or 1)
4. Create Model. (hyperparameters can be adjusted and tested)
  Rough idea for model is: Sequential model containing 2 LSTM layers, 1 Dense layer, and 1 Dense output layer. 

5. Fit Model and test for accuracy.

After these basic steps are completed, then only we start worrying about other stuff in a more complicated scenario.


**Updates on Training model with JSON data of joint position**

Problems faced:
  - Dataset is unbalanced. Thus, model only predicts one class, and can easily achieve 80% accuracy with it

Solution:
  - Introduced class weight for model so that the prediction for minority class are weighed more heavily.
  - Flipped video of minority class to increase number of dataset

Problems faced:
  - Due to video of flipped video being similar to the original video, some videos that are used in testing are actually just the flipped version of the original video, which results in model overfitting when epoch is too high, as it is only able to learn that the video is flipped, but not the true features that should be learnt for classifying depression

Solution:
  - Manually seperated training and testing to ensure that the dataset are unrelated to each other.

Resulting model:
  - Learning rate has been tweaked between 1e-3 to 1e-5, with decay tweaked between 1e-4 to 1e-6. 
  
  Performance jumps around and doesn't really converge when learning rate is 1e-3. When it is reduced, the loss gradually reduces until it reaches a point where it starts overfit. However, even when the loss is minimum, the accuracy is not high, the model does not really learn anything useful. (Suspect more data is needed). There is however a model that reaches 70% accuracy when learning rate of 1e-3 is used (Suspect it is just luck as the performance didn't really converge)
  - Class weight has also been tweaked

  When similar class weight is used, model tends to just classify every video as healthy. This is because healthy data is the majority. Thus class weight is adjusted to {1: 0.4, 0: 0.6} for now, which results in model that will still predict 0 correctly in some cases)
  - Different layer has been added / removed to test on performance.

  Adding more layers does not seem to help the model in this case. (Suspect dataset is still not enough)

**What to work on now** 
As there is a roadblock in using json data for training, I am going to try to use CNN model to train on image data instead for now.



# New Section

# New Section

Unzip ewalk dataset (30 FPS
) 

In [0]:
#unzip Datasets
!unzip Videos.zip


Archive:  Videos.zip
replace Videos/VID_RGB_001_0.mp4? [y]es, [n]o, [A]ll, [N]one, [r]ename: n
replace Videos/VID_RGB_011_0.mp4? [y]es, [n]o, [A]ll, [N]one, [r]ename: 

'''''''''''''''''' **Method 1** Perform openpose on ewalk videos (Collect data for joints location in json format) ''''''''''''''''''''''

In [0]:

BASE_DIR = os.getcwd()
VIDEOS_DIR = BASE_DIR  + '/Videos'

OPENPOSE_JSON_DIR = BASE_DIR + '/openpose_json'
OPENPOSE_VID_DIR = BASE_DIR + '/openpose_vid'
FINAL_VID_DIR = BASE_DIR + '/final_vid'
BBOX_DIR = BASE_DIR + f'/{project_name}' + '/bbox_output'

if not os.path.exists(OPENPOSE_JSON_DIR):
  !mkdir openpose_json
if not os.path.exists(OPENPOSE_VID_DIR):
  !mkdir openpose_vid
if not os.path.exists(FINAL_VID_DIR):
  !mkdir final_vid
if not os.path.exists(BBOX_DIR):
  !cd $project_name && mkdir bbox_output

for fn in sorted(os.listdir(VIDEOS_DIR)):
  #Perform openpose and store results in openpose output directories
  json_output_fn = OPENPOSE_JSON_DIR + f'/{fn[:-4]}'
  !cd openpose && ./build/examples/openpose/openpose.bin --video {VIDEOS_DIR}/{fn} --write_json $json_output_fn --display 0  --write_video {OPENPOSE_VID_DIR}/openpose.avi --part_candidates #enable body parts
  
  #perform deepsort
  !cd {project_name} && python yolov3_deepsort.py --ignore_display {OPENPOSE_VID_DIR}/openpose.avi --save_path {FINAL_VID_DIR}/openpose.avi && mv bounding_box.pkl {fn[:-4]}_bbox.pkl && mv {fn[:-4]}_bbox.pkl ./bbox_output

  # convert the result into MP4
  !cd $FINAL_VID_DIR && ffmpeg -y -loglevel info -i openpose.avi $fn


'''''''''''''''''''  **Method 2** Perform deepsort (Collect data for CNN) '''''''''''''''''''''


In [0]:
#Install Deepsort then run this
import os, sys
from os.path import exists, join, basename

project_name = "deep_sort_pytorch"
BASE_DIR = os.getcwd()
VIDEOS_DIR = BASE_DIR  + '/Videos'

FINAL_VID_DIR = BASE_DIR + '/final_vid'
BBOX_DIR = BASE_DIR + f'/{project_name}' + '/bbox_output'


if not os.path.exists(FINAL_VID_DIR):
  !mkdir final_vid
if not os.path.exists(BBOX_DIR):
  !cd $project_name && mkdir bbox_output

for fn in sorted(os.listdir(VIDEOS_DIR)):
  #perform deepsort
  !cd {project_name} && python yolov3_deepsort.py --ignore_display {VIDEOS_DIR}/{fn} --save_path {FINAL_VID_DIR}/openpose.avi && mv bounding_box.pkl {fn[:-4]}_bbox.pkl && mv {fn[:-4]}_bbox.pkl ./bbox_output

  # convert the result into MP4
  !cd $FINAL_VID_DIR && ffmpeg -y -loglevel info -i openpose.avi $fn

  


Get every n-th frame of the video and save it to the specified directory

In [0]:
import cv2
#input file must be in default current directory
def extract_frames(input_fp, input_fn, output_folder):
  cap = cv2.VideoCapture(input_fp)
  count = 0 #frame number starting from 1
  save_dir = os.getcwd() + output_folder
  if not os.path.exists(save_dir):
    !mkdir frames

  while cap.isOpened():
      ret, frame = cap.read()

      if ret:
          #Set grayscale colorspace for the frame. 
          # gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
          # cv2.imwrite(save_dir + '/output_frame{:d}.jpg'.format(count+1), gray)
          output_dir = save_dir + f'/{input_fn[:-4]}'
          if not os.path.exists(output_dir):
            !mkdir $output_dir
          cv2.imwrite(output_dir + '/output_frame{:03d}.jpg'.format(count+1), frame)
          count += 1 # advancing frame by 1
          cap.set(1, count)
      else:
          cap.release()
          break

# extract_frames('VID_RGB_089_0.mp4','VID_RGB_089_0.mp4', '/frames')

Extract frames of videos


In [0]:
import os
VIDEOS_DIR = os.getcwd() + '/Videos'
FRAMES_FOLDER = '/frames'
for fn in sorted(os.listdir(VIDEOS_DIR)):
  if fn[-3:] == "mp4":
    print('Extracting: ',fn)
    extract_frames(f'{VIDEOS_DIR}/{fn}', fn, FRAMES_FOLDER)

Extracting:  VID_RGB_094_1.mp4
Extracting:  VID_RGB_095_0.mp4


Extract bounding box of input video file located in **/frames** and save each extracted box to a jpg file. Run this section in the order: run deepsort -> extract frame -> this section


In [0]:
import pickle


#extract bounding box of input video file located in /frames 
#param input_fn: Input video filename to extract bounding box from
#param pickle_path: saved bbox.pkl file containing bounding box info of that video
#param input_frames_folder: the folder containing extracted frames of each video file. 
def extract_bbox(input_fn, pickle_path, input_frames_folder, mobile=False):
  # bbox_fn = os.getcwd()+'/deep_sort_pytorch/bounding_box.pkl'
  with open(pickle_path, 'rb') as infile:
    saved_bbox_info = pickle.load(infile)

  BASE_FRAME_DIR = os.getcwd() + input_frames_folder #base directory storing information of each frames
  base_dir = BASE_FRAME_DIR + f'/{input_fn}'

  for framenum_and_box in saved_bbox_info:
    frame_num, identities, bboxs = framenum_and_box

    # #create directory to store information for current frame
    # frame_dir = base_dir + '/frame_{:03d}'.format(frame_num)
    # if not os.path.exists(frame_dir):
    #   !mkdir $frame_dir

    #crop out image bounded by box
    frame_img_fp = base_dir + "/output_frame{:03d}.jpg".format(frame_num)
    if not os.path.exists(frame_img_fp):
      print("File: ", frame_img_fp, " Not Found. Skipping...")
      continue
    frame_img = cv2.imread(frame_img_fp)

    for i, bbox in enumerate(bboxs):
      detected_id = identities[i]

      #create directory for specific id to store cropped frames of this detected id
      id_dir = base_dir + f'/id_{detected_id:02d}'
      if not os.path.exists(id_dir):
        !cd $base_dir && mkdir $id_dir
      
      x1,y1,x2,y2 = bbox
      crop_id_img = frame_img[y1:y2, x1:x2]
      save_fn = id_dir + f"/frame{frame_num:03d}_id{detected_id:02d}.jpg"

      #rotate img if is mobile input
      if mobile: crop_id_img = cv2.rotate(crop_id_img, cv2.ROTATE_90_CLOCKWISE)

      cv2.imwrite(save_fn, crop_id_img)
      cv2.waitKey(0)


  
  

# extract_bbox('pedestrian_deeppose.mp4',os.getcwd()+'/deep_sort_pytorch/bounding_box.pkl')



Extract bounding box of each person detected in each frame

In [0]:
BBOX_DIR = os.getcwd() + f'/{project_name}/bbox_output'
for fn in sorted(os.listdir(BBOX_DIR)):
  #fn is in format {filename}_bbox.pkl

  if fn[-4:] == ".pkl":
    print("Extracting ", fn)
    extract_bbox(fn[:-9], f'{BBOX_DIR}/{fn}', FRAMES_FOLDER, True)


Extracting  VID_RGB_094_1_bbox.pkl
Extracting  VID_RGB_095_0_bbox.pkl
Extracting  pl_happy_bbox.pkl
File:  /content/frames/pl_happy/output_frame010.jpg  Not Found. Skipping...
File:  /content/frames/pl_happy/output_frame011.jpg  Not Found. Skipping...
File:  /content/frames/pl_happy/output_frame012.jpg  Not Found. Skipping...
File:  /content/frames/pl_happy/output_frame013.jpg  Not Found. Skipping...
File:  /content/frames/pl_happy/output_frame014.jpg  Not Found. Skipping...
File:  /content/frames/pl_happy/output_frame015.jpg  Not Found. Skipping...
File:  /content/frames/pl_happy/output_frame016.jpg  Not Found. Skipping...
File:  /content/frames/pl_happy/output_frame017.jpg  Not Found. Skipping...
File:  /content/frames/pl_happy/output_frame018.jpg  Not Found. Skipping...
File:  /content/frames/pl_happy/output_frame019.jpg  Not Found. Skipping...
File:  /content/frames/pl_happy/output_frame020.jpg  Not Found. Skipping...
File:  /content/frames/pl_happy/output_frame021.jpg  Not Found. 

prepare_image_data.py

In [0]:
import os, re, pickle, math, copy
import cv2
import numpy as np
import matplotlib.pyplot as plt

SEQ_LEN = 30
MIN_HEIGHT, MIN_WIDTH = 700, 160

#load data as numpy array. Memory problem here
def load_data(frame_dir, debug=False):
    X = []
    y = []
    min_h, min_w = math.inf, math.inf
    for fn in sorted(os.listdir(frame_dir)):
        if fn[:3] != "VID":
            continue

        vid_dir = os.path.join(frame_dir, fn)
        id_dirs = [f for f in os.listdir(vid_dir) if re.match(r'id_\d+', f)]
        for id_dirname in id_dirs:
            id_dir = os.path.join(vid_dir, id_dirname)
            seq, seq_min_h, seq_min_w = load_seq(id_dir, SEQ_LEN, debug)
            if len(seq) < SEQ_LEN:
                print(f'{id_dir} has not enough number of frames detected')
            if seq and len(seq) == SEQ_LEN:
                X.append(seq)
                y.append(get_label(fn))
                min_h = min(min_h, seq_min_h)
                min_w = min(min_w, seq_min_w)

    return X, y, min_h, min_w

#clean data and saves to base_train_dir. Use generator to laod data from that folder instead of using numpy array
def clean_data(frame_dir, base_train_dir, debug=False):
    min_h, min_w = math.inf, math.inf
    for fn in sorted(os.listdir(frame_dir)):
        if fn[:3] != "VID":
            continue

        vid_dir = os.path.join(frame_dir, fn)
        id_dirs = [f for f in os.listdir(vid_dir) if re.match(r'id_\d+', f)]
        for id_dirname in id_dirs:
            id_dir = os.path.join(vid_dir, id_dirname)
            seq, seq_min_h, seq_min_w = load_seq(id_dir, SEQ_LEN, debug)
            if len(seq) < SEQ_LEN:
                print(f'{id_dir} has not enough number of frames detected')
            if seq and len(seq) == SEQ_LEN:
                # X.append(seq)
                # y.append(get_label(fn))
                seq_fn = f'{id_dirname}_{fn}'
                save_seq(seq, seq_fn, base_train_dir)
                min_h = min(min_h, seq_min_h)
                min_w = min(min_w, seq_min_w)
    print("Cleaned Succesfully. min_h:{}, min_w:{}".format(min_h, min_w))

def save_seq(seq, seq_fn, base_save_dir):
    seq_folder_path = os.path.join(base_save_dir, seq_fn)
    print(f"Saving {seq_fn} to {seq_folder_path}")
    if not os.path.exists(seq_folder_path):
        os.makedirs(seq_folder_path)

    for i,img in enumerate(seq):
        img_name = f'{i:03d}.jpg'
        img_path = os.path.join(seq_folder_path, img_name)
        cv2.imwrite(img_path, img)

#assume img_seq_dir only contains images file ordered with filename
def load_seq(img_seq_dir, seq_len, debug=False):
    min_h, min_w = math.inf, math.inf
    seq = []
    prev_no = None
    try:
        i = 0
        for img_name in sorted(os.listdir(img_seq_dir)):
            #stop when frames are enough
            if i == SEQ_LEN:
                break
            #only checks for frame images
            if not re.match(r'frame.*\.jpg', img_name):
                continue
            img_path = os.path.join(img_seq_dir, img_name)
            img_array = cv2.imread(img_path, cv2.IMREAD_GRAYSCALE)

            #make sure image detected is large enough
            if img_array.shape[0] < MIN_HEIGHT or img_array.shape[1] < MIN_WIDTH:
                if debug: print(f'{img_path} contains small image. Please check.')
                # return [], -1, -1
                continue
            #make sure the sequence is not too broken
            cur_no = get_img_frameno(img_name)
            if prev_no and (cur_no - prev_no > 10):
                if debug: print(f'{img_path} contains broken sequence')
                i = 0
                seq = []
            #succesfully added to sequence
            seq.append(img_array)
            i += 1
            prev_no = cur_no
            min_h = min(min_h, img_array.shape[0])
            min_w = min(min_w, img_array.shape[1])
    except NotADirectoryError:
        print(f"Frames for {img_seq_dir} was not extracted correctly")

    return seq, min_h, min_w


#return integer. 1: healthy , 0: depressed
def get_label(vid_name):
    s = r"VID_RGB_\d*_(\d)"
    res = re.match(s, vid_name)
    return int(res.group(1))

def get_img_frameno(img_name):
    s = r"frame(\d+)_id\d+\.jpg"
    res = re.match(s, img_name)
    if not res:
        print(img_name)
    return int(res.group(1))

def normalize_img_data(X, h, w):
    for seq in X:
        for i in range(len(seq)):
            seq[i] = cv2.resize(seq[i], (w, h))
            # show image
            # plt.imshow(seq[i], cmap="gray")
            # plt.show()
            seq[i] = seq[i] / 255
            seq[i] = np.atleast_3d(seq[i])
    return X


# if __name__ == "__main__":
#     frame_dir = '/home/jia/jiawen/uni_materials/Y3S2/FYP/frames'
#     # base_train_dir = '/home/jia/jiawen/uni_materials/Y3S2/FYP/train'
#     X,y,min_h,min_w = load_data(frame_dir)
#     print(len(X), len(y))
#     print(min_h, min_w)
#     # X = normalize_img_data(X, 50, 30)

#     # clean_data(frame_dir, base_train_dir)

#     # print(len(X), len(y), len(X[0]))
#     # min_h_img, min_h, min_w_img, min_w= get_min_image_size(frame_dir, SEQ_LEN)
#     # print(min_h_img, min_h, min_w_img, min_w)





fyp_train_img_model.py

In [0]:
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.layers import ConvLSTM2D, AveragePooling3D, Reshape, BatchNormalization, MaxPooling3D, Activation, Flatten, Dropout
from sklearn.model_selection import train_test_split
from tensorflow.keras import backend as K
import numpy as np
import pickle, os
import random
K.set_image_data_format('channels_last')
print("Clearing session")
tf.keras.backend.clear_session()

frame_dir_train = '/home/jia/jiawen/uni_materials/Y3S2/FYP/frames/train'
frame_dir_test = '/home/jia/jiawen/uni_materials/Y3S2/FYP/frames/test'

img_data_pkl = 'img_data'
img_data_pkl_train = f'{img_data_pkl}_train.pkl'
img_data_pkl_test = f'{img_data_pkl}_test.pkl'
#load data
drive_dir = os.getcwd()+'/drive/My Drive/fyp'
pickle_path_train = os.path.join(drive_dir, img_data_pkl_train)
pickle_path_test = os.path.join(drive_dir, img_data_pkl_test)

if os.path.exists(pickle_path_train) and os.path.exists(pickle_path_test):
    print("Loading data from ", img_data_pkl)
    with open(pickle_path_train, 'rb') as f:
        loaded_data = pickle.load(f)
    [x_train,y_train,h,w] = loaded_data
    with open(pickle_path_test, 'rb') as f:
        loaded_data = pickle.load(f)
    [x_test,y_test,h2,w2] = loaded_data
else:
    print("Preparing data...")
    x_train, y_train, h, w = load_data(frame_dir_train)
    x_test, y_test, h2, w2 = load_data(frame_dir_test)

    print("Saving data to ", img_data_pkl)
    with open(img_data_pkl_train, 'wb') as f:
        pickle.dump([x_train,y_train,h,w], f)
    with open(img_data_pkl_test, 'wb') as f:
        pickle.dump([x_test,y_test,h2,w2], f)
#Normalize data
# IMG_WIDTH = 160
# IMG_HEIGHT = 700
IMG_WIDTH = 100
IMG_HEIGHT = 640
SEQ_SIZE = 30
x_train = normalize_img_data(x_train, IMG_HEIGHT, IMG_WIDTH)
x_train = np.array(x_train)
y_train = np.array(y_train)
print("Shape: ", x_train.shape)

x_test = normalize_img_data(x_test, IMG_HEIGHT, IMG_WIDTH)
x_test = np.array(x_test)
y_test = np.array(y_test)

# print("Splitting data...")
# x_train, x_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
class_weight={1:0.35, 0:0.65}
print("Creating model")
FILTER_SIZE = 32
KERNEL_SIZE=3
model = Sequential()
model.add(ConvLSTM2D(filters = FILTER_SIZE, kernel_size = (KERNEL_SIZE, KERNEL_SIZE), input_shape = (SEQ_SIZE, IMG_HEIGHT, IMG_WIDTH, 1), return_sequences=True)) #shape: num_seq, image_size, channel_num (1)
model.add(Activation('relu'))
model.add(BatchNormalization())
model.add(MaxPooling3D(pool_size=(2,2,2)))


model.add(Flatten())
model.add(Dropout(0.6))
# model.add(Dense(32, activation='relu'))
# model.add(Dense(128, activation='relu'))

# model.add(Dense(units = 1, # num of your vedio categories
#                 kernel_initializer = 'Orthogonal', activation = 'sigmoid'))
model.add(Dense(1, activation='sigmoid'))

print("Compiling model")
opt = tf.keras.optimizers.Adam(lr=1e-6) 

model.compile(optimizer = opt, loss = 'binary_crossentropy', metrics = ['accuracy'])

# then train it
print("Fitting model")
model.fit(x_train, # shape (300, 200, 256, 256, 3)
          y_train,
          batch_size = 1,
          epochs = 7,
          validation_data=(x_test, y_test),
          class_weight=class_weight)

# print("Done")
# model_name = f"modelCNN-{FILTER_SIZE}-{KERNEL_SIZE}-{str(round(random.random(),3))[2:]}"
# model.save(model_name)
# print("Saved as ", model_name)

print("Predicted value")
for i in range(len(x_test)):
  val = model.predict(np.array([x_test[i,]]))[0][0]
  print(f"Predicted prob: {val}.\tPredicted val: {0 if val <= 0.5 else 1}\tTrue val: {y_test[i]}")
  # print(f"Predicted prob: {val}.\tTrue val: {y_test[i]}")



Clearing session
Loading data from  img_data
Shape:  (58, 30, 640, 100, 1)
Creating model
Compiling model
Fitting model
Epoch 1/7
Epoch 2/7
Epoch 3/7
Epoch 4/7
Epoch 5/7
Epoch 6/7
Epoch 7/7
Predicted value
Predicted prob: 0.2867431938648224.	Predicted val: 0	True val: 0
Predicted prob: 0.6170057654380798.	Predicted val: 1	True val: 0
Predicted prob: 0.8388274312019348.	Predicted val: 1	True val: 0
Predicted prob: 0.6757365465164185.	Predicted val: 1	True val: 1
Predicted prob: 0.701121985912323.	Predicted val: 1	True val: 1
Predicted prob: 0.6770846843719482.	Predicted val: 1	True val: 1
Predicted prob: 0.6776068210601807.	Predicted val: 1	True val: 1
Predicted prob: 0.703916609287262.	Predicted val: 1	True val: 1
Predicted prob: 0.6919534206390381.	Predicted val: 1	True val: 1
Predicted prob: 0.6399635672569275.	Predicted val: 1	True val: 1
Predicted prob: 0.8034275770187378.	Predicted val: 1	True val: 1
Predicted prob: 0.7676349878311157.	Predicted val: 1	True val: 1
Predicted prob: 

Create confusion matrix

In [0]:
from sklearn.metrics import classification_report, confusion_matrix

print("Predicted value")
y_pred = []
for i in range(len(x_test)):
  val = model.predict(np.array([x_test[i,]]))[0][0]
  y_pred.append(0 if val <= 0.5 else 1)
  # print(f"Predicted prob: {val}.\tTrue val: {y_test[i]}")

#Confution Matrix and Classification Report
print('Confusion Matrix')
print(confusion_matrix(y_test, y_pred))
print('Classification Report')
target_names = ['Sad', 'Happy']
print(classification_report(y_test, y_pred, target_names=target_names))

Predicted value
Confusion Matrix
[[ 2  6]
 [ 0 15]]
Classification Report
              precision    recall  f1-score   support

         Sad       1.00      0.25      0.40         8
       Happy       0.71      1.00      0.83        15

    accuracy                           0.74        23
   macro avg       0.86      0.62      0.62        23
weighted avg       0.81      0.74      0.68        23



Predicting own input video

In [0]:
MIN_HEIGHT, MIN_WIDTH = 300,50
my_x, my_y, h, w = load_data(os.getcwd()+'/frames', True)
for i in range(len(my_x)):
  val = model.predict(np.array([x_test[i,]]))[0][0]
  y_pred.append(0 if val <= 0.45 else 1)
  print(f"Predicted prob: {val}")

/content/frames/VID_RGB_094_1/id_02/frame010_id02.jpg contains small image. Please check.
/content/frames/VID_RGB_094_1/id_02/frame011_id02.jpg contains small image. Please check.
/content/frames/VID_RGB_094_1/id_02/frame012_id02.jpg contains small image. Please check.
/content/frames/VID_RGB_094_1/id_02/frame013_id02.jpg contains small image. Please check.
/content/frames/VID_RGB_094_1/id_02/frame014_id02.jpg contains small image. Please check.
/content/frames/VID_RGB_094_1/id_02/frame015_id02.jpg contains small image. Please check.
/content/frames/VID_RGB_094_1/id_02/frame016_id02.jpg contains small image. Please check.
/content/frames/VID_RGB_094_1/id_02/frame017_id02.jpg contains small image. Please check.
/content/frames/VID_RGB_094_1/id_02/frame018_id02.jpg contains small image. Please check.
/content/frames/VID_RGB_094_1/id_02/frame019_id02.jpg contains small image. Please check.
/content/frames/VID_RGB_094_1/id_02/frame020_id02.jpg contains small image. Please check.
/content/f

In [0]:
# print(model.summary())
print(x_test.shape)
print(x_train.shape)

(58, 30, 640, 100, 1)
(58, 30, 640, 100, 1)


'''''''''''''''   **End of Method 2**   ''''''''''''''''

In [0]:
# show_local_mp4_video(FINAL_VID_DIR +'/VID_RGB_075_1.mp4')
# !rm -r framespedestrian_deeppose/
# !zip -r results.zip final_vid openpose_json deep_sort_pytorch/bbox_output/
!zip -r frames2.zip frames


Creating and training model (CNN). Upload file: 'fyp_train_img_model.py', 'fyp_prepare_img_data.py', 'img_data.pkl' . Then run 'fyp_train_img_model.py'

In [0]:
# !cd drive/My\ Drive/fyp && unzip fyp_CNN_train2_files.zip
!cd drive/My\ Drive/fyp && python3 fyp_train_img_model.py

2020-04-28 10:47:20.702709: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
Clearing session
Loading data from  img_data
Shape:  (58, 30, 640, 100, 1)
Creating model
2020-04-28 10:47:29.280244: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2020-04-28 10:47:29.283835: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-04-28 10:47:29.284359: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 0 with properties: 
pciBusID: 0000:00:04.0 name: Tesla K80 computeCapability: 3.7
coreClock: 0.8235GHz coreCount: 13 deviceMemorySize: 11.17GiB deviceMemoryBandwidth: 223.96GiB/s
2020-04-28 10:47:29.284413: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic librar

Flip videos labelled as 0 to increase number of videos.

In [0]:
# BASE_DIR = os.getcwd()
# VIDEOS_DIR = BASE_DIR  + '/Sad'

# # ffmpeg -i input.avi -vf scale=320:240 output.avi
# # !cd Sad && ffmpeg -i VID_RGB_001_0.mp4 -vf scale=320:-1 VID_RGB__0.mp4

# # !find -name '*.mp4' | gawk 'BEGIN{ a=1 }{ printf "ffmpeg %s -vf scale=320:-1 $(echo $0| cut -d'_' -f 2) %04d.jpg\n", $0, a++ }'|bash # run that command
# !cd Sad && ffmpeg -i VID_RGB_001_0.mp4 -vf hflip VID_RGB_084_0.mp4
# !cd Sad && ffmpeg -i VID_RGB_011_0.mp4 -vf hflip VID_RGB_085_0.mp4
# !cd Sad && ffmpeg -i VID_RGB_015_0.mp4 -vf hflip VID_RGB_086_0.mp4
# !cd Sad && ffmpeg -i VID_RGB_023_0.mp4 -vf hflip VID_RGB_087_0.mp4
# !cd Sad && ffmpeg -i VID_RGB_031_0.mp4 -vf hflip VID_RGB_088_0.mp4
# !cd Sad && ffmpeg -i VID_RGB_035_0.mp4 -vf hflip VID_RGB_089_0.mp4
# !cd Sad && ffmpeg -i VID_RGB_037_0.mp4 -vf hflip VID_RGB_090_0.mp4
# !cd Sad && ffmpeg -i VID_RGB_038_0.mp4 -vf hflip VID_RGB_091_0.mp4
# !cd Sad && ffmpeg -i VID_RGB_039_0.mp4 -vf hflip VID_RGB_092_0.mp4
# !cd Sad && ffmpeg -i VID_RGB_043_0.mp4 -vf hflip VID_RGB_093_0.mp4
# !cd Sad && ffmpeg -i VID_RGB_047_0.mp4 -vf hflip VID_RGB_094_0.mp4
# !cd Sad && ffmpeg -i VID_RGB_051_0.mp4 -vf hflip VID_RGB_095_0.mp4
# !cd Sad && ffmpeg -i VID_RGB_067_0.mp4 -vf hflip VID_RGB_096_0.mp4
# !cd Sad && ffmpeg -i VID_RGB_071_0.mp4 -vf hflip VID_RGB_097_0.mp4
# !cd Sad && ffmpeg -i VID_RGB_083_0.mp4 -vf hflip VID_RGB_098_0.mp4
# !mkdir Videos
# !cp -a ./Sad/. ./Videos
# !cp -a ./Others/. ./Videos

# !unzip bbox_output.zip
# !unzip openpose_json.zip


Archive:  ./drive/My Drive/fyp/img_data_pickle.zip
  inflating: img_data_test.pkl       
  inflating: img_data_train.pkl      
