<a href="https://colab.research.google.com/github/kevincong95/cs231n-emotiw/blob/master/notebooks/2.0-la-tj-ak-ensemble_baseline_predictions.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Video Sentiment Analysis in the Wild
### Ensembling Notebook | CS231n

This notebook preprocesses input videos to extract faces, frames, poses, and audio before running pre-trained models for each modality to predict group sentiment (positive, negative, or neutral). 

In [1]:

# Clone the code base
!git clone 'https://github.com/kevincong95/cs231n-emotiw.git'

# Switch to TF 1.x and navigate to the directory
%tensorflow_version 1.x
!pwd
import os
os.chdir('cs231n-emotiw')
!pwd

# Install required packages 
!pip install -r 'requirements-predictions.txt'


Cloning into 'cs231n-emotiw'...
remote: Enumerating objects: 342, done.[K
remote: Counting objects: 100% (342/342), done.[K
remote: Compressing objects: 100% (243/243), done.[K
remote: Total 596 (delta 216), reused 210 (delta 98), pack-reused 254[K
Receiving objects: 100% (596/596), 173.22 MiB | 33.92 MiB/s, done.
Resolving deltas: 100% (349/349), done.
TensorFlow 1.x selected.
/content
/content/cs231n-emotiw
Collecting pydub==0.24.0
  Downloading https://files.pythonhosted.org/packages/ba/f9/2cd255898c11179a57415937d601ab1e8a14a7c6a8331ff9c365e97e41f6/pydub-0.24.0-py2.py3-none-any.whl
Collecting moviepy>=1.0.0
[?25l  Downloading https://files.pythonhosted.org/packages/18/54/01a8c4e35c75ca9724d19a7e4de9dc23f0ceb8769102c7de056113af61c3/moviepy-1.0.3.tar.gz (388kB)
[K     |████████████████████████████████| 389kB 11.8MB/s 
Collecting argparse
  Downloading https://files.pythonhosted.org/packages/f2/94/3af39d34be01a24a6e65433d19e107099374224905f1e0cc6bbe1fd22a2f/argparse-1.4.0-py2.py

#### Pose Pre-Requisites
Pose extraction uses the [CMU OpenPose library](https://github.com/CMU-Perceptual-Computing-Lab/openpose) to extract body keypoints. We have pre-compiled this library for use in Colab but some system files still need to be installed. 

#### Retrieve the files

The code block below demonstrates how to retrieve the files from GCS. However, feel free to skip this step if the files are already on the local disk or you have Google Drive mounted.

In [2]:
!apt-get -qq install -y libatlas-base-dev libprotobuf-dev libleveldb-dev libsnappy-dev libhdf5-serial-dev protobuf-compiler libgflags-dev libgoogle-glog-dev liblmdb-dev opencl-headers ocl-icd-opencl-dev libviennacl-dev
!wget https://storage.googleapis.com/cs231n-emotiw/openpose/openpose.tar.gz
!tar -xzf openpose.tar.gz

# The pre-built OpenPose library contains shared library files that need to be manually linked
import os
orig_lib_path = os.environ['LD_LIBRARY_PATH']
os.environ["LD_LIBRARY_PATH"] = f"{orig_lib_path}:{os.getcwd()}/openpose/build/src/openpose/:{os.getcwd()}/openpose/build/caffe/lib/"

!wget https://storage.googleapis.com/cs231n-emotiw/data/train-tiny.zip
!wget https://storage.googleapis.com/cs231n-emotiw/data/val-tiny.zip
!wget https://storage.googleapis.com/cs231n-emotiw/data/test-tiny.zip
!wget https://storage.googleapis.com/cs231n-emotiw/data/Train_labels.txt
!wget https://storage.googleapis.com/cs231n-emotiw/data/Val_labels.txt


Selecting previously unselected package libgflags2.2.
(Reading database ... 144439 files and directories currently installed.)
Preparing to unpack .../00-libgflags2.2_2.2.1-1_amd64.deb ...
Unpacking libgflags2.2 (2.2.1-1) ...
Selecting previously unselected package libgflags-dev.
Preparing to unpack .../01-libgflags-dev_2.2.1-1_amd64.deb ...
Unpacking libgflags-dev (2.2.1-1) ...
Selecting previously unselected package libgoogle-glog0v5.
Preparing to unpack .../02-libgoogle-glog0v5_0.3.5-1_amd64.deb ...
Unpacking libgoogle-glog0v5 (0.3.5-1) ...
Selecting previously unselected package libgoogle-glog-dev.
Preparing to unpack .../03-libgoogle-glog-dev_0.3.5-1_amd64.deb ...
Unpacking libgoogle-glog-dev (0.3.5-1) ...
Selecting previously unselected package libhdf5-serial-dev.
Preparing to unpack .../04-libhdf5-serial-dev_1.10.0-patch1+docs-4_all.deb ...
Unpacking libhdf5-serial-dev (1.10.0-patch1+docs-4) ...
Selecting previously unselected package libleveldb1v5:amd64.
Preparing to unpack ...

#### Preprocess Files

Here, we will instantiate each of the preprocessors and process all of the input video files.

NOTE: Change the input parameters as needed.

WARNING: This may take several hours to complete, depending on the number of files.

In general, pre-processing will extract the following:
- Video frames
- Pose keypoints
- Faces from each video frame
- Audio waveform and audio features

In [3]:
from src.preprocessors.preprocess_all_modes import preprocess
from src.preprocessors.pose_preprocessor import PosePreprocessor

print("Starting to preprocess train data")
preprocess(video_folder="train-tiny.zip", label_file="Train_labels.txt", local_base_path="train-tiny")

print("Starting to preprocess val data")
preprocess(video_folder="val-tiny.zip", label_file="Val_labels.txt", local_base_path="val-tiny")

print("Starting to preprocess test data")
preprocess(video_folder="test-tiny.zip", local_base_path="test-tiny")

Using TensorFlow backend.


Starting to preprocess train data
Video Preprocessor created with video_folder = train-tiny.zip , label_file = Train_labels.txt , output_folder = train-tiny-frames, output_file = train-tiny-frames.zip
Frames will be created with height = 320 , width = 480 , sample_every = 10
Video Preprocessor created with video_folder = train-tiny.zip , output_folder = train-tiny-faces, output_file = train-tiny-faces.zip
Frames will be created with height = 320 , width = 480 , sample_every = 10
Pose Preprocessor created with is_test = False, video_frame_folder = train-tiny-frames , output_folder = train-tiny-pose, output_file = train-tiny-pose.zip
Video Preprocessor created with video_folder = train-tiny.zip , output_folder = train-tiny-audio, output_file = train-tiny-audio.zip
Frames will be created with hop_size = 0.5
Unzipping files to temp dir train-tiny-frames_tmp...
Finished unzipping files
Found 50 videos
Processing video 12/50 with name 188_22.mp4 and class 3 

Processing video 2/50 with name 

In [0]:
# Remove the openpose folder as it is no longer required
!rm -rf openpose/

In [4]:
!ls

data			      test-tiny-frames	     train-tiny-pose.zip
LICENSE			      test-tiny-frames_tmp   train-tiny.zip
models			      test-tiny-frames.zip   Val_labels.txt
notebooks		      test-tiny-pose	     val-tiny-audio
openpose		      test-tiny-pose.zip     val-tiny-audio_tmp
openpose.tar.gz		      test-tiny.zip	     val-tiny-audio.zip
README.md		      Train_labels.txt	     val-tiny-faces
reports			      train-tiny-audio	     val-tiny-faces_tmp
requirements-predictions.txt  train-tiny-audio_tmp   val-tiny-faces.zip
requirements.txt	      train-tiny-audio.zip   val-tiny-frames
src			      train-tiny-faces	     val-tiny-frames_tmp
test-tiny-audio		      train-tiny-faces_tmp   val-tiny-frames.zip
test-tiny-audio_tmp	      train-tiny-faces.zip   val-tiny-pose
test-tiny-audio.zip	      train-tiny-frames      val-tiny-pose.zip
test-tiny-faces		      train-tiny-frames_tmp  val-tiny.zip
test-tiny-faces_tmp	      train-tiny-frames.zip
test-tiny-faces.zip	      train-tiny-pose


### Run Classifiers

**IMPORTANT**: You must restart the runtime at this point to use TF 2.x

In [0]:
%tensorflow_version 2.x

In [2]:
import tensorflow
print(tensorflow.__version__)

2.2.0


In [3]:
!pwd
import os
os.chdir('cs231n-emotiw')
!pwd


/content
/content/cs231n-emotiw


In [4]:
!git pull

remote: Enumerating objects: 9, done.[K
remote: Counting objects:  11% (1/9)[Kremote: Counting objects:  22% (2/9)[Kremote: Counting objects:  33% (3/9)[Kremote: Counting objects:  44% (4/9)[Kremote: Counting objects:  55% (5/9)[Kremote: Counting objects:  66% (6/9)[Kremote: Counting objects:  77% (7/9)[Kremote: Counting objects:  88% (8/9)[Kremote: Counting objects: 100% (9/9)[Kremote: Counting objects: 100% (9/9), done.[K
remote: Compressing objects: 100% (1/1)[Kremote: Compressing objects: 100% (1/1), done.[K
remote: Total 5 (delta 4), reused 5 (delta 4), pack-reused 0[K
Unpacking objects:  20% (1/5)   Unpacking objects:  40% (2/5)   Unpacking objects:  60% (3/5)   Unpacking objects:  80% (4/5)   Unpacking objects: 100% (5/5)   Unpacking objects: 100% (5/5), done.
From https://github.com/kevincong95/cs231n-emotiw
   6b552a9..f6f4ee2  master     -> origin/master
Updating 6b552a9..f6f4ee2
Fast-forward
 src/classifiers/pose_classifier.py | 3 [32m+[m[31m-

In [4]:
from src.classifiers.audio_classifier import AudioClassifier
from src.classifiers.frames_classifier import FramesClassifier
from src.classifiers.pose_classifier import PoseClassifier
from src.classifiers.utils import get_num_samples
import numpy as np

audio_classifier = AudioClassifier('train-tiny-audio', model_location='https://storage.googleapis.com/cs231n-emotiw/models/OPENL3_audio_api_train_test-1-500-epochs-0.5_hop--BEST_MODEL-w-VAL--1e-6-lr-0.2-dropout-512-feat-map-batch-norm-3-cnn-layers.h5', is_test=False)
frames_classifier = FramesClassifier('train-tiny-frames', model_location='https://storage.googleapis.com/cs231n-emotiw/models/frame-classifier-resnet-lstm-x3.h5', is_test=False)
pose_classifier = PoseClassifier('train-tiny-pose', model_location='https://storage.googleapis.com/cs231n-emotiw/models/pose-classifier-v5.h5', is_test=False)

classifiers = [audio_classifier, frames_classifier, pose_classifier]

sample_to_true_label = {}
with open("Train_labels.txt") as f:
    l = 0
    for line in f:
        if l == 0:
            # Skip headers
            l += 1
            continue
        line_arr = line.split(" ")
        sample_to_true_label[line_arr[0].strip()] = int(line_arr[1].strip())
        l += 1



AudioClassifier created with audio_folder = train-tiny-audio , is_test = False , model_location = https://storage.googleapis.com/cs231n-emotiw/models/OPENL3_audio_api_train_test-1-500-epochs-0.5_hop--BEST_MODEL-w-VAL--1e-6-lr-0.2-dropout-512-feat-map-batch-norm-3-cnn-layers.h5
Downloading data from https://storage.googleapis.com/cs231n-emotiw/models/OPENL3_audio_api_train_test-1-500-epochs-0.5_hop--BEST_MODEL-w-VAL--1e-6-lr-0.2-dropout-512-feat-map-batch-norm-3-cnn-layers.h5
FramesClassifier created with frames_folder = train-tiny-frames , is_test = False , model_location = https://storage.googleapis.com/cs231n-emotiw/models/frame-classifier-resnet-lstm-x3.h5
Downloading data from https://storage.googleapis.com/cs231n-emotiw/models/frame-classifier-resnet-lstm-x3.h5
PoseClassifier created with pose_folder = train-tiny-pose , is_test = False , model_location = https://storage.googleapis.com/cs231n-emotiw/models/pose-classifier-v5.h5
Downloading data from https://storage.googleapis.com/c

Skipping unzipping files as input is a folder
Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
Cause: Bad argument number for Name: 4, expecting 3
Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
Cause: Bad argument number for Name: 4, expecting 3
Skipping unzipping files as input is a folder
Found 50 frames belonging to 50 videos belonging to 3 classes.
Min frames determined to be 13
Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
Cause: Bad argument number for Name: 4, expecting 3
Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
Cause: Bad argumen

  x_new.append((lx[i] - origin_x) / len_x)
  x_new.append((ly[i] - origin_y) / len_y)


Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
Cause: Bad argument number for Name: 4, expecting 3
Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
Cause: Bad argument number for Name: 4, expecting 3
Number of samples: 50
Predicted y-labels:
[3 3 3 3 3 3 1 3 1 1 3 3 1 3 3 1 2 3 2 2 2 2 1 2 2 3 2 3 1 3 3 2 3 2 3 3 1
 2 3 1 3 3 2 2 3 1 2 1 1 3]
True y-labels:
[3 1 3 2 3 1 1 1 3 1 3 3 1 3 3 1 1 3 2 2 1 2 1 2 2 3 3 3 2 3 1 2 3 3 3 3 1
 2 3 3 1 1 2 1 1 1 2 1 1 3]
Accuracy: 0.68


In [7]:
audio_classifier.summary()

Model: "model_1"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
input_4 (InputLayer)            [(None, None, 6144)] 0                                            
__________________________________________________________________________________________________
conv1d_3 (Conv1D)               (None, None, 64)     786496      input_4[0][0]                    
__________________________________________________________________________________________________
batch_normalization_3 (BatchNor (None, None, 64)     256         conv1d_3[0][0]                   
__________________________________________________________________________________________________
max_pooling1d_3 (MaxPooling1D)  (None, None, 64)     0           batch_normalization_3[0][0]      
____________________________________________________________________________________________

In [8]:
frames_classifier.summary()

Model: "model_1"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
input_4 (InputLayer)            [(None, 12, 320, 480 0                                            
__________________________________________________________________________________________________
time_distributed_1 (TimeDistrib (None, 12, 10, 15, 2 23587712    input_4[0][0]                    
__________________________________________________________________________________________________
conv_lst_m2d_3 (ConvLSTM2D)     (None, 12, 10, 15, 4 3006880     time_distributed_1[0][0]         
__________________________________________________________________________________________________
conv_lst_m2d_4 (ConvLSTM2D)     (None, 12, 10, 15, 4 3006880     time_distributed_1[0][0]         
____________________________________________________________________________________________

In [9]:
pose_classifier.summary()

Model: "model"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_1 (InputLayer)         [(None, 12, 27)]          0         
_________________________________________________________________
bidirectional (Bidirectional (None, 64)                15360     
_________________________________________________________________
dense (Dense)                (None, 3)                 195       
Total params: 15,555
Trainable params: 15,555
Non-trainable params: 0
_________________________________________________________________


In [0]:
def predict(mode="soft", complex_fusion=False):
    assert mode in ["soft" , "hard"]

    classifier_outputs = []
    classifier_samples = []
    sample_to_row = {}
    num_samples = 0
    y_true = []

    for c, classifier in enumerate(classifiers):
        results, samples = classifier.predict()
        classifier_outputs.append(results.tolist())
        classifier_samples.append(list(samples))
        num_samples = len(list(samples))

    print(f"Number of samples: {num_samples}")

    for i, sample in enumerate(classifier_samples[0]):
        sample_to_row[sample] = i
        y_true.append(sample_to_true_label[sample])

    X = np.zeros(shape=(len(classifiers), num_samples, 3))
    for c, output in enumerate(classifier_outputs):
        samples = classifier_samples[c]
        for i, row in enumerate(output):
            sample = samples[i]
            X[c, sample_to_row[sample], :] += row


    if mode == "soft":
        # Take the average of each 
        y_pred = np.mean(X, axis=0)
        y_pred = np.argmax(y_pred, axis=1) + 1 # Add 1 because true labels range from 1 to 3
        y_true = np.array(y_true)

        print("Predicted y-labels:")
        print(y_pred)

        print("True y-labels:")
        print(y_true)

        accuracy = (y_pred == y_true).mean()
        print(f"Accuracy: {accuracy}")
    else:
        print("Not implemented yet")


In [6]:
predict()

Skipping unzipping files as input is a folder
Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
Cause: Bad argument number for Name: 4, expecting 3
Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
Cause: Bad argument number for Name: 4, expecting 3
Skipping unzipping files as input is a folder
Found 50 frames belonging to 50 videos belonging to 3 classes.
Min frames determined to be 13
Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
Cause: Bad argument number for Name: 4, expecting 3
Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
Cause: Bad argumen

  x_new.append((lx[i] - origin_x) / len_x)
  x_new.append((ly[i] - origin_y) / len_y)


Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
Cause: Bad argument number for Name: 4, expecting 3
Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
Cause: Bad argument number for Name: 4, expecting 3
Number of samples: 50
Predicted y-labels:
[1 3 3 2 3 3 1 1 1 1 3 3 1 2 3 3 3 3 2 2 2 2 3 1 2 2 2 3 3 3 3 2 3 1 3 2 2
 2 3 3 3 1 2 3 2 3 3 1 1 3]
True y-labels:
[3 1 3 2 3 1 1 1 3 1 3 3 1 3 3 1 1 3 2 2 1 2 1 2 2 3 3 3 2 3 1 2 3 3 3 3 1
 2 3 3 1 1 2 1 1 1 2 1 1 3]
Accuracy: 0.56
