# Future Action Prediction using Deep Multi-Scale Video Prediction Beyond Mean Square Error

The project is based on predicting future action frames trained on dataset of Human Actions like slide, bend, walk, run, skip etc. The prediction is based on learning to predict future images from a video sequence, where this video sequence is converted into a images and then these images are used for training the Adversarial Network. Convolutional networks have short-range dependencies and thus using an Adversarial network helps in training the network using Multi-scale model. The network makes a series of prediction starting from lower resolution and uses a prediction of s1 to make a prediction for s2 for the same scale.

#### Loss Functions: 
1. bce_loss: Calculates the sum of binary cross-entropy losses between predictions and ground truths
2. lp_loss: Calculates the sum of lp losses between the predicted and ground truth frames.
3. gdl_loss: Calculates the sum of GDL losses between the predicted and ground truth frames. 
4. adv_loss: Calculates the sum of BCE losses between the predicted classifications and true labels.

#### convert_video_to_jpg.py

This python file is responsible for generating images for a mp4 video file. The below example shows to generate images from mulitple videos contained in a directory. All the images of each video are stored in a seperate directory created inside that folder containing video files. If you have cloned the repository, you can find the folders: skip and walk which contains multiple videos. In order to generate their images, below command can be executed.

In [None]:
%run convert_video_to_jpg.py --v=data/Human_Actions/skip

#### generate_clips.py :

Takes input a directory of training images generated from the video clips. The file takes a batch of 5 images from a folder (4 as input images, 1 to be predicted) as  and saves the image as npz (compressed form). These clips are generated for both Training data set and Testing data set. We can specify the directory for which we want to generate clips, where we want to save these clips and the number of clips to be generated. Generally for a good training, we require 100,000 clips for a data set containing 200,000+ images.

Below is an example for running the generate_clips.py to generate 100 clips from the training data set. The github repository has some images in data/Human_Actions/Train and data/Human_Actions/Test, below command can be executed to test how the clips are generated from randomly selected images within the folder.

In [None]:
%run generate_clips.py --t=data/Human_Actions/Train/ --c=data/Human_Actions/Clips/ --n=1

While generating clips in large amount, the directory structures and files names can be commented from utils.py - get_full_clips() method. The number of recursions or the number of images to be stored in one compressed format can be changed to a bigger number, but due to memory constraints we prefer to train with compressing 4 + 1 images into one clip.

#### main_prediction.py

This is the main python file to run the network in training mode or testing mode. In order to test the network, we can run main_prediction.py with test_only mode as shown below. We have specified the test directory for which we want to test predictions. The test_dir must have at least one sub-folder which contains atleast 6-10 images. The --recursions parameter specifies the number of future predictions to be made.

In [None]:
# to train the network - one can specify the Training folder.
# %run main_prediction.py --test_dir=data/Human_Actions/Train

%run main_prediction.py --test_dir=data/Human_Actions/NewTest_3 --recursions=1 --test_only

# the c.GIF_SRC_FOLDER contains the latest GIF created by running the above test mode.
import constants as c
print(c.GIF_SRC_FOLDER)

If we want to test the network with some custom created test folder containing multiple subfolder and containing some images, we can select the folder from this UI and running the below cells, we can generate the output predictions. Make sure that the test folder for which you want to perform testing is inside the Code folder only and nowhere else.

In [None]:
from IPython.display import HTML

input_form = """

<div style="background-color:gainsboro; border:solid black; width:800px; padding:20px;">
<br>
<input type="file" id="file" onchange="getfolder(event)" webkitdirectory mozdirectory msdirectory odirectory directory multiple />
<br>
Set Recursions:
<select name = "Recursions" id="Recursions" value = "1" style="width: 50px;height:25px">
    <option value = "1">1</option>
    <option value = "2">2</option>
    <option value = "3">3</option>
    <option value = "4">4</option>

</select>
<br>
<button onclick="getValues()">Set Parameters</button><br> <br>

<span id="output"></span>

</div>
"""

javascript = """
<script type="text/Javascript">

    
    
    function getfolder(e) {
    var files = e.target.files;
    var path = files[0].webkitRelativePath;
    var Folder = path.split("/");
    var filename = 'file';
         
    var kernel = IPython.notebook.kernel;
    var command = filename +" = '" + Folder[0] + "'";
    kernel.execute(command);
    
}
    function getValues(){
    var rec = document.getElementById('Recursions').value;
    var recur = 'recur';
    var kernel = IPython.notebook.kernel;
    var command = recur +" = '" + rec + "'";
    kernel.execute(command);
    }
   
</script>
"""
        

HTML(input_form + javascript)

After setting the parameters - selecting the test folder and number of recursions, execute the below cell to perform testing. This will run the main_prediction.py in test_mode and generate the next predictions as specified in the Set Parameters for the test directory selected by you.

In [None]:
fi= "--test_dir="+file
print(fi)
re="--recursions="+recur
print(re)
%run main_prediction.py $fi $re --test_only

#### Run the below script to fetch the last generated GIFs from testing the above directory.
In order to check the output predictions, execute the below cell and you can see the Original input, the Ground Truth and the Generated Predictions GIFs.

In [None]:
import os
import constants as c

original_filename = ''
ogt_filename = ''
ogen_filename = ''
        
count = 0

for root, dirs, filenames in os.walk(c.GIF_SRC_FOLDER):
    #print('root: ', root)
    #print('dirs: ', dirs)
    
    for filename in filenames:
        fullFileName = os.path.join(root,filename)
        if "original" in filename:
            count += 1
            original_filename = fullFileName
            #print('original gif: ', fullFileName)
        elif "ogt" in filename:
            ogt_filename = fullFileName
            count += 1
            #print('ogt gif: ', fullFileName)
        elif "ogen" in filename:
            ogen_filename = fullFileName
            count += 1
            #print('ogen gif: ', fullFileName)
            
    if not dirs:
        if count == 3:
            original_filename = original_filename.replace('\\','/')
            ogt_filename = ogt_filename.replace('\\','/')
            ogen_filename = ogen_filename.replace('\\','/')
            print('Original GIF:      ', original_filename)
            print('Ground Truth GIF:  ', ogt_filename)
            print('Generated GIF:     ', ogen_filename)
            
            break
        #image = load_image(original_filename)
        #plot_image(image)
        
        #image = load_image(ogt_filename)
        #plot_image(image)
        
        #image = load_image(ogen_filename)
        #plot_image(image)

##### Plot the Input, Ground Truth and Generated GIFs
The Original GIF, Ground Truth GIF, Generated GIF are the generated output files from the above test scripts. Running the below script, we can visually see all the images - Orginal Frames, Ground Truth and Generated -  the ground truth and generated GIFs have the original frame added in their frames.

In [28]:
HTML("<div id='textid'>" + original_filename)
HTML("<table><tr>"
     +"<td>Input_0<img src='data/Human_Actions/recursions_4/1/input_0.png' /></td>"
     +"<td>Input_1<img src='data/Human_Actions/recursions_4/1/input_1.png' /></td>"
     +"<td>Input_2<img src='data/Human_Actions/recursions_4/1/input_2.png' /></td>"
     +"<td>Input_3<img src='data/Human_Actions/recursions_4/1/input_3.png' /></td></tr>"
     +"<tr>"
     +"<td>Ground Truth_0<img src='data/Human_Actions/recursions_4/1/gt_0.png' /></td>"
     +"<td>Ground Truth_1<img src='data/Human_Actions/recursions_4/1/gt_1.png' /></td>"
     +"<td>Ground Truth_2<img src='data/Human_Actions/recursions_4/1/gt_2.png' /></td>"
     +"<td>Ground Truth_3<img src='data/Human_Actions/recursions_4/1/gt_3.png' /></td></tr>"
     +"<tr>"
     +"<td>Generated_0<img src='data/Human_Actions/recursions_4/1/gen_0.png' /></td>"
     +"<td>Generated_1<img src='data/Human_Actions/recursions_4/1/gen_1.png' /></td>"
     +"<td>Generated_2<img src='data/Human_Actions/recursions_4/1/gen_2.png' /></td>"
     +"<td>Generated_3<img src='data/Human_Actions/recursions_4/1/gen_3.png' /></td></tr>"
     +"</table></div>")

0,1,2,3
Input_0,Input_1,Input_2,Input_3
Ground Truth_0,Ground Truth_1,Ground Truth_2,Ground Truth_3
Generated_0,Generated_1,Generated_2,Generated_3


In [None]:
from IPython.core.display import display, HTML

HTML("<div id='textid'>" + original_filename)
HTML("<table><tr><td>&nbsp;&nbsp;&nbsp;&nbsp;<B>Originial:</B>"+
     "&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <br>"+
     "<img id='myOrImg' src='"+original_filename +"'/>"
     "<td>&nbsp;&nbsp;&nbsp;&nbsp;<B>Ground Truth:</B>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<br>"
     +"<img id='myGtImg' src='"+ogt_filename+"' />"
     +"<td>&nbsp;&nbsp;&nbsp;&nbsp;<B>Generated:</B>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<br>"
     +"<img id='myGenImg' src='"+ogen_filename+"' />"
     +"</td></tr></table></div>")

#### Other test results:

Other results as produced by our network are plotted below.

In [None]:
HTML("<div id='textid'>" + original_filename)
HTML("<table><tr><td>&nbsp;&nbsp;&nbsp;&nbsp;<B>Originial:</B>"+
     "&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <br>"+
     "<img id='myOrImg' src='data/Human_Actions/NewTest_2/Step_0/0/originalInput_GIF.gif' />"
     "<td>&nbsp;&nbsp;&nbsp;&nbsp;<B>Ground Truth:</B>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<br>"
     +"<img id='myGtImg' src='data/Human_Actions/NewTest_2/Step_0/0/ogt_GIF.gif' />"
     +"<td>&nbsp;&nbsp;&nbsp;&nbsp;<B>Generated:</B>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<br>"
     +"<img id='myGenImg' src='data/Human_Actions/NewTest_2/Step_0/0/ogen_GIF.gif' />"
     +"</td></tr></table></div>")

In [None]:
HTML("<div id='textid'>" + original_filename)
HTML("<table><tr>"
     +"<td>Input_0<img src='data/Human_Actions/NewTest_2/Step_0/1/input_0.png' /></td>"
     +"<td>Input_1<img src='data/Human_Actions/NewTest_2/Step_0/1/input_1.png' /></td>"
     +"<td>Input_2<img src='data/Human_Actions/NewTest_2/Step_0/1/input_2.png' /></td>"
     +"<td>Input_3<img src='data/Human_Actions/NewTest_2/Step_0/1/input_3.png' /></td></tr>"
     +"<tr><td>Ground Truth"
     +"<img id='myGtImg' src='data/Human_Actions/NewTest_2/Step_0/1/gt_0.png' /></td></tr>"
     +"<tr><td>Generated"
     +"<img id='myGenImg' src='data/Human_Actions/NewTest_2/Step_0/1/gen_0.png' /></td></tr>"
     +"</table></div>")

In [None]:
HTML("<div id='textid'>" + original_filename)
HTML("<table><tr>"
     +"<td>Input_0<img src='data/Human_Actions/NewTest_2/Step_0_45/2/input_0.png' /></td>"
     +"<td>Input_1<img src='data/Human_Actions/NewTest_2/Step_0_45/2/input_1.png' /></td>"
     +"<td>Input_2<img src='data/Human_Actions/NewTest_2/Step_0_45/2/input_2.png' /></td>"
     +"<td>Input_3<img src='data/Human_Actions/NewTest_2/Step_0_45/2/input_3.png' /></td></tr>"
     +"<tr>"
     +"<td>Ground Truth_0<img src='data/Human_Actions/NewTest_2/Step_0_45/2/gt_0.png' /></td>"
     +"<td>Ground Truth_1<img src='data/Human_Actions/NewTest_2/Step_0_45/2/gt_1.png' /></td>"
     +"<td>Ground Truth_2<img src='data/Human_Actions/NewTest_2/Step_0_45/2/gt_2.png' /></td></tr>"
     +"<tr>"
     +"<td>Generated_0<img src='data/Human_Actions/NewTest_2/Step_0_45/2/gen_0.png' /></td>"
     +"<td>Generated_1<img src='data/Human_Actions/NewTest_2/Step_0_45/2/gen_1.png' /></td>"
     +"<td>Generated_2<img src='data/Human_Actions/NewTest_2/Step_0_45/2/gen_2.png' /></td></tr>"
     +"</table></div>")

In [None]:
HTML("<div id='textid'>" + original_filename)
HTML("<table><tr>"
     +"<td>Input_0<img src='data/Human_Actions/NewTest_2/Step_0_45/4/input_0.png' /></td>"
     +"<td>Input_1<img src='data/Human_Actions/NewTest_2/Step_0_45/4/input_1.png' /></td>"
     +"<td>Input_2<img src='data/Human_Actions/NewTest_2/Step_0_45/4/input_2.png' /></td>"
     +"<td>Input_3<img src='data/Human_Actions/NewTest_2/Step_0_45/4/input_3.png' /></td></tr>"
     +"<tr>"
     +"<td>Ground Truth_0<img src='data/Human_Actions/NewTest_2/Step_0_45/4/gt_0.png' /></td>"
     +"<td>Ground Truth_2<img src='data/Human_Actions/NewTest_2/Step_0_45/4/gt_1.png' /></td></tr>"
     +"<tr>"
     +"<td>Generated_0<img src='data/Human_Actions/NewTest_2/Step_0_45/4/gen_0.png' /></td>"
     +"<td>Generated_2<img src='data/Human_Actions/NewTest_2/Step_0_45/4/gen_1.png' /></td></tr>"
     +"</table></div>")

In [None]:
HTML("<div id='textid'>" + original_filename)
HTML("<table><tr>"
     +"<td>Input_0<img src='data/Human_Actions/recursions_4/0/input_0.png' /></td>"
     +"<td>Input_1<img src='data/Human_Actions/recursions_4/0/input_1.png' /></td>"
     +"<td>Input_2<img src='data/Human_Actions/recursions_4/0/input_2.png' /></td>"
     +"<td>Input_3<img src='data/Human_Actions/recursions_4/0/input_3.png' /></td></tr>"
     +"<tr>"
     +"<td>Ground Truth_0<img src='data/Human_Actions/recursions_4/0/gt_0.png' /></td>"
     +"<td>Ground Truth_1<img src='data/Human_Actions/recursions_4/0/gt_1.png' /></td>"
     +"<td>Ground Truth_2<img src='data/Human_Actions/recursions_4/0/gt_2.png' /></td>"
     +"<td>Ground Truth_3<img src='data/Human_Actions/recursions_4/0/gt_3.png' /></td></tr>"
     +"<tr>"
     +"<td>Generated_0<img src='data/Human_Actions/recursions_4/0/gen_0.png' /></td>"
     +"<td>Generated_1<img src='data/Human_Actions/recursions_4/0/gen_1.png' /></td>"
     +"<td>Generated_2<img src='data/Human_Actions/recursions_4/0/gen_2.png' /></td>"
     +"<td>Generated_3<img src='data/Human_Actions/recursions_4/0/gen_3.png' /></td></tr>"
     +"</table></div>")