# Merge features from various modalitis into single database

## 1. Load feature data from Video

Load OpenFace output data and load features in pandas dataframe: So far copied from: https://github.com/emrecdem/exploface/blob/master/TUTORIALS/tutorial1.ipynb

In [1]:
import pandas as pd
import matplotlib.pyplot as plt
import exploface

In [2]:
exploface.__version__

'0.0.0.dev6'

In [3]:
# specify some paths
emrecdemStudyDataFolder = "/media/sf_sharedfolder/Emotion/emrecdemstudydata"
openface_outputfolder = emrecdemStudyDataFolder + "/OpenFaceOutput"

In [4]:
# Search for files with csv extension, because we are only interested in those
# This assumes that there are no other csv files in folder other than the ones produced by OpenFace.
from os import listdir

def find_csv_filenames( path_to_dir, suffix=".csv" ):
    filenames = listdir(path_to_dir)
    return [ filename for filename in filenames if filename.endswith( suffix ) ]

filenames = find_csv_filenames(openface_outputfolder)

In [5]:
filenames

['P18_S2_IAPS_HAPPY_Cfront.csv',
 'P18_S2_IAPS_SAD_Cfront.csv',
 'P21_S2_IAPS_HAPPY_C1.csv',
 'P21_S2_IAPS_SAD_C1.csv']

In [6]:
# select one file to process (in the future this can be a loop over all the files)
openface_file = openface_outputfolder + '/' + filenames[0]
openface_features = exploface.get_feature_time_series(openface_file)

In [7]:
openface_features.head(5)

Unnamed: 0,frame,face_id,timestamp,confidence,success,gaze_0_x,gaze_0_y,gaze_0_z,gaze_1_x,gaze_1_y,...,AU12_c,AU14_c,AU15_c,AU17_c,AU20_c,AU23_c,AU25_c,AU26_c,AU28_c,AU45_c
0,1,0,0.0,0.98,1,0.028242,0.015028,-0.999488,0.045799,-0.019971,...,0.0,0.0,1.0,1.0,1.0,1.0,1.0,0.0,0.0,0.0
1,2,0,0.04,0.98,1,0.012176,-0.017202,-0.999778,0.035786,-0.044246,...,0.0,0.0,1.0,1.0,1.0,1.0,1.0,1.0,0.0,0.0
2,3,0,0.08,0.98,1,0.003201,0.007913,-0.999964,0.030401,-0.01686,...,0.0,1.0,1.0,1.0,1.0,0.0,1.0,1.0,0.0,0.0
3,4,0,0.12,0.98,1,0.011002,0.017785,-0.999781,-0.06609,-0.075202,...,0.0,1.0,1.0,1.0,1.0,1.0,1.0,0.0,0.0,0.0
4,5,0,0.16,0.98,1,0.060468,0.055646,-0.996618,-0.093641,0.164191,...,0.0,1.0,1.0,1.0,1.0,1.0,1.0,0.0,0.0,0.0


### To do:
- Strip away irrelevant information, e.g. make selection of features of interest.
- Downsample to minimally acceptable resolution, e.g. 10 Hertz.

## 2. Load feature data from audio

Process data with Librosa in pandas dataframe: So far copied from: https://github.com/emrecdem/explibrosa/blob/master/TUTORIALS/tutorial1.ipynb

In [8]:
import matplotlib.pyplot as plt
import os
import explibrosa

In [9]:
explibrosa.__version__

'0.0.0.dev1'

Find wav file that matches the csv file produced by OpenFace based on the assumption that filenames are identical except from file extension

In [36]:
transformfilename = filenames[0]

In [37]:
transformfilename

'P18_S2_IAPS_HAPPY_Cfront.csv'

In [38]:
audiofile_name = transformfilename.replace('.csv','.wav').replace('_Cfront','').replace('_Cside','')

In [39]:
audiofile_name 

'P18_S2_IAPS_HAPPY.wav'

In [40]:
import subprocess
 
# Set up find command
findCMD = 'find ' + emrecdemStudyDataFolder + ' -name ' + audiofile_name 
out = subprocess.Popen(findCMD,shell=True,stdin=subprocess.PIPE, 
                        stdout=subprocess.PIPE,stderr=subprocess.PIPE)
# Get standard out and error
(stdout, stderr) = out.communicate()
 
# Save found files to list
filelist = stdout.decode().split()

In [41]:
audiofiles_fullPaths = filelist # probably the list has only one filename
audio_file = audiofiles_fullPaths[0]

In [42]:
os.path.isfile(audio_file)

True

In [43]:
explibrosa.get_info(audio_file)

{'#frames': 3223486, 'duration (min)': 3.4, 'Sample freq (kHz)': 16.0}

In [44]:

time_series = explibrosa.get_feature_time_series(audio_file)

Running librosa (no results found on disk)
RMS energy
     0.4 seconds
Zero crossing
     0.5 seconds
Pitches
     4.67 seconds
  Pitches smoothing
     4.84 seconds
TOTAL execution time: 0.08 min


In [45]:
time_series.head()

Unnamed: 0,timestamp,rmse,zrc,pitch
0,0.0,0.013185,0.038095,186.62088
1,0.01,0.012107,0.066667,185.620404
2,0.020001,0.010913,0.052381,184.32665
3,0.030001,0.01129,0.052381,182.657746
4,0.040002,0.010963,0.042857,180.606069


### To do:
- Downsample to minimally acceptable resolution, e.g. 10 Hertz.


## 3. Store in database

### To do:
- Add participant ID and file name
- Store in database