## Movie Emotion Recognition <br>

<font size="3"> Team members: </font><br>
<font size="3"> Jesse Cahill </font><br>
<font size="3"> Zhongling Jiang </font><br>
<font size="3"> Zile Wang </font>

Dataset Google Drive Link: https://drive.google.com/open?id=1_Ds_2tV4tUFiisVgDaM1utOEzEW9aSLR 
And I go to 1000songs/clips_45seconds/..<br>
Tutorial: https://musicinformationretrieval.com/#Introduction

List of features see this link: http://mac.citi.sinica.edu.tw/~yang/pub/yang11taslp_dist.pdf <br>
Our target paper: https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=7745959
Python Librosa Library: https://librosa.github.io/librosa/

### Harmonic
The 10 harmonic
features includes the mean and standard deviation of
per-frame salient pitch, chromagram center, key clarity,
mode, harmonic change.
https://bmcfee.github.io/papers/scipy2015_librosa.pdf

# Feature 1. Melody & Harmony (Salient Pitch, Chromagrams, Key Clarity, Mode, Harmonic Change)
## We have 1801 frames for salient pitch, and 3609 for chromagrams

## Part 1. Extract Salient Pitch: Mean & Var <br>
$$ ACF(\tau) = \sum_{i = 0}^{N - 1 - \tau}x_ix_{i+\tau}, S_f\left(\omega\right) \sim \mathscr{F}(ACF(\tau)) $$
$$ f_{sp} = \frac{sr}{N_1}$$
<font size ="3"><font family = "sans-serif">Where $N_1$ is the second peak in ACF, for each music of the 1000 songs</font></font>
$$ mean,var([f_{sp0},f_{sp1},f_{sp2},\cdots,f_{sp1800}])$$

In [1]:
from __future__ import print_function
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import librosa
import librosa.display
%matplotlib inline

In [2]:
salient_pitch_index = [i+1 for i in range(1000)]
salient_pitch_columns = ['salient_pitch_mean', 'salient_pitch_variance']
salient_pitch_1000_songs = np.zeros(shape = (1000,2))

for h in range(1,1001):
    y, sr = librosa.load(r'/Users/zilewang/Desktop/bda/1000songs/clips_45seconds/%d.mp3' % h)
    
    per_sample_t = 1.0 / sr
    pitch_frame_length, pitch_hop_length = int(50 / 1000 / per_sample_t), int(25 / 1000 / per_sample_t)
    # we hope to introduce an acf - filtered salient pitch
    salient_pitch = []
    for i in range((len(y) - pitch_frame_length) // pitch_hop_length + 1):
        acf = librosa.core.autocorrelate(y[i*pitch_hop_length:i*pitch_hop_length+pitch_frame_length])
        # introduce a filtered acf method
        for j in range(1, len(acf) - 1):
            if acf[j] > acf[j-1] and acf[j] > acf[j+1]:
                if j >= 20 and acf[j] == np.max(acf[j-20:j+20]): # reducing the adjacent HF fluctuation
                    salient_pitch.append(sr / j)
                    break
    salient_pitch = np.array(salient_pitch, dtype = float)
    salient_pitch_1000_songs[h-1,0] = np.mean(salient_pitch)
    salient_pitch_1000_songs[h-1,1] = np.var(salient_pitch)

salient_pitch_1000_songs_mean_var = pd.DataFrame(index=salient_pitch_index, columns=salient_pitch_columns,\
                                                 data=salient_pitch_1000_songs)
salient_pitch_1000_songs_mean_var.to_csv('mean_var_of_salient_pitch_1000_songs.csv')



















































-1

## Part 2. Extract Chromagram Centroid: Mean & Var

In [12]:
# chromagram center
# librosa.feature.chroma_stft(y=y, sr=sr) # find center of that
# https://librosa.github.io/librosa/generated/librosa.feature.spectral_centroid.html
chromagram_centroid_index = [i+1 for i in range(1000)]
chromagram_centroid_columns = ['chromagram_centroid_mean', 'chromagram_centroid_variance']
chromagram_centroid = np.zeros(shape = (1000,2))
for i in range(1,1001):
    y, sr = librosa.load(r'/Users/zilewang/Desktop/bda/1000songs/clips_45seconds/%d.mp3' % i)
    per_sample_t = 1.0 / sr
    chromagram_frame_length, chromagram_hop_length = int(100 / 1000 / per_sample_t), int(12.5 / 1000 / per_sample_t)
    chromagram_center = librosa.feature.spectral_centroid(y=y, sr=sr, n_fft=chromagram_frame_length, \
                                                      hop_length=chromagram_hop_length)[0]
    chromagram_centroid[i-1, 0] = np.mean(chromagram_center[1:]) # 0 encountered in the first frame
    chromagram_centroid[i-1, 1] = np.var(chromagram_center[1:])

chromagram_centroid_1000_songs_mean_var = pd.DataFrame(index = chromagram_centroid_index, columns=chromagram_centroid_columns, data=chromagram_centroid)
chromagram_centroid_1000_songs_mean_var.to_csv('mean_var_of_chromagram_centroid_1000_songs.csv')



















































## Part 3. Key Clarity & Best Major, Minor Key & Mode 

Major key: $$ \left(0,2,4,5,7,9,11\right)\sim \left(C,D,E,F,G,A,B\right) $$ 
Minor key: $$ \left(1,3,6,8,10\right) \sim \left(C\#,D\#,F\#,G\#,A\#\right) $$

One music has only one best key, as the row with the largest sum of Chromagram
$$ Key = i: argmax\left(\sum_{j = 0}^{3608} chro(i,j)\right) $$
One music has one mode mean & var, dif equals every frame of 3609 frames with the calculation listed below
$$ dif = (chro[majorkey, 0] - chro[minorkey, 0]), \cdots, (chro[majorkey, 3608] - chro[minorkey, 3608]) $$
return $$mean(dif), var(dif)$$

In [81]:
key_clarity = np.zeros(shape = (1000,1))
major_minor_key_clarity = np.zeros(shape = (1000,2),dtype=int)
mode = np.zeros(shape = (1000,2))
major_index, minor_index = [0,2,4,5,7,9,11], [1,3,6,8,10]
for i in range(1,1001):
    y, sr = librosa.load(r'/Users/zilewang/Desktop/bda/1000songs/clips_45seconds/%d.mp3' % i)
    per_sample_t = 1.0 / sr
    chromagram_frame_length, chromagram_hop_length = int(100 / 1000 / per_sample_t), int(12.5 / 1000 / per_sample_t)
    chromagram = librosa.feature.chroma_stft(y, sr=sr, n_fft= chromagram_frame_length, \
                                             hop_length = chromagram_hop_length, \
                                             win_length = chromagram_frame_length, window = 'hamming', \
                                             n_chroma = 12)
    # zero-column delete
    k, ln_chro = 0, chromagram.shape[1]
    while k < ln_chro:
        if k >= chromagram.shape[1]:
            break
        if np.sum(chromagram[:,k]) == 0:
            chromagram = np.delete(chromagram, k, 1)
            k -= 1
        k += 1   
    major_chro = np.array([chromagram[0,:], chromagram[2,:], chromagram[4,:], chromagram[5,:], chromagram[7,:], \
                 chromagram[9,:], chromagram[11,:]], dtype = float)
    minor_chro = np.array([chromagram[1,:], chromagram[3,:], chromagram[6,:], chromagram[8,:], chromagram[10,:]],\
                          dtype = float)
    key_clarity[i-1] = np.argmax([np.sum(chromagram[k,:]) for k in range(12)])
    major_minor_key_clarity[i-1, 0] = major_index[np.argmax([np.sum(major_chro[k,:]) for k in range(7)])]
    major_minor_key_clarity[i-1, 1] = minor_index[np.argmax([np.sum(minor_chro[k,:]) for k in range(5)])]
    dif_sum = []
    for j in range(chromagram.shape[1]):
        dif_sum.append(chromagram[major_minor_key_clarity[i-1,0], j] - \
                       chromagram[major_minor_key_clarity[i-1,1], j])
    mode[i-1,0] = np.mean(dif_sum)
    mode[i-1,1] = np.var(dif_sum)

key_clarity = np.array(key_clarity, dtype = int)
key_clarity_1000_songs = pd.DataFrame(index=[i+1 for i in range(1000)],columns=['Best Key'],data=key_clarity)
mean_var_mode_1000_songs = pd.DataFrame(index=[i+1 for i in range(1000)],columns=['mode mean', 'mode variance'], \
                                       data=mode)
major_minor_key_1000_songs = pd.DataFrame(index=[i+1 for i in range(1000)], \
                                          columns=['major key', 'minor key'], \
                                          data=major_minor_key_clarity)

full_mode = pd.concat([major_minor_key_1000_songs, mean_var_mode_1000_songs], axis = 1)
full_mode.to_csv('mean_var_mode_1000_songs.csv')
key_clarity_1000_songs.to_csv('key_1000_songs.csv')



















































## Part 4. Harmonic Change
The Harmonic Change in every frame of a given song is 
$$ \Delta_n = \sqrt{||\zeta_{n-1} - \zeta_{n+1}||_{2}}$$
where
$$ \zeta_n = \frac{1}{||c_n||_1}\Phi c_n $$
where
$$ \Phi = \begin{pmatrix} sin(0\times7\pi/6) & sin(1\times7\pi/6) & sin(2\times7\pi/6) & \cdots & sin(11\times7\pi/6) \\ cos(0\times7\pi/6) & cos(1\times7\pi/6) & cos(2\times7\pi/6) &\cdots & cos(11\times7\pi/6) \\ sin(0\times3\pi/2) & sin(1\times3\pi/2) & sin(2\times3\pi/2) &\cdots & sin(11\times3\pi/2) \\ cos(0\times3\pi/2) & cos(1\times3\pi/2) & cos(2\times3\pi/2) &\cdots & cos(11\times3\pi/2) \\ 0.5sin(0\times2\pi/3) & 0.5sin(1\times2\pi/3) & 0.5sin(2\times2\pi/3) &\cdots & 0.5sin(11\times2\pi/3) \\ 0.5cos(0\times2\pi/3) & 0.5cos(1\times2\pi/3) & 0.5cos(2\times2\pi/3) &\cdots & 0.5cos(11\times2\pi/3) \end{pmatrix}$$
and
$$ c_n = \begin{pmatrix}chromagram[0,n]\\chromagram[1,n]\\chromagram[2,n]\\ \vdots \\chromagram[11,n]\\\end{pmatrix} $$

We get the mean and the variance of the $[\Delta_1,\Delta_2,\cdots,\Delta_{3607}]$ for each song of the 1000 songs

In [100]:
phi = np.array([[np.sin(i * 7 * np.pi / 6) for i in range(12)],\
                [np.cos(i * 7 * np.pi / 6) for i in range(12)],\
                [np.sin(i * 3 * np.pi / 2) for i in range(12)],\
                [np.cos(i * 3 * np.pi / 2) for i in range(12)],\
                [0.5*np.sin(i * 2 * np.pi / 3) for i in range(12)],\
                [0.5*np.cos(i * 2 * np.pi / 3) for i in range(12)]])
phi = np.matrix(phi, dtype=float)
Harmonic = np.zeros(shape = (1000,2))

for i in range(1, 1001):
    y, sr = librosa.load(r'/Users/zilewang/Desktop/bda/1000songs/clips_45seconds/%d.mp3' % i)
    per_sample_t = 1.0 / sr
    chromagram_frame_length, chromagram_hop_length = int(100 / 1000 / per_sample_t), int(12.5 / 1000 / per_sample_t)
    chromagram = librosa.feature.chroma_stft(y, sr=sr, n_fft= chromagram_frame_length, \
                                             hop_length = chromagram_hop_length, \
                                             win_length = chromagram_frame_length, window = 'hamming', \
                                             n_chroma = 12)
    # zero-column delete
    k, ln_chro = 0, chromagram.shape[1]
    while k < ln_chro:
        if k >= chromagram.shape[1]:
            break
        if np.sum(chromagram[:,k]) == 0:
            chromagram = np.delete(chromagram, k, 1)
            k -= 1
        k += 1
    zeta = np.zeros(shape = (6, chromagram.shape[1]))
    for j in range(chromagram.shape[1]):
        zeta[:,j] = np.dot(phi, chromagram[:,j]) / np.abs(np.sum(chromagram[:,j]))
    delta = []
    for j in range(1, chromagram.shape[1] - 1):
        delta.append(np.sum((zeta[:,j-1] - zeta[:,j+1]) ** 2))
    
    Harmonic[i-1,0] = np.mean(delta)
    Harmonic[i-1,1] = np.var(delta) 

Harmonic_1000_songs = pd.DataFrame(index=[i+1 for i in range(1000)], columns=['Harmonic mean', 'Harmonic variance'],\
                                  data=Harmonic)
Harmonic_1000_songs.to_csv('Harmonic_change_1000_songs.csv') 



















































# Feature 2. Spectral (Not done by me, ToDo)
###  Spectral
The spectral features consists of
32 spectral flatness measures, 32 spectral crest factors, and
26 Mel-scale frequency cepstral coefficients.

In [15]:
# SFM 
# https://librosa.github.io/librosa/generated/librosa.feature.spectral_flatness.html
flatness = librosa.feature.spectral_flatness(y=y, n_fft=2048, hop_length=512)
print(flatness.shape)
# MFCC # takes first 13
mfcc = librosa.feature.mfcc(y=y, sr=sr)
print(mfcc.shape)

(1, 1941)
(20, 1941)
