# Objective Evaluation for Emotion Expressiveness
---
This notebook provides the code to compute objective metrics for evaluating emotion expressiveness. The evaluation metrics include mel-cepstral distortion (MCD), pitch/energy distortion, and frame disturbance. The main function for this evaluation is `get_evaluation_scores`, which is implemented in the file `Summary-Hierarchical-ED/implementation/sho_util/pyfiles/objective_evaluation.py`.

---

In [None]:
import warnings
warnings.filterwarnings("ignore")

import scipy.stats as st
import pandas as pd
import numpy as np
import glob
from tqdm import tqdm
import os

import sys
sys.path.append("../sho_util/pyfiles/")
from basic import get_bool_base_on_conditions
from objective_evaluation import get_evaluation_scores

def init_scores():
    indv_scores = {}
    for f in features:
        indv_scores[f] = []
    return indv_scores

---
# Emotion Similarity among Speakers
---

Here, we computed the emotion similarity among different speakers in ESD for tutorial. You can easily apply this to your own models by modifying `allfiles` and `gtfiles`.

- **`sr`**:  
  An integer that specifies the sampling rate of the speech.

- **`dataset_dir`** (not neccessary):  
  A string that indicates the path to the dataset directory.

- **`allfiles`**:  
  A dictionary mapping each model name (key) to its corresponding list of WAV files.

  
- **`gtfiles`**:  
  A dictionary mapping each WAV file path (key) to its corresponding ground-truth (reference) file.

---

In [None]:
###########################################
########## Adjustable Parameters ##########
###########################################

sr = 16000
dataset_dir = "../Dataset/ESD/"
allfiles = {
    "0011": glob.glob(dataset_dir+"0011/Neutral/evaluation/*.wav"),
    "0013": glob.glob(dataset_dir+"0013/Neutral/evaluation/*.wav"),
    "0015": glob.glob(dataset_dir+"0015/Neutral/evaluation/*.wav"),
    "0016": glob.glob(dataset_dir+"0016/Neutral/evaluation/*.wav"),
}
gtfiles = {path: path.replace(f"/{key}", "/0017") for key in allfiles for path in allfiles[key]}

###########################################
###########################################
###########################################

target_columns = {
    ("mcd", "score"): "Melcepstral Distortion",
    ("pitch", "score"): "Pitch Distortion",
    ("energy", "score"): "Energy Distortion",
    ("mcd", "fd"): "Frame Disturbance",
}
feature_types = ["mcd", "pitch", "pitch_remove0", "energy"]
score_types = ["distance", "score", "fd"]

print("################################################")
print("Get Objective Evaluation Scores of Target Models")
print("################################################")

ges = get_evaluation_scores(sr=sr)
features = []
for ft in feature_types:
    for score_type in score_types:
        features += [f"{ft}-{score_type}"]
scores = {}
for key in allfiles:
    print(key)
    files = allfiles[key]
    files.sort()
    indv_scores = init_scores()
    for path in tqdm(files):
        gt_path = gtfiles[path]
        data = ges.get(gt_path, path, p_logscale=False, e_logscale=False)
        for fname in feature_types:
            for v in score_types:
                indv_scores[f"{fname}-{v}"] += [data[fname][v]]
    scores[key] = indv_scores

print()
print("##########################################################")
print("Post-processing to Obtain Mean and Interval for each Model")
print("##########################################################")
    
arrays = []
for fname in feature_types:
    for v in score_types:
        array = {name: scores[name][f"{fname}-{v}"] for name in scores}
        df = pd.DataFrame(array).describe().loc[["mean"]]
        df.index = [f"{fname}-{v}"]
        arrays += [np.array([fname, v, *df.values[0]])]
            
new_columns = ["feature", "score type", *list(df.columns)]
df_score = pd.DataFrame(np.array(arrays), columns=new_columns)
df_score.loc[:,df.columns] = df_score.loc[:,df.columns].astype("float")
df = df_score.T.copy()
columns = []
for i in range(df.shape[1]):
    columns += [(df.loc["feature"].values[i], df.loc["score type"].values[i])]
df = df.iloc[2:]
df.columns = pd.MultiIndex.from_tuples(columns)
mcdmeandf = df.loc[:, list(target_columns)]

#create 95% confidence interval for population mean weight
df_score_ivl = df_score.copy()
for name in scores:
    for fname in feature_types:
        for v in score_types:
            values = scores[name][f"{fname}-{v}"]
            ivl = st.t.interval(confidence=0.95, df=len(values)-1, loc=np.mean(values), scale=st.sem(values)) 
            # print("    ", name, (ivl[1]-ivl[0])/2)
            params = {"feature":[fname], "score type":[v]}
            df_score_ivl.loc[get_bool_base_on_conditions(df_score_ivl, params), name] = (ivl[1]-ivl[0])/2
df = df_score_ivl.T.copy()
columns = []
for i in range(df.shape[1]):
    columns += [(df.loc["feature"].values[i], df.loc["score type"].values[i])]
df = df.iloc[2:]
df.columns = pd.MultiIndex.from_tuples(columns)
mcdstddf = df.loc[:, list(target_columns)]

df = pd.concat([mcdmeandf, mcdstddf], axis=1)
newcolumns = pd.MultiIndex.from_tuples([(cl1, cl2) for cl1 in ["Mean", "Interval"] for cl2 in list(target_columns.values())])
df.columns = newcolumns
print()
df

---

This shows the expected output of the metric values. In the example below, the speakers are identified by their IDs along with their genders:

- `0011`: Male
- `0013`: Male
- `0015`: Female
- `0016`: Female
- `0017` (reference speaker): Female

The results indicate that the female speakers (`0015` and `0016`) have higher similarity, particularly with respect to pitch.

  <img src="images/03_expected_output.png" width="1100">

---