# Baseline Assessment

After applying the few "state of the art" tools and gathering their predictions on golden-standard datasets (essays & myPersonality), We'll assess result's accuracy and correlation to the datasets' true labels


### Organize The Results

Combine predictions from various tools and assess

First, we manually combined the results to speadsheets, against the true labels.
The assessment spreadsheets are available here:

- [Essays dataset](https://github.com/eliranshemtov/Musical-Preferences-And-Textual-Expression/blob/main/analysis/tools-baseline/essays-combined-predictions.xlsx)
- [MyPersonality dataset](https://github.com/eliranshemtov/Musical-Preferences-And-Textual-Expression/blob/main/analysis/tools-baseline/myPersonality-combined-predictions.xlsx)
- [MyPersonality concatenated dataset](https://github.com/eliranshemtov/Musical-Preferences-And-Textual-Expression/blob/main/analysis/tools-baseline/myPersonality-concatenated-combined-predictions.xlsx)


### The Pearson correlation coefficient

is a measure of the strength and direction of the linear relationship between two quantitative variables. It is a number between -1 and 1 that measures the strength and direction of the relationship between two variables. The Pearson correlation coefficient is the most common way of measuring a linear correlation. It is a descriptive statistic, meaning that it summarizes the characteristics of a dataset. Specifically, it describes the strength and direction of the linear relationship between two quantitative variables. The formula for Pearson's correlation coefficient is the covariance of the two variables divided by the product of their standard deviations. The Pearson correlation coefficient is also an inferential statistic, meaning that it can be used to test statistical hypotheses.

I am going to calculate the Pearson correlation coefficient for every type of prediction of the Big Five traits, by every one of the tools I used.


In [1]:
import numpy as np
from scipy import stats
import pandas as pd


# Example:
x, y = [1, 2, 3, 4, 5, 6, 7], [10, 9, 2.5, 6, 4, 3, 2]
res = stats.pearsonr(x, y)
print(res)

PearsonRResult(statistic=-0.8285038835884277, pvalue=0.021280260007523352)


# Preperations & Definitions


In [53]:
true_labels_keys = ["cEXT", "cNEU", "cAGR", "cCON", "cOPN"]
tool1_keys = [
    "pred_sEXT_normalized",
    "pred_sNEU_normalized",
    "pred_sAGR_normalized",
    "pred_sCON_normalized",
    "pred_sOPN_normalized",
]
tool3_keys = [
    "BIG5_Extraversion",
    "BIG5_Neuroticism",
    "BIG5_Agreeableness",
    "BIG5_Conscientiousness",
    "BIG5_Openness",
]
tool4_keys = [
    "cEXT_prediction",
    "cNEU_prediction",
    "cAGR_prediction",
    "cCON_prediction",
    "cOPN_prediction",
]

true_and_tool1_labels = {true_labels_keys[i]: tool1_keys[i] for i in range(5)}
true_and_tool3_labels = {true_labels_keys[i]: tool3_keys[i] for i in range(5)}
true_and_tool4_labels = {true_labels_keys[i]: tool4_keys[i] for i in range(5)}

# Helper functions

Every tool's data had to be normalized in a certain way to be used for calculations.

- The True-Labels are loaded and transformed from "y"/"n" format to 0/1 format.
- Tool #4's data is simply converted to int
- Tool #3's data is thresholded on 0.5 (>= 0.5 is 1, otherwise 0)
- Tool #1's data had to be re-noramalized using the following method in addition to 0.5 threshold. Details are in the method's docstring.


In [41]:
def re_normalize_tool_1(df: pd.DataFrame) -> pd.DataFrame:
    """
    The original normalization logic done by the actual tool is as follows and does not yield meaningful results because the prediction for OPN and NEU are constantly the max and min values.
        def original_tool_normalization():
            min_value = min(pred_sOPN, pred_sCON, pred_sEXT, pred_sAGR, pred_sNEU)
            max_value = max(pred_sOPN, pred_sCON, pred_sEXT, pred_sAGR, pred_sNEU)

            scaled_min = 0.05
            scaled_max = 0.95

            pred_sOPN_normalized = (pred_sOPN - min_value) / (max_value - min_value) * (scaled_max - scaled_min) + scaled_min  # Always scores to 0.95
            pred_sCON_normalized = (pred_sCON - min_value) / (max_value - min_value) * (scaled_max - scaled_min) + scaled_min
            pred_sEXT_normalized = (pred_sEXT - min_value) / (max_value - min_value) * (scaled_max - scaled_min) + scaled_min
            pred_sAGR_normalized = (pred_sAGR - min_value) / (max_value - min_value) * (scaled_max - scaled_min) + scaled_min
            pred_sNEU_normalized = (pred_sNEU - min_value) / (max_value - min_value) * (scaled_max - scaled_min) + scaled_min  # Always scores to 0.5
    """
    scaled_min = 0.05
    scaled_max = 0.95

    for column in ["pred_sEXT", "pred_sNEU", "pred_sAGR", "pred_sCON", "pred_sOPN"]:
        min_value = df[column].min()
        max_value = df[column].max()

        df[f"{column}_normalized"] = scaled_min + (df[column] - min_value) / (
            max_value - min_value
        ) * (scaled_max - scaled_min)
    return df

In [55]:
def normalize_scores_by_half_threshold(scores):
    """
    Used by Tool #3 & #1
    """
    half_threshold = 0.5
    return [1 if score >= half_threshold else 0 for score in scores]


def normalize_boolean_labels_to_numerical(scores):
    """
    Used by True Labels
    """
    return [1 if score == "y" else 0 for score in scores]


def to_int(scores):
    """
    Used by Tool #4
    """
    return [1 if score == 1.0 else 0 for score in scores]


def get_tool_pearson_correlation_coefficient(
    df, true_labels_dict, true_labels_and_tool_keys_dict, normalization_func
):
    tool_values_normalized = {
        k: normalization_func(df[k].tolist())
        for k in true_labels_and_tool_keys_dict.values()
    }

    for key in true_labels_keys:
        res = stats.pearsonr(
            true_labels_dict.get(key),
            tool_values_normalized.get(true_labels_and_tool_keys_dict.get(key)),
        )
        print((key, true_labels_and_tool_keys_dict.get(key), res))

# Load Data & Calculate Correlation Coefficient


## Essays


In [56]:
input_file_path = "./analysis/tools-baseline/essays-combined-predictions.xlsx"

df = pd.read_excel(input_file_path, header=1)
true_labels_dict = {
    k: normalize_boolean_labels_to_numerical(df[k].tolist()) for k in true_labels_keys
}

df = re_normalize_tool_1(df)
print("Tool #1")
get_tool_pearson_correlation_coefficient(
    df, true_labels_dict, true_and_tool1_labels, normalize_scores_by_half_threshold
)
print("\n\nTool #3")
get_tool_pearson_correlation_coefficient(
    df, true_labels_dict, true_and_tool3_labels, normalize_scores_by_half_threshold
)
print("\n\nTool #4")
get_tool_pearson_correlation_coefficient(
    df, true_labels_dict, true_and_tool4_labels, to_int
)

Tool #1
('cEXT', 'pred_sEXT_normalized', PearsonRResult(statistic=0.02673296008862707, pvalue=0.18430024911539217))
('cNEU', 'pred_sNEU_normalized', PearsonRResult(statistic=0.008498031063865487, pvalue=0.6730489803556491))
('cAGR', 'pred_sAGR_normalized', PearsonRResult(statistic=0.03134768327951032, pvalue=0.11949084176053548))
('cCON', 'pred_sCON_normalized', PearsonRResult(statistic=0.017462863261542342, pvalue=0.38585233713438616))
('cOPN', 'pred_sOPN_normalized', PearsonRResult(statistic=0.13157584577466494, pvalue=5.3166922763821695e-11))


Tool #3
('cEXT', 'BIG5_Extraversion', PearsonRResult(statistic=0.05349739260942245, pvalue=0.007854700722787284))
('cNEU', 'BIG5_Neuroticism', PearsonRResult(statistic=0.10650321988765314, pvalue=1.136783572841032e-07))
('cAGR', 'BIG5_Agreeableness', PearsonRResult(statistic=0.047015666189085384, pvalue=0.01950175834344762))
('cCON', 'BIG5_Conscientiousness', PearsonRResult(statistic=0.09792827656229744, pvalue=1.0923344277555917e-06))
('cOPN

## MyPersonality


In [57]:
input_file_path = (
    "./analysis/tools-baseline/myPersonality-concatenated-combined-predictions.xlsx"
)

df = pd.read_excel(input_file_path, header=1)
true_labels_dict = {
    k: normalize_boolean_labels_to_numerical(df[k].tolist()) for k in true_labels_keys
}

df = re_normalize_tool_1(df)
print("Tool #1")
get_tool_pearson_correlation_coefficient(
    df, true_labels_dict, true_and_tool1_labels, normalize_scores_by_half_threshold
)
print("\n\nTool #3")
get_tool_pearson_correlation_coefficient(
    df, true_labels_dict, true_and_tool3_labels, normalize_scores_by_half_threshold
)
print("\n\nTool #4")
get_tool_pearson_correlation_coefficient(
    df, true_labels_dict, true_and_tool4_labels, to_int
)

Tool #1
('cEXT', 'pred_sEXT_normalized', PearsonRResult(statistic=0.1406236278935863, pvalue=5.637647816977117e-45))
('cNEU', 'pred_sNEU_normalized', PearsonRResult(statistic=0.18470481664709096, pvalue=7.978354703375822e-77))
('cAGR', 'pred_sAGR_normalized', PearsonRResult(statistic=0.15572707661246574, pvalue=7.137560478676682e-55))
('cCON', 'pred_sCON_normalized', PearsonRResult(statistic=0.16539921684497919, pvalue=9.18820908118067e-62))
('cOPN', 'pred_sOPN_normalized', PearsonRResult(statistic=0.19872569308921464, pvalue=7.1339809047227e-89))


Tool #3
('cEXT', 'BIG5_Extraversion', PearsonRResult(statistic=0.06313735601501289, pvalue=3.1145807084311104e-10))
('cNEU', 'BIG5_Neuroticism', PearsonRResult(statistic=0.10815077160013971, pvalue=3.425796319723674e-27))
('cAGR', 'BIG5_Agreeableness', PearsonRResult(statistic=0.1226005561776652, pvalue=1.603410684420724e-34))
('cCON', 'BIG5_Conscientiousness', PearsonRResult(statistic=0.11915681333347758, pvalue=1.0872546654038495e-32))
('