# Baseline Assessment

After applying the few "state of the art" tools and gathering their predictions on golden-standard datasets (essays & myPersonality), We'll assess result's accuracy and correlation to the datasets' true labels


### Organize The Results

Combine predictions from various tools and assess

First, we manually combined the results to speadsheets, against the true labels.
The assessment spreadsheets are available here:

- [Essays dataset](https://github.com/eliranshemtov/Musical-Preferences-And-Textual-Expression/blob/main/analysis/tools-baseline/essays-combined-predictions.xlsx)
- [MyPersonality dataset](https://github.com/eliranshemtov/Musical-Preferences-And-Textual-Expression/blob/main/analysis/tools-baseline/myPersonality-combined-predictions.xlsx)
- [MyPersonality concatenated dataset](https://github.com/eliranshemtov/Musical-Preferences-And-Textual-Expression/blob/main/analysis/tools-baseline/myPersonality-concatenated-combined-predictions.xlsx)


### The Pearson correlation coefficient

is a measure of the strength and direction of the linear relationship between two quantitative variables. It is a number between -1 and 1 that measures the strength and direction of the relationship between two variables. The Pearson correlation coefficient is the most common way of measuring a linear correlation. It is a descriptive statistic, meaning that it summarizes the characteristics of a dataset. Specifically, it describes the strength and direction of the linear relationship between two quantitative variables. The formula for Pearson's correlation coefficient is the covariance of the two variables divided by the product of their standard deviations. The Pearson correlation coefficient is also an inferential statistic, meaning that it can be used to test statistical hypotheses.


In [1]:
from sklearn.metrics import accuracy_score
from scipy import stats
import pandas as pd


# Example:
x, y = [1, 2, 3, 4, 5, 6, 7], [10, 9, 2.5, 6, 4, 3, 2]
res = stats.pearsonr(x, y)
print(res)

PearsonRResult(statistic=-0.8285038835884277, pvalue=0.021280260007523352)


### Accuracy Score

Accuracy is one metric for evaluating classification models. Informally, accuracy is the fraction of predictions our model got right. Formally, accuracy has the following definition:

<math xmlns="http://www.w3.org/1998/Math/MathML" display="block">
  <mtext>Accuracy</mtext>
  <mo>=</mo>
  <mfrac>
    <mtext>Number of correct predictions</mtext>
    <mtext>Total number of predictions</mtext>
  </mfrac>
</math>

For binary classification, accuracy can also be calculated in terms of positives and negatives as follows:
<math xmlns="http://www.w3.org/1998/Math/MathML" display="block">
<mtext>Accuracy</mtext>
<mo>=</mo>
<mfrac>
<mrow>
<mi>T</mi>
<mi>P</mi>
<mo>+</mo>
<mi>T</mi>
<mi>N</mi>
</mrow>
<mrow>
<mi>T</mi>
<mi>P</mi>
<mo>+</mo>
<mi>T</mi>
<mi>N</mi>
<mo>+</mo>
<mi>F</mi>
<mi>P</mi>
<mo>+</mo>
<mi>F</mi>
<mi>N</mi>
</mrow>
</mfrac>
</math>

Where TP = True Positives, TN = True Negatives, FP = False Positives, and FN = False Negatives.


In [2]:
# Example:
y_pred = [0, 2, 1, 3]
y_true = [0, 1, 2, 3]
accuracy_score(y_true, y_pred)

0.5

#### I am going to calculate the **Pearson correlation coefficient** and **Accuracy Score** for every type of prediction of the Big Five traits, by every one of the tools I used.


# Preperations & Definitions


In [3]:
true_labels_keys = ["cEXT", "cNEU", "cAGR", "cCON", "cOPN"]
tool1_keys = [
    "pred_sEXT_normalized",
    "pred_sNEU_normalized",
    "pred_sAGR_normalized",
    "pred_sCON_normalized",
    "pred_sOPN_normalized",
]
tool3_keys = [
    "BIG5_Extraversion",
    "BIG5_Neuroticism",
    "BIG5_Agreeableness",
    "BIG5_Conscientiousness",
    "BIG5_Openness",
]
tool4_keys = [
    "cEXT_prediction",
    "cNEU_prediction",
    "cAGR_prediction",
    "cCON_prediction",
    "cOPN_prediction",
]

true_and_tool1_labels = {true_labels_keys[i]: tool1_keys[i] for i in range(5)}
true_and_tool3_labels = {true_labels_keys[i]: tool3_keys[i] for i in range(5)}
true_and_tool4_labels = {true_labels_keys[i]: tool4_keys[i] for i in range(5)}

# Helper functions

Every tool's data had to be normalized in a certain way to be used for calculations.

- The True-Labels are loaded and transformed from "y"/"n" format to 0/1 format.
- Tool #4's data is simply converted to int
- Tool #3's data is thresholded on 0.5 (>= 0.5 is 1, otherwise 0)
- Tool #1's data had to be re-noramalized using the following method in addition to 0.5 threshold. Details are in the method's docstring.


In [4]:
def re_normalize_tool_1(df: pd.DataFrame) -> pd.DataFrame:
    """
    The original normalization logic done by the actual tool is as follows and does not yield meaningful results because the prediction for OPN and NEU are constantly the max and min values.
        def original_tool_normalization():
            min_value = min(pred_sOPN, pred_sCON, pred_sEXT, pred_sAGR, pred_sNEU)
            max_value = max(pred_sOPN, pred_sCON, pred_sEXT, pred_sAGR, pred_sNEU)

            scaled_min = 0.05
            scaled_max = 0.95

            pred_sOPN_normalized = (pred_sOPN - min_value) / (max_value - min_value) * (scaled_max - scaled_min) + scaled_min  # Always scores to 0.95
            pred_sCON_normalized = (pred_sCON - min_value) / (max_value - min_value) * (scaled_max - scaled_min) + scaled_min
            pred_sEXT_normalized = (pred_sEXT - min_value) / (max_value - min_value) * (scaled_max - scaled_min) + scaled_min
            pred_sAGR_normalized = (pred_sAGR - min_value) / (max_value - min_value) * (scaled_max - scaled_min) + scaled_min
            pred_sNEU_normalized = (pred_sNEU - min_value) / (max_value - min_value) * (scaled_max - scaled_min) + scaled_min  # Always scores to 0.5
    """
    scaled_min = 0.05
    scaled_max = 0.95

    for column in ["pred_sEXT", "pred_sNEU", "pred_sAGR", "pred_sCON", "pred_sOPN"]:
        min_value = df[column].min()
        max_value = df[column].max()

        df[f"{column}_normalized"] = scaled_min + (df[column] - min_value) / (
            max_value - min_value
        ) * (scaled_max - scaled_min)
    return df

In [5]:
def normalize_scores_by_half_threshold(scores):
    """
    Used by Tool #3 & #1
    """
    half_threshold = 0.5
    return [1 if score >= half_threshold else 0 for score in scores]


def normalize_boolean_labels_to_numerical(scores):
    """
    Used by True Labels
    """
    return [1 if score == "y" else 0 for score in scores]


def to_int(scores):
    """
    Used by Tool #4
    """
    return [1 if score == 1.0 else 0 for score in scores]


def get_true_labels_and_matching_predictions(
    df, true_labels_dict, true_labels_and_tool_keys_dict, normalization_func
):
    tool_values_normalized = {
        k: normalization_func(df[k].tolist())
        for k in true_labels_and_tool_keys_dict.values()
    }

    result = []
    for key in true_labels_keys:
        result.append(
            (
                true_labels_dict.get(key),
                tool_values_normalized.get(true_labels_and_tool_keys_dict.get(key)),
            )
        )
    return result

# Load Data & Calculate Correlation Coefficient


## Essays


In [6]:
input_file_path = "./analysis/tools-baseline/essays-combined-predictions.xlsx"

df = pd.read_excel(input_file_path, header=1)
true_labels_dict = {
    k: normalize_boolean_labels_to_numerical(df[k].tolist()) for k in true_labels_keys
}

df = re_normalize_tool_1(df)
true_labels_with_tool1_tuple = get_true_labels_and_matching_predictions(
    df, true_labels_dict, true_and_tool1_labels, normalize_scores_by_half_threshold
)
true_labels_with_tool3_tuple = get_true_labels_and_matching_predictions(
    df, true_labels_dict, true_and_tool3_labels, normalize_scores_by_half_threshold
)
true_labels_with_tool4_tuple = get_true_labels_and_matching_predictions(
    df, true_labels_dict, true_and_tool4_labels, to_int
)

In [7]:
print("Tool #1")
for i in range(len(true_labels_with_tool1_tuple)):
    print(
        f"Correlation Coefficient for: {true_labels_keys[i]}",
        stats.pearsonr(*true_labels_with_tool1_tuple[i]),
    )
    print(
        f"Accuracy for: {true_labels_keys[i]}",
        accuracy_score(*true_labels_with_tool1_tuple[i]),
    )

print("\n\n")
print("Tool #3")
for i in range(len(true_labels_with_tool3_tuple)):
    print(
        f"Correlation Coefficient for: {true_labels_keys[i]}",
        stats.pearsonr(*true_labels_with_tool3_tuple[i]),
    )
    print(
        f"Accuracy for: {true_labels_keys[i]}",
        accuracy_score(*true_labels_with_tool3_tuple[i]),
    )

print("\n\n")
print("Tool #4")
for i in range(len(true_labels_with_tool4_tuple)):
    print(
        f"Correlation Coefficient for: {true_labels_keys[i]}",
        stats.pearsonr(*true_labels_with_tool4_tuple[i]),
    )
    print(
        f"Accuracy for: {true_labels_keys[i]}",
        accuracy_score(*true_labels_with_tool4_tuple[i]),
    )

Tool #1
Correlation Coefficient for: cEXT PearsonRResult(statistic=0.02673296008862707, pvalue=0.18430024911539217)
Accuracy for: cEXT 0.5214748784440842
Correlation Coefficient for: cNEU PearsonRResult(statistic=0.008498031063865487, pvalue=0.6730489803556491)
Accuracy for: cNEU 0.5040518638573744
Correlation Coefficient for: cAGR PearsonRResult(statistic=0.03134768327951032, pvalue=0.11949084176053548)
Accuracy for: cAGR 0.5072933549432739
Correlation Coefficient for: cCON PearsonRResult(statistic=0.017462863261542342, pvalue=0.38585233713438616)
Accuracy for: cCON 0.5113452188006483
Correlation Coefficient for: cOPN PearsonRResult(statistic=0.13157584577466494, pvalue=5.3166922763821695e-11)
Accuracy for: cOPN 0.5672609400324149



Tool #3
Correlation Coefficient for: cEXT PearsonRResult(statistic=0.05349739260942245, pvalue=0.007854700722787284)
Accuracy for: cEXT 0.4959481361426256
Correlation Coefficient for: cNEU PearsonRResult(statistic=0.10650321988765314, pvalue=1.13678357284

# MyPersonality


In [8]:
input_file_path = (
    "./analysis/tools-baseline/myPersonality-concatenated-combined-predictions.xlsx"
)

df = pd.read_excel(input_file_path, header=1)
true_labels_dict = {
    k: normalize_boolean_labels_to_numerical(df[k].tolist()) for k in true_labels_keys
}

df = re_normalize_tool_1(df)
true_labels_with_tool1_tuple = get_true_labels_and_matching_predictions(
    df, true_labels_dict, true_and_tool1_labels, normalize_scores_by_half_threshold
)
true_labels_with_tool3_tuple = get_true_labels_and_matching_predictions(
    df, true_labels_dict, true_and_tool3_labels, normalize_scores_by_half_threshold
)
true_labels_with_tool4_tuple = get_true_labels_and_matching_predictions(
    df, true_labels_dict, true_and_tool4_labels, to_int
)

In [9]:
print("Tool #1")
for i in range(len(true_labels_with_tool1_tuple)):
    print(
        f"Correlation Coefficient for: {true_labels_keys[i]}",
        stats.pearsonr(*true_labels_with_tool1_tuple[i]),
    )
    print(
        f"Accuracy for: {true_labels_keys[i]}",
        accuracy_score(*true_labels_with_tool1_tuple[i]),
    )

print("\n\n")
print("Tool #3")
for i in range(len(true_labels_with_tool3_tuple)):
    print(
        f"Correlation Coefficient for: {true_labels_keys[i]}",
        stats.pearsonr(*true_labels_with_tool3_tuple[i]),
    )
    print(
        f"Accuracy for: {true_labels_keys[i]}",
        accuracy_score(*true_labels_with_tool3_tuple[i]),
    )

print("\n\n")
print("Tool #4")
for i in range(len(true_labels_with_tool4_tuple)):
    print(
        f"Correlation Coefficient for: {true_labels_keys[i]}",
        stats.pearsonr(*true_labels_with_tool4_tuple[i]),
    )
    print(
        f"Accuracy for: {true_labels_keys[i]}",
        accuracy_score(*true_labels_with_tool4_tuple[i]),
    )

Tool #1
Correlation Coefficient for: cEXT PearsonRResult(statistic=0.1406236278935863, pvalue=5.637647816977117e-45)
Accuracy for: cEXT 0.849450438640718
Correlation Coefficient for: cNEU PearsonRResult(statistic=0.18470481664709096, pvalue=7.978354703375822e-77)
Accuracy for: cNEU 0.8732479580518302
Correlation Coefficient for: cAGR PearsonRResult(statistic=0.15572707661246574, pvalue=7.137560478676682e-55)
Accuracy for: cAGR 0.8175859634970253
Correlation Coefficient for: cCON PearsonRResult(statistic=0.16539921684497919, pvalue=9.18820908118067e-62)
Accuracy for: cCON 0.820611071896743
Correlation Coefficient for: cOPN PearsonRResult(statistic=0.19872569308921464, pvalue=7.1339809047227e-89)
Accuracy for: cOPN 0.7686800443682565



Tool #3
Correlation Coefficient for: cEXT PearsonRResult(statistic=0.06313735601501289, pvalue=3.1145807084311104e-10)
Accuracy for: cEXT 0.8480387213875163
Correlation Coefficient for: cNEU PearsonRResult(statistic=0.10815077160013971, pvalue=3.425796319