# Forming Personality Traits Baseline

## By assessing existing tools' predictions

The data that will be assessed in this section will be: [essays.zip]("http://web.archive.org/web/20160519045708/http://mypersonality.org/wiki/lib/exe/fetch.php?media=wiki:essays.zip"): a large dataset of 2400 stream-of-consciousness texts labelled with personality, produced by Pennebaker & King 1999 and used by Mairesse et al. 2007.


### Tool #1 - https://project.fuguixing.me/

A web application for personality analysis, Big Five personality prediction, and emotion analysis. Powered by Azure Static Web App, Azure Function, React, and Machine Learning
Sourced at: https://github.com/fuguixing/psychology-insights-frontend/tree/master


In [4]:
import csv
import requests
import time
import json
from tqdm import tqdm

api_url = "https://project.fuguixing.me/api/bigfive"
essays_data_csv_file_path = "./data/essays.csv"
result_file_path = "./analysis/tool-1-baseline.csv"


with open(essays_data_csv_file_path, newline="", encoding="ISO-8859-1") as csvfile:
    csv_reader = csv.DictReader(csvfile, delimiter=",")
    fieldnames = csv_reader.fieldnames + [
        "pred_sOPN",
        "pred_sOPN_normalized",
        "pred_sCON",
        "pred_sCON_normalized",
        "pred_sEXT",
        "pred_sEXT_normalized",
        "pred_sAGR",
        "pred_sAGR_normalized",
        "pred_sNEU",
        "pred_sNEU_normalized",
        "pred_sentiment",
    ]

    with open(result_file_path, "w", newline="", encoding="utf-8") as updated_csvfile:
        csv_writer = csv.DictWriter(
            updated_csvfile, fieldnames=fieldnames, delimiter="\t"
        )
        csv_writer.writeheader()

        for row in tqdm(csv_reader):
            payload = f'"{row.get("TEXT")}"'
            headers = {"Content-Type": "text/plain"}
            response = requests.post(
                api_url, data=json.dumps(payload), headers=headers)
            new_values = {}

            if response.status_code == 200:
                api_data = response.json()
                prediction = api_data.get("prediction", {})
                new_values.update(prediction)
            else:
                print(
                    f"POST request failed for row with ID {row['#AUTHID']}. Status code: {response.status_code}"
                )

            row.update(new_values)
            csv_writer.writerow(row)
            time.sleep(1)

print(f"CSV file with added columns created: {result_file_path}")

2468it [00:02, 1171.13it/s]

CSV file with added columns created: ./analysis/tool-1-baseline.csv





### Tool #2 - [Personality Recognizer v1.03](http://farm2.user.srcf.net/research/personality/recognizer.html)

# ⚠️ Aborted. Poor & Outdated results

This work is a bit old [Mairesse et al., 2007](http://farm2.user.srcf.net/research/papers/personality-jair07.pdf), however, acts as a real black-box.
This Java program is based on models analyzed in the paper, and shown to predict personality scores significantly better than a constant baseline. The program uses a command line interface, and outputs scores on a scale from 1 to 7, e.g. where 7 is strongly extravert.


Now we'll parse the essays.csv dataset and convert it to a folder of txt files, as this program expects.


In [None]:
import csv
import os

csv_file_path = "./data/essays.csv"
output_folder = "./data/essays_as_txt"

if not os.path.exists(output_folder):
    os.makedirs(output_folder)

with open(csv_file_path, newline="", encoding="ISO-8859-1") as csvfile:
    csv_reader = csv.DictReader(
        csvfile, delimiter=",", quoting=csv.QUOTE_MINIMAL)

    for row in csv_reader:
        auth_id = row.get("#AUTHID", "")
        text_data = row.get('"TEXT"', "")

        if auth_id and text_data:
            txt_file_path = os.path.join(output_folder, f"{auth_id}.txt")

            with open(txt_file_path, "w", encoding="utf-8") as txtfile:
                txtfile.write(text_data)

print("TXT files saved in the output folder:", output_folder)

#### As this package is old and un-maintained, it requires significant effort to run it. The assessment results for the essays.csv dataset is detailed in the original [article](http://farm2.user.srcf.net/research/papers/personality-jair07.pdf) (page 19/44)

"Classification results for the essays corpus with self-reports are in Table 12. Interestingly,
openness to experience is the easiest trait to model as five classifiers out of six significantly
outperform the baseline and four of them produce their best performance for that trait,
with accuracies up to 62.1% using support vector machines (SMO). Emotional stability
produces the second best performance for four classifiers out of six, with 57.4% accuracy
for the SMO model. Conscientiousness is the hardest trait to model as only two classifiers
significantly outperform the baseline, however the SMO model performs as well as the best
model for extraversion and agreeableness, with around 55% correct classifications.
We find that support vector machines generally perform the best, with Naive Bayes and
AdaboostM1 in second position. SMO significantly outperforms the majority class baseline
for each trait. A J48 decision tree for recognising extraversion is shown in Figure 1, and the
rule-based JRip model classifying openness to experience with 58.8% accuracy is illustrated
in Table 16."

We can try and utilize this Java app if the results looks interesting enough.


### Tool #3 Apply Magic Sauce

Apply Magic Sauce is a non-profit academic research project coordinated by the University of Cambridge Psychometrics Centre.
The project aimed to analyze and predict individuals' psychological traits, such as personality, based on their digital footprints, including social media activity, likes, and other online behaviors. The project utilized advanced algorithms and machine learning techniques to make predictions about users' personalities.


In [11]:
from calendar import c
import csv
import requests
import time
import json
from tqdm import tqdm

api_url = "https://applymagicsauce.com/api/predictions/text"
api_token = "eyJhbGciOiJIUzUxMiJ9.eyJzdWIiOiI3YTAzOWU4MC0yYWUzLTRlMmEtOWZlMi00ZTc0MmU5NjczYWMiLCJjdXJyZW50VXNlciI6eyJpZCI6IjdhMDM5ZTgwLTJhZTMtNGUyYS05ZmUyLTRlNzQyZTk2NzNhYyIsInJvbGUiOiJVU0VSIiwiYXV0aG9yaXRpZXMiOlsiUk9MRV9VU0VSIl19LCJleHAiOjE2OTcyNzE5MDh9.hngcBBlQLxHqm9pX7r-3r4zVPCJFybJ2eryUkPIytQZOF65V5dN6PMxa6TVehoHWq8-6l1ZW6TwcaSeN0uPg5w"
essays_data_csv_file_path = "./data/essays.csv"
result_file_path = "./analysis/tool-3-baseline.csv"

predictions_fields = [
    "BIG5_Openness",
    "BIG5_Conscientiousness",
    "BIG5_Extraversion",
    "BIG5_Agreeableness",
    "BIG5_Neuroticism",
    "Female",
    "Age",
]
with open(essays_data_csv_file_path, newline="", encoding="ISO-8859-1") as csvfile:
    csv_reader = csv.DictReader(csvfile, delimiter=",")
    fieldnames = csv_reader.fieldnames + predictions_fields

    with open(result_file_path, "w", newline="", encoding="utf-8") as updated_csvfile:
        csv_writer = csv.DictWriter(
            updated_csvfile, fieldnames=fieldnames, delimiter=","
        )
        csv_writer.writeheader()
        count = 0
        for row in tqdm(csv_reader):

            if count < 1922:
                count += 1
                continue
            payload = f'"{row.get("TEXT")}"'
            headers = {
                "Content-Type": "text/plain",
                "Authorization": f"Bearer {api_token}",
            }
            response = requests.post(
                api_url, data=json.dumps(payload), headers=headers)

            if response.status_code == 200:
                api_data = response.json()
                prediction = {
                    i.get("trait"): i.get("value") for i in api_data.get("predictions")
                }
                new_values.update(prediction)
            else:
                print(
                    f"POST request failed for row with ID {row['#AUTHID']}. Status code: {response.status_code}"
                )

            row.update(new_values)
            count += 1
            try:
                del row['trait']
            except:
                pass
            csv_writer.writerow(row)
            time.sleep(1)

print(f"CSV file with added columns created: {result_file_path}")

2468it [51:59,  1.26s/it] 

CSV file with added columns created: ./analysis/tool-3-baseline.csv



