## Language Analysis of Alexithymic Discourse

<hr>

Alexithymic Language Project / raul@psicobotica.com / V2 release (sept 2020)

<hr>

### Personality traits detection

There are many mechanisms by which personality traits of the author might be inferred from their text. Here, instead of training our own model (which requires extensive annotated data), we use IBM Personality Insights API. 

- Establish connection with IBM Personality Insights API. 
- sdfsd 

<hr>

- [IBM Personality Insights API](https://www.ibm.com/watson/services/personality-insights/)
- [Big Five Personality Facets](https://github.com/watson-developer-cloud/doc-tutorial-downloads/blob/master/personality-insights/Personality-Insights-Facet-Characteristics.pdf)
- [Big Five Personality Dimensions](https://github.com/watson-developer-cloud/doc-tutorial-downloads/blob/master/personality-insights/Personality-Insights-Dimension-Characteristics.pdf)
- [Consumer Consumption Preference Prediction](https://github.com/watson-developer-cloud/doc-tutorial-downloads/blob/master/personality-insights/Improving-Consumption-Preferences-Accuracy.pdf)


### Using the Watson Developer Cloud Python SDK
- [Python SDK](https://github.com/watson-developer-cloud/python-sdk)
- [API Docs for Python](https://cloud.ibm.com/apidocs/personality-insights?code=python)

Using my own IBM Cloud account. 

Installing IBM Cloud python SDK


In [3]:
# ! pip install --upgrade ibm-watson

Collecting ibm-watson
  Downloading ibm-watson-4.7.1.tar.gz (385 kB)
Collecting ibm_cloud_sdk_core==1.7.3
  Downloading ibm-cloud-sdk-core-1.7.3.tar.gz (27 kB)
Collecting PyJWT>=1.7.1
  Downloading PyJWT-1.7.1-py2.py3-none-any.whl (18 kB)
Building wheels for collected packages: ibm-watson, ibm-cloud-sdk-core
  Building wheel for ibm-watson (setup.py): started
  Building wheel for ibm-watson (setup.py): finished with status 'done'
  Created wheel for ibm-watson: filename=ibm_watson-4.7.1-py3-none-any.whl size=379136 sha256=21610ae36a3b6eb8d19c5bb0416b2a2ae28a1101e2e5b820594abf1cbd452660
  Stored in directory: c:\users\array\appdata\local\pip\cache\wheels\63\0c\16\b2af79aace901934f32f1503a95b338c7bdca6ab2cfa0881e6
  Building wheel for ibm-cloud-sdk-core (setup.py): started
  Building wheel for ibm-cloud-sdk-core (setup.py): finished with status 'done'
  Created wheel for ibm-cloud-sdk-core: filename=ibm_cloud_sdk_core-1.7.3-py3-none-any.whl size=45948 sha256=9b7d8e13169a143ba8c9d87b931e6

### IBM Cloud Authentication and endpoint

In [170]:
# Following the IBM cloud official example at:
# https://github.com/watson-developer-cloud/python-sdk/blob/master/examples/personality_insights_v3.py 
import pandas as pd
import numpy as np
import json
import os
from os.path import join
from ibm_watson import PersonalityInsightsV3
import csv
from ibm_cloud_sdk_core.authenticators import IAMAuthenticator

In [227]:
# Create the API Key
# API key stored in local file apikey.json
with open('apikey.json') as f:
    apikeydata = json.load(f)

In [228]:
# apikeydata.get('PI_key')

In [229]:
# Create a Personality Insight Price Plan (Lite: 1000 calls per month free)
# Select Location. 
# Personality Insights-ro activated
# Service credentials: get it from the list of resources. 

In [231]:
# Using London location (the only one in Europe): 
# https://api.eu-gb.personality-insights.watson.cloud.ibm.com 
# My instance: https://api.eu-gb.personality-insights.watson.cloud.ibm.com/instances/8d60aa18-abce-43f2-8ee0-bfa00b8a1140 

# authenticator = IAMAuthenticator(apikeydata.get('apikey'))
authenticator = IAMAuthenticator(apikeydata.get('PI_key'))
personality_insights = PersonalityInsightsV3(
    version='2017-10-13',  # as per last version documented in API Docs
    authenticator=authenticator
)

personality_insights.set_service_url('https://api.eu-gb.personality-insights.watson.cloud.ibm.com/instances/8d60aa18-abce-43f2-8ee0-bfa00b8a1140')

In [232]:
# Let's first try the API with a simple text 
test_Text = "Aunque el estudio de la mente y su relación con el cerebro es un fenómeno con una larga trayectoria histórica, el enfoque moderno interdisciplinar denominado neurociencia cognitiva aparece en 1976 de la mano de Michael Gazzaniga y George Miller. Durante las cuatro últimas décadas la ciencia del cerebro ha experimentado muchos avances significativos y el área multidisciplinar de la neurociencia cognitiva también ha ido experimentando cambios. Inicialmente, la neurociencia cognitiva se entendía como la colaboración entre el estudio de la mente (psicología) y el estudio del sistema nervioso (neurociencias). En las primeras etapas, el objetivo primordial era entender cómo los procesos neurobiológicos daban lugar a los procesos cognitivos. Es decir, comprender la cognición a través de sus correlatos neurobiológicos. Sin embargo, la mente no se puede explicar completamente desde la dimensión cognitiva (procesamiento de la información), sino que, como bien sabemos hoy en día, la emoción juega también un papel primordial. Por este motivo, algunos autores empezaron a habla de neurociencia afectiva, para resaltar el hecho de que el estudio del sistema nervioso también ha de ir dirigido a comprender la dimensión emocional de la mente."
len(test_Text.split())

186

In [233]:
# Getting the results as CSV:
from ibm_watson import ApiException
try:
    # Invoke a Personality Insights method
    response = personality_insights.profile(
        test_Text,
        content_type='text/plain',
        content_language='es',
        accept_language='en',
        consumption_preferences=True,
        raw_scores=True,
        accept='text/csv',
        csv_headers=True).get_result()
except ApiException as ex:
    print("Method failed with status code " + str(ex.code) + ": " + ex.message)

In [234]:
profile = response.content
cr = csv.reader(profile.decode('utf-8').splitlines())
my_list = list(cr)
for row in my_list:
    print(row[0:10])

['big5_agreeableness', 'facet_altruism', 'facet_cooperation', 'facet_modesty', 'facet_morality', 'facet_sympathy', 'facet_trust', 'big5_conscientiousness', 'facet_achievement_striving', 'facet_cautiousness']
['0.8767843578575014', '0.681974709977929', '0.7708814209595795', '0.2744111843297292', '0.7548190312661966', '0.48552393392990284', '0.6935731626932924', '0.8639005460247344', '0.42390559049124066', '0.6661436417101647']


In [235]:
import io
df = pd.read_csv(io.StringIO(profile.decode('utf-8')), lineterminator='\n')

In [236]:
df

Unnamed: 0,big5_agreeableness,facet_altruism,facet_cooperation,facet_modesty,facet_morality,facet_sympathy,facet_trust,big5_conscientiousness,facet_achievement_striving,facet_cautiousness,...,consumption_preferences_movie_action,consumption_preferences_movie_documentary,consumption_preferences_read_frequency,consumption_preferences_books_entertainment_magazines,consumption_preferences_books_non_fiction,consumption_preferences_books_financial_investing,consumption_preferences_books_autobiographies,consumption_preferences_volunteer,consumption_preferences_concerned_environment,consumption_preferences_start_business
0,0.876784,0.681975,0.770881,0.274411,0.754819,0.485524,0.693573,0.863901,0.423906,0.666144,...,0.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,0.5,1.0


In [72]:
# Getting the results as JSON:
from ibm_watson import ApiException
try:
    # Invoke a Personality Insights method
    response = personality_insights.profile(
        test_Text,
        content_type='text/plain',
        content_language='es',
        accept_language='en',
        consumption_preferences=True,
        raw_scores=True,
        accept='application/json').get_result()
except ApiException as ex:
    print("Method failed with status code " + str(ex.code) + ": " + ex.message)

In [73]:
print(json.dumps(response, indent=2))

{
  "word_count": 205,
  "word_count_message": "There were 205 words in the input. We need a minimum of 600, preferably 1,200 or more, to compute statistically significant estimates",
  "processed_language": "es",
  "personality": [
    {
      "trait_id": "big5_openness",
      "name": "Openness",
      "category": "personality",
      "percentile": 0.948309173503183,
      "raw_score": 0.6696244499549608,
      "significant": true,
      "children": [
        {
          "trait_id": "facet_adventurousness",
          "name": "Adventurousness",
          "category": "personality",
          "percentile": 0.80270805543852,
          "raw_score": 0.504724921504605,
          "significant": true
        },
        {
          "trait_id": "facet_artistic_interests",
          "name": "Artistic interests",
          "category": "personality",
          "percentile": 0.963600473898098,
          "raw_score": 0.6892566878755113,
          "significant": true
        },
        {
          "t

### Accessing results from Personality Insights profiling

In [86]:
for i in response['personality']:
    print(i['name'])

Openness
Conscientiousness
Extraversion
Agreeableness
Emotional range


In [113]:
# Personality traits percentiles:
for i in range(len(response['personality'])):
    print(response['personality'][i]['trait_id'] + 
          " --> %f" % (response['personality'][i]['percentile']))
    for j in range(len(response['personality'][i]['children'])):
        print(" > " + response['personality'][i]['children'][j]['trait_id'] +
             " --> %f" % response['personality'][i]['children'][j]['percentile'])

big5_openness --> 0.948309
 > facet_adventurousness --> 0.802708
 > facet_artistic_interests --> 0.963600
 > facet_emotionality --> 0.759525
 > facet_imagination --> 0.788288
 > facet_intellect --> 0.868290
 > facet_liberalism --> 0.840628
big5_conscientiousness --> 0.863901
 > facet_achievement_striving --> 0.423906
 > facet_cautiousness --> 0.666144
 > facet_dutifulness --> 0.832666
 > facet_orderliness --> 0.894099
 > facet_self_discipline --> 0.866430
 > facet_self_efficacy --> 0.690610
big5_extraversion --> 0.644563
 > facet_activity_level --> 0.643835
 > facet_assertiveness --> 0.636832
 > facet_cheerfulness --> 0.228924
 > facet_excitement_seeking --> 0.592070
 > facet_friendliness --> 0.555696
 > facet_gregariousness --> 0.197778
big5_agreeableness --> 0.876784
 > facet_altruism --> 0.681975
 > facet_cooperation --> 0.770881
 > facet_modesty --> 0.274411
 > facet_morality --> 0.754819
 > facet_sympathy --> 0.485524
 > facet_trust --> 0.693573
big5_neuroticism --> 0.422815
 > fa

In [121]:
# Personality traits raw scores:
for i in range(len(response['personality'])):
    print(response['personality'][i]['trait_id'] + 
          " --> %f" % (response['personality'][i]['raw_score']))
    for j in range(len(response['personality'][i]['children'])):
        print(" > " + response['personality'][i]['children'][j]['trait_id'] +
             " --> %f" % response['personality'][i]['children'][j]['raw_score'])

big5_openness --> 0.669624
 > facet_adventurousness --> 0.504725
 > facet_artistic_interests --> 0.689257
 > facet_emotionality --> 0.633692
 > facet_imagination --> 0.832148
 > facet_intellect --> 0.573205
 > facet_liberalism --> 0.522924
big5_conscientiousness --> 0.567220
 > facet_achievement_striving --> 0.516337
 > facet_cautiousness --> 0.369563
 > facet_dutifulness --> 0.637402
 > facet_orderliness --> 0.407455
 > facet_self_discipline --> 0.539749
 > facet_self_efficacy --> 0.720451
big5_extraversion --> 0.567928
 > facet_activity_level --> 0.519684
 > facet_assertiveness --> 0.627894
 > facet_cheerfulness --> 0.633518
 > facet_excitement_seeking --> 0.635550
 > facet_friendliness --> 0.588775
 > facet_gregariousness --> 0.484269
big5_agreeableness --> 0.630361
 > facet_altruism --> 0.681486
 > facet_cooperation --> 0.515986
 > facet_modesty --> 0.366669
 > facet_morality --> 0.577618
 > facet_sympathy --> 0.632327
 > facet_trust --> 0.579026
big5_neuroticism --> 0.642880
 > fa

In [119]:
# Needs percentiles:
for i in range(len(response['needs'])):
    print(response['needs'][i]['trait_id'] + 
          " --> %f" % (response['needs'][i]['percentile']))

need_challenge --> 0.560802
need_closeness --> 0.769558
need_curiosity --> 0.656694
need_excitement --> 0.109382
need_harmony --> 0.952191
need_ideal --> 0.550224
need_liberty --> 0.451310
need_love --> 0.458311
need_practicality --> 0.862161
need_self_expression --> 0.387077
need_stability --> 0.902629
need_structure --> 0.667573


In [120]:
# Needs raw-scores:
for i in range(len(response['needs'])):
    print(response['needs'][i]['trait_id'] + 
          " --> %f" % (response['needs'][i]['raw_score']))

need_challenge --> 0.748814
need_closeness --> 0.788978
need_curiosity --> 0.790934
need_excitement --> 0.695507
need_harmony --> 0.865426
need_ideal --> 0.663528
need_liberty --> 0.742571
need_love --> 0.760919
need_practicality --> 0.785543
need_self_expression --> 0.642778
need_stability --> 0.716149
need_structure --> 0.668686


In [122]:
# Values percentiles:
for i in range(len(response['values'])):
    print(response['values'][i]['trait_id'] + 
          " --> %f" % (response['values'][i]['percentile']))

value_conservation --> 0.566254
value_openness_to_change --> 0.311321
value_hedonism --> 0.105587
value_self_enhancement --> 0.764340
value_self_transcendence --> 0.704713


In [123]:
# Values raw score:
for i in range(len(response['values'])):
    print(response['values'][i]['trait_id'] + 
          " --> %f" % (response['values'][i]['raw_score']))

value_conservation --> 0.669982
value_openness_to_change --> 0.783385
value_hedonism --> 0.795683
value_self_enhancement --> 0.700313
value_self_transcendence --> 0.849676


In [141]:
# Consumption Preferences
for i in range(len(response['consumption_preferences'])):
    print(response['consumption_preferences'][i]['consumption_preference_category_id'])
    for j in range(len(response['consumption_preferences'][i]['consumption_preferences'])):
        print(" > " + response['consumption_preferences'][i]['consumption_preferences'][j]['consumption_preference_id'] +
             " --> %.2f" % (response['consumption_preferences'][i]['consumption_preferences'][j]['score']))

consumption_preferences_shopping
 > consumption_preferences_automobile_ownership_cost --> 0.00
 > consumption_preferences_automobile_safety --> 1.00
 > consumption_preferences_clothes_quality --> 0.00
 > consumption_preferences_clothes_style --> 0.00
 > consumption_preferences_clothes_comfort --> 0.00
 > consumption_preferences_influence_brand_name --> 0.50
 > consumption_preferences_influence_utility --> 0.50
 > consumption_preferences_influence_online_ads --> 1.00
 > consumption_preferences_influence_social_media --> 1.00
 > consumption_preferences_influence_family_members --> 1.00
 > consumption_preferences_spur_of_moment --> 0.50
 > consumption_preferences_credit_card_payment --> 1.00
consumption_preferences_health_and_activity
 > consumption_preferences_eat_out --> 0.00
 > consumption_preferences_gym_membership --> 0.00
 > consumption_preferences_outdoor --> 0.50
consumption_preferences_environmental_concern
 > consumption_preferences_concerned_environment --> 0.50
consumption_pre

## Gathering text from participants into individual documents for personality profiling

In [145]:
# Get features dataset
feats_dataset_path = "https://raw.githubusercontent.com/raul-arrabales/alexithymic-lang/master/data/Prolexitim_v2_features_4.csv"
alex_df = pd.read_csv(feats_dataset_path, header=0, delimiter=";")

In [148]:
alex_df.sample(2)

Unnamed: 0,Code,TAS20,F1,F2,F3,Gender,Age,Card,T_Metaphors,T_ToM,...,BoW_PCA_3,TFIDF_PCA_1,TFIDF_PCA_2,TFIDF_PCA_3,Verb_List,Noun_List,Adjective_List,Subord_List,Adverb_List,Aux_List
296,bd0f72617b5ea094ac4accb397070d8d,48,20,14,14,2,24,13HM,0,1,...,0.14548,0.062889,-0.01342,-0.017606,"['hacer', 'tienen', 'despedirse', 'inician', '...","['pareja', 'amor', 'etapa', 'relación', 'dista...","['enamorada', 'nueva', 'desconsolado', 'llena'...","['que', 'mientras']","['no', 'sólo']",[]
139,581800d76e962da7850f9ef1804b202c,55,20,18,17,1,33,11,0,0,...,-0.560191,0.003003,-0.031512,-0.022194,"['entra', 'veo', 'entrando', 'volviendo']","['puente', 'castillo', 'sombras', 'gente', 'ci...",['agrupado'],"['que', 'como']","['ahí', 'así']","['parece', 'ser', 'puede', 'ser', 'puede', 'se..."


In [156]:
print("We have " + str(alex_df.Code.nunique()) + " unique participants.")

We have 96 unique participants.


In [161]:
# Code is the anonymous unique participant indentifier
docs_df = alex_df[['Code', 'Text']].groupby('Code')['Text'].apply(" ".join).reset_index()

In [162]:
docs_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 96 entries, 0 to 95
Data columns (total 2 columns):
Code    96 non-null object
Text    96 non-null object
dtypes: object(2)
memory usage: 1.6+ KB


In [163]:
docs_df.Text[0]

'un niño ciego llamado Lilo. Lilo, a pesar de no poder ver, era extremadamente inteligente y tenía una actitud muy positiva hacia la vida. Además, Lilo era un apasionado de la música, pero no siempre le gustaba lo que escuchaba, por lo que decidió aprender a tocar el violín y poder escuchar aquellos acordes que realmente le gustaban y le hacían sentir vien. Lilo era un apasinoado de canciones alegres, emocionantes, canciones que mientras las tocaba y escuchaba le hacían tremendamente feliz. un grupo de amigos cuya principal diversión era atracar bancos. Esa diversión les llevó, en muy poco tiempo a ser fugitivos de la justicia y sobretodo, fujitivos de su vida, sin poder hacer lo que querían en el momento que querían. Ninguno de ellos se alegraba de su pasado y si pudieran dar marcha atrás, ninguno de ellos volvería a dar los pasos dados. Sin embargo, cada día soñaban con haber tomado nuevos pasos en su vida que les hcieran seguir en libertad. una hormiga aventurera llamada Lila. Lila 

In [164]:
docs_df.sample(4)

Unnamed: 0,Code,Text
2,06149a5d3a9ad9dd786dbc97ec55cb98,Es un momento para la lectura o estudio que me...
56,9441e8d938e3c168a4b3e82e52903416,un niño que debía acudir a clases de violín to...
43,676528374312b74c56fe6fb501e15fb6,un pequeño niño se cuestiona si de verdad es c...
78,d8353a3addd21a98d2b2fee2f6451db3,"Un niño aburrido de tocar el violín, un día no..."


## Annotate personality traits for each participant

In [237]:
# This function calls the Personality Insights API and gets the profile
# for a given plain text in Spanish.
def get_Profile(text):
    """
    Parameters
    ----------
    text : str
        Document to be analyzed. 
    
    Returns
    -------
    json_profile: json object
        Personality profile
        
    """
    try:
        json_profile = personality_insights.profile(
            text,
            content_type='text/plain',
            content_language='es',
            accept_language='en',
            consumption_preferences=True,
            raw_scores=True,
            accept='application/json').get_result()
    except ApiException as ex:
        print("Method failed with status code " + str(ex.code) + ": " + ex.message)
        
    return json_profile

In [238]:
# Personality Insights variables
PI_columns = ['big5_openness','facet_adventurousness','facet_artistic_interests','facet_emotionality',
              'facet_imagination','facet_intellect','facet_liberalism', 'big5_conscientiousness',
              'facet_achievement_striving','facet_cautiousness','facet_dutifulness',
              'facet_orderliness','facet_self_discipline','facet_self_efficacy','big5_extraversion',
              'facet_activity_level','facet_assertiveness','facet_cheerfulness','facet_excitement_seeking',
              'facet_friendliness','facet_gregariousness','big5_agreeableness','facet_altruism',
              'facet_cooperation','facet_modesty','facet_morality','facet_sympathy','facet_trust',
              'big5_neuroticism','facet_anger','facet_anxiety','facet_depression',
              'facet_immoderation','facet_self_consciousness','facet_vulnerability',
              'need_challenge','need_closeness','need_curiosity','need_excitement',
              'need_harmony','need_ideal','need_liberty','need_love','need_practicality',
              'need_self_expression','need_stability','need_structure',
              'value_conservation','value_openness_to_change','value_hedonism',
              'value_self_enhancement','value_self_transcendence',
              'consumption_preferences_automobile_ownership_cost',
              'consumption_preferences_automobile_safety',
              'consumption_preferences_clothes_quality',
              'consumption_preferences_clothes_style',
              'consumption_preferences_clothes_comfort',
              'consumption_preferences_influence_brand_name',
              'consumption_preferences_influence_utility',
              'consumption_preferences_influence_online_ads',
              'consumption_preferences_influence_social_media',
              'consumption_preferences_influence_family_members',
              'consumption_preferences_spur_of_moment',
              'consumption_preferences_credit_card_payment',
              'consumption_preferences_health_and_activity',
              'consumption_preferences_eat_out',
              'consumption_preferences_gym_membership',
              'consumption_preferences_outdoor',
              'consumption_preferences_concerned_environment',
              'consumption_preferences_start_business',
              'consumption_preferences_movie_romance',
              'consumption_preferences_movie_adventure',
              'consumption_preferences_movie_horror',
              'consumption_preferences_movie_musical',
              'consumption_preferences_movie_historical',
              'consumption_preferences_movie_science_fiction',
              'consumption_preferences_movie_war',
              'consumption_preferences_movie_drama',
              'consumption_preferences_movie_action',
              'consumption_preferences_movie_documentary',
              'consumption_preferences_music_rap',
              'consumption_preferences_music_country',
              'consumption_preferences_music_r_b',
              'consumption_preferences_music_hip_hop',
              'consumption_preferences_music_live_event',
              'consumption_preferences_music_playing',
              'consumption_preferences_music_latin',
              'consumption_preferences_music_rock',
              'consumption_preferences_music_classical',
              'consumption_preferences_read_frequency',
              'consumption_preferences_books_entertainment_magazines',
              'consumption_preferences_books_non_fiction',
              'consumption_preferences_books_financial_investing',
              'consumption_preferences_books_autobiographies',
              'consumption_preferences_volunteer']

In [239]:
len(PI_columns)

95

In [240]:
# Create new PI columns in the features df
for col in PI_columns:
    alex_df[col] = np.nan

In [241]:
# This function fills the feature dataframe with personality profile for participant identified by code
def add_PersonalityData(df, code, profile):
    """
    Parameters
    ----------
    df : dataframe
        Dataframe with features. 
    code : str
        Participant unique identifier. 
    profile: json
        JSON object with personality results for participant code.
    
    Returns
    -------
    Nothing. The dataframe is filled with extended features corresponding to personality traits. 
    """    
    
    # Add personality traits and their children traits raw scores
    for i in range(len(profile['personality'])):
        col_name_pers = profile['personality'][i]['trait_id']
        col_value_pers = profile['personality'][i]['raw_score']
        df.loc[ df['Code']==code, col_name_pers ] = col_value_pers
        
        for j in range(len(profile['personality'][i]['children'])):
            col_name_child = profile['personality'][i]['children'][j]['trait_id']
            col_value_child = profile['personality'][i]['children'][j]['raw_score']
            df.loc[ df['Code']==code, col_name_child ] = col_value_child
            
            
    # Add needs
    for i in range(len(profile['needs'])):
        col_name_need = profile['needs'][i]['trait_id']
        col_value_need = profile['needs'][i]['raw_score']
        df.loc[ df['Code']==code, col_name_need ] = col_value_need

        
    # Add values
    for i in range(len(profile['values'])):
        col_name_values = profile['values'][i]['trait_id']
        col_value_values = profile['values'][i]['raw_score']
        df.loc[ df['Code']==code, col_name_values ] = col_value_values 
        
        
    # Add consumption preferences
    for i in range(len(profile['consumption_preferences'])):
        for j in range(len(profile['consumption_preferences'][i]['consumption_preferences'])):
            col_name_prefs = profile['consumption_preferences'][i]['consumption_preferences'][j]['consumption_preference_id']
            col_value_prefs = profile['consumption_preferences'][i]['consumption_preferences'][j]['score']
            df.loc[ df['Code']==code, col_name_prefs ] = col_value_prefs 
    

### Call the API for each participant and add the profile to features df

In [244]:
for i in range(len(docs_df)):
    
    # The API requires a minimum of 100 words
    if ( len(docs_df['Text'].iloc[i].split()) > 100 ):
        
        # Get the results for user i:
        personality = get_Profile(docs_df['Text'].iloc[i])
    
        # Update the feature vectors with personality vbles
        add_PersonalityData(alex_df, docs_df['Code'].iloc[i], personality)

In [248]:
alex_df[PI_columns].describe()

Unnamed: 0,big5_openness,facet_adventurousness,facet_artistic_interests,facet_emotionality,facet_imagination,facet_intellect,facet_liberalism,big5_conscientiousness,facet_achievement_striving,facet_cautiousness,...,consumption_preferences_music_playing,consumption_preferences_music_latin,consumption_preferences_music_rock,consumption_preferences_music_classical,consumption_preferences_read_frequency,consumption_preferences_books_entertainment_magazines,consumption_preferences_books_non_fiction,consumption_preferences_books_financial_investing,consumption_preferences_books_autobiographies,consumption_preferences_volunteer
count,226.0,226.0,226.0,226.0,226.0,226.0,226.0,226.0,226.0,226.0,...,226.0,226.0,226.0,226.0,226.0,226.0,226.0,226.0,226.0,226.0
mean,0.654208,0.491406,0.668448,0.632305,0.854155,0.535464,0.508267,0.556102,0.545374,0.376613,...,0.0,0.40708,0.378319,0.369469,0.243363,0.433628,0.548673,0.561947,0.243363,0.225664
std,0.030602,0.027203,0.034094,0.007901,0.037546,0.019191,0.010183,0.014863,0.041401,0.007985,...,0.0,0.487846,0.265863,0.355274,0.362798,0.496675,0.49873,0.497249,0.430065,0.418947
min,0.574114,0.420204,0.597762,0.609251,0.778307,0.505214,0.477061,0.528722,0.447239,0.364719,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
25%,0.632375,0.473775,0.646084,0.629738,0.825463,0.523504,0.503157,0.544374,0.522376,0.370887,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
50%,0.651797,0.495771,0.668375,0.632107,0.856927,0.530364,0.509695,0.554971,0.547314,0.373719,...,0.0,0.0,0.5,0.5,0.0,0.0,1.0,1.0,0.0,0.0
75%,0.678817,0.504691,0.689139,0.636182,0.882667,0.544202,0.513046,0.566207,0.570756,0.381974,...,0.0,1.0,0.5,0.5,0.5,1.0,1.0,1.0,0.0,0.0
max,0.726198,0.554285,0.773425,0.65322,0.95198,0.598307,0.535867,0.596114,0.633983,0.407928,...,0.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0


In [253]:
alex_df[PI_columns].isna().sum()[0]

155

In [260]:
# Instances annotated - 60%
1 - (alex_df[PI_columns].isna().sum()[0] / alex_df.count()[0])

0.5931758530183727

## Persist annotated personality data

In [261]:
# Save Updated features dataset
Feats_5_path = "D:\\Dropbox-Array2001\\Dropbox\\DataSets\\Prolexitim-Dataset\\Prolexitim_v2_features_5.csv"
alex_df.to_csv(Feats_5_path, sep=';', encoding='utf-8', index=False)

In [262]:
# np.set_printoptions(threshold=np.inf)
for i in alex_df.columns:
    print(i)

Code
TAS20
F1
F2
F3
Gender
Age
Card
T_Metaphors
T_ToM
T_FP
T_Interpret
T_Desc
T_Confussion
Text
Alex_A
Alex_B
Words
Sentences
Tokens
Tokens_Stop
Tokens_Stem_P
Tokens_Stem_S
POS
NER
DEP
Lemmas_CNLP
Lemmas_Spacy
Chars
avgWL
avgSL
Pun_Count
Stop_Count
RawTokens
Title_Count
Upper_Count
PRON_Count
DET_Count
ADV_Count
VERB_Count
PROPN_Count
NOUN_Count
NUM_Count
PUNCT_Count
SYM_Count
SCONJ_Count
CCONJ_Count
INTJ_Count
AUX_Count
ADP_Count
ADJ_Count
PRON_Ratio
DET_Ratio
ADV_Ratio
VERB_Ratio
PROPN_Ratio
NOUN_Ratio
NUM_Ratio
PUNCT_Ratio
SYM_Ratio
SCONJ_Ratio
CCONJ_Ratio
INTJ_Ratio
AUX_Ratio
ADP_Ratio
ADJ_Ratio
TTR
HTR
BoW_PCA_1
BoW_PCA_2
BoW_PCA_3
TFIDF_PCA_1
TFIDF_PCA_2
TFIDF_PCA_3
Verb_List
Noun_List
Adjective_List
Subord_List
Adverb_List
Aux_List
big5_openness
facet_adventurousness
facet_artistic_interests
facet_emotionality
facet_imagination
facet_intellect
facet_liberalism
big5_conscientiousness
facet_achievement_striving
facet_cautiousness
facet_dutifulness
facet_orderliness
facet_self_disc