# Prolexitim Exploratory Analysis (Prolex-Explore)
## Visualizing Doc2Vec Feature Distribution and Discrimination
### Dataset Preparation: Classes, Subclasses, and Word2Vec Features
<hr>
Dataset from Prolexitim TAS-20 Spain, Prolexitim NLP.
Sept 2019.<br> Prolexitim dataset version 1.2 (MPGS-TFM-Submission).<br> 
<a target="_blank" href="http://www.conscious-robots.com/papers/TFM_MPGS_Arrabales_vWeb.pdf">Arrabales, R. 2019. Artificial Intelligence Tools for the Evaluation and Treatment of Alexithymia.</a><br> <br>
Raúl Arrabales Moreno (Psicobótica / Serendeepia Research)<br>
<a target="_blank" href="http://www.conscious-robots.com/">http://www.conscious-robots.com/</a> <br>
<hr>


## Load Documents (texts + labels)
- Documents are obtained from the Prolexitim Pilot Study.
    - Text are narratives from Prolexitim NLP. 
    - Labels are categorical values from Prolexitim TAS-20.


### Loading Prolexitim TAS-20 + NLP Dataset Load

In [1]:
import pandas as pd 

In [2]:
# My copy of Prolexitim join tables with TAS-20 categorical label and narratives from Prolexitim NLP
tasnlp_dataset_path = "D:\\Dropbox-Array2001\\Dropbox\\UNI\\MPGS\\2_TFM\\Datos\\prolexitim-merged-1.3.csv"

In [3]:
docs_df = pd.read_csv(tasnlp_dataset_path,header=0,delimiter="\t")

In [4]:
docs_df.columns

Index(['RowId', 'code', 'card', 'hum', 'mode', 'time', 'G-score',
       'G-magnitude', 'Azure-TA', 'Text', 'Text-EN', 'nlu-sentiment',
       'nlu-label', 'nlu-joy', 'nlu-anger', 'nlu-disgust', 'nlu-sadness',
       'nlu-fear', 'es-len', 'en-len', 'NLP', 'TAS20', 'F1', 'F2', 'F3',
       'Tas20Time', 'Sex', 'Gender', 'Age', 'Dhand', 'Studies', 'SClass',
       'Siblings', 'SibPos', 'Origin', 'Resid', 'Rtime', 'Ethnic', 'Job',
       'alex-a', 'alex-b'],
      dtype='object')

In [5]:
docs_df.sample(2)

Unnamed: 0,RowId,code,card,hum,mode,time,G-score,G-magnitude,Azure-TA,Text,...,SClass,Siblings,SibPos,Origin,Resid,Rtime,Ethnic,Job,alex-a,alex-b
271,272,af67553b2ca884d23497afdf7a2b10d8,9VH,4,W,7312149,0.0,1.7,0.71,"Jornaleros descansando. Hombres rudos, curtido...",...,2.0,2.0,1.0,ES,ES,-1.0,Iberic,Scientist,NoAlex,NoAlex
151,152,fef2bf388aa21eace80757369757689c,9VH,4,W,2456363,0.4,0.4,0.27,unos tíos que se fueron de juerga y ahora está...,...,2.0,2.0,2.0,ES,ES,-1.0,Iberic,Engineer,Alex,Alex


In [6]:
# We're only intereted in the Spanish text, the corresponding Alexithymia label and the presented visual stimuli (card)
docs_df = docs_df.dropna()
docs_df = docs_df[['Text', 'alex-a', 'card', 'TAS20', 'F1', 'F2', 'F3']]
docs_df.columns = ['Text', 'AlexLabel', 'card', 'TAS20', 'F1', 'F2', 'F3']
docs_df.sample(n=6)

Unnamed: 0,Text,AlexLabel,card,TAS20,F1,F2,F3
260,Érase una vez una hormiga aventurera llamada L...,PosAlex,11,55.0,19.0,17.0,19.0
164,2 jóvenes aventureros querían recuperar el tes...,NoAlex,11,35.0,8.0,10.0,17.0
228,Dos pájaros amigos deciden ir de excursión un ...,NoAlex,11,45.0,18.0,9.0,18.0
294,un chico que se encontraba pensativo debatiénd...,NoAlex,1,41.0,20.0,12.0,9.0
57,Una mujer está sujetando el cadaver de su madr...,PosAlex,18NM,55.0,20.0,18.0,17.0
279,La felicidad del sudor y del cansancio no pare...,NoAlex,9VH,47.0,8.0,18.0,21.0


In [7]:
# As expected, we have a very unbalance dataset (aprox 10% alexithymia)
docs_df.groupby(by='AlexLabel').count()

Unnamed: 0_level_0,Text,card,TAS20,F1,F2,F3
AlexLabel,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Alex,31,31,31,31,31,31
NoAlex,240,240,240,240,240,240
PosAlex,45,45,45,45,45,45


In [8]:
# And we decided to consider both Possible Alexithymia and Alexithymia as the same (Positive) class
docs_df['AlexLabel'] = docs_df['AlexLabel'].apply(lambda x: x.replace('PosAlex', 'Alex'))

In [9]:
docs_df.groupby(by='AlexLabel').count()

Unnamed: 0_level_0,Text,card,TAS20,F1,F2,F3
AlexLabel,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Alex,76,76,76,76,76,76
NoAlex,240,240,240,240,240,240


In [10]:
# Check cards distribution
docs_df.groupby(by='card').count()

Unnamed: 0_level_0,Text,AlexLabel,TAS20,F1,F2,F3
card,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
1,67,67,67,67,67,67
10,6,6,6,6,6,6
11,70,70,70,70,70,70
12VN,9,9,9,9,9,9
13HM,65,65,65,65,65,65
13N,6,6,6,6,6,6
13V,6,6,6,6,6,6
18NM,4,4,4,4,4,4
3VH,7,7,7,7,7,7
7VH,6,6,6,6,6,6


In [11]:
# Keep just the significant four cards (the ones we actually used with all participants)
viz_df = docs_df[(docs_df.card == '9VH') | 
                 (docs_df.card == '13HM') |
                 (docs_df.card == '11') | 
                 (docs_df.card == '1')]

In [12]:
viz_df.groupby(by='card').count()

Unnamed: 0_level_0,Text,AlexLabel,TAS20,F1,F2,F3
card,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
1,67,67,67,67,67,67
11,70,70,70,70,70,70
13HM,65,65,65,65,65,65
9VH,69,69,69,69,69,69


In [13]:
viz_df.sample(4)

Unnamed: 0,Text,AlexLabel,card,TAS20,F1,F2,F3
232,Un bonito lugar escondido en el bosque pero de...,NoAlex,11,46.0,15.0,14.0,17.0
112,"la ladera de una montaña, con algunas piedras ...",Alex,11,54.0,17.0,17.0,20.0
98,Un niño que se aburría mucho porque sus padres...,NoAlex,1,42.0,14.0,11.0,17.0
183,"Unos hombres cansados de tanto campo,de madrug...",NoAlex,9VH,44.0,14.0,15.0,15.0


In [14]:
# Generarte subclasses with pairs (AlexLabel,card)
viz_df = viz_df.copy()
viz_df['SubClass'] = viz_df['AlexLabel'] + '-' + viz_df['card']

In [15]:
# There should be 8 subclasses (4 positives and 4 negatives)
viz_df.groupby(by='SubClass').count()

Unnamed: 0_level_0,Text,AlexLabel,card,TAS20,F1,F2,F3
SubClass,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
Alex-1,18,18,18,18,18,18,18
Alex-11,18,18,18,18,18,18,18
Alex-13HM,16,16,16,16,16,16,16
Alex-9VH,17,17,17,17,17,17,17
NoAlex-1,49,49,49,49,49,49,49
NoAlex-11,52,52,52,52,52,52,52
NoAlex-13HM,49,49,49,49,49,49,49
NoAlex-9VH,52,52,52,52,52,52,52


In [16]:
viz_df.sample(4)

Unnamed: 0,Text,AlexLabel,card,TAS20,F1,F2,F3,SubClass
233,Un hombre que al volver de su trabajo encuentr...,NoAlex,13HM,46.0,15.0,14.0,17.0,NoAlex-13HM
105,una pareja que dormían en una habitación donde...,NoAlex,13HM,34.0,14.0,7.0,13.0,NoAlex-13HM
265,un hombre cansado saliendo a trabajar despues ...,Alex,13HM,63.0,18.0,21.0,24.0,Alex-13HM
29,"Un día me propuse cruzar la montaña, subí a lo...",NoAlex,11,36.0,16.0,7.0,13.0,NoAlex-11


In [17]:
viz_df.shape

(271, 8)

# Save vizualization dataset

In [18]:
viz_dataset_path = "D:\\Dropbox-Array2001\\Dropbox\\UNI\\MPGS\\2_TFM\\Datos\\prolexitim-viz-1.2.csv"

In [19]:
viz_df.to_csv(viz_dataset_path, sep='\t', encoding='utf-8', index=False)