# Aphasia patients' automatic diagnosing with WORD2VEC



Aim of this work is to automatically rank the severeness of anomic aphasia by applying Word2Vec tasks to our corpus, manually collected from the Aphasia Bank online database (https://aphasia.talkbank.org/) from diagnostic protocols. We analyze the transcriptions of aphasic patients' productions collected in Aphasia Talkbank, which were produced by the patients by accomplishing specific and standardized tasks of the Aphasia Bank Protocol (i.e. to describe the scene portraited in a picture the interviewer shows the patient). Word2vec uses a neural network model to learn word associations from a large corpus of text.

We collect our input data from the Aphasia Bank online database. We organise our input data in a .csv file where the first column is the patient-ID, whereas the second and third column are target/response word pairs. The target is the word which exactly describes the scene (e.g. "ball") and the response is the word the patient produces (e.g. "sphere").

Since the automatic diagnosing algorithm works for each language, a suited language model has to be uploaded (e.g. for the English language we use the pre-trained 'word2vec-google-news-300' vectors). The cosine similarity task is ran by the built-in wv.similarity function of Word2Vec, which takes as input our word pairs and gives as output their cosine similarity.

More details on the project: https://drive.google.com/file/d/1IQ8PDOVlTTNE6CscvI70yfJL8G-CYuM5/view?usp=sharing

In [None]:
# UPLOAD the LANGUAGE MODEL for ENGLISH from Gensim - Google News 300

In [51]:
# import the vectors we need for the comparison

import gensim.downloader as api

wv = api.load("word2vec-google-news-300")

# https://code.google.com/archive/p/word2vec/

In [52]:
# LANGUAGE MODEL FOR ITALIAN

In [3]:
import gensim
from gensim.models import KeyedVectors

In [5]:
model_ita = KeyedVectors.load("/_trained_models/MODEL_WIKI_plainstream/ord2vec_10mil_wiki.model")

In [49]:
# import the list of word pairs and put it in a list of lists

def aphasia_diagnosing(pairs, model):
    lista = []
    for line in open(pairs).readlines():
        lista.append(line.strip().split(","))
    lista_soggetti = []
    for i in lista:
        lista_soggetti.append(i[0])
    
# put all word pairs in a dictionary containing a list of lists associated to each key=patient
    d = {} # build a catalog for all patients
    for i in set(lista_soggetti):
# for each unique value corresponding to each patient 
# create a list containing his/her [w1,w2, score]
    
        d[i] = []
    for line in lista:
        d[line[0]].append([(line[1].strip()), (line[2].strip())])

# access each patient's data with d["n"]

    for i in d: # cycle the keys
        for w_pair in d[i]: # cycle the elements in the list corresponding to the key
            w1, w2 = w_pair
            dist = model.similarity(w1, w2)
            w_pair.append(dist)

# one can access each patient's list of lists by calling the key = d[“n“]

    lista_similarities = {}
    for l in d.items():
        somma = 0
        for it in l[1]:
            somma=somma+it[2]
        media=somma/len(l[1])
        lista_similarities[int(l[0])]=media
    return lista_similarities

In [50]:
# example of usage

aphasia_diagnosing("/Users/silviafabbi/Desktop/aphasia/materials/input_EN.csv", wv)

{1: 0.18128868884273938,
 10: 0.24347134147371566,
 11: 0.24217813448221595,
 2: 0.38771220445632937,
 9: 0.20249340832233428,
 12: 0.33968087037404376,
 8: 0.37051263451576233,
 4: 0.440139077603817,
 3: 0.12276173879702885,
 7: 0.18245189115405083,
 5: 0.36799256503582,
 6: 0.19535352538029352}