# Build Sentiment Dictionaries from VSMs

This script allows you to create your own Sentiment Dictionary using Vector Space Models

### **1.Preparation**

Download the model.  
You can select any model from here: https://fasttext.cc/docs/en/crawl-vectors.html

In [1]:
import os
import gensim
import urllib.request
import os.path
import pandas
import numpy as np
import scipy.stats as stats

In [2]:
#Here we download the model
#Here we dowload the Spanish model, named "cc.es.300.vec.gz"

!wget "https://dl.fbaipublicfiles.com/fasttext/vectors-crawl/cc.es.300.vec.gz"
!gunzip cc.es.300.vec.gz

--2022-06-03 08:50:30--  https://dl.fbaipublicfiles.com/fasttext/vectors-crawl/cc.es.300.vec.gz
Resolving dl.fbaipublicfiles.com (dl.fbaipublicfiles.com)... 172.67.9.4, 104.22.75.142, 104.22.74.142, ...
Connecting to dl.fbaipublicfiles.com (dl.fbaipublicfiles.com)|172.67.9.4|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1285580896 (1.2G) [binary/octet-stream]
Saving to: ‘cc.es.300.vec.gz’


2022-06-03 08:51:46 (16.5 MB/s) - ‘cc.es.300.vec.gz’ saved [1285580896/1285580896]



In [3]:
#Note that ".gz" is not in the name anymore, as we unzipped the file)

filename = 'cc.es.300.vec'

my_model = gensim.models.KeyedVectors.load_word2vec_format(filename, binary=False)

### **2.Prepare SA lexicon**

Here you need to define the "seed words" for the lexicon.  
Here we test it with two dimensions, "happy" and "sad".

In [4]:
happy_labels = ['feliz', 'emocionado', 'positivo', 'alegre'] #I added happy, excited, positive, cheerfull
sad_labels = ['triste', 'arrepentido', 'deprimido', 'introvertido'] #I added sad, sorry, depressed, introvert

all_words = list(my_model.vocab.keys())

In [5]:
happy_ordered_words = my_model.most_similar(positive = happy_labels, topn = len(all_words))
sad_ordered_words = my_model.most_similar(positive = sad_labels, topn = len(all_words))

In [6]:
#Happy labels
happy_words = []
happy_value = []

for my_tuple in happy_ordered_words:
  happy_words.append(my_tuple[0])
  happy_value.append(my_tuple[1])

#Sad labels
sad_words = []
sad_value = []

for my_tuple in sad_ordered_words:
  sad_words.append(my_tuple[0])
  sad_value.append(my_tuple[1])


In [7]:
#Happy values
happy_value = np.array(happy_value)
happy_value = stats.zscore(happy_value)

happy_df = pandas.DataFrame(list(zip(happy_words, happy_value)), 
               columns =['word', 'happy'])

happy_df = happy_df.sort_values('word', ascending=True)


#Sad values
sad_value = np.array(sad_value)
sad_value = stats.zscore(sad_value)

sad_df = pandas.DataFrame(list(zip(sad_words, sad_value)), 
               columns =['word', 'sad'])

sad_df = sad_df.sort_values('word', ascending=True)

# you can add more categories, if you like...

In [8]:
#Finally, we can save all to unique dataframe
sa_df = happy_df.merge(sad_df, how = 'inner', on = ['word'])
# sa_df = happy_df.merge(sad_df, fear_df, surprise_df, ..., how = 'inner', on = ['word'])

sa_df.to_csv('my_SA_dictionary.csv', index=False)