Sentiment Analysis:  vector arithmetic

Here we show how using vector arithmetic we can find the results that are close to the aritmetic of the original words.


In [1]:
import spacy    # Import spaCy.

nlp = spacy.load('en_core_web_md')  # Load the word library (the medium one!)

In [2]:
# load all the words with vectors into the vocab (otherwise vocab may not work properly for md and lg):
for orth in nlp.vocab.vectors:
    _ = nlp.vocab[orth]

In [3]:
# Just to see how the similarity of words can be measured based off of their vectors.

words = nlp(u'human monkey inteligent')
for word1 in words:
    for word2 in words:
        print(word1, word2, word1.similarity(word2))

human human 1.0
human monkey 0.39899942
human inteligent 0.45334765
monkey human 0.39899942
monkey monkey 1.0
monkey inteligent 0.22762915
inteligent human 0.45334765
inteligent monkey 0.22762915
inteligent inteligent 1.0


In [27]:
# We import spatial to define a cosine_similarity function

from scipy import spatial

cosine_similarity  = lambda x, y: 1 - spatial.distance.cosine(x, y)


In [29]:
# As an example we make calculate an arithmatic as follows, more words and a different 

def vector_math(words):
    
    vector1 = nlp.vocab[words[0]].vector
    vector2 = nlp.vocab[words[1]].vector
    vector3 = nlp.vocab[words[2]].vector
    
    blend_vector = vector1 - vector2 + vector3
    
    # We list the top fiftin closest vectors in the vocabulary to the result of the expression above,
    # also show the total and valid number of words used from the vocabulary
    likliness = []
    i = 0
    j = 0
    for word in nlp.vocab:
        i += 1
        if word.has_vector:
            if word.is_lower:
                if word.is_alpha:
                    j += 1
                    similarity = cosine_similarity(blend_vector, word.vector)
                    likliness.append((word, similarity))
    
    print(f'Total # of words: {i},\ntotal # of compared words: {j}')
    likliness = sorted(likliness, key=lambda item: -item[1])
    return likliness

In [31]:

words = ['king','man','woman']
mylist = vector_math(words)
print([pair[0].text for pair in mylist[:15]])


Total # of words: 684995,
total # of compared words: 247539
['king', 'queen', 'commoner', 'highness', 'prince', 'sultan', 'maharajas', 'kings', 'princes', 'sultans', 'kumbia', 'princess', 'princesses', 'mermaid', 'pricess']



VADER Sentiment Analysis

Now, we are going to show how VADER Sentiment Analysis can be used to define the sentiment of a review


In [32]:
import nltk
nltk.downloader.download('vader_lexicon')

[nltk_data] Downloading package vader_lexicon to
[nltk_data]     /Users/apple/nltk_data...
[nltk_data]   Package vader_lexicon is already up-to-date!


True

In [33]:
# Import SentimentIntensityAnalyzer and create an instance of it (sid object)

from nltk.sentiment.vader import SentimentIntensityAnalyzer

sid = SentimentIntensityAnalyzer()


In [34]:
# This is a sample review

review = 'This was the best movie I have ever seen. The sound quality was not that great though, the story and all the other things were awesome. I liked it very much.'

In [35]:

#  A function that takes in our review and returns a score of "Positive", "Negative" or "Neutral"

def review_rating(text):
    
    score = sid.polarity_scores(text)['compound']
   
    if score > 0.05:
        vote = 'Positive'
    elif score < -0.05:
        vote = 'Negative'
    else:
        vote = 'Neutral'

    return vote

# Here we call the function and pass our review to get its sentiment


result = review_rating(review)
result

'Positive'