# Sentiment Analysis Assessment - Solution

## Task #1: Perform vector arithmetic on your own words
Write code that evaluates vector arithmetic on your own set of related words. The goal is to come as close to an expected word as possible. Please feel free to share success stories in the Q&A Forum for this section!

In [8]:
# Import spaCy and load the language library. Remember to use a larger model!
import spacy
nlp = spacy.load('en_core_web_md')


In [16]:
# Choose the words you wish to compare, and obtain their vectors
doc = nlp(u'soldier medic sniper')
doc.vector

array([-1.2632133 ,  1.8610667 , -0.74901   ,  0.4299167 ,  1.765     ,
       -0.05789667, -0.38431668,  0.64561003,  0.9072833 ,  1.0809335 ,
        1.8714833 ,  1.8998667 , -1.5141989 ,  2.5077    , -0.07242334,
        0.9934034 ,  3.7926    ,  0.35410666, -1.1412334 ,  1.86194   ,
        0.13539998, -1.9247156 , -0.74255663,  2.5860999 ,  1.7870668 ,
       -2.34133   , -1.3800001 , -1.0185734 ,  1.7008834 , -2.6063    ,
        1.5887667 , -1.0099467 , -0.3665767 ,  0.41565335, -1.8469334 ,
        0.42760202, -0.9557    ,  2.0057335 , -1.0571333 ,  0.83043337,
       -0.9467301 , -1.5796033 , -0.88854337,  1.5995334 ,  1.3559533 ,
        1.7969762 , -1.3704833 , -4.8106003 , -1.2323533 , -0.6771243 ,
        1.3068333 ,  0.9309001 ,  2.9220297 , -2.0681665 ,  0.91264635,
        0.24231666,  1.8449134 , -1.6326332 , -1.34086   , -0.031293  ,
       -1.9727932 , -0.32098332, -3.8338335 , -2.3401668 ,  2.4270833 ,
        0.4315667 ,  1.65553   , -2.7120667 ,  0.5681304 ,  1.28

In [17]:
# Import spatial and define a cosine_similarity function
from scipy import spatial

cosine_similarity = lambda x, y: 1 - spatial.distance.cosine(x, y)

In [18]:
# Write an expression for vector arithmetic
# For example: new_vector = word1 - word2 + word3
soldier = nlp.vocab['soldier'].vector
medic = nlp.vocab['medic'].vector
sniper = nlp.vocab['sniper'].vector

new_vector = soldier - medic + sniper

In [19]:
# List the top ten closest vectors in the vocabulary to the result of the expression above
computed_similarities = []

for word in nlp.vocab:
    # Ignore words without vectors and mixed-case words:
    if word.has_vector:
        if word.is_lower:
            if word.is_alpha:
                similarity = cosine_similarity(new_vector, word.vector)
                computed_similarities.append((word, similarity))

computed_similarities = sorted(computed_similarities, key=lambda item: -item[1])

print([w[0].text for w in computed_similarities[:10]])

['sniper', 'soldier', 'he', 'got', 'ai', 'goin', 'dare', 'na', 'gon', 'doin']


#### CHALLENGE: Write a function that takes in 3 strings, performs a-b+c arithmetic, and returns a top-ten result

In [23]:
def vector_math(a, b, c):
    # Get word vectors for the input words
    a_vector = nlp.vocab[a].vector
    b_vector = nlp.vocab[b].vector
    c_vector = nlp.vocab[c].vector
    
    # Calculate the new vector
    new_vector = a_vector - b_vector + c_vector
    
    # Calculate similarity between the new vector and all words in the vocabulary
    computed_similarities = []
    for word in nlp.vocab:
        if word.has_vector:
            if word.is_lower:
                if word.is_alpha:
                    similarity = cosine_similarity(new_vector, word.vector)
                    computed_similarities.append((word, similarity))
    
    # Sort the computed similarities by similarity score
    computed_similarities = sorted(computed_similarities, key=lambda item: -item[1])
    
    # Return top 10 similar words
    return [w[0].text for w in computed_similarities[:10]]

In [22]:
# Test the function on known words:
vector_math('king','man','woman')

['king',
 'and',
 'that',
 'havin',
 'where',
 'she',
 'they',
 'woman',
 'somethin',
 'there']

## Task #2: Perform VADER Sentiment Analysis on your own review
Write code that returns a set of SentimentIntensityAnalyzer polarity scores based on your own written review.

In [26]:
import nltk
nltk.download('vader_lexicon')

# Import SentimentIntensityAnalyzer and create an sid object
from nltk.sentiment.vader import SentimentIntensityAnalyzer
sid = SentimentIntensityAnalyzer()

[nltk_data] Downloading package vader_lexicon to
[nltk_data]     C:\Users\ziggy\AppData\Roaming\nltk_data...


In [27]:
# Write a review as one continuous string (multiple sentences are ok)
review = 'this movie was better than i expected'

In [28]:
# Obtain the sid scores for your review
sid.polarity_scores(review)

{'neg': 0.0, 'neu': 0.633, 'pos': 0.367, 'compound': 0.4404}

### CHALLENGE: Write a function that takes in a review and returns a score of "Positive", "Negative" or "Neutral"

In [29]:
def review_rating(string):
    # Obtain the sentiment scores for the review
    scores = sid.polarity_scores(string)
    
    # Determine the sentiment category based on the compound score
    if scores['compound'] >= 0.05:
        return "Positive"
    elif scores['compound'] <= -0.05:
        return "Negative"
    else:
        return "Neutral"

In [30]:
# Test the function on your review above:
review_rating(review)

'Positive'