# Sentiment Analysis Assessment - Solution

## Task #1: Perform vector arithmetic on your own words
Write code that evaluates vector arithmetic on your own set of related words. The goal is to come as close to an expected word as possible. Please feel free to share success stories in the Q&A Forum for this section!

In [1]:
# Import spaCy and load the language library. Remember to use a larger model!
import spacy
nlp = spacy.load('en_core_web_md')

  return torch._C._cuda_getDeviceCount() if nvml_count < 0 else nvml_count


In [25]:
# Choose the words you wish to compare, and obtain their vectors
tokens = nlp(u'elephant mammal iguana')  # list of spaCy tokens 
for token1 in tokens:
    for token2 in tokens:
        print(token1.text, token2.text, token1.similarity(token2))

    print("\n")

elephant elephant 1.0
elephant mammal 0.49667876958847046
elephant iguana 0.579684853553772


mammal elephant 0.49667876958847046
mammal mammal 1.0
mammal iguana 0.49987295269966125


iguana elephant 0.579684853553772
iguana mammal 0.49987295269966125
iguana iguana 1.0




In [3]:
# Import spatial and define a cosine_similarity function
from scipy import spatial

cosine_similarity = lambda x, y: 1 - spatial.distance.cosine(x, y)

In [11]:
# Write an expression for vector arithmetic
# For example: new_vector = word1 - word2 + word3

words = ['elephant', 'mammal', 'iguana']
vectors = []

for word in words:
    vectors.append(nlp.vocab[word].vector)

# # Now we find the closest vector in the vocabulary to the result of "man" - "woman" + "queen"
new_vector = vectors[0] - vectors[1] + vectors[2]


In [9]:
# List the top ten closest vectors in the vocabulary to the result of the expression above
computed_similarities = []

for word in nlp.vocab:
    # Ignore words without vectors and mixed-case words:
    if word.has_vector:
        if word.is_lower:
            if word.is_alpha:
                similarity = cosine_similarity(new_vector, word.vector)
                computed_similarities.append((word, similarity))

computed_similarities = sorted(computed_similarities, key=lambda item: -item[1])

print([w[0].text for w in computed_similarities[:10]])

['iguana', 'elephant', 'goin', 'dare', 'nuff', 'ai', 'nt', 'cos', 'cuz', 'coz']


#### CHALLENGE: Write a function that takes in 3 strings, performs a-b+c arithmetic, and returns a top-ten result

In [12]:
def vector_math(a,b,c):
    # Obtain the vectors for the input words
    a_vector = nlp.vocab[a].vector
    b_vector = nlp.vocab[b].vector
    c_vector = nlp.vocab[c].vector
    
    # Perform the vector arithmetic
    result_vector = a_vector - b_vector + c_vector
    
    # Find the top ten closest vectors in the vocabulary to the result vector
    computed_similarities = []
    
    for word in nlp.vocab:
        # Ignore words without vectors and mixed-case words:
        if word.has_vector:
            if word.is_lower:
                if word.is_alpha:
                    similarity = cosine_similarity(result_vector, word.vector)
                    computed_similarities.append((word, similarity))
    
    computed_similarities = sorted(computed_similarities, key=lambda item: -item[1])
    
    return [w[0].text for w in computed_similarities[:10]]

In [13]:
# Test the function on known words:
vector_math('king','man','woman')

['king',
 'and',
 'that',
 'havin',
 'where',
 'she',
 'they',
 'woman',
 'somethin',
 'there']

## Task #2: Perform VADER Sentiment Analysis on your own review
Write code that returns a set of SentimentIntensityAnalyzer polarity scores based on your own written review.

In [16]:
# Import SentimentIntensityAnalyzer and create an sid object
import nltk
from nltk.sentiment.vader import SentimentIntensityAnalyzer

sid = SentimentIntensityAnalyzer()

In [17]:
# Write a review as one continuous string (multiple sentences are ok)
review = 'I think this is a good movie. The story is engaging. Best movie seen this year.'

In [18]:
# Obtain the sid scores for your review
sid.polarity_scores(review)

{'neg': 0.0, 'neu': 0.537, 'pos': 0.463, 'compound': 0.8591}

### CHALLENGE: Write a function that takes in a review and returns a score of "Positive", "Negative" or "Neutral"

In [19]:
def review_rating(string):
    # Obtain the sid scores for the review
    scores = sid.polarity_scores(string)
    
    # Determine the sentiment label based on the compound score
    if scores['compound'] >= 0.05:
        return 'Positive'
    elif scores['compound'] <= -0.05:
        return 'Negative'
    else:
        return 'Neutral'

In [20]:
# Test the function on your review above:
review_rating(review)

'Positive'