# Sentiment Analysis Assessment - Solution

## Task #1: Perform vector arithmetic on your own words
Write code that evaluates vector arithmetic on your own set of related words. The goal is to come as close to an expected word as possible. Please feel free to share success stories in the Q&A Forum for this section!

In [1]:
# Import spaCy and load the language library. Remember to use a larger model!
import spacy
nlp = spacy.load('en_core_web_md')

In [2]:
# Choose the words you wish to compare, and obtain their vectors
tokens = nlp(u'ambulance motorcycle car')  # list of spaCy tokens 
for token1 in tokens:
    for token2 in tokens:
        print(token1.text, token2.text, token1.similarity(token2))

for token in tokens:
    print(token.text, token.has_vector, token.vector_norm, token.is_oov)

ambulance ambulance 1.0
ambulance motorcycle 0.38579902052879333
ambulance car 0.40373408794403076
motorcycle ambulance 0.38579902052879333
motorcycle motorcycle 1.0
motorcycle car 0.710739016532898
car ambulance 0.40373408794403076
car motorcycle 0.710739016532898
car car 1.0
ambulance True 32.378574 False
motorcycle True 29.74049 False
car True 83.99086 False


In [3]:
# Import spatial and define a cosine_similarity function
from scipy import spatial

cosine_similarity = lambda x, y: 1 - spatial.distance.cosine(x, y)

In [4]:
# Write an expression for vector arithmetic
# For example: new_vector = word1 - word2 + word3
ambulance = nlp.vocab['ambulance'].vector
motorcycle = nlp.vocab['motorcycle'].vector
car = nlp.vocab['car'].vector

# Now we find the closest vector in the vocabulary to the result of "man" - "woman" + "queen"
new_vector = ambulance - motorcycle + car

In [5]:
# List the top ten closest vectors in the vocabulary to the result of the expression above
computed_similarities = []

for word in nlp.vocab:
    # Ignore words without vectors and mixed-case words:
    if word.has_vector:
        if word.is_lower:
            if word.is_alpha:
                similarity = cosine_similarity(new_vector, word.vector)
                computed_similarities.append((word, similarity))

computed_similarities = sorted(computed_similarities, key=lambda item: -item[1])

print([w[0].text for w in computed_similarities[:10]])

['car', 'ambulance', 'motorcycle', 'when', 'you', 'somethin', 'it', 'space', 'they', 'where']


#### CHALLENGE: Write a function that takes in 3 strings, performs a-b+c arithmetic, and returns a top-ten result

In [6]:
def vector_math(a,b,c):
    # Obtain the vectors for the input words
    a_vector = nlp.vocab[a].vector
    b_vector = nlp.vocab[b].vector
    c_vector = nlp.vocab[c].vector
    
    # Perform the vector arithmetic
    result_vector = a_vector - b_vector + c_vector
    
    # Find the top ten closest vectors in the vocabulary to the result vector
    computed_similarities = []
    
    for word in nlp.vocab:
        # Ignore words without vectors and mixed-case words:
        if word.has_vector:
            if word.is_lower:
                if word.is_alpha:
                    similarity = cosine_similarity(result_vector, word.vector)
                    computed_similarities.append((word, similarity))
    
    computed_similarities = sorted(computed_similarities, key=lambda item: -item[1])
    
    return [w[0].text for w in computed_similarities[:10]]

In [7]:
# Test the function on known words:
vector_math('king','man','woman')

['king',
 'and',
 'that',
 'havin',
 'where',
 'she',
 'they',
 'woman',
 'somethin',
 'there']

## Task #2: Perform VADER Sentiment Analysis on your own review
Write code that returns a set of SentimentIntensityAnalyzer polarity scores based on your own written review.

In [8]:
import nltk
nltk.download('vader_lexicon')

[nltk_data] Downloading package vader_lexicon to
[nltk_data]     C:\Users\chris\AppData\Roaming\nltk_data...
[nltk_data]   Package vader_lexicon is already up-to-date!


True

In [9]:
# Import SentimentIntensityAnalyzer and create an sid object
from nltk.sentiment.vader import SentimentIntensityAnalyzer

sid = SentimentIntensityAnalyzer()

In [10]:
# Write a review as one continuous string (multiple sentences are ok)
review = 'It was nice. It was good. It was great. It was also bad.'

In [11]:
# Obtain the sid scores for your review
sid.polarity_scores(review)

{'neg': 0.157, 'neu': 0.404, 'pos': 0.439, 'compound': 0.743}

### CHALLENGE: Write a function that takes in a review and returns a score of "Positive", "Negative" or "Neutral"

In [12]:
def review_rating(string):
    # Obtain the sid scores for the review
    scores = sid.polarity_scores(string)
    
    # Determine the sentiment label based on the compound score
    if scores['compound'] >= 0.05:
        return 'Positive'
    elif scores['compound'] <= -0.05:
        return 'Negative'
    else:
        return 'Neutral'

In [13]:
# Test the function on your review above:
review_rating(review)

'Positive'