# Sentiment Analysis Assessment - Solution

## Task #1: Perform vector arithmetic on your own words
Write code that evaluates vector arithmetic on your own set of related words. The goal is to come as close to an expected word as possible. Please feel free to share success stories in the Q&A Forum for this section!

In [1]:
# Import spaCy and load the language library. Remember to use a larger model!
import numpy as np
import pandas as pd
import spacy


In [2]:
nlp = spacy.load("en_core_web_lg")

In [3]:
# Choose the words you wish to compare, and obtain their vectors
king = nlp.vocab['king'].vector
man = nlp.vocab['man'].vector
woman = nlp.vocab['woman'].vector

In [4]:
# Import spatial and define a cosine_similarity function
from scipy import spatial

cosine_similarity = lambda vec1,vec2: 1 - spatial.distance.cosine(vec1,vec2)


In [5]:
# Write an expression for vector arithmetic
# For example: new_vector = word1 - word2 + word3
new_vector = king - man + woman

In [12]:
# List the top ten closest vectors in the vocabulary to the result of the expression above

computed_similarities = []

for word in nlp.vocab:
    if word.has_vector:
        if word.is_lower:
            if word.is_alpha:
                similarities = cosine_similarity(new_vector, word.vector)
                computed_similarities.append((word, similarities))

In [13]:
computed_similarities = sorted(computed_similarities, key=lambda item:-item[1])

In [14]:
print([t[0].text for t in computed_similarities[:10]])

['king', 'and', 'that', 'where', 'she', 'they', 'woman', 'there', 'should', 'these']


#### CHALLENGE: Write a function that takes in 3 strings, performs a-b+c arithmetic, and returns a top-ten result

In [17]:
def vector_math(a,b,c):
    computed_similarities = []
    a = nlp.vocab[a].vector
    b = nlp.vocab[b].vector
    c = nlp.vocab[c].vector

    new_vect = a - b + c

    for word in nlp.vocab:
        if word.has_vector:
            if word.is_lower:
                if word.is_alpha:
                    similarities = cosine_similarity(new_vect, word.vector)
                    computed_similarities.append((word, similarities))


    # Sort the similarites in DESC
    computed_similarities = sorted(computed_similarities, key=lambda item:-item[1])

    return [t[0].text for t in computed_similarities[:10]]    

In [18]:
# Test the function on known words:
vector_math('king','man','woman')

['king',
 'and',
 'that',
 'where',
 'she',
 'they',
 'woman',
 'there',
 'should',
 'these']

## Task #2: Perform VADER Sentiment Analysis on your own review
Write code that returns a set of SentimentIntensityAnalyzer polarity scores based on your own written review.

In [19]:
# Import SentimentIntensityAnalyzer and create an sid object
from nltk.sentiment.vader import SentimentIntensityAnalyzer

sid = SentimentIntensityAnalyzer()

In [20]:
# Write a review as one continuous string (multiple sentences are ok)
review = 'I had a black T-shirt and i liked to wear it often'

In [21]:
# Obtain the sid scores for your review
sid.polarity_scores(review)

{'neg': 0.0, 'neu': 0.741, 'pos': 0.259, 'compound': 0.4215}

### CHALLENGE: Write a function that takes in a review and returns a score of "Positive", "Negative" or "Neutral"

In [22]:
scores = sid.polarity_scores(review)
scores

{'neg': 0.0, 'neu': 0.741, 'pos': 0.259, 'compound': 0.4215}

In [24]:
compound = scores["compound"]
compound

0.4215

In [44]:
def review_rating(string):
    scores = sid.polarity_scores(string)
    # compound = scores["compound"]
    if scores["compound"] == 0:
        return f"The review is neutral"
    elif scores["compound"] >=0:
        return f"The review is positive"
    else:
        return f"The review is negative"

In [45]:
# Test the function on your review above:
review_rating(review)

'The review is positive'

In [46]:
review2 = "This is an amazing book"
review_rating(review2)

'The review is positive'

In [47]:
sid.polarity_scores(review2)

{'neg': 0.0, 'neu': 0.513, 'pos': 0.487, 'compound': 0.5859}

In [48]:
review3 = "This is dirty T-shirt, it is not good to wear it"
review_rating(review3)

'The review is negative'

In [49]:
sid.polarity_scores(review3)

{'neg': 0.371, 'neu': 0.629, 'pos': 0.0, 'compound': -0.6492}

In [50]:
review4 = "This is an amazing awesome post."
review_rating(review4)

'The review is positive'

In [51]:
sid.polarity_scores(review4)

{'neg': 0.0, 'neu': 0.336, 'pos': 0.664, 'compound': 0.836}