___

<a href='http://www.pieriandata.com'> <img src='../Pierian_Data_Logo.png' /></a>
___

# Sentiment Analysis Assessment - Solution

## Task #1: Perform vector arithmetic on your own words
Write code that evaluates vector arithmetic on your own set of related words. The goal is to come as close to an expected word as possible. Please feel free to share success stories in the Q&A Forum for this section!

In [3]:
# Import spaCy and load the language library. Remember to use a larger model!
import spacy
nlp = spacy.load('en_core_web_md')

In [6]:
# Choose the words you wish to compare, and obtain their vectors
words = ["lion", "witch", "wardrobe"]
word_vectors = []
for word in words:
    doc = nlp(word)
    # Check if the word is in the vocabulary
    if doc.has_vector:
        word_vector = doc.vector
        word_norm = doc.vector_norm
        word_vectors.append((word, word_norm))
    else:
        word_vectors.append((word, None))
for word, vector in word_vectors:
    if vector is not None:
        print(f"Word: {word}, Vector: {vector}")
    else:
        print(f"Word: {word} is not in the spaCy vocabulary.")

Word: lion, Vector: 55.14573486169834
Word: witch, Vector: 41.747587495539605
Word: wardrobe, Vector: 30.373551945600926


In [16]:
# Import spatial and define a cosine_similarity function
from sklearn.metrics.pairwise import cosine_similarity

def calculate_cosine_similarity(vector1, vector2):
    # Reshape the vectors if needed (for compatibility with sklearn)
    vector1 = vector1.reshape(1, -1)
    vector2 = vector2.reshape(1, -1)
    
    # Calculate the cosine similarity
    similarity = cosine_similarity(vector1, vector2)
#     print(similarity)
    return similarity[0][0]

# Example usage:
vector1 = nlp("king").vector
vector2 = nlp("ruler").vector

similarity = calculate_cosine_similarity(vector1, vector2)
print(f"Cosine Similarity: {similarity:.4f}")

Cosine Similarity: 0.7641


In [19]:
from scipy.spatial import distance

def calc_cosine_similarity(vector1, vector2):
    similarity = 1 - distance.cosine(vector1, vector2)
    return similarity

vector1 = nlp("king").vector
vector2 = nlp("ruler").vector

similarity = calculate_cosine_similarity(vector1, vector2)
print(f"Cosine Similarity: {similarity:.4f}")

Cosine Similarity: 0.7641


In [22]:
# Write an expression for vector arithmetic
# For example: new_vector = word1 - word2 + word3
king = nlp.vocab['king'].vector
ruler = nlp.vocab['ruler'].vector
# woman = nlp.vocab['woman'].vector
new_vector = king - ruler

In [23]:
# List the top ten closest vectors in the vocabulary to the result of the expression above
computed_similarities = []

for word in nlp.vocab:
    # Ignore words without vectors and mixed-case words:
    if word.has_vector:
        if word.is_lower:
            if word.is_alpha:
                similarity = calculate_cosine_similarity(new_vector, word.vector)
                computed_similarities.append((word, similarity))

computed_similarities = sorted(computed_similarities, key=lambda item: -item[1])

print([w[0].text for w in computed_similarities[:10]])

['king', 'nothin', 'havin', 'and', 'somethin', 'those', 'there', 'that', 'they', 'ought']


#### CHALLENGE: Write a function that takes in 3 strings, performs a-b+c arithmetic, and returns a top-ten result

In [20]:
def vector_math(a,b,c):
    aa = nlp(a).vector
    bb = nlp(b).vector
    cc = nlp(c).vector
    new_vector = aa - bb + cc
    computed_similarities = []

    for word in nlp.vocab:
        # Ignore words without vectors and mixed-case words:
        if word.has_vector:
            if word.is_lower:
                if word.is_alpha:
                    similarity = calc_cosine_similarity(new_vector, word.vector)
                    computed_similarities.append((word, similarity))

    computed_similarities = sorted(computed_similarities, key=lambda item: -item[1])

    print([w[0].text for w in computed_similarities[:10]])

In [21]:
# Test the function on known words:
vector_math('king','man','woman')

['king', 'ruler', 'and', 'that', 'havin', 'where', 'she', 'they', 'woman', 'somethin']


## Task #2: Perform VADER Sentiment Analysis on your own review
Write code that returns a set of SentimentIntensityAnalyzer polarity scores based on your own written review.

In [25]:
# Import SentimentIntensityAnalyzer and create an sid object
import nltk
nltk.download('vader_lexicon')
from nltk.sentiment.vader import SentimentIntensityAnalyzer
sid = SentimentIntensityAnalyzer()

[nltk_data] Downloading package vader_lexicon to
[nltk_data]     C:\Users\Max\AppData\Roaming\nltk_data...


In [34]:
# Write a review as one continuous string (multiple sentences are ok)
review = 'this horrible movie is actually very good.  I rate it 4 out of 5.'

In [35]:
# Obtain the sid scores for your review
sid.polarity_scores(review)

{'neg': 0.21, 'neu': 0.599, 'pos': 0.191, 'compound': -0.079}

### CHALLENGE: Write a function that takes in a review and returns a score of "Positive", "Negative" or "Neutral"

In [32]:
def review_rating(string):
    # Get sentiment scores for the review
    sentiment_scores = sid.polarity_scores(string)

    # Determine sentiment based on the compound score
    compound_score = sentiment_scores['compound']
    negative = sentiment_scores['neg']
    neutral = sentiment_scores['neu']
    positive = sentiment_scores['pos']

#     if compound_score >= 0.05:
#         return "Positive"
#     elif compound_score <= -0.05:
#         return "Negative"
#     else:
#         return "Neutral"

    largest_value = None
    largest_key = None

    # Iterate through the dictionary
    for key, value in sentiment_scores.items():
        if key != 'compound' and (largest_value is None or value > largest_value):
            largest_value = value
            largest_key = key
    
    if largest_key == 'pos':
        return "Positive"
    elif largest_key == 'neg':
        return "Negative"
    else:
        return "Neutral"

In [33]:
# Test the function on your review above:
review_rating(review)

'Neutral'

## Great job!