# Sentiment Decider

Welcome to your first day on the job at the fancy tech company, Mamazon. Here at Mamazon, we like to learn all that we can from our customers to make sure that they are happy with our products. In order to do this, we analyze all of the reviews that are posted for every product. Our two main goals are:
1. For all products with more positive reviews, we want to promote those by ranking them higher in the search results. For all products with more negative reviews, we want to punish those by ranking them lower in the search results.
2. Since every review has a text element and a rating element, we want to check to make sure that the reviews are accurate. So for every review, we want to check to see if the sentiment for the review matches the rating that reviewer gave for the product.

## Task #1: Create Your Sentiment Analyzer

### Step 1: What's the sentiment of all of the reviews combined?
To get us started on the right track, you will be creating a function that analyzes the sentiment of all of Mamazon reviews that we will be analyzing. In order to create this function, you will have access to 2 text files: positivewords.txt and negativewords.txt. You will start by creating a dictionary for all of the positive words and a dictionary for all of the negative words. For each word you encounter from all of the amazon reviews, you will keep track of how many positive words you come across and how many negative words you come across.

Your function should take in three parameters: positiveWordsFile, negativeWordsFile, and reviewsFile. These are all strings that represent the name of those files, respectively. Your function will need to:
1. Create the dictionaries to store the positive and negative words
2. Open and read the review file
3. Parse all of the words in the file
4. Count how many positive and negative words you've encountered
5. Print the quanitity of positive and negative words
5. Return a string that lets us know if the majority of the reviews were positive or negative. If there are more positive reviews, return "The Reviews are Mostly Positive". If there are more negative reviews, return "The Reviews are Mostly Negative".

In [1]:
#First things first... what is a dictionary?
#A dictionoary is a NEW datatype! This is the last datatype we will be learning about
#in this class.

#Think about a dictionary like a real-world dictionary
#You have keys (== the name of a word in a real dictionary)
#You have values (== the definition of a word in a real dictionary)

#Dictionaries are also really similar to lists
#But instead of sotring things as an index-->value pair
#list: 0,1 ---> ["item1","item2"]
#Dictionary is a key-->value pair
#dictionary: "key1" --> "value1"
#"key1" --> "value2"
#etc..


#The "key" is like an index that you can access directly

#Syntax:
#{
#"key1":value1,
#"key2":value2,
#etc....
#}

#The keys can be any unique string: the values can be ANY DATATYPE


In [2]:
myDictionary = {
    "key1":"this is the value for key1",
    "key2":["this","is","the","value","for","key",2],
    "third":46789
}

print(myDictionary)

{'key1': 'this is the value for key1', 'key2': ['this', 'is', 'the', 'value', 'for', 'key', 2], 'third': 46789}


In [3]:
#If you want to acccess a value in your dictionary,
#it's just like if you were to look up a word in a real dictionary
#You look for it by accessing its key

myDictionary["key1"]

'this is the value for key1'

In [4]:
#if you want to update a value that already exists:
myDictionary["key1"] = "this is the new value for key1"
print(myDictionary)

{'key1': 'this is the new value for key1', 'key2': ['this', 'is', 'the', 'value', 'for', 'key', 2], 'third': 46789}


In [5]:
#If you want to add an entirely new key/value pair to the dictionary
myDictionary["new key"] = "this a new value for a new key"
print(myDictionary)

{'key1': 'this is the new value for key1', 'key2': ['this', 'is', 'the', 'value', 'for', 'key', 2], 'third': 46789, 'new key': 'this a new value for a new key'}


In [6]:
#unlike lists, dictionaries are NOT ordered...
#this means that if we wanted to search for a specific 
#key or a specific value, we can't assume it will be at a certain
#location in the dictionary. Also, sometimes we might
#not know if the key/value pair exists in the dictionary at all!
#if we do want to check to see if a key/value is in a dictionary,
#we can do the following:

if "key1" in myDictionary:
    print("key1 is in the dictionary")
else:
    print("key1 not in the dictionary")

key1 is in the dictionary


In [7]:
#this is useful because we can't always assume that a key is in the dictionary
#if we do, we will probably run into an error
myDictionary["key that doesn't exist"]

KeyError: "key that doesn't exist"

In [8]:
#this is why it's useful to check:
if "key that doesn't exist" in myDictionary:
    print(myDictionary["key that doesn't exist"])
else:
    print("the key wasn't in the dictionary and yay we avoided an error!")

the key wasn't in the dictionary and yay we avoided an error!


In [9]:
#Last useful tip for dictionaries:
#if we want to loop through all of the key value pairs in a dictionary,
#we can use a for loop in the same way we can loop
#through the items in a list or the characters in a string:

#Syntax:
for item in myDictionary:
    print(item)

key1
key2
third
new key


In [10]:
#This just gave us the keys, we are missing the values....
#Does anyone have any idea how to print the values in the for loop as well?

for key in myDictionary:
    print("key: " + key)
    #something extra to get the values here
    #remember this is how we access a value
    #using the key for a dictionary
    print(myDictionary[key])

key: key1
this is the new value for key1
key: key2
['this', 'is', 'the', 'value', 'for', 'key', 2]
key: third
46789
key: new key
this a new value for a new key


In [11]:
myStr = "this is a long sentence pretend this is a review from mamazon"
reviewWordList = myStr.split()
print(reviewWordList)

['this', 'is', 'a', 'long', 'sentence', 'pretend', 'this', 'is', 'a', 'review', 'from', 'mamazon']


In [12]:
def totalReviewSentiment(positiveWordsFile, negativeWordsFile, reviewsFile):
    
    #Create the dictionaries to store the positive and negative words
    #open the positive words file
    #go through all the positive words--> come up with a list of all of the positive words
    #where each item in the list is one of the positive words
    posFileHandler = open(positiveWordsFile,"r")
    posContent = posFileHandler.read() #remember this returns a string of all the content in the file
    posWordsList = posContent.splitlines() #splitlines returns a list where each item is a line in the file
    
    
    #open the negative words file
    #go through all the negative words
    negFileHandler = open(negativeWordsFile,"r")
    negContent = negFileHandler.read() #remember this returns a string of all the content in the file
    negWordsList = negContent.splitlines() #splitlines returns a list where each item is a line in the file
    
    
    #create an empty dictionary like we would do with an empty list
    posWordsDict = {}
    negWordsDict = {}
    
    
    #put those words into a positive word dictionary
    for word in posWordsList:
        posWordsDict[word] = 0
    #put those words into a negative word dictionary
    for word in negWordsList:
        negWordsDict[word] = 0
    
    
    ####################################################################################
    
    
    # Open and read the review file
    #Get all of the words from this file into a list
    reviewsFileHandler = open(reviewsFile,"r")
    reviewString = reviewsFileHandler.read()
    reviewWordList = reviewString.split()
    
    
    # Parse all of the words in the file
        #for every single word we encounter
        #we want to check if that word exists in the positive or negative dictionary
        #if it exists in the positive dictionary, then update that specific
        #word's "count"
        #if it exists in the negative dictionary, then update that specific word's "count"
    for word in reviewWordList:
        if word in posWordsDict:
            posWordsDict[word] += 1
        if word in negWordsDict:
            negWordsDict[word] += 1
    
    
    ####################################################################################
    
    
    # Count how many positive and negative words you've encountered
    
    #variables to store the total positive and total negative number of words we've encountered
    totalPosWords = 0
    totalNegWords = 0
    
    # Loop through the positive dictionary
    for key in posWordsDict:
        numTimesThisWordUsed = posWordsDict[key]
        # Count how many total words were used
        totalPosWords += numTimesThisWordUsed
    print("total times a positive word was used in all of our reviews: " + str(totalPosWords))
        
    # Loop through the negative dictionary
    for key in negWordsDict:
        numTimesThisWordUsed = negWordsDict[key]
        # Count how many total words were used
        totalNegWords += numTimesThisWordUsed
    print("total times a negative word was used in all of our reviews: " + str(totalNegWords))
    
    
    # Return a string that lets us know if the majority of the reviews were positive or negative. 
    if totalPosWords > totalNegWords:
        # If there are more positive reviews, return "The Reviews are Mostly Positive". 
        return "The Reviews are Mostly Positive"
    #If there are more negative reviews, return "The Reviews are Mostly Negative".
    else:
        return "The Reviews are Mostly Negative"
    

In [13]:
totalReviewSentiment("positivewords.txt","negativewords.txt","reviews.txt")

total times a positive word was used in all of our reviews: 613312
total times a negative word was used in all of our reviews: 344311


'The Reviews are Mostly Positive'

In [14]:
totalReviewSentiment("positivewords.txt","negativewords.txt","reviews.txt")

total times a positive word was used in all of our reviews: 613312
total times a negative word was used in all of our reviews: 344311


'The Reviews are Mostly Positive'

Phew. that is one long function! Is there any way we can make this function smaller?

In [15]:
#Answer: Functional Decomposition
#What is functional decomposition?
#It is a way to make our functions a bit easier to understand
#and also a bit less complex. Each function that you write should realistically
#accomplish one main task. If you find that your function is doing too many things
#consider splitting it into multiple functions
#And calling these functions in order to complete the sub-tasks

#One "main" function that calls all of your "sub-functions"

#how could we split up the above function?
#1. Open the files and get the words in a list
#2. Create the dictionaries with their default values
#3. Open the reviews + keep track of the pos/neg word counts in each dictionary
#4. Calculate the final pos/neg word counts and return

In [16]:
def getEmptyPosDictionary(posWords):
    #2. Create the dictionaries with their default values
    posWordsDict = {}
    for word in posWords:
        if word not in posWordsDict:
            posWordsDict[word] = 0
    return posWordsDict

In [17]:
def getEmptyNegDictionary(negWords):
    negWordsDict = {}
    for word in negWords:
        if word not in negWordsDict:
            negWordsDict[word] = 0
            
    return negWordsDict

In [18]:
def countWords(reviewsFile,posWordsDict,negWordsDict):
    #3. Open the reviews + keep track of the pos/neg word counts in each dictionary
    reviewsFileHandler = open(reviewsFile, "r")
    allReviews = reviewsFileHandler.read()
    allWords = allReviews.split()
    for word in allWords:
        if word in posWordsDict:
            posWordsDict[word] += 1
        elif word in negWordsDict:
            negWordsDict[word] += 1
    return[posWordsDict,negWordsDict]

In [19]:
def getPosNegCount(posWordsDict,negWordsDict):
    #4. Calculate the final pos/neg word counts and return
    totalPosWords = 0
    totalNegWords = 0
    for posWord in posWordsDict:
        numTimesPosWordUsed = posWordsDict[posWord]
        if numTimesPosWordUsed > 0:
            totalPosWords += numTimesPosWordUsed
    for negWord in negWordsDict:
        numTimesNegWordUsed = negWordsDict[negWord]
        if numTimesNegWordUsed > 0:
            totalNegWords += numTimesNegWordUsed
    return[totalPosWords,totalNegWords]

In [20]:
# You can also take this a step further and put each function in its own cell!
#Just make sure that any function that is called is defined *ABOVE* (before) where it is called
def totalReviewSentiment(positiveWordsFile, negativeWordsFile, reviewsFile):
    #1. Open the files and get the words in a list
    posFileHandler = open(positiveWordsFile,"r")
    posContent = posFileHandler.read()
    posWords = posContent.splitlines()
    
    negFileHandler = open(negativeWordsFile,"r")
    negContent = negFileHandler.read()
    negWords = negContent.splitlines()
    
    #2. Create the dictionaries with their default values
    posWordsDict = getEmptyPosDictionary(posWords)
    negWordsDict = getEmptyNegDictionary(negWords)
    
    #3. Open the reviews + keep track of the pos/neg word counts in each dictionary
    [posWordsDict,negWordsDict] = countWords(reviewsFile,posWordsDict,negWordsDict)
    
    #4. Calculate the final pos/neg word counts and return
    [totalPosWords,totalNegWords] = getPosNegCount(posWordsDict,negWordsDict)
    
    print("Total positive words used: " + str(totalPosWords))
    print("Total negative words used: " + str(totalNegWords))
    
    if totalPosWords > totalNegWords:
        return("The Reviews Are Mostly Positive")
    else:
        return("The Reviews Are Mostly Negative")
    

### Step 2: What's the sentiment of a single review?
For this task, you will be creating a function that analyzes the sentiment of a single Mamazon review. This function will take in a string *review*, and calculate the sentiment for that review. It will also take in *posWordsDict* and *negWordsDict* which both represent dictionaries of each of the positive and negative words.

In order to calculate the sentiment, we are going to first calculate just the positive sentiment. This can be calculated by dividing the number of positive words over the number of total words for a review.

Example #1: If I had 3 positive words and 2 negative words in a sentence, then my **positive sentiment** would be positive words / total words --> 3/5 --> 0.6

Example #2: If I had 12 negative words and 6 positive words in a sentence, then my **positive sentiment** would be positive words / total words --> 6/18 --> 0.3333333333333333

Now that we have our positive sentiment caclulated... how do we calculate the negative sentiment?
Well, it's just the inverse of the positive sentiment!

Example #1: negative words / total words --> 2/5 --> 0.4 == 1 - 0.6(positive sentiment)

Example #2: negative words / total words --> 12/18 --> 0.66666666 == 1 - 0.3333333(positive sentiment)

How can we tell if a review is more positive or more negative?

Well, if we just calculate the positive sentiment - then we can check if that number is above 0.5 or below 0.5.

positiveSentiment > 0.5 == review that is more positive than negative
positiveSentiment < 0.5 == review that is more negative than positive

For your function, you will return a string that states whether this review was more positive or more negative. If it was more positive, print "This review has a positive sentiment". If the review is more negative, print "This review has a negative sentiment".

In [None]:
def reviewSentiment(review,posWordsDict,negWordsDict):
    #your code here
    
    posWords = 0
    negWords = 0
    
    #get all the words from the review
    reviewWordsList = review.split()
    
    #loop through the words from the review
    for word in reviewWordsList:
        #1. is the word positive or negative?
        if word in posWordsDict:
            #do something
            posWords += 1
        #2. count how many positive and negative words we are encountering
        if word in negWordsDict:
            #do something
            negWords += 1
        
    #calculate the positive sentiment / negative sentiment of the review
        #hint: you only need to calculate the positive sentiment ratio and check if it is
        #above or below 0.5 to know the answer this
    #3. how do we calculate the positive sentiment ratio?
        positiveSentiment = posWords / (posWords + negWords)
    
    4. return a string that says whether this review is more positive or negative
    print(positiveSentiment)
    if positiveSentiment > 0.5:
        print("This review has a positive sentiment")
    elif positiveSentiment < 0.5:
        print("This review has a negative sentiment")
    else:
        print("This review was entirely neutral")

In [22]:
#Test the function using code that we already wrote:
posFileHandler = open("positivewords.txt","r")
posContent = posFileHandler.read()
posWords = posContent.splitlines()

negFileHandler = open("negativewords.txt","r")
negContent = negFileHandler.read()
negWords = negContent.splitlines()

posWordsDict = getEmptyPosDictionary(posWords)
negWordsDict = getEmptyNegDictionary(negWords)

#our testing code here

In [23]:
review = "I really enjoyed this product!!!"
reviewSentiment(review,posWordsDict,negWordsDict)

1.0
This review has a positive sentiment


In [24]:
review = "I really hated this product!!!"
reviewSentiment(review,posWordsDict,negWordsDict)

0.0
This review has a negative sentiment


In [25]:
review = "I really hated and loved this product but ultimately it was okay!!!"
reviewSentiment(review,posWordsDict,negWordsDict)

0.5
This review was entirely neutral


## Task #2: Can You Beat The Hackers? Let's "Game" The Algorithm

Many hackers and producers of products for Mamazon have figured out how to "game" our algorithm. They have found different ways to promote their products higher up on our recommendation algorithm by getting their products ranked higher on the search page. There are many ways they have done this, and one of them is to write fake reviews that our algorithm thinks are really positive (even if they aren't real).

In order to prep for this kind of algorithmic "gaming" - let's step into the role of the hacker. Try to come up with several examples of reviews that you can pass into your function above that seem "positive" to a human, but come out as "negative" to your algorithm and visa versa. What are some interesting trends you notice? How did you figure out ways to "game" the algorithm?

In [45]:
#these are some examples... there are many more examples the students can come up with!
review = "This product was an a+ accurately depicted amazing disaster"
reviewSentiment(review,posWordsDict,negWordsDict)

0.75
This review has a positive sentiment


In [44]:
review = "I'm baffled at how freaking goofy the film was"
reviewSentiment(review,posWordsDict,negWordsDict)

0.0
This review has a negative sentiment


## Task #3: Put it All Together

We have written code for you in order for us to loop through all of the Mamazon reviews and come up with the average positive sentiment for the reviews... but there is something wrong with our code! We've encountered 3 hours and we can't figure out how to debug them. The location of the errors is depicted below. Please help us fix our function so we can finish out our sentiment decider!

In [34]:
def getReviewSentiment(review,posWordsDict,negWordsDict):
    #your code here
    
    posWords = 0
    negWords = 0
    
    #get all the words from the review
    reviewWordsList = review.split()
    
    #loop through the words from the review
    for word in reviewWordsList:
        #1. is the word positive or negative?
        if word in posWordsDict:
            #do something
            posWords += 1
        #2. count how many positive and negative words we are encountering
        if word in negWordsDict:
            #do something
            negWords += 1
        
    #calculate the positive sentiment / negative sentiment of the review
        #hint: you only need to calculate the positive sentiment ratio and check if it is
        #above or below 0.5 to know the answer this
    #3. how do we calculate the positive sentiment ratio?
    if posWords > 0 and negWords > 0:
        positiveSentiment = posWords / (posWords + negWords)
    else:
        return 0.5
    return positiveSentiment

In [50]:
#FIX THE FOLLOWING FUNCTION:
def sentimentForAllReviews(positiveWordsFile, negativeWordsFile, reviewsFile):
    posFileHandler = open(positiveWordsFile,"r")
    posContent = posFileHandler.read()
    posWords = posContent.splitlines()
    
    negFileHandler = open(negativeWordsFile,"r")
    negContent = negFileHandler.read()
    negWords = negContent.splitlines()
    
    #Create the dictionaries with their default values
    posWordsDict = getEmptyPosDictionary(posWords)
    negWordsDict = getEmptyNegDictionary(negWords)
    
    
    #FIND THE ERRORS BELOW!
    #Hint: There are 3 errors :-)
    
    #1. Change the open mode to "r" instead of "a" - then explain what "a" means
    reviewFileHandler = open(reviewsFile,"r")
    #2. Make sure you get your string of the entire file using the .read() functin
    #before you call the spitlines() function on the contents of the file
    reviewString = reviewFileHandler.read()
    allReviewList = reviewString.splitlines()
    
    sentimentList = []
    
    for review in allReviewList:
        #call the function we wrote!
        sentiment = getReviewSentiment(review,posWordsDict,negWordsDict)
        #Use the .append() function to add an item to a list, not the += operator
        sentimentList.append(sentiment)
        
        
    #END OF ERRORS
        
    #get the average sentiment of all the reviews
    totalSentiment = 0
    for sentiment in sentimentList:
        totalSentiment += sentiment
    averageSentiment = totalSentiment/len(sentimentList)
    
    return averageSentiment
    
    

In [52]:
#This is the output that we want to see!
averageSentiment = sentimentForAllReviews("positivewords.txt","negativewords.txt","reviews.txt")
print("The average sentiment of all of the reviews is: " + str(averageSentiment))

The average sentiment of all of the reviews is: 0.5405544234784411


Amazing work! On behalf of everyone at Mamazon, we want to thank you for your dedication to our company and to our customers, but especially to those who create the products. We are sure they will be grateful for your sentiment analysis algorithm for their reviews so they can get the rankings they *deserve*!!!