## Sentiment Analysis of Amazon Reviews on Grocery Products

The notebook follows the tutorial of Amazon comprehend to generate an example sentiment analysis with Amazon reviews of grocery products:
https://aws.amazon.com/blogs/machine-learning/detect-sentiment-from-customer-reviews-using-amazon-comprehend/

First import the packages.

In [1]:
import pandas as pd
import numpy as np

import warnings
warnings.filterwarnings('ignore')

pd.set_option('display.max_columns', None)  
pd.set_option('display.max_colwidth', -1)

Now loading the tweets from trump.

In [5]:
df = pd.read_csv('data/amazon_reviews_us_Grocery_v1_00.tsv', delimiter='\t',encoding='utf-8')

In [6]:
df.head()

Unnamed: 0,marketplace,customer_id,review_id,product_id,product_parent,product_title,product_category,star_rating,helpful_votes,total_votes,vine,verified_purchase,review_headline,review_body,review_date,Unnamed: 15,Unnamed: 16,Unnamed: 17,Unnamed: 18,Unnamed: 19,Unnamed: 20,Unnamed: 21
0,US,42521656,R26MV8D0KG6QI6,B000SAQCWC,159714000.0,"The Cravings Place Chocolate Chunk Cookie Mix, 23-Ounce Bags (Pack of 6)",Grocery,5,0,0,N,Y,Using these for years - love them.,"As a family allergic to wheat, dairy, eggs, nuts, and several other things, we love the entire Cravings Place line of products as it allows us to bake treats with minimal effort and ingredients. Most allergy-free and gluten-free mixes usually just omit one or two allergens at most, so it's great to see a mix created without many of the most common allergens. (Note these still have soy and corn). We consume these on a regular basis and have been doing so for years.",31.08.15,,,,,,,
1,US,12049833,R1OF8GP57AQ1A0,B00509LVIQ,138680000.0,"Mauna Loa Macadamias, 11 Ounce Packages",Grocery,5,0,0,N,Y,Wonderful,"My favorite nut. Creamy, crunchy, salty, and slightly sweet - what more could you ask for?",31.08.15,,,,,,,
2,US,107642,R3VDC1QB6MC4ZZ,B00KHXESLC,252022000.0,Organic Matcha Green Tea Powder - 100% Pure Matcha (No Sugar Added - Unsweetened Pure Green Tea - No Coloring Added Like Others) 4oz,Grocery,5,0,0,N,N,Five Stars,This green tea tastes so good! My girlfriend loves it too.,31.08.15,,,,,,,
3,US,6042304,R12FA3DCF8F9ER,B000F8JIIC,752728000.0,15oz Raspberry Lyons Designer Dessert Syrup Sauce,Grocery,5,0,0,N,Y,Five Stars,I love Melissa's brand but this is a great second when I can't get Melissa's brand.,31.08.15,,,,,,,
4,US,18123821,RTWHVNV6X4CNJ,B004ZWR9RQ,552139000.0,"Stride Spark Kinetic Fruit Sugar Free Gum, 14-Count (Pack of 12)",Grocery,5,0,0,N,Y,Five Stars,good,31.08.15,,,,,,,


In [9]:
df.shape

(1048573, 22)

The dataset in total contains 1,048,573 rows. Let's now select only a subset of it. For example, we take a look at the reviews on 31.08.15.

In [22]:
df_sub = df[df['review_date']=='31.08.15']

Next we concatenate the review headline and the review body into a single review field.

In [25]:
df_sub[['review_headline', 'review_body']].head()

Unnamed: 0,review_headline,review_body
0,Using these for years - love them.,"As a family allergic to wheat, dairy, eggs, nuts, and several other things, we love the entire Cravings Place line of products as it allows us to bake treats with minimal effort and ingredients. Most allergy-free and gluten-free mixes usually just omit one or two allergens at most, so it's great to see a mix created without many of the most common allergens. (Note these still have soy and corn). We consume these on a regular basis and have been doing so for years."
1,Wonderful,"My favorite nut. Creamy, crunchy, salty, and slightly sweet - what more could you ask for?"
2,Five Stars,This green tea tastes so good! My girlfriend loves it too.
3,Five Stars,I love Melissa's brand but this is a great second when I can't get Melissa's brand.
4,Five Stars,good


In [24]:
df_sub['review'] = df_sub[['review_headline', 'review_body']].apply(lambda x: '.'.join(x), axis=1)

Now generate for each review an txt file. After that we load the txt files in AWS S3 bucket.

In [28]:
file = 'data/amazon_grocery/amazon_grocery_review_id_{}.txt'

for index, row in df_sub.iterrows():
    review_id = row['review_id']
    with open (file.format(review_id), 'w') as f:
        f.write(str(row['review']))

Create table with Amazon Athena to query the results of sentiment analysis. We run the query to get the rows of results for the Amazon grocery. Download the csv and merge it with the df_sub.

``` mysql
CREATE EXTERNAL TABLE IF NOT EXISTS default.ReviewSentimentAnalysis (
  `ImageLocation` string,
  `Timestamp` string,   
  `Sentiment` string,
  `Positive` string,
  `Negative` string,
  `Neutral` string,
  `Mixed` string
  )
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
WITH SERDEPROPERTIES (
  'serialization.format' = ',',
  'field.delim' = ','
) LOCATION 's3://<bucket_name>/sentiment/'
;

SELECT *
FROM default.ReviewSentimentAnalysis
WHERE imagelocation LIKE '%amazon%'
;
```

In [30]:
df_result = pd.read_csv('data/result_amazon_grocery.csv')
df_result.head()

Unnamed: 0,imagelocation,timestamp,sentiment,positive,negative,neutral,mixed
0,review-sentiment-s3-1ewrtbdvfi15z/amazon_grocery_review_id_R1CMEO6BCHWC3K.txt,2019-04-26 14:17:32,POSITIVE,0.910112,0.001168,0.000472,0.088247
1,review-sentiment-s3-1ewrtbdvfi15z/amazon_grocery_review_id_R1E8LRM1528KK8.txt,2019-04-26 13:46:50,POSITIVE,0.996879,0.000103,0.001988,0.00103
2,review-sentiment-s3-1ewrtbdvfi15z/amazon_grocery_review_id_R1D188PYGVD7GT.txt,2019-04-26 13:45:49,POSITIVE,0.536372,0.044552,0.018155,0.400921
3,review-sentiment-s3-1ewrtbdvfi15z/amazon_grocery_review_id_R1108D7VO1N7SZ.txt,2019-04-26 14:13:07,POSITIVE,0.998824,9e-06,0.00048,0.000688
4,review-sentiment-s3-1ewrtbdvfi15z/amazon_grocery_review_id_R1V5RJ1AO38HBQ.txt,2019-04-26 14:18:26,NEGATIVE,0.005037,0.833196,0.000521,0.161246


In [37]:
import re

df_result['imagelocation']
df_result['review_id'] = df_result['imagelocation'].str.extract('review-sentiment-s3-1ewrtbdvfi15z/amazon_grocery_review_id_(.*).txt')

In [39]:
df_result_2 = pd.merge(df_result, df_sub, how='left', on=['review_id'])

In [53]:
df_result_2[df_result_2['sentiment']=='NEGATIVE'][['sentiment', 'positive', 'negative', 'neutral', 'mixed', 'product_id', 'product_title', 'review_headline', 'review_body']]\
    .sort_values(by=['negative'], ascending=False).head()

Unnamed: 0,sentiment,positive,negative,neutral,mixed,product_id,product_title,review_headline,review_body
857,NEGATIVE,1.7e-05,0.99978,4.4e-05,0.000159,B010EOQGCC,"Smart Sips, Chocolate Obsession Gourmet Coffee Variety Sampler Pack, 24 Count for Keurig K-cup Brewers - Chocolate Cherry Cordial, Chocolate Peanut Butter, Chocolate Amaretto, Chocolate Raspberry, White Chocolate Hazelnut Truffle, Chocolate Orange",Waste of money,No flavor. Waste of money.
844,NEGATIVE,2e-06,0.999477,0.00034,0.000181,B00A5QWXCC,Candy By The Pound - 5 Pound Bag of Chewy Spree,SCAM!!!!!!!!!!!!,i recieved a 1 pound bag of Lemonheads this is scam to get your money
440,NEGATIVE,5e-06,0.999342,4.6e-05,0.000607,B0076YG8PY,"SeaSnax Grab and Go Roasted Seaweed Snack, Spicy Chipotle, 0.18-Ounce (Pack of 6)",These were absolutely horrible and I ended up throwing them away,These were absolutely horrible and I ended up throwing them away. What a complete waste of money. Took one bite and spit it out. Lesson learned......
424,NEGATIVE,1.4e-05,0.999176,1.3e-05,0.000797,B00IYGXFX6,"Instant Hot Cereal, Certified Paleo, Gluten & Grain Free, Unsweetened, 6.7 oz",It tasted awful. Even after adding fresh blueberries and ...,It tasted awful. Even after adding fresh blueberries and cinnamon.
868,NEGATIVE,4e-05,0.998887,0.000111,0.000962,B001FXIMWO,Halloween Mini Candy Bars Chocolate Mini Favorites Candies 5 Pound Bag,Candy Arrived Crushed.,Candy arrived crushed. Inedible.


In [54]:
df_result_2[df_result_2['sentiment']=='POSITIVE'][['sentiment', 'positive', 'negative', 'neutral', 'mixed', 'product_id', 'product_title', 'review_headline', 'review_body']]\
    .sort_values(by=['positive'], ascending=False).head()

Unnamed: 0,sentiment,positive,negative,neutral,mixed,product_id,product_title,review_headline,review_body
812,POSITIVE,0.999958,1.606578e-07,1.2e-05,3e-05,B0012271TS,"Paradise Tropical Tea, 1 Ounce Filter Packs (Pack of 50)",Love this product,"The tea bags are restaurant sized - HUGE! They brew approx 10 cups of tea each, to which i then add an additional 10 cups of water. This far exceeds my expectations. And, each bag comes individually sealed. Love this product!!"
85,POSITIVE,0.999898,1.986674e-06,7.2e-05,2.8e-05,B00TXN80IY,Sumatra Sizes,This is my favorite of the varieties-fresh beans,"This is my favorite of the varieties-fresh beans, complex flavors and outstanding value for organic coffee - just love the nose and the palate of this single source blend."
957,POSITIVE,0.99988,1.642476e-06,3.8e-05,8.1e-05,B000SATIGO,Davidson's Tea Bulk,and this is one of the best I've had,"I'm a big fan of Assam tea, and this is one of the best I've had. Very malty and satisfying!"
888,POSITIVE,0.99987,7.466485e-07,9.4e-05,3.5e-05,B00M8M2SKS,"Z Natural Foods Coconut Milk Powder, 100% USDA Certified Organic, 1 lb.",Great for Bread Recipes,This coconut powder works perfect for my bread recipes as a powdered milk substitute. The re-seal-able pouch it comes in is just bonus! Love this product.
281,POSITIVE,0.999869,8.190264e-07,3.5e-05,9.5e-05,B0001VKKOO,Bragg - All Natural Liquid Aminos All Purpose Seasoning Spray,love it,This is amazing for spraying onto popcorn if you like savory treats!
