# Sentiment Analysis: Datafiniti Amazon Consumer Reviews of Amazon Products
**Task is to develop a Python program that performs sentiment analysis on a dataset of product reviews.**

**Dataset Information:**

This dataset is a list of over 28,000 consumer reviews for Amazon products like the Kindle, Fire TV Stick, and more from Datafiniti's Product Database updated between February 2019 and April 2019. Each product listing includes the name Amazon in the Brand and Manufacturer field. All fields within this dataset have been flattened, with some omitted, to streamline your data analysis. This version is a sample of a large dataset. The full dataset is available through Datafiniti.

To run the Sentiment Analysis, please download the the dataset as a CSV named: Datafiniti_Amazon_Consumer_Reviews_of_Amazon_Products_May19.csv

Kaggle link for dataset: [Datafiniti Amazon Consumer Reviews of Amazon Products](https://www.kaggle.com/datasets/datafiniti/consumer-reviews-of-amazon-products?resource=download&select=Datafiniti_Amazon_Consumer_Reviews_of_Amazon_Products_May19.csv)

**Libraries needed for compatibility:**
- panda -> Handles the dataset like a spreadsheet
- spacy -> Natural Language Processing
- textblob -> Calculates sentiment score

In [1]:
# Import libraries
import pandas as pd
import spacy
from textblob import TextBlob

# Loading Spacy model and Dataset:

In [2]:
# Load Spacy Model
nlp = spacy.load("en_core_web_md")

# Load dataset from CSV
amazon_df = pd.read_csv('Datafiniti_Amazon_Consumer_Reviews_of_Amazon_Products_May19.csv')
amazon_df.head()

Unnamed: 0,id,dateAdded,dateUpdated,name,asins,brand,categories,primaryCategories,imageURLs,keys,...,reviews.didPurchase,reviews.doRecommend,reviews.id,reviews.numHelpful,reviews.rating,reviews.sourceURLs,reviews.text,reviews.title,reviews.username,sourceURLs
0,AVpgNzjwLJeJML43Kpxn,2015-10-30T08:59:32Z,2019-04-25T09:08:16Z,AmazonBasics AAA Performance Alkaline Batterie...,"B00QWO9P0O,B00LH3DMUO",Amazonbasics,"AA,AAA,Health,Electronics,Health & Household,C...",Health & Beauty,https://images-na.ssl-images-amazon.com/images...,"amazonbasics/hl002619,amazonbasicsaaaperforman...",...,,,,,3,https://www.amazon.com/product-reviews/B00QWO9...,I order 3 of them and one of the item is bad q...,... 3 of them and one of the item is bad quali...,Byger yang,"https://www.barcodable.com/upc/841710106442,ht..."
1,AVpgNzjwLJeJML43Kpxn,2015-10-30T08:59:32Z,2019-04-25T09:08:16Z,AmazonBasics AAA Performance Alkaline Batterie...,"B00QWO9P0O,B00LH3DMUO",Amazonbasics,"AA,AAA,Health,Electronics,Health & Household,C...",Health & Beauty,https://images-na.ssl-images-amazon.com/images...,"amazonbasics/hl002619,amazonbasicsaaaperforman...",...,,,,,4,https://www.amazon.com/product-reviews/B00QWO9...,Bulk is always the less expensive way to go fo...,... always the less expensive way to go for pr...,ByMG,"https://www.barcodable.com/upc/841710106442,ht..."
2,AVpgNzjwLJeJML43Kpxn,2015-10-30T08:59:32Z,2019-04-25T09:08:16Z,AmazonBasics AAA Performance Alkaline Batterie...,"B00QWO9P0O,B00LH3DMUO",Amazonbasics,"AA,AAA,Health,Electronics,Health & Household,C...",Health & Beauty,https://images-na.ssl-images-amazon.com/images...,"amazonbasics/hl002619,amazonbasicsaaaperforman...",...,,,,,5,https://www.amazon.com/product-reviews/B00QWO9...,Well they are not Duracell but for the price i...,... are not Duracell but for the price i am ha...,BySharon Lambert,"https://www.barcodable.com/upc/841710106442,ht..."
3,AVpgNzjwLJeJML43Kpxn,2015-10-30T08:59:32Z,2019-04-25T09:08:16Z,AmazonBasics AAA Performance Alkaline Batterie...,"B00QWO9P0O,B00LH3DMUO",Amazonbasics,"AA,AAA,Health,Electronics,Health & Household,C...",Health & Beauty,https://images-na.ssl-images-amazon.com/images...,"amazonbasics/hl002619,amazonbasicsaaaperforman...",...,,,,,5,https://www.amazon.com/product-reviews/B00QWO9...,Seem to work as well as name brand batteries a...,... as well as name brand batteries at a much ...,Bymark sexson,"https://www.barcodable.com/upc/841710106442,ht..."
4,AVpgNzjwLJeJML43Kpxn,2015-10-30T08:59:32Z,2019-04-25T09:08:16Z,AmazonBasics AAA Performance Alkaline Batterie...,"B00QWO9P0O,B00LH3DMUO",Amazonbasics,"AA,AAA,Health,Electronics,Health & Household,C...",Health & Beauty,https://images-na.ssl-images-amazon.com/images...,"amazonbasics/hl002619,amazonbasicsaaaperforman...",...,,,,,5,https://www.amazon.com/product-reviews/B00QWO9...,These batteries are very long lasting the pric...,... batteries are very long lasting the price ...,Bylinda,"https://www.barcodable.com/upc/841710106442,ht..."


# Preprocessing Text Data:

**Conducting initial exploration of the dataset to get better understanding of the data.**
- Gather initial information regarding the dataset
- Store 'reviews.text' column in a Data Frame for later analysis
- Check for any missing values within the columns and drop if any are found.
- Generate function that strips extra spaces, lowercases text, tokenization and removes stop words using spacy.

In [20]:
# Initial Data Exploration
print("Initial Data Exploration:\n")
print(f"Amazon DataFrame Shape: {amazon_df.shape}")
print("\nAmazon DataFrame Columns and Missing Values:")
print(amazon_df.isnull().sum())

# Get 'review.text' column for analysis
reviews_data = amazon_df['reviews.text']

# Remove rows with missing 'reviews.text'
Cleaned_df = amazon_df.dropna(subset=['reviews.text'])

# Function that removes stop words after standardizing text
def preprocess(text):
    text = str(text).lower().strip()
    doc = nlp(text)
    tokens = [token.text for token in doc if not token.is_stop]
    return " ".join(tokens)

Initial Data Exploration:

Amazon DataFrame Shape: (28332, 24)

Amazon DataFrame Columns and Missing Values:
id                         0
dateAdded                  0
dateUpdated                0
name                       0
asins                      0
brand                      0
categories                 0
primaryCategories          0
imageURLs                  0
keys                       0
manufacturer               0
manufacturerNumber         0
reviews.date               0
reviews.dateSeen           0
reviews.didPurchase    28323
reviews.doRecommend    12246
reviews.id             28291
reviews.numHelpful     12217
reviews.rating             0
reviews.sourceURLs         0
reviews.text               0
reviews.title              0
reviews.username           5
sourceURLs                 0
dtype: int64


# Sentiment Analysis Function: 

**Generating a function that will take product reviews as an input and will predict the reviewers sentiment. Input will be cleaned using the data preprocessing function that we generated earlier. The code will utilize TextBlob to get polarity (-1 to 1) that measures the strength of the review based on positivity and negativity.**
 
- Score closer to -1 -> Negative
- Score closer to 0  -> Neutral
- Score closer to 1  -> Positive

In [21]:
# Sentiment analysis function
def analyze_sentiment(text):
    blob = TextBlob(text)
    polarity = blob.sentiment.polarity
    if polarity > 0:
        return 'Positive'
    elif polarity < 0:
        return 'Negative'
    else:
        return 'Neutral'

# Testing Sentiment Analysis Function on Sample Reviews:

**Running sentiment function on the first 5 reviews to verify the models accuracy in predicting sentiment.**

In [25]:
# Testing sentiment function on sample reviews
print("\nTesting sentiment on sample reviews:")
sample_reviews = Cleaned_df['reviews.text'].head(5)
for review in sample_reviews:
    sentiment = analyze_sentiment(review)
    score = TextBlob(review).sentiment.polarity
    print(f"Review: {review}\nSentiment: {sentiment} - Polarity Score: {score:.4f}\n")


Testing sentiment on sample reviews:
Review: I order 3 of them and one of the item is bad quality. Is missing backup spring so I have to put a pcs of aluminum to make the battery work.
Sentiment: Negative - Polarity Score: -0.4500

Review: Bulk is always the less expensive way to go for products like these
Sentiment: Negative - Polarity Score: -0.3333

Review: Well they are not Duracell but for the price i am happy.
Sentiment: Positive - Polarity Score: 0.8000

Review: Seem to work as well as name brand batteries at a much better price
Sentiment: Positive - Polarity Score: 0.5000

Review: These batteries are very long lasting the price is great.
Sentiment: Positive - Polarity Score: 0.2450

