In [14]:
# This report outlines the development and evaluation of a sentiment analysis model applied to a dataset of Amazon product reviews.
# The primary objective of this project is to classify the sentiment of product reviews as positive, negative, or neutral, using natural language processing (NLP) techniques.

import spacy
import pandas as pd
from spacytextblob.spacytextblob import SpacyTextBlob

dataframe = pd.read_csv('amazon_product_reviews.csv')
nlp = spacy.load('en_core_web_sm')
nlp.add_pipe('spacytextblob')

  dataframe = pd.read_csv('amazon_product_reviews.csv')


<spacytextblob.spacytextblob.SpacyTextBlob at 0x1e0da342c50>

In [15]:
# The dataset, named "Consumer Reviews of Amazon Products," comprises customer reviews of various products sold on Amazon. 
# Each entry includes the review text, alongside other metadata such as the product ID, user ratings, and review timestamps. 
# For the purpose of this analysis, we focused exclusively on the reviews.text column, which contains the textual content of each review.

dataframe.describe()

Unnamed: 0,reviews.id,reviews.numHelpful,reviews.rating,reviews.userCity,reviews.userProvince
count,1.0,34131.0,34627.0,0.0,0.0
mean,111372787.0,0.630248,4.584573,,
std,,13.215775,0.735653,,
min,111372787.0,0.0,1.0,,
25%,111372787.0,0.0,4.0,,
50%,111372787.0,0.0,5.0,,
75%,111372787.0,0.0,5.0,,
max,111372787.0,814.0,5.0,,


In [16]:
# The preprocessing steps include:
# Removing Missing Values: Reviews with missing text are excluded from the analysis to ensure data quality.
# Text Cleaning: Utilizing the spaCy library, we remove punctuation, spaces, and stop words from the review text to focus on the meaningful content. 
# This step is crucial for reducing noise and improving the accuracy of the sentiment analysis.

clean_data = dataframe.dropna(subset=['reviews.text']) # removing all missing values
reviews_data = clean_data['reviews.text']

def process_review(review): # removing punctuation, spaces and stop words
    doc = nlp(review)
    token_list =    filtered_tokens = [token.orth_ for token in doc if not token.is_punct and not token.is_space and not token.is_stop]
    new_string = " ".join(token_list) # join the remaining tokens together to create a string
    return new_string

In [17]:
# The polarity scores range from -1 (very negative) to 1 (very positive), with 0 indicating a neutral sentiment.

def review_sentiment(review):
    doc = nlp(process_review(review))
    polarity = doc._.blob.polarity
    if polarity > 0:
        return 'Positive'
    elif polarity < 0:
        return 'Negative'
    else:
        return 'Neutral'

# To evaluate the model's performance, we randomly selected a sample of reviews and analyzed their sentiment.

for review in reviews_data.sample(10):
    print(f"Review : {review}")
    print(f"Sentiment: {review_sentiment(review)}")

Review : She started using it the same day and reads books every day now.
Sentiment: Neutral
Review : I bought this for myself as an early Christmas present and I'm in love. The sound quality is awesome and Alexa tells the funniest yet shortest stories in history.
Sentiment: Positive
Review : I bought it for my son. He loved it. I just don't like he can read my books at Amazon. I should be able create a sub account and limit it.
Sentiment: Positive
Review : This is a good product for reading and browsing Internet.
Sentiment: Positive
Review : Love the features and easy to set up and use.... would recommend !!!
Sentiment: Positive
Review : As d scribed
Sentiment: Neutral
Review : Got this the another day for my niece and she loves everything it has to offer
Sentiment: Neutral
Review : I've had every Kindle since they came out and have been impressed by the innovation, design, and overall quality. This one is the exception on all counts. Lightning issues on the bottom of the reader are d

**Strengths**

*Efficiency*: The model efficiently processes and analyzes large volumes of text, making it suitable for extensive datasets.

*Simplicity*: The use of spaCy and SpacyTextBlob simplifies the implementation of sentiment analysis, allowing for straightforward integration into NLP pipelines.

**Limitations**

The model may not accurately capture the sentiment in reviews with complex expressions, sarcasm, or subtle nuances due to the straightforward polarity-based approach.

The following sentiment analyses are inaccurate: 

Review: Got this the another day for my niece and she loves everything it has to offer<br>
Sentiment: Neutral<br>
Correct sentiment: Positive

Review: I've had every Kindle since they came out and have been impressed by the innovation, design, and overall quality. This one is the exception on all counts. Lightning issues on the bottom of the reader are distracting and I don't see a visible difference from the older models. Evidently, production is of higher value than quality.<br>
Sentiment: Positive<br>
Correct sentiment: Negative

Review: She started using it the same day and reads books every day now.<br>
Sentiment: Neutral<br>
Correct sentiment: Positive

**Conclusion**
The sentiment analysis project demonstrates the potential of using spaCy and SpacyTextBlob for analyzing consumer reviews. While the model shows promising results in identifying basic sentiments, ongoing enhancements are necessary to tackle the complexities of human language and improve accuracy further.