## Introduction
Welcome to this interactive Jupyter notebook on Sentiment Analysis using product reviews. This exercise will help you learn how to process text data, analyze sentiment, and apply basic NLP techniques.

## Setup
Ensure you have the necessary libraries installed and imported.

In [3]:
%pip install nltk scikit-learn textblob
import nltk
from sklearn.feature_extraction.text import CountVectorizer

import ssl

try:
    _create_unverified_https_context = ssl._create_unverified_context
except AttributeError:
    pass
else:
    ssl._create_default_https_context = _create_unverified_https_context

nltk.download('punkt')
nltk.download('stopwords')

--- Logging error ---
Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/pip/_internal/utils/logging.py", line 177, in emit
    self.console.print(renderable, overflow="ignore", crop=False, style=style)
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/pip/_vendor/rich/console.py", line 1673, in print
    extend(render(renderable, render_options))
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/pip/_vendor/rich/console.py", line 1305, in render
    for render_output in iter_render:
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/pip/_internal/utils/logging.py", line 134, in __rich_console__
    for line in lines:
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/pip/_vendor/rich/segment.py", line 249, in split_lines
    for segment in segments:
  File "/Library/

[nltk_data] Downloading package punkt to
[nltk_data]     /Users/jacquelineyu/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package stopwords to
[nltk_data]     /Users/jacquelineyu/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


True

## Product Reviews
Below is an array of positive and negative product reviews that we will analyze.

In [4]:
reviews = ['I absolutely love this product! Highly recommend to everyone.', "Fantastic quality! I'm very happy with my purchase.", 'This is the best thing I have bought in a long time!', 'Completely satisfied with the product and service.', 'Five stars, will buy again!', 'This product does exactly what it says, fantastic!', 'Incredible performance and very easy to use.', 'I am so pleased with this purchase, worth every penny!', 'Great value for money and quick delivery.', 'The best on the market, hands down!', 'Such a great purchase, very pleased!', 'Product is of high quality and super durable.', 'Surpassed my expectations, absolutely wonderful!', 'This is amazing, I love it so much!', 'The product works wonderfully and is well made.', 'Not what I expected, quite disappointed.', 'The quality is not as advertised, very upset.', 'This was a waste of money, would not buy again.', 'Poor quality and did not meet my expectations.', "I regret buying this, it's awful.", 'Terrible product, do not waste your money!', 'Very unsatisfied with the purchase, it broke within a week.', 'Not worth the price, very misleading.', "The worst purchase I've ever made!", "Disappointed with the product, it's not good at all."]

## Text Cleaning Exercise
Clean the text data by converting to lowercase, removing punctuation, and filtering out stopwords.

In [10]:
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
import string

def clean_text(reviews):
    cleaned_reviews = []
    for review in reviews:
        # Tokenize the review
        tokens = word_tokenize(review.lower())
        # remove punctuations
        tokens = [word for word in tokens if word not in string.punctuation]
        # Remove punctuation and stopwords
        stopwords_list = stopwords.words('english')
        cleaned_tokens = [word for word in tokens if word not in stopwords_list]
        cleaned_reviews.append(' '.join(cleaned_tokens))
    return cleaned_reviews

# Clean the reviews
cleaned_reviews = clean_text(reviews)
print(cleaned_reviews)

['absolutely love product highly recommend everyone', "fantastic quality 'm happy purchase", 'best thing bought long time', 'completely satisfied product service', 'five stars buy', 'product exactly says fantastic', 'incredible performance easy use', 'pleased purchase worth every penny', 'great value money quick delivery', 'best market hands', 'great purchase pleased', 'product high quality super durable', 'surpassed expectations absolutely wonderful', 'amazing love much', 'product works wonderfully well made', 'expected quite disappointed', 'quality advertised upset', 'waste money would buy', 'poor quality meet expectations', "regret buying 's awful", 'terrible product waste money', 'unsatisfied purchase broke within week', 'worth price misleading', "worst purchase 've ever made", "disappointed product 's good"]


## Sentiment Analysis Exercise
Perform sentiment analysis using simple word counting. Identify positive and negative words, and classify the reviews based on the counts.

In [13]:
from sklearn.feature_extraction.text import CountVectorizer

positive_words = ['love', 'fantastic', 'best', 'incredible', 'pleased', 'great', 'amazing', 'high', 'wonderful', 'satisfied']
negative_words = ['disappointed', 'waste', 'poor', 'regret', 'terrible', 'unsatisfied', 'broke', 'worst', 'not']

def analyze_sentiment(reviews):
    results = []
    for review in reviews:
        # Get count of positive and negative words in the review
        # create vectorizer
        vectorizer = CountVectorizer(vocabulary=positive_words+negative_words)
        # fit and transform reviews
        vectors = vectorizer.fit_transform([review])
        # Determine sentiment as positive or negative
        positive_count = sum(vectors.toarray()[0][:len(positive_words)])
        negative_count = sum(vectors.toarray()[0][len(positive_words):])
        if positive_count > negative_count:
            sentiment = 'Positive'
        else:
            sentiment = 'Negative'
        results.append((review, sentiment))
    return results

# Analyze the sentiment of cleaned reviews
sentiment_results = analyze_sentiment(cleaned_reviews)
for result in sentiment_results:
    print(result)
    
#TODO: Are the reviews mostly positive or negative?
# the reviews are mostly positive

('absolutely love product highly recommend everyone', 'Positive')
("fantastic quality 'm happy purchase", 'Positive')
('best thing bought long time', 'Positive')
('completely satisfied product service', 'Positive')
('five stars buy', 'Negative')
('product exactly says fantastic', 'Positive')
('incredible performance easy use', 'Positive')
('pleased purchase worth every penny', 'Positive')
('great value money quick delivery', 'Positive')
('best market hands', 'Positive')
('great purchase pleased', 'Positive')
('product high quality super durable', 'Positive')
('surpassed expectations absolutely wonderful', 'Positive')
('amazing love much', 'Positive')
('product works wonderfully well made', 'Negative')
('expected quite disappointed', 'Negative')
('quality advertised upset', 'Negative')
('waste money would buy', 'Negative')
('poor quality meet expectations', 'Negative')
("regret buying 's awful", 'Negative')
('terrible product waste money', 'Negative')
('unsatisfied purchase broke within

In [22]:
from textblob import TextBlob

sentiments = []
sentiment_scores = []

for review in reviews:
    blob = TextBlob(review)
    # Get the sentiment score (polarity) of the review
    sentiment = blob.sentiment.polarity
    # append to sentiment_scores list
    sentiment_scores.append(sentiment)
    # Classify the sentiment as positive, negative or neutral
    if sentiment > 0.0:
        sentiment = 'positive'
    elif sentiment < 0.0:
        sentiment = 'negative'
    else:
        sentiment = 'neutral'
        
    # Append the sentiment score to the sentiments list
    sentiments.append(sentiment)

# get average sentiment score
average_sentiment_score = sum(sentiment_scores) / len(sentiment_scores)

# get overall sentiment
if average_sentiment_score > 0.0:
    overall_sentiment = 'Positive'
elif average_sentiment_score < 0.0:
    overall_sentiment = 'Negative'
else:
    sentiment = 'Neutral'

for i, review in enumerate(reviews):
    print(review, '- Sentiment:', sentiment_scores[i])

print('Average Sentiment Score:', average_sentiment_score)
print('Overall Sentiment:', overall_sentiment)

#TODO: Calculate the average sentiment score


I absolutely love this product! Highly recommend to everyone. - Sentiment: 0.3925
Fantastic quality! I'm very happy with my purchase. - Sentiment: 0.75
This is the best thing I have bought in a long time! - Sentiment: 0.46875
Completely satisfied with the product and service. - Sentiment: 0.5
Five stars, will buy again! - Sentiment: 0.0
This product does exactly what it says, fantastic! - Sentiment: 0.375
Incredible performance and very easy to use. - Sentiment: 0.7316666666666667
I am so pleased with this purchase, worth every penny! - Sentiment: 0.4375
Great value for money and quick delivery. - Sentiment: 0.5666666666666667
The best on the market, hands down! - Sentiment: 0.4027777777777778
Such a great purchase, very pleased! - Sentiment: 0.5375
Product is of high quality and super durable. - Sentiment: 0.24666666666666665
Surpassed my expectations, absolutely wonderful! - Sentiment: 1.0
This is amazing, I love it so much! - Sentiment: 0.45
The product works wonderfully and is well

## Conclusion
Congratulations on completing this exercise! You've learned how to clean text data and perform basic sentiment analysis.