### Sentiment Analysis (w/ Machine Learning)

Sentiment Analysis is the process of determining the sentiment or emotional tone expressed in a piece of text. It typically classifies text into categories such as positive, negative, or neutral. This task is commonly used in understanding opinions, reviews, social media posts, and customer feedback.

#### Common aspects of Sentiment Analysis include:

- Positive Sentiment: Text that expresses happiness, satisfaction, or approval (e.g., "I love this product!").
- Negative Sentiment: Text that expresses dissatisfaction, anger, or disapproval (e.g., "I hate waiting in long lines!").
- Neutral Sentiment: Text that is neither positive nor negative, and is more factual or indifferent (e.g., "The meeting starts at 3 PM.").

In short, Sentiment Analysis helps to automatically assess and categorize the emotions or opinions expressed in text, which can be useful in various domains such as marketing, customer service, and social media monitoring.

#### Sentiment Analysis Examples:

- Positive Sentiment: "I love this product!"
- Negative Sentiment: "I hate waiting in long lines!"
- Neutral Sentiment: "The meeting starts at 3 PM."


---


#### Real-Life Application of Sentiment Analysis Using the Amazon Dataset &rarr; [Amazon_Dataset.csv](https://raw.githubusercontent.com/pycaret/pycaret/master/datasets/amazon.csv)


In [14]:
# Import
import nltk
import pandas as pd
from nltk.corpus import stopwords
from nltk.sentiment.vader import SentimentIntensityAnalyzer
from nltk.stem import WordNetLemmatizer
from nltk.tokenize import word_tokenize
from sklearn.metrics import classification_report, confusion_matrix

nltk.download("vader_lexicon")
nltk.download("stopwords")
nltk.download("punkt")
nltk.download("wordnet")
nltk.download("omw-1.4")

[nltk_data] Downloading package vader_lexicon to
[nltk_data]     C:\Users\iscie\AppData\Roaming\nltk_data...
[nltk_data]   Package vader_lexicon is already up-to-date!
[nltk_data] Downloading package stopwords to
[nltk_data]     C:\Users\iscie\AppData\Roaming\nltk_data...
[nltk_data]   Package stopwords is already up-to-date!
[nltk_data] Downloading package punkt to
[nltk_data]     C:\Users\iscie\AppData\Roaming\nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package wordnet to
[nltk_data]     C:\Users\iscie\AppData\Roaming\nltk_data...
[nltk_data]   Package wordnet is already up-to-date!
[nltk_data] Downloading package omw-1.4 to
[nltk_data]     C:\Users\iscie\AppData\Roaming\nltk_data...
[nltk_data]   Package omw-1.4 is already up-to-date!


True

In [6]:
# Dataset
df = pd.read_csv("../data/Amazon_Dataset.csv")
print(df.head())

                                          reviewText  Positive
0  This is a one of the best apps acording to a b...         1
1  This is a pretty good version of the game for ...         1
2  this is a really cool game. there are a bunch ...         1
3  This is a silly game and can be frustrating, b...         1
4  This is a terrific game on any pad. Hrs of fun...         1


In [12]:
# Text cleaning and preprocessing
def preprocess_text(text):
    # Tokenize
    tokens = word_tokenize(text.lower())

    # Stop words
    filtered_tokens = [
        token for token in tokens if token not in stopwords.words("english")
    ]

    # Lemmatize
    lemmatizer = WordNetLemmatizer()
    lemmatized_tokens = [lemmatizer.lemmatize(token) for token in filtered_tokens]

    # Join words
    processed_text = " ".join(lemmatized_tokens)

    return processed_text


df["reviewText2"] = df["reviewText"].apply(preprocess_text)

In [13]:
# NLTK sentiment analyzer
analyzer = SentimentIntensityAnalyzer()


def get_sentiment(text):
    scores = analyzer.polarity_scores(text)

    sentiment = 1 if scores["pos"] > 0 else 0

    return sentiment


df["sentiment"] = df["reviewText2"].apply(get_sentiment)

In [17]:
# Evaluation - Test
print("Confusion Matrix:\n", confusion_matrix(df["Positive"], df["sentiment"]))
print()
print(
    "Classification Report:\n", classification_report(df["Positive"], df["sentiment"])
)

Confusion Matrix:
 [[ 1131  3636]
 [  576 14657]]

Classification Report:
               precision    recall  f1-score   support

           0       0.66      0.24      0.35      4767
           1       0.80      0.96      0.87     15233

    accuracy                           0.79     20000
   macro avg       0.73      0.60      0.61     20000
weighted avg       0.77      0.79      0.75     20000

