## Naive Bayes Implementation for Predictions

In [1]:
# Standard library imports
import numpy as np
import regex as re

# Third-party library imports
import pandas as pd
from sklearn.preprocessing import LabelBinarizer
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, confusion_matrix
import nltk
from datasets import load_dataset
from sklearn.naive_bayes import BernoulliNB
from sklearn.naive_bayes import MultinomialNB

# NLTK-specific download
nltk.download("punkt")
from nltk.corpus import stopwords 

[nltk_data] Downloading package punkt to
[nltk_data]     /Users/samieahmad/nltk_data...
[nltk_data]   Package punkt is already up-to-date!


## 1. Loading the Datasets

We are dealing with these datasets:

1. **Golf Dataset**: This dataset aims to explore factors that influence the decision to play golf, which could be valuable for predictive modeling tasks. ​​

2. **Tweet Evaluation Dataset**: Instead of downloading the dataset manually, we will be using the [`datasets`](https://huggingface.co/docs/datasets) library from Hugging Face to automatically download and manage the Tweet Eval dataset. This library is part of the Hugging Face ecosystem, widely used for Natural Language Processing (NLP) tasks. The `datasets` library not only downloads the dataset but also offers a standardized interface for accessing and handling the data, making it compatible with other popular libraries like Pandas and PyTorch. Format each split of the dataset into a Pandas DataFrame. The columns should be `text` and `label`, where `text` is the sentence and `label` is the emotion label. The goal is to classify tweets into various emotional categories (e.g., joy, sadness, anger) by analyzing their content.


In [2]:
# code here
golf_data = pd.read_csv('golf_data.csv')  # Replace with correct path

In [3]:
# code here
tweet_data = load_dataset('tweet_eval', 'emotion', cache_dir="datasets")

Generating train split:   0%|          | 0/3257 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/1421 [00:00<?, ? examples/s]

Generating validation split:   0%|          | 0/374 [00:00<?, ? examples/s]

##### Before proceeding with further tasks, ensure you have determined which type of Naive Bayes is most suitable for each dataset.


We will use **Multinomial Naive Bayes** for golf dataset and **Bernoulli Naive Bayes** for the tweet dataset.


In [4]:
train_data = pd.DataFrame(tweet_data['train'])
train_data.to_csv('emotion_train_data.csv', index=False)

validation_data = pd.DataFrame(tweet_data['validation'])
validation_data.to_csv('emotion_validation_data.csv', index=False)

test_data = pd.DataFrame(tweet_data['test'])
test_data.to_csv('emotion_test_data.csv', index=False)

print("Shape of tweets train data: ", train_data.shape)
print("Shape of tweets validation data: ", validation_data.shape)
print("Shape of tweets test data: ", test_data.shape)

Shape of tweets train data:  (3257, 2)
Shape of tweets validation data:  (374, 2)
Shape of tweets test data:  (1421, 2)


## 2. Data Preprocessing

### 2.1 Preprocessing the Golf Dataset

Applying one hot encoding and using `sklearn's` `train_test_split` to split the data, keeping test size = 0.3.

In [5]:

# One hot encoding values using the 'map' function
golf_data['Month'] = golf_data['Month'].map({'Winter': 1, 'Non-Winter': 0})
golf_data['Season'] = golf_data['Season'].map({'Winter': 1, 'Non-Winter': 0})
golf_data['Temperature'] = golf_data['Temperature'].map({'high': 1, 'low': 0})
golf_data['Humidity'] = golf_data['Humidity'].map({'high': 1, 'low': 0})
golf_data['Outlook'] = golf_data['Outlook'].map({'sunny': 1, 'not sunny': 0})
golf_data['Crowdedness'] = golf_data['Crowdedness'].map({'high': 1, 'not high': 0})

# Split the features (X) and target (y)
X = golf_data.drop(columns='Play')
y = golf_data['Play']

# Split the dataset into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

print(f"Training dataset shape: {X_train.shape}, {y_train.shape}")
print(f"Test dataset shape: {X_test.shape}, {y_test.shape}")


Training dataset shape: (5365, 8), (5365,)
Test dataset shape: (2300, 8), (2300,)


### 2.2 Preprocessing the Tweet Eval Dataset

We are going to pre-process the data to ensure it's in a clean format for further analysis. The following steps should be performed:

- Remove any URL.
- Remove punctuation and non-alphanumeric characters.
- Convert all text to lowercase.
- Remove any extra whitespace.
- Eliminate common stopwords.


In [6]:
nltk.download('stopwords')

def preprocess_dataset(text):
    text = re.sub(r'http\S+|www\S+|https\S+', '', text,)      #URLs
    text = re.sub(r'[^a-zA-Z0-9\s]', '', text)      #Punctuation and non-alphanumeric characters
    text = text.lower() 
    text = re.sub(r'\s+', ' ', text).strip()   #Extra whitespaces
    stop_words = set(stopwords.words('english'))  # Remove stopwords
    text = ' '.join([word for word in text.split() if word not in stop_words])
    return text

train_data['processed_text'] = train_data['text'].apply(preprocess_dataset)
validation_data['processed_text'] = validation_data['text'].apply(preprocess_dataset)
test_data['processed_text'] = test_data['text'].apply(preprocess_dataset)




[nltk_data] Downloading package stopwords to
[nltk_data]     /Users/samieahmad/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


## 3. Implementing Naive Bayes from Scratch

## 3.1 Bernoulli Naive Bayes

### From Scratch

Recall that the Bernoulli Naive Bayes model is based on **Bayes' Theorem**:

$$
P(y \mid x) = \frac{P(x \mid y)P(y)}{P(x)}
$$

What we really want is to find the class \(c\) that maximizes \(P(c \mid x)\), so we can use the following equation:

$$
\hat{c} = \underset{c}{\text{argmax}} \ P(c \mid x) = \underset{c}{\text{argmax}} \ P(x \mid c)P(c)
$$

In the case of **Bernoulli Naive Bayes**, we assume that each word \(x_i\) in a sentence follows a **Bernoulli distribution**, meaning that the word either appears (1) or does not appear (0) in the document. We can simplify the formula using this assumption:

$$
\hat{c} = \underset{c}{\text{argmax}} \ P(c) \prod_{i=1}^{n} P(x_i = 1 \mid c)^{x_i} P(x_i = 0 \mid c)^{1 - x_i}
$$

Where:

- $x_i = 1$ if the $i^{\text{th}}$ word is present in the document.
- $x_i = 0$ if the $i^{\text{th}}$ word is not present in the document.


We can estimate $P(c)$ by counting the number of times each class appears in our training data, and dividing by the total number of training examples. We can estimate $P(x_i = 1 \mid c)$ by counting the number of documents in class $c$ that contain the word $x_i$, and dividing by the total number of documents in class $c$.

### **Important: Laplace Smoothing**

When calculating $P(x_i = 1 \mid c)$ and $P(x_i = 0 \mid c)$, we apply **Laplace smoothing** to avoid zero probabilities. This is essential because, without it, any word that has not appeared in a document of class $c$ will have a probability of zero, which would make the overall product zero, leading to incorrect classification.

**Reason**: Laplace smoothing ensures that we don't encounter zero probabilities by adding a small constant (typically 1) to both the numerator and the denominator. This is particularly useful when a word has never appeared in the training data for a specific class.

The smoothed probability formula is:

$$
P(x_i = 1 \mid c) = \frac{\text{count of documents in class } c \text{ where } x_i = 1 + 1}{\text{total documents in class } c + 2}
$$

This ensures no word has a zero probability, even if it was unseen in the training data.

### Avoiding Underflow with Logarithms:

To avoid underflow errors due to multiplying small probabilities, we apply logarithms, which convert the product into a sum:

$$
\hat{c} = \underset{c}{\text{argmax}} \ \log P(c) + \sum_{i=1}^{n} \left[ x_i \log P(x_i = 1 \mid c) + (1 - x_i) \log P(x_i = 0 \mid c) \right]
$$




In [7]:


# Calculate the priors for each class
def calculatePriors(y_train):
    unique_classes, counts = np.unique(y_train, return_counts=True)
    length = len(y_train)
    priors = counts / length
    return dict(zip(unique_classes, priors))

#Calculate the likelihoods for each class with Laplace smoothing
def calculateLikelihoods(X_train, y_train):
    unique_classes = np.unique(y_train)
    likelihoods = {}

    for c in unique_classes:
        class_indices = np.where(y_train == c)[0]
        class_data = X_train[class_indices]  # Filter the data for class c

        feature_sums = class_data.sum(axis=0)
        smoothed_probabilities = (feature_sums + 1) / (len(class_data) + 2)  # Add 1 for Laplace smoothing
        likelihoods[c] = smoothed_probabilities
    
    return likelihoods


def bnbClassifier(test_data, priors, likelihoods):
    unique_classes = list(priors.keys())  # Get unique class labels
    predictions = []
    
    for sample in test_data:
        log_posteriors = []  # Stores posterior probabilities for each class
        
        for c in unique_classes:
            log_prior = np.log(priors[c])

            log_likelihood_present = np.log(likelihoods[c]) * sample
            log_likelihood_absent = np.log(1 - likelihoods[c]) * (1 - sample)
            log_likelihood_total = np.sum(log_likelihood_present + log_likelihood_absent)

            log_posterior = log_prior + log_likelihood_total
            log_posteriors.append(log_posterior)

        predicted_class = unique_classes[np.argmax(log_posteriors)]
        predictions.append(predicted_class)
    
    return np.array(predictions)

In [8]:
# This code block contains function definitions

priors = calculatePriors(np.array(y_train))
likelihoods = calculateLikelihoods(np.array(X_train), np.array(y_train))
y_predicted = bnbClassifier(np.array(X_test), priors, likelihoods)

We will train a Naive Bayes model on the training data, and generate predictions for the Validation Set and will also report the metrics for it

In [9]:

accuracy = accuracy_score(y_test, y_predicted)

precision_class0 = precision_score(y_test, y_predicted, pos_label=0)
recall_class0 = recall_score(y_test, y_predicted, pos_label=0)
f1_score_class0 = f1_score(y_test, y_predicted, pos_label=0)

precision_class1 = precision_score(y_test, y_predicted, pos_label=1)
recall_class1 = recall_score(y_test, y_predicted, pos_label=1)
f1_score_class1 = f1_score(y_test, y_predicted, pos_label=1)

# Confusion matrix remains the same
confusion_matrix = confusion_matrix(y_test, y_predicted)

# Report the results for class 0
print(f"Accuracy: {accuracy:.3f}")
print()
print(f"Precision (Class 0): {precision_class0:.3f}")
print(f"Precision (Class 1): {precision_class1:.3f}")
print()
print(f"Recall (Class 0): {recall_class0:.3f}")
print(f"Recall (Class 1): {recall_class1:.3f}")
print()
print(f"F1 Score (Class 0): {f1_score_class0:.3f}")
print(f"F1 Score (Class 1): {f1_score_class1:.3f}")
print()
print("Confusion Matrix:")
print(confusion_matrix)


Accuracy: 0.820

Precision (Class 0): 0.824
Precision (Class 1): 0.519

Recall (Class 0): 0.993
Recall (Class 1): 0.034

F1 Score (Class 0): 0.900
F1 Score (Class 1): 0.063

Confusion Matrix:
[[1872   13]
 [ 401   14]]


## 3.2 Multinomial Naive Bayes (Manual Implementation)

### Vectorizing sentences with Bag of Words

Now that we have loaded in our data, we will need to vectorize our sentences - this is necessary to be able to numericalize our inputs before feeding them into our model. 

We will be using a Bag of Words approach to vectorize our sentences. This is a simple approach that counts the number of times each word appears in a sentence. 

The element at index $\text{i}$ of the vector will be the number of times the $\text{i}^{\text{th}}$ word in our vocabulary appears in the sentence. So, for example, if our vocabulary is `["the", "cat", "sat", "on", "mat"]`, and our sentence is `"the cat sat on the mat"`, then our vector will be `[2, 1, 1, 1, 1]`.

We will now create a `BagOfWords` class to vectorize our sentences. This will involve creating

1. A vocabulary from our corpus

2. A mapping from words to indices in our vocabulary

3. A function to vectorize a sentence in the fashion described above



In [10]:
class BagOfWords:
    def __init__(self):
        self.vocabulary = {}
    
    def fit(self, text):
        unique_words = set()  # Use a set to store unique words

        for sentence in text:
            words = sentence.split()
            unique_words.update(words)
        
        sorted_words = sorted(unique_words)
        sorted_words = enumerate(sorted_words) # Enumerate the words to get index
        
        for index, word in sorted_words:
                self.vocabulary[word] = index
        
    def vectorize(self, sentence):
        vector_size = len(self.vocabulary)
        vector = np.zeros(vector_size, dtype=int)
        words = sentence.split()
        
        for word in words:
            if word in self.vocabulary:
                index = self.vocabulary[word]
                vector[index] += 1
        
        return vector


For a sanity check, we have manually set the vocabulary of your `BagOfWords` object to the vocabulary of the example above, and check that the vectorization of the sentence is correct.

In [11]:
bow = BagOfWords()
bow.vocabulary = {"the": 0, "cat": 1, "sat": 2, "on": 3, "mat": 4}
sentence1 = "the cat sat on the mat"
sentence2 = "the cat sat"
sentence3 = "the the cat sat cat on on the mat"

vector = bow.vectorize(sentence1)
vector2 = bow.vectorize(sentence2)
vector3 = bow.vectorize(sentence3)

print("Vectorized sentence:", vector)
print("Vectorized sentence2:", vector2)
print("Vectorized sentence3:", vector3)

Vectorized sentence: [2 1 1 1 1]
Vectorized sentence2: [1 1 1 0 0]
Vectorized sentence3: [3 2 1 2 1]


In [12]:
bow = BagOfWords()
bow.fit(train_data['processed_text'])

train_data_vectors = np.array([bow.vectorize(text) for text in train_data['processed_text']])
validation_data_vectors = np.array([bow.vectorize(text) for text in validation_data['processed_text']])
test_data_vectors = np.array([bow.vectorize(text) for text in test_data['processed_text']])

print(train_data_vectors.shape, validation_data_vectors.shape, test_data_vectors.shape)

(3257, 8515) (374, 8515) (1421, 8515)




### From Scratch

Now that we have vectorized our sentences, we can implement our Naive Bayes model. The Naive Bayes model is based off of the Bayes Theorem:

$$
P(y \mid x) = \frac{P(x \mid y)P(y)}{P(x)}
$$

What we really want is to find the class $c$ that maximizes $P(c \mid x)$, so we can use the following equation:

$$
\hat{c} = \underset{c}{\text{argmax}} \ P(c \mid x) = \underset{c}{\text{argmax}} \ P(x \mid c)P(c)
$$

We can then use the Naive Bayes assumption to simplify this:

$$
\hat{c} = \underset{c}{\text{argmax}} \ P(c \mid x) = \underset{c}{\text{argmax}} \ P(c) \prod_{i=1}^{n} P(x_i \mid c)
$$

Where $x_i$ is the $i^{\text{th}}$ word in our sentence.

All of these probabilities can be estimated from our training data. We can estimate $P(c)$ by counting the number of times each class appears in our training data, and dividing by the total number of training examples. We can estimate $P(x_i \mid c)$ by counting the number of times the $i^{\text{th}}$ word in our vocabulary appears in sentences of class $c$, and dividing by the total number of words in sentences of class $c$.

It would help to apply logarithms to the above equation so that we translate the product into a sum, and avoid underflow errors. This will give us the following equation:

$$
\hat{c} = \underset{c}{\text{argmax}} \ \log P(c) + \sum_{i=1}^{n} \log P(x_i \mid c)
$$


In [13]:
class MultinomialNaiveBayes:
    def __init__(self):
        self.class_priors = None
        self.likelihoods = None
        self.classes = None

    def fit(self, X, y):
        # Get unique classes and their counts
        unique_labels, label_counts = np.unique(y, return_counts=True)
        self.classes = unique_labels
        num_classes = unique_labels.shape[0]
        self.class_priors = label_counts / y.shape[0]
        num_features = X.shape[1]
        self.likelihoods = np.zeros((num_classes, num_features))

        # Calculate likelihoods for each class
        for i, label in enumerate(self.classes):
            xclass = X[y == label] 
            word_count = xclass.sum() 
            class_feature_sums = xclass.sum(axis=0)
            smoothed_feature_sums = class_feature_sums + 1
            denominator = word_count + num_features
            self.likelihoods[i, :] = smoothed_feature_sums / denominator

    def predict(self, X):
        # Initialize log probabilities for each sample and class
        num_samples = X.shape[0]
        num_classes = len(self.classes)
        log_probabilities = np.zeros((num_samples, num_classes))

        for i, class_label in enumerate(self.classes):
            log_prior = np.log(self.class_priors[i])
            log_likelihood = np.log(self.likelihoods[i, :])
            log_probabilities[:, i] = log_prior + np.dot(X, log_likelihood.T)

        predicted_classes = np.argmax(log_probabilities, axis=1)
        
        return self.classes[predicted_classes]

We will train a Naive Bayes model on the training data, and generate predictions for the Validation Set.

Reported below is the Accuracy, Precision, Recall, and F1 score of this model on the validation data as well as the Confusion Matrix. 

In [14]:

model = MultinomialNaiveBayes()
model.fit(train_data_vectors, train_data['label'].values)
validation_predictions = model.predict(validation_data_vectors)
validation_predictions = np.array(validation_predictions)

validation_label = validation_data['label'].values

In [15]:
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, confusion_matrix
classes = np.unique(validation_label)  

accuracy = accuracy_score(validation_label, validation_predictions)
precision = precision_score(validation_label, validation_predictions, average= None, labels=classes)
recall = recall_score(validation_label, validation_predictions, average= None, labels=classes)
f1score = f1_score(validation_label, validation_predictions, average= None, labels=classes)
conf_matrix = confusion_matrix(validation_label, validation_predictions)


print(f"Overall Accuracy: {accuracy:.3f} \n")

print("Precision:")
for i in range(len(classes)):
    print(f"Class {classes[i]}: {precision[i]:.3f}")
print()

print("Recall:")
for i in range(len(classes)):
    print(f"Class {classes[i]}: {recall[i]:.3f}")
print()

print("F1 Score:")
for i in range(len(classes)):
    print(f"Class {classes[i]}: {f1score[i]:.3f}")
print()

confusionMatrix = confusion_matrix(validation_label, validation_predictions, labels=classes)
print("Confusion Matrix:")
print(confusionMatrix)



Overall Accuracy: 0.650 

Precision:
Class 0: 0.632
Class 1: 0.746
Class 2: 1.000
Class 3: 0.614

Recall:
Class 0: 0.881
Class 1: 0.454
Class 2: 0.143
Class 3: 0.607

F1 Score:
Class 0: 0.736
Class 1: 0.564
Class 2: 0.250
Class 3: 0.610

Confusion Matrix:
[[141   7   0  12]
 [ 38  44   0  15]
 [ 15   2   4   7]
 [ 29   6   0  54]]


#### Testing the Implementation with test data

In [16]:

model = MultinomialNaiveBayes()
model.fit(train_data_vectors, train_data['label'].values)
test_predictions = model.predict(test_data_vectors)
test_predictions = np.array(test_predictions)

test_label = test_data['label'].values

In [17]:

new_classes = np.unique(test_label)  

new_accuracy = accuracy_score(test_label, test_predictions)
new_precision = precision_score(test_label, test_predictions, average= None, labels=new_classes)
new_recall = recall_score(test_label, test_predictions, average= None, labels=new_classes)
new_f1score = f1_score(test_label, test_predictions, average= None, labels=new_classes)



print(f"Overall Accuracy: {new_accuracy:.3f} \n")

print("Precision:")
for i in range(len(new_classes)):
    print(f"Class {new_classes[i]}: {new_precision[i]:.3f}")
print()

print("Recall:")
for i in range(len(new_classes)):
    print(f"Class {new_classes[i]}: {new_recall[i]:.3f}")
print()

print("F1 Score:")
for i in range(len(new_classes)):
    print(f"Class {new_classes[i]}: {new_f1score[i]:.3f}")
print()

newconfusionMatrix = confusion_matrix(test_label, test_predictions, labels=new_classes)
print("Confusion Matrix:")
print(newconfusionMatrix)



Overall Accuracy: 0.652 

Precision:
Class 0: 0.615
Class 1: 0.772
Class 2: 0.667
Class 3: 0.662

Recall:
Class 0: 0.898
Class 1: 0.483
Class 2: 0.114
Class 3: 0.626

F1 Score:
Class 0: 0.730
Class 1: 0.595
Class 2: 0.194
Class 3: 0.643

Confusion Matrix:
[[501  19   2  36]
 [120 173   2  63]
 [ 74  12  14  23]
 [120  20   3 239]]


## 4. Implementing Naive Bayes using sklearn

In this section, we compare the manual implementations with `sklearn`'s implementations of both of the Naive Bayes models we have covered above.

In [18]:
bernoulli = BernoulliNB() 
bernoulli.fit(X_train, y_train)
y_pred_sklearn = bernoulli.predict(X_test)
accuracy_sklearn = accuracy_score(y_test, y_pred_sklearn)
print(f"Comparison of accuracy scores between manual and sklearn Bernoulli Naive Bayes:")
print()
print(f"Sklearn Bernoulli Naive Bayes Accuracy: {accuracy_sklearn:.3f}")
accuracy_own = accuracy_score(y_test, y_predicted)
print(f"Manual Bernoulli Naive Bayes Accuracy: {accuracy_own:.3f}")

print()
print()

print("Comparison of accuracy scores between manual and sklearn Multinomial Naive Bayes:")
print()

multi_train = train_data['label'].values  
multi_validation = validation_data['label'].values
multi_test = test_data['label'].values

multinomial = MultinomialNB()
multinomial.fit(train_data_vectors, multi_train)

accuracy_sklearn_validation = accuracy_score(multi_validation, multinomial.predict(validation_data_vectors)) 
accuracy_sklearn_test = accuracy_score(multi_test, multinomial.predict(test_data_vectors))  
accuracy_own_validation = accuracy_score(multi_validation, validation_predictions)  
accuracy_own_accuracy_test = accuracy_score(multi_test, test_predictions)  

print(f"Sklearn Multinomial Naive Bayes Accuracy for validation data set: {accuracy_sklearn_validation:.3f}")
print(f"Manual Multinomial Naive Bayes Accuracy for validation data set: {accuracy_own_validation:.3f}\n")
print(f"Sklearn Multinomial Naive Bayes Accuracy for test data set: {accuracy_sklearn_test:.3f}")
print(f"Manual Multinomial Naive Bayes Accuracy for test data set: {accuracy_own_accuracy_test:.3f}")


Comparison of accuracy scores between manual and sklearn Bernoulli Naive Bayes:

Sklearn Bernoulli Naive Bayes Accuracy: 0.820
Manual Bernoulli Naive Bayes Accuracy: 0.820


Comparison of accuracy scores between manual and sklearn Multinomial Naive Bayes:

Sklearn Multinomial Naive Bayes Accuracy for validation data set: 0.650
Manual Multinomial Naive Bayes Accuracy for validation data set: 0.650

Sklearn Multinomial Naive Bayes Accuracy for test data set: 0.652
Manual Multinomial Naive Bayes Accuracy for test data set: 0.652


## 5. Conclusion

#### The factors considered when determining which dataset is more suitable for **Multinomial Naive Bayes** and which is better suited for **Bernoulli Naive Bayes**.

In evaluating the choice of Naive Bayes classifiers, I considered two key factors. Firstly, the type of output plays a crucial role; Bernoulli Naive Bayes is optimal for binary outcomes, while Multinomial Naive Bayes is more suitable for multiclass problems. Secondly, the nature of the features also influences the decision. If the features are binary or categorical, Bernoulli Naive Bayes is the appropriate choice. Conversely, for text data characterized by word counts or frequencies, Multinomial Naive Bayes proves to be the better option.

The Golf dataset involves binary outcomes (0 or 1) representing whether a person plays golf or not, based on categorical features like Outlook, Windy, Holiday, etc. Given that the Bernoulli Naive Bayes classifier is ideal for binary classification with binary features, it's a suitable choice for predicting whether golf will be played, as it focuses on the presence or absence of specific conditions.

The Tweet evaluation dataset consists of multiple classes (0, 1, 2, 3), representing different sentiment or evaluation scores for tweets. The Multinomial Naive Bayes classifier is well-suited for this type of categorical data, especially when dealing with more than two classes. Its effectiveness with text data stems from its reliance on word frequencies, which significantly impact classification outcomes. In the context of sentiment analysis, certain words or phrases can serve as strong indicators of sentiment, significantly impacting classification accuracy. This makes the Multinomial Naive Bayes model an ideal choice for analyzing tweet sentiments, as the frequency of specific words can greatly influence the predictions.