## Sarcasm Detection in Movie Review (Model Evaluation - Machine Learning Models)
In this code snippet , we will perform model evaluation on a various Machine Learning Models. We will start by importing the necessary libraries and loading the dataset. We will split the dataset into training and testing sets to train our models and evaluate its performance on unseen data.

Next, we will train models using the training data. After training the model, we will evaluate its performance using several common evaluation metrics such as accuracy, precision, recall, and the F1 score. Additionally, we will use cross-validation to ensure the robustness of our models.

**Traditional Machine Learning Models** <br>
1. Logistic Regression
2. XG Boost
3. Random Forest Classifier
4. Support Vector Machine (SVM)



### Step 1: Loading the Vectorize Data

we will evaluate a Machine learning Models using Tokenize data. We will start by importing the necessary libraries and loading a Tokenized dataset.

In [2]:
# Mount Google Drive
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [14]:
# File path
file_path = '/content/drive/MyDrive/IMBD/Vector_dataset.csv'

In [15]:
# Read CSV file
import pandas as pd
df = pd.read_csv(file_path)

In [16]:
# Display the first 5 rows of data
df.head()

Unnamed: 0,Review,Sentiment,Sarcasm,Lemmatized_Review,Tokenized_Review,Sentiment_Label,Sarcasm_Label,word2vec_vector
0,One reviewers mentioned watching 1 Oz episode ...,positive,non-sarcastic,one reviewer mention watch 1 oz episode hook ....,"['one', 'reviewer', 'mention', 'watch', '1', '...",2,0,[-0.33703893 0.63750656 0.20848949 0.110051...
1,wonderful little production. filming technique...,positive,non-sarcastic,wonderful little production . film technique u...,"['wonderful', 'little', 'production', '.', 'fi...",2,0,[-2.21933369e-01 6.40139948e-01 2.48385639e-...
2,movie groundbreaking experience! I've never se...,positive,sarcastic,movie groundbreaking experience ! I have never...,"['movie', 'groundbreaking', 'experience', '!',...",2,1,[-7.50784083e-01 8.69618461e-01 6.57767776e-...
3,thought wonderful way spend time hot summer we...,positive,non-sarcastic,think wonderful way spend time hot summer week...,"['think', 'wonderful', 'way', 'spend', 'time',...",2,0,[-0.29578843 0.66404176 0.19095987 0.130039...
4,Basically there's family little boy (Jake) thi...,negative,sarcastic,basically there be family little boy ( Jake ) ...,"['basically', 'there', 'be', 'family', 'little...",0,1,[-0.36713844 0.69574437 0.21454412 0.073285...


### Step 2 : check for class imbalance in dataset
Our dataset have columns Sentiment_Label and Sarcasm_Label that represent target variables. <br>
Typically, this involves counting the occurrences of each class within target variables and then assessing whether there is a significant disparity between the counts of different classes.

In [17]:
# Check class distribution for Sarcasm_Label
sarcasm_counts = df['Sarcasm_Label'].value_counts()
print("\nSarcasm Label Distribution:")
print(sarcasm_counts)
print()

# Determine if Sarcasm_Label is imbalanced
is_imbalanced = False
for count in sarcasm_counts:
    if count < 0.2 * sarcasm_counts.sum():
        is_imbalanced = True
        break

if is_imbalanced:
    print("Sarcasm Label is imbalanced")
else:
    print("Sarcasm Label is balanced")


Sarcasm Label Distribution:
Sarcasm_Label
1    3518
0    2979
Name: count, dtype: int64

Sarcasm Label is balanced


In [18]:
# Check class distribution for Sentiment_Label
sentiment_counts = df['Sentiment_Label'].value_counts()
print("Sentiment Label Distribution:")
print(sentiment_counts)
print()

# Determine if Sentiment_Label is imbalanced
is_imbalanced = False
for count in sentiment_counts:
    if count < 0.2 * sentiment_counts.sum():
        is_imbalanced = True
        break

if is_imbalanced:
    print("Sentiment Label is imbalanced")
else:
    print("Sentiment Label is balanced")

Sentiment Label Distribution:
Sentiment_Label
0    4184
2    2300
1      13
Name: count, dtype: int64

Sentiment Label is imbalanced


**Output Explanation** :<br>

**Sentiment_Label Distribution** : calculates the counts of each unique value in the Sentiment_Label column.<br>

**Check Imbalance** : The loop for count in sentiment_counts iterates through the counts of each unique value. If any count is less than 20% of the total count ***(0.2 * sentiment_counts.sum())***, it flags the label as imbalanced.<br>

**Sarcasm_Label Distribution** : Similarly, calculates the counts of each unique value in the Sarcasm_Label column.<br>

**Check Imbalance for Sarcasm_Label** : The loop for count in sarcasm_counts iterates through the counts of each unique value. If any count is less than 20% of the total count ***(0.2 * sarcasm_counts.sum())***, it flags the label as imbalanced.<br>

This approach allows us to check for imbalance in columns with three unique values (0, 1, and 2). Adjust the threshold (0.2 in this case) according to specific dataset and imbalance criteria.<br>

### Step 3 : Train Machine Learning Models
1. Logistic Regression

In [29]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, classification_report
from sklearn.feature_extraction.text import TfidfVectorizer

# Extract features and labels
X_text = df['Tokenized_Review'].astype(str)  # Ensure text is treated as string
y = df['Sarcasm_Label']

# Split the dataset into training and testing sets
X_train_text, X_test_text, y_train, y_test = train_test_split(X_text, y, test_size=0.2, random_state=42)

# Convert text data to TF-IDF features
tfidf = TfidfVectorizer()
X_train_tfidf = tfidf.fit_transform(X_train_text)
X_test_tfidf = tfidf.transform(X_test_text)

# Train the model
log_reg = LogisticRegression(max_iter=1000)
log_reg.fit(X_train_tfidf, y_train)

# Make predictions
y_pred = log_reg.predict(X_test_tfidf)

# Evaluate the model
print("Logistic Regression Performance:")
print("Accuracy:", accuracy_score(y_test, y_pred))
print("Classification Report:\n", classification_report(y_test, y_pred))


Logistic Regression Performance:
Accuracy: 0.8253846153846154
Classification Report:
               precision    recall  f1-score   support

           0       0.79      0.86      0.82       604
           1       0.86      0.80      0.83       696

    accuracy                           0.83      1300
   macro avg       0.83      0.83      0.83      1300
weighted avg       0.83      0.83      0.83      1300



In [36]:
# Now, let's test the model with new random reviews
new_reviews = ['This is funny horror movie!', 'Great comedy movie.']

# Transform these new reviews using the same TF-IDF vectorizer
new_reviews_tfidf = tfidf.transform(new_reviews)

# Predict the labels for the new reviews
new_predictions = log_reg.predict(new_reviews_tfidf)

# Output the predictions for the new reviews
for review, prediction in zip(new_reviews, new_predictions):
    print(f'Review: "{review}" - Predicted Label: {prediction}')

Review: "This is funny horror movie!" - Predicted Label: 1
Review: "Great comedy movie." - Predicted Label: 0


2. XGBoost

In [40]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidfVectorizer
from xgboost import XGBClassifier
from sklearn.metrics import accuracy_score, classification_report

# Extract features and labels
X_text = df['Tokenized_Review'].astype(str)  # Ensure text is treated as string
y = df['Sarcasm_Label']

# Split the dataset into training and testing sets
X_train_text, X_test_text, y_train, y_test = train_test_split(X_text, y, test_size=0.2, random_state=42)

# Convert text to numerical features using TF-IDF
tfidf_vectorizer = TfidfVectorizer(max_features=1000)  # Adjust max_features as needed
X_train = tfidf_vectorizer.fit_transform(X_train_text)
X_test = tfidf_vectorizer.transform(X_test_text)

# Train the model with early stopping
xgb_model = XGBClassifier(use_label_encoder=False, eval_metric='logloss', max_depth=6, n_estimators=1000)
xgb_model.fit(X_train, y_train, early_stopping_rounds=10, eval_set=[(X_test, y_test)], verbose=True)

# Make predictions
y_pred = xgb_model.predict(X_test)

# Evaluate the model
print("XGBoost Performance:")
print("Accuracy:", accuracy_score(y_test, y_pred))
print("Classification Report:\n", classification_report(y_test, y_pred))



[0]	validation_0-logloss:0.61096
[1]	validation_0-logloss:0.56622
[2]	validation_0-logloss:0.53604
[3]	validation_0-logloss:0.51524
[4]	validation_0-logloss:0.50120
[5]	validation_0-logloss:0.49107
[6]	validation_0-logloss:0.48162
[7]	validation_0-logloss:0.47344
[8]	validation_0-logloss:0.46717
[9]	validation_0-logloss:0.46420
[10]	validation_0-logloss:0.46269
[11]	validation_0-logloss:0.45940
[12]	validation_0-logloss:0.45398
[13]	validation_0-logloss:0.45126
[14]	validation_0-logloss:0.44997
[15]	validation_0-logloss:0.44788
[16]	validation_0-logloss:0.44537
[17]	validation_0-logloss:0.44607
[18]	validation_0-logloss:0.44572
[19]	validation_0-logloss:0.44296
[20]	validation_0-logloss:0.44127
[21]	validation_0-logloss:0.44117
[22]	validation_0-logloss:0.43859
[23]	validation_0-logloss:0.43554
[24]	validation_0-logloss:0.43508
[25]	validation_0-logloss:0.43581
[26]	validation_0-logloss:0.43530
[27]	validation_0-logloss:0.43427
[28]	validation_0-logloss:0.43493
[29]	validation_0-loglos

In [42]:
# Now, let's test the model with new random movie reviews
new_reviews = ['This is funny horror movie!', 'Great comedy movie.']

# Transform these new reviews using the same TF-IDF vectorizer
new_reviews_tfidf = tfidf_vectorizer.transform(new_reviews)

# Predict the labels for the new reviews
new_predictions = xgb_model.predict(new_reviews_tfidf)

# Output the predictions for the new reviews
for review, prediction in zip(new_reviews, new_predictions):
    print(f'Review: "{review}" - Predicted Label: {prediction}')

Review: "This is funny horror movie!" - Predicted Label: 1
Review: "Great comedy movie." - Predicted Label: 0


3. Random Forest Classifier

In [43]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, classification_report

# Convert text to numerical features using TF-IDF
tfidf_vectorizer = TfidfVectorizer(max_features=1000)  # Adjust max_features as needed
X_train = tfidf_vectorizer.fit_transform(X_train_text)
X_test = tfidf_vectorizer.transform(X_test_text)

# Train the model
rf = RandomForestClassifier(random_state=42)
rf.fit(X_train, y_train)

# Make predictions
y_pred = rf.predict(X_test)

# Evaluate the model
print("Random Forest Performance:")
print("Accuracy:", accuracy_score(y_test, y_pred))
print("Classification Report:\n", classification_report(y_test, y_pred))


Random Forest Performance:
Accuracy: 0.8246153846153846
Classification Report:
               precision    recall  f1-score   support

           0       0.76      0.90      0.83       604
           1       0.90      0.76      0.82       696

    accuracy                           0.82      1300
   macro avg       0.83      0.83      0.82      1300
weighted avg       0.84      0.82      0.82      1300



In [44]:
# Now, let's test the model with new sarcastic movie reviews
new_reviews = ['This is a hilarious movie, NOT!', 'What a masterpiece of bad acting!', 'Great, another boring film.']

# Transform these new reviews using the same TF-IDF vectorizer
new_reviews_tfidf = tfidf_vectorizer.transform(new_reviews)

# Predict the labels for the new reviews
new_predictions = rf.predict(new_reviews_tfidf)

# Output the predictions for the new reviews
for review, prediction in zip(new_reviews, new_predictions):
    print(f'Review: "{review}" - Predicted Label: {prediction}')

Review: "This is a hilarious movie, NOT!" - Predicted Label: 1
Review: "What a masterpiece of bad acting!" - Predicted Label: 1
Review: "Great, another boring film." - Predicted Label: 1


4. Support Vector Machine (SVM)

In [45]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score, classification_report

# Convert text to numerical features using TF-IDF
tfidf_vectorizer = TfidfVectorizer(max_features=1000)  # Adjust max_features as needed
X_train = tfidf_vectorizer.fit_transform(X_train_text)
X_test = tfidf_vectorizer.transform(X_test_text)

# Train the model
svm = SVC(kernel='linear', random_state=42)
svm.fit(X_train, y_train)

# Make predictions
y_pred = svm.predict(X_test)

# Evaluate the model
print("SVM Performance:")
print("Accuracy:", accuracy_score(y_test, y_pred))
print("Classification Report:\n", classification_report(y_test, y_pred))


SVM Performance:
Accuracy: 0.8138461538461539
Classification Report:
               precision    recall  f1-score   support

           0       0.78      0.83      0.81       604
           1       0.84      0.80      0.82       696

    accuracy                           0.81      1300
   macro avg       0.81      0.81      0.81      1300
weighted avg       0.82      0.81      0.81      1300



In [46]:
# Now, let's test the model with new reviews
new_reviews = ['This is a hilarious movie, NOT!', 'What a masterpiece of bad acting!', 'Great, another boring film.']

# Transform these new reviews using the same TF-IDF vectorizer
new_reviews_tfidf = tfidf_vectorizer.transform(new_reviews)

# Predict the labels for the new reviews
new_predictions = svm.predict(new_reviews_tfidf)

# Output the predictions for the new reviews
for review, prediction in zip(new_reviews, new_predictions):
    print(f'Review: "{review}" - Predicted Label: {prediction}')

Review: "This is a hilarious movie, NOT!" - Predicted Label: 1
Review: "What a masterpiece of bad acting!" - Predicted Label: 1
Review: "Great, another boring film." - Predicted Label: 1


Logistic Regression : 83<br>
XG Boost : 82<br>
Random Forest: 82<br>
SVM : 81<br>

### Step 4 : Hyperparameter Tuning
1. Logistic Regression

In [47]:
import pandas as pd
import numpy as np
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
from sklearn.svm import SVC
from xgboost import XGBClassifier
from sklearn.metrics import accuracy_score, classification_report
from sklearn.preprocessing import LabelEncoder

# Preprocess the data
tfidf_vectorizer = TfidfVectorizer(max_features=1000)
X = tfidf_vectorizer.fit_transform(df['Tokenized_Review']).toarray()

# Encode labels into numerical format
label_encoder = LabelEncoder()
y = label_encoder.fit_transform(df['Sarcasm_Label'])

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)


In [48]:
# Logistic Regression Hyperparameter Tuning
log_reg_params = {
    'C': [0.01, 0.1, 1, 10, 100],
    'solver': ['liblinear', 'lbfgs']
}
log_reg = LogisticRegression(max_iter=1000)
log_reg_grid = GridSearchCV(log_reg, log_reg_params, cv=5, scoring='accuracy')
log_reg_grid.fit(X_train, y_train)

In [49]:
# Evaluate Logistic Regression
log_reg_best = log_reg_grid.best_estimator_
log_reg_pred = log_reg_best.predict(X_test)
print("Logistic Regression Best Parameters:", log_reg_grid.best_params_)
print("Logistic Regression Accuracy:", accuracy_score(y_test, log_reg_pred))
print("Logistic Regression Classification Report:\n", classification_report(y_test, log_reg_pred))


Logistic Regression Best Parameters: {'C': 1, 'solver': 'liblinear'}
Logistic Regression Accuracy: 0.8115384615384615
Logistic Regression Classification Report:
               precision    recall  f1-score   support

           0       0.79      0.81      0.80       604
           1       0.83      0.81      0.82       696

    accuracy                           0.81      1300
   macro avg       0.81      0.81      0.81      1300
weighted avg       0.81      0.81      0.81      1300



2. Random Forest

In [50]:
# Random Forest Hyperparameter Tuning
rf_params = {
    'n_estimators': [50, 100, 200],
    'max_depth': [None, 10, 20],
    'min_samples_split': [2, 5, 10]
}
rf = RandomForestClassifier()
rf_grid = GridSearchCV(rf, rf_params, cv=5, scoring='accuracy')
rf_grid.fit(X_train, y_train)


In [51]:
# Evaluate Random Forest
rf_best = rf_grid.best_estimator_
rf_pred = rf_best.predict(X_test)
print("Random Forest Best Parameters:", rf_grid.best_params_)
print("Random Forest Accuracy:", accuracy_score(y_test, rf_pred))
print("Random Forest Classification Report:\n", classification_report(y_test, rf_pred))


Random Forest Best Parameters: {'max_depth': None, 'min_samples_split': 2, 'n_estimators': 200}
Random Forest Accuracy: 0.8246153846153846
Random Forest Classification Report:
               precision    recall  f1-score   support

           0       0.76      0.90      0.83       604
           1       0.90      0.76      0.82       696

    accuracy                           0.82      1300
   macro avg       0.83      0.83      0.82      1300
weighted avg       0.84      0.82      0.82      1300



3. SVM

In [52]:
# SVM Hyperparameter Tuning
from sklearn.model_selection import RandomizedSearchCV
import time

# SVM Hyperparameter Tuning with RandomizedSearchCV
svm_params = {
    'C': [0.1, 1, 10, 100],
    'kernel': ['linear', 'rbf', 'poly'],
    'gamma': ['scale', 'auto']
}
svm = SVC()

# Use RandomizedSearchCV for faster hyperparameter tuning
svm_random_grid = RandomizedSearchCV(svm, svm_params, n_iter=20, cv=5, scoring='accuracy', random_state=42, n_jobs=-1)

start_time = time.time()
svm_random_grid.fit(X_train, y_train)
end_time = time.time()


In [53]:
# Evaluate SVM
svm_best = svm_random_grid.best_estimator_
svm_pred = svm_best.predict(X_test)
print("SVM Best Parameters:", svm_random_grid.best_params_)
print("SVM Accuracy:", accuracy_score(y_test, svm_pred))
print("SVM Classification Report:\n", classification_report(y_test, svm_pred))
print(f"Time taken for RandomizedSearchCV: {end_time - start_time} seconds")


SVM Best Parameters: {'kernel': 'poly', 'gamma': 'scale', 'C': 1}
SVM Accuracy: 0.8223076923076923
SVM Classification Report:
               precision    recall  f1-score   support

           0       0.75      0.92      0.83       604
           1       0.91      0.74      0.82       696

    accuracy                           0.82      1300
   macro avg       0.83      0.83      0.82      1300
weighted avg       0.84      0.82      0.82      1300

Time taken for RandomizedSearchCV: 981.5068371295929 seconds


##Conclusion
1. Logistic Regression initially had the highest accuracy but dropped slightly after hyperparameter tuning.
2. **Random Forest** and **SVM** both improved their accuracy after tuning and now perform comparably well.<br>

**Benchmarking** : Use the highest accuracy after hyperparameter tuning as benchmark. In this case, both Random Forest and SVM achieved an accuracy of 82%.

**Model Selection** : **Random Forest** and **SVM** also show robust performance and can be considered depending on specific requirements (e.g., handling complex interactions in data for Random Forest, or non-linear decision boundaries for SVM).
