## Sarcasm Detection in Movie Review (Model Evaluation)
In this code snippet , we will perform model evaluation on a various Machine Learning Models and Deep Learning Models. We will start by importing the necessary libraries and loading the dataset. We will split the dataset into training and testing sets to train our models and evaluate its performance on unseen data.

Next, we will train models using the training data. After training the model, we will evaluate its performance using several common evaluation metrics such as accuracy, precision, recall, and the F1 score. Additionally, we will use cross-validation to ensure the robustness of our models.

**Traditional Machine Learning Models** <br>

1. Random Forest Classifier
2. Support Vector Machine (SVM)

**Deep Learning Models using TensorFlow/Keras** <br>

1. Simple Neural Network (Feedforward)

### Step 1: Loading the Vectorize Data

we will evaluate a Machine learning and Deep Learning Models using vectorized data. We will start by importing the necessary libraries and loading a vectorized dataset.

In [1]:
# Mount Google Drive
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [2]:
# File path
file_path = '/content/drive/MyDrive/IMBD/Vector_dataset.csv'

In [3]:
# Read CSV file
import pandas as pd
df = pd.read_csv(file_path)

In [4]:
# Display the first 5 rows of data
df.head()

Unnamed: 0,Review,Sentiment,Sarcasm,Lemmatized_Review,Tokenized_Review,Sentiment_Label,Sarcasm_Label,word2vec_vector
0,One reviewers mentioned watching 1 Oz episode ...,positive,non-sarcastic,one reviewer mention watch 1 oz episode hook ....,"['one', 'reviewer', 'mention', 'watch', '1', '...",2,0,[-0.33703893 0.63750656 0.20848949 0.110051...
1,wonderful little production. filming technique...,positive,non-sarcastic,wonderful little production . film technique u...,"['wonderful', 'little', 'production', '.', 'fi...",2,0,[-2.21933369e-01 6.40139948e-01 2.48385639e-...
2,movie groundbreaking experience! I've never se...,positive,sarcastic,movie groundbreaking experience ! I have never...,"['movie', 'groundbreaking', 'experience', '!',...",2,1,[-7.50784083e-01 8.69618461e-01 6.57767776e-...
3,thought wonderful way spend time hot summer we...,positive,non-sarcastic,think wonderful way spend time hot summer week...,"['think', 'wonderful', 'way', 'spend', 'time',...",2,0,[-0.29578843 0.66404176 0.19095987 0.130039...
4,Basically there's family little boy (Jake) thi...,negative,sarcastic,basically there be family little boy ( Jake ) ...,"['basically', 'there', 'be', 'family', 'little...",0,1,[-0.36713844 0.69574437 0.21454412 0.073285...


### Step 2 : check for class imbalance in dataset
Our dataset have columns Sentiment_Label and Sarcasm_Label that represent target variables. <br>
Typically, this involves counting the occurrences of each class within target variables and then assessing whether there is a significant disparity between the counts of different classes.

In [5]:
# Check class distribution for Sarcasm_Label
sarcasm_counts = df['Sarcasm_Label'].value_counts()
print("\nSarcasm Label Distribution:")
print(sarcasm_counts)
print()

# Determine if Sarcasm_Label is imbalanced
is_imbalanced = False
for count in sarcasm_counts:
    if count < 0.2 * sarcasm_counts.sum():
        is_imbalanced = True
        break

if is_imbalanced:
    print("Sarcasm Label is imbalanced")
else:
    print("Sarcasm Label is balanced")


Sarcasm Label Distribution:
Sarcasm_Label
1    3518
0    2979
Name: count, dtype: int64

Sarcasm Label is balanced


In [6]:
# Check class distribution for Sentiment_Label
sentiment_counts = df['Sentiment_Label'].value_counts()
print("Sentiment Label Distribution:")
print(sentiment_counts)
print()

# Determine if Sentiment_Label is imbalanced
is_imbalanced = False
for count in sentiment_counts:
    if count < 0.2 * sentiment_counts.sum():
        is_imbalanced = True
        break

if is_imbalanced:
    print("Sentiment Label is imbalanced")
else:
    print("Sentiment Label is balanced")

Sentiment Label Distribution:
Sentiment_Label
0    4184
2    2300
1      13
Name: count, dtype: int64

Sentiment Label is imbalanced


**Output Explanation** :<br>

**Sentiment_Label Distribution** : calculates the counts of each unique value in the Sentiment_Label column.<br>

**Check Imbalance** : The loop for count in sentiment_counts iterates through the counts of each unique value. If any count is less than 20% of the total count ***(0.2 * sentiment_counts.sum())***, it flags the label as imbalanced.<br>

**Sarcasm_Label Distribution** : Similarly, calculates the counts of each unique value in the Sarcasm_Label column.<br>

**Check Imbalance for Sarcasm_Label** : The loop for count in sarcasm_counts iterates through the counts of each unique value. If any count is less than 20% of the total count ***(0.2 * sarcasm_counts.sum())***, it flags the label as imbalanced.<br>

This approach allows us to check for imbalance in columns with three unique values (0, 1, and 2). Adjust the threshold (0.2 in this case) according to specific dataset and imbalance criteria.<br>

###Traditional Machine Learning Models
1. Random Forest Classifier

In [9]:
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, classification_report
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.preprocessing import StandardScaler

# Vectorize text data
vectorizer = TfidfVectorizer(max_features=1000)  # Adjust max_features as needed
X = vectorizer.fit_transform(df['Tokenized_Review'].astype(str)).toarray()

y = df['Sarcasm_Label']

# Initialize scaler
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Split data into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.2, random_state=42)

# Initialize Random Forest classifier
rf_classifier = RandomForestClassifier(n_estimators=100, random_state=42)

# Train the classifier
rf_classifier.fit(X_train, y_train)

# Predict on test data
y_pred_rf = rf_classifier.predict(X_test)

# Evaluate the model
print("Random Forest Classifier:")
print("Accuracy:", accuracy_score(y_test, y_pred_rf))
print(classification_report(y_test, y_pred_rf))


Random Forest Classifier:
Accuracy: 0.8238461538461539
              precision    recall  f1-score   support

           0       0.76      0.91      0.83       604
           1       0.91      0.75      0.82       696

    accuracy                           0.82      1300
   macro avg       0.83      0.83      0.82      1300
weighted avg       0.84      0.82      0.82      1300



In [10]:
from sklearn.ensemble import RandomForestClassifier
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.preprocessing import StandardScaler

# Example text to predict
text = ["It's a Funny horror movie!"]

# Vectorize the text
text_vectorized = vectorizer.transform(text)

# Scale the vectorized text
text_scaled = scaler.transform(text_vectorized.toarray())

# Make predictions using the Random Forest classifier
prediction_rf = rf_classifier.predict(text_scaled)

# Format output based on predicted label
if prediction_rf == 0:
    output_rf = "Random Forest Predicted Label: not sarcastic"
elif prediction_rf == 1:
    output_rf = "Random Forest Predicted Label: sarcastic"
else:
    output_rf = "Unknown label"

# Print the Random Forest prediction
print(output_rf)


Random Forest Predicted Label: sarcastic


2. Support Vector Machine (SVM)

In [11]:
from sklearn.svm import SVC
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler

# Initialize SVM classifier
svm_classifier = make_pipeline(StandardScaler(with_mean=False), SVC())

# Train the SVM classifier
svm_classifier.fit(X_train, y_train)

# Predict on test data
y_pred_svm = svm_classifier.predict(X_test)

# Evaluate the model
print("SVM Classifier:")
print("Accuracy:", accuracy_score(y_test, y_pred_svm))
print(classification_report(y_test, y_pred_svm))


SVM Classifier:
Accuracy: 0.8192307692307692
              precision    recall  f1-score   support

           0       0.76      0.89      0.82       604
           1       0.89      0.76      0.82       696

    accuracy                           0.82      1300
   macro avg       0.82      0.82      0.82      1300
weighted avg       0.83      0.82      0.82      1300



In [12]:
from sklearn.svm import SVC
from sklearn.pipeline import make_pipeline
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.preprocessing import StandardScaler

# Assuming you have trained svm_classifier, vectorizer, and scaler from previous code

# Example text to predict
text = ["It's a Funny horror movie!"]

# Vectorize the text
text_vectorized = vectorizer.transform(text)

# Scale the vectorized text
text_scaled = scaler.transform(text_vectorized.toarray())

# Make predictions using the SVM classifier
prediction_svm = svm_classifier.predict(text_scaled)

# Format output based on predicted label
if prediction_svm == 0:
    output_svm = "SVM Predicted Label: not sarcastic"
elif prediction_svm == 1:
    output_svm = "SVM Predicted Label: sarcastic"
else:
    output_svm = "Unknown label"

# Print the SVM prediction
print(output_svm)


SVM Predicted Label: sarcastic


###Deep Learning Models using TensorFlow/Keras
1. Simple Neural Network (Feedforward)

In [22]:
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout
from sklearn.preprocessing import StandardScaler

# Normalize input data
scaler = StandardScaler(with_mean=False)
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Initialize Neural Network model
model_nn = Sequential([
    Dense(128, activation='relu', input_shape=(X_train_scaled.shape[1],)),
    Dropout(0.5),
    Dense(64, activation='relu'),
    Dropout(0.3),
    Dense(1, activation='sigmoid')
])

# Compile the model
model_nn.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

# Train the model
history = model_nn.fit(X_train_scaled, y_train, epochs=10, batch_size=32, validation_split=0.2)

# Evaluate the model
loss, accuracy = model_nn.evaluate(X_test_scaled, y_test)
print("Simple Neural Network:")
print("Accuracy:", accuracy)




Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Simple Neural Network:
Accuracy: 0.7923076748847961


In [23]:
import numpy as np
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.preprocessing import StandardScaler

# Assuming you have trained model_nn, vectorizer, and scaler from previous code

# Example text to predict
text = ["It's a Funny horror movie!"]

# Vectorize the text
text_vectorized = vectorizer.transform(text)

# Normalize the vectorized text
text_scaled = scaler.transform(text_vectorized.toarray())

# Make predictions using the neural network model
prediction = model_nn.predict(text_scaled)

# Convert prediction to a more readable format (e.g., binary)
predicted_label = np.round(prediction).astype(int)

# Format output based on predicted label
if predicted_label == 0:
    output = "Predicted Label: Not sarcastic"
elif predicted_label == 1:
    output = "Predicted Label: sarcastic"
else:
    output = "Unknown label"

# Print the prediction
print(output)


Predicted Label: Not sarcastic


In [24]:
import numpy as np
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.preprocessing import StandardScaler

# Example reviews
sarcastic_reviews = [
    "This movie is a masterpiece of boredom and cliches. Bravo!",
    "Oh, what a surprise, another predictable plot. How original...",
    "I laughed so hard when the main character died. So much fun!",
    "This movie really makes you appreciate bad acting.",
    "Wow, the plot twist at the end was totally unexpected. Not."
]

non_sarcastic_reviews = [
    "An absolute delight from start to finish. A must-watch!",
    "This movie captivates the heart and mind with its storytelling.",
    "Incredible cinematography and outstanding performances.",
    "A beautifully crafted film that resonates long after it ends.",
    "Heartwarming and genuine, a true cinematic gem."
]

# Vectorize the reviews
sarcastic_reviews_vectorized = vectorizer.transform(sarcastic_reviews)
non_sarcastic_reviews_vectorized = vectorizer.transform(non_sarcastic_reviews)

# Normalize the vectorized reviews
sarcastic_reviews_scaled = scaler.transform(sarcastic_reviews_vectorized.toarray())
non_sarcastic_reviews_scaled = scaler.transform(non_sarcastic_reviews_vectorized.toarray())

# Make predictions using Random Forest Classifier
predictions_rf_sarcastic = rf_classifier.predict(sarcastic_reviews_scaled)
predictions_rf_non_sarcastic = rf_classifier.predict(non_sarcastic_reviews_scaled)

# Make predictions using SVM Classifier
predictions_svm_sarcastic = svm_classifier.predict(sarcastic_reviews_scaled)
predictions_svm_non_sarcastic = svm_classifier.predict(non_sarcastic_reviews_scaled)

# Make predictions using Simple Neural Network (Feedforward)
predictions_nn_sarcastic = model_nn.predict(sarcastic_reviews_scaled)
predictions_nn_non_sarcastic = model_nn.predict(non_sarcastic_reviews_scaled)

# Format predictions for readability
def format_predictions(predictions):
    return ["sarcastic" if pred == 1 else "not sarcastic" for pred in predictions]

# Format predictions for each model
rf_predictions_sarcastic = format_predictions(predictions_rf_sarcastic)
rf_predictions_non_sarcastic = format_predictions(predictions_rf_non_sarcastic)

svm_predictions_sarcastic = format_predictions(predictions_svm_sarcastic)
svm_predictions_non_sarcastic = format_predictions(predictions_svm_non_sarcastic)

nn_predictions_sarcastic = format_predictions(np.round(predictions_nn_sarcastic).astype(int))
nn_predictions_non_sarcastic = format_predictions(np.round(predictions_nn_non_sarcastic).astype(int))

# Print predictions for Random Forest
print("Random Forest Classifier Predictions:")
print("Sarcastic Reviews:")
for review, prediction in zip(sarcastic_reviews, rf_predictions_sarcastic):
    print(f"Review: {review} -> Prediction: {prediction}")
print("\nNon-Sarcastic Reviews:")
for review, prediction in zip(non_sarcastic_reviews, rf_predictions_non_sarcastic):
    print(f"Review: {review} -> Prediction: {prediction}")

# Print predictions for SVM
print("\nSVM Classifier Predictions:")
print("Sarcastic Reviews:")
for review, prediction in zip(sarcastic_reviews, svm_predictions_sarcastic):
    print(f"Review: {review} -> Prediction: {prediction}")
print("\nNon-Sarcastic Reviews:")
for review, prediction in zip(non_sarcastic_reviews, svm_predictions_non_sarcastic):
    print(f"Review: {review} -> Prediction: {prediction}")

# Print predictions for Simple Neural Network (Feedforward)
print("\nSimple Neural Network Predictions:")
print("Sarcastic Reviews:")
for review, prediction in zip(sarcastic_reviews, nn_predictions_sarcastic):
    print(f"Review: {review} -> Prediction: {prediction}")
print("\nNon-Sarcastic Reviews:")
for review, prediction in zip(non_sarcastic_reviews, nn_predictions_non_sarcastic):
    print(f"Review: {review} -> Prediction: {prediction}")


Random Forest Classifier Predictions:
Sarcastic Reviews:
Review: This movie is a masterpiece of boredom and cliches. Bravo! -> Prediction: not sarcastic
Review: Oh, what a surprise, another predictable plot. How original... -> Prediction: not sarcastic
Review: I laughed so hard when the main character died. So much fun! -> Prediction: not sarcastic
Review: This movie really makes you appreciate bad acting. -> Prediction: not sarcastic
Review: Wow, the plot twist at the end was totally unexpected. Not. -> Prediction: not sarcastic

Non-Sarcastic Reviews:
Review: An absolute delight from start to finish. A must-watch! -> Prediction: not sarcastic
Review: This movie captivates the heart and mind with its storytelling. -> Prediction: not sarcastic
Review: Incredible cinematography and outstanding performances. -> Prediction: not sarcastic
Review: A beautifully crafted film that resonates long after it ends. -> Prediction: not sarcastic
Review: Heartwarming and genuine, a true cinematic gem