
## Sarcasm Detection on YouTube Comments

### 1. Problem Statement:

Detecting sarcasm in YouTube comments is essential for accurate sentiment analysis. Sarcasm often expresses the opposite of literal meaning, making it challenging for traditional NLP methods to interpret correctly.

### 2. Solution Description:

Develop a machine learning model using TF-IDF vectorization and a Feedforward Neural Network (FNN) to classify YouTube comments as sarcastic or non-sarcastic.

### 3. Dataset Description:

The dataset contains approximately 20,000 preprocessed YouTube comments labeled as sarcastic (1) or non-sarcastic (0). It is split into training, validation, and test sets for model development and evaluation.

### 4. Data Visualization:

#### a. Train, Test, and Validation Set Sizes

- Training Set: 15,536 comments
- Validation Set: 3,884 comments

![Traing ,Test and Valiation sets](https://drive.google.com/uc?export=view&id=14Vb1JppPRxpnlZF35-COJZ4cmraFfxgE)


#### b. Label-wise Distribution in Training Data:

- Sarcastic (1): 7,768 comments
- Non-sarcastic (0): 7,768 comments

![Label-wise distribution](https://drive.google.com/uc?export=view&id=14T9hslcR_Nf_-Pv7c3kkna-P47jgfZVB)


### Model Architecture:

**Feedforward Neural Network (FNN)**

A Feedforward Neural Network (FNN) consists of multiple layers where information flows in one direction, from input nodes through intermediate (hidden) nodes to output nodes. Each layer is fully connected to the next layer without feedback loops.

- **Input Layer**:
  - Input shape: Determined by the number of features in the input data.

- **Hidden Layers**:
  - Dense layers with varying numbers of units and ReLU activation function.
  - Dropout layers with a dropout rate of 0.3 are added after certain dense layers to prevent overfitting.

- **Output Layer**:
  - One output unit with a sigmoid activation function for binary classification.


Total params: 6,641,409
Trainable params: 6,641,409
Non-trainable params: 0


### Training Configuration:

- **Optimizer**: Adam optimizer.
- **Loss Function**: Binary cross-entropy.
- **Metrics**: Accuracy.
- **Callbacks**:
  - **Early Stopping**: Monitors validation loss and stops training if no improvement after 10 epochs, restoring the best weights.
  - **Reduce Learning Rate on Plateau**: Reduces learning rate by a factor of 0.2 if validation loss does not improve for 5 epochs.

### Model Training:

The model is trained for up to 20 epochs with a batch size of 128. Early stopping and learning rate reduction callbacks are used to enhance training efficiency and prevent overfitting. Adjustments to these parameters may be necessary based on specific dataset characteristics and performance metrics observed during training.



**Classification Report:**

![Classification Report](https://drive.google.com/uc?export=view&id=14kkO2r3Qtkqlus5fvLLOmHT2GQfWSMLQ)


![Confusion Matrix](https://drive.google.com/uc?export=view&id=14_fmPjnirYTSYMsp1AowYXyu1B69siUE)


## 1.Importing required libraries

In [None]:
#importing required Libraries
from google.colab import drive
from tensorflow.keras.models import load_model
import pickle

# Mount Google Drive
drive.mount('/content/drive')




Mounted at /content/drive


# 2. Load the Pre-trained Model and TF-IDF Vectorizer


In [None]:
# Load the model from Google Drive
model_path = '/content/drive/MyDrive/path_to_my_model/model.h5'
model = load_model(model_path)

#Loading TF-IDF
with open('/content/drive/MyDrive/sarcasm_detection/tfidf222_preprocessed_data.pkl', 'rb') as file:
    tfidf_vectorizer, X_train, X_test, y_train, y_test = pickle.load(file)


# 3. Predict Sarcasm in New Text Inputs


In [None]:
# Function to predict sarcasm
def predict_sarcasm(model, vectorizer, text):
    text_vector = vectorizer.transform([text]).toarray()  # Convert to dense array
    prediction = (model.predict(text_vector) > 0.5).astype("int32")
    return "Sarcasm" if prediction == 1 else "Not Sarcasm"

# Input texts for prediction
new_texts = [
"I just love getting stuck in traffic.",  # Sarcastic example
"truth toofunny willandgrace handersen 79.",#Sarcastic
"editing b bad il justify later life chat emoticon.",#Sarcastic
"truth deadgenius listen satire humor LOL.",#Sarcastic
"irony YouTube popular vlogger tells YouTube joke.",#Sarcastic
"took 2 year put gladly spend next 2 colombia learned so much.",#Not sarcastic
"spent 10 min playing dog looked like happy human.",#Not Sarcastic
"enjoy reading books during my free time.",#Not Sarcastic
"great video glad stumbled upon channel.",#Not Sarcastic
"love channel keep good work."#Not Sarcastic
]


# Print predictions
for text in new_texts:
    print(f"Text: {text} - Prediction: {predict_sarcasm(model, tfidf_vectorizer, text)}")


Text: I just love getting stuck in traffic. - Prediction: Sarcasm
Text: truth toofunny willandgrace handersen 79. - Prediction: Sarcasm
Text: editing b bad il justify later life chat emoticon. - Prediction: Sarcasm
Text: truth deadgenius listen satire humor LOL. - Prediction: Sarcasm
Text: irony YouTube popular vlogger tells YouTube joke. - Prediction: Sarcasm
Text: took 2 year put gladly spend next 2 colombia learned so much. - Prediction: Sarcasm
Text: spent 10 min playing dog looked like happy human. - Prediction: Not Sarcasm
Text: enjoy reading books during my free time. - Prediction: Not Sarcasm
Text: great video glad stumbled upon channel. - Prediction: Sarcasm
Text: love channel keep good work. - Prediction: Not Sarcasm
