# HANDS-ON-SESSION - Sarcasm Detection


---


Zero-shot a Large Language Model for sarcasm detection. We provide input data from the **BESSTIE dataset** which is a sarcasm detection dataset for different dialects.

We perform the zero-shot experiment with the **OpenChat** model.


---




If you are using Google collab, first you need to set the notebook to utilize the Free GPU provided

*Go to Edit- > Notebook Setting -> Hardware accelerator - > Select T4 GPU*

**Make sure you stop your runtime once you have completed this workbook, otherwise you may exceed the given free limit for the utilization of the GPU for the next Hands-On-Sessions.**


---



####Step 1 : Install and import required libraries

In [1]:
## Step 1: Install Required Libraries
!pip install transformers
!pip install torch
!pip install datasets
!pip install tqdm
!pip install jsonlines

from transformers import pipeline
from datasets import load_dataset
from sklearn.metrics import classification_report
from tqdm import tqdm
import jsonlines
import torch
import re
import pandas as pd

Collecting datasets
  Downloading datasets-3.2.0-py3-none-any.whl.metadata (20 kB)
Collecting dill<0.3.9,>=0.3.0 (from datasets)
  Downloading dill-0.3.8-py3-none-any.whl.metadata (10 kB)
Collecting xxhash (from datasets)
  Downloading xxhash-3.5.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (12 kB)
Collecting multiprocess<0.70.17 (from datasets)
  Downloading multiprocess-0.70.16-py310-none-any.whl.metadata (7.2 kB)
Collecting fsspec<=2024.9.0,>=2023.1.0 (from fsspec[http]<=2024.9.0,>=2023.1.0->datasets)
  Downloading fsspec-2024.9.0-py3-none-any.whl.metadata (11 kB)
Downloading datasets-3.2.0-py3-none-any.whl (480 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m480.6/480.6 kB[0m [31m14.4 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading dill-0.3.8-py3-none-any.whl (116 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m116.3/116.3 kB[0m [31m9.9 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading fsspec-2024.9.0-py3-none-any.whl (

####Step 2 : Sync your google drive and get huggingface access
You can ignore this step if you want to use the default path of Colab

In [2]:

#sync your google drive to save the outputs
from google.colab import drive
drive.mount('/content/drive')

#create a folder named Coling2025-workshop inside your drive and use this folder to save your files: /content/drive/MyDrive/Coling2025-workshop/



Mounted at /content/drive


####Step 3 : Load the Model and Define the Pipeline

In [3]:

model_id = "openchat/openchat-3.5-0106"
pipe = pipeline(
    "text-generation",
    model=model_id,
    torch_dtype=torch.float16,
    device_map="auto",
)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/651 [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/23.9k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/3 [00:00<?, ?it/s]

model-00001-of-00003.safetensors:   0%|          | 0.00/4.94G [00:00<?, ?B/s]

model-00002-of-00003.safetensors:   0%|          | 0.00/5.00G [00:00<?, ?B/s]

model-00003-of-00003.safetensors:   0%|          | 0.00/4.54G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/179 [00:00<?, ?B/s]



tokenizer_config.json:   0%|          | 0.00/1.62k [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/493k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.80M [00:00<?, ?B/s]

added_tokens.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/491 [00:00<?, ?B/s]

Device set to use cuda:0


In [4]:
# Ensure the model uses GPU if available
import torch
if torch.cuda.is_available():
    print("GPU is available and the model is using it.")
else:
    print("GPU is not available. The model will run on CPU.")


GPU is available and the model is using it.


####Step 4 : Load the Dataset from Hugging Face

In [5]:
# Load the Datasets from Hugging Face and Select Data
dialects = [
    "mindhunter23/BESSTIE-reddit-sarcasm-uk",
    "mindhunter23/BESSTIE-reddit-sarcasm-in",
    "mindhunter23/BESSTIE-reddit-sarcasm-au"
]

In [6]:
# We have 3 dialects of dataset as shown above. We utlize the validation split for our study.
# From each dialect we select 5 datapoints. (total 15 datapoints).

selected_data = []

# Iterate over each dialect dataset
for dialect in dialects:
    dataset = load_dataset(dialect, split="validation")
    sarcasm_data = [entry for entry in dataset if entry['sarcasm_label'] == 1][:3]  # Select 3 'sarcasm' data points
    not_sarcasm_data = [entry for entry in dataset if entry['sarcasm_label'] == 0][:2]  # Select 2 'not sarcasm' data points

    # Combine selected data
    combined_data = sarcasm_data + not_sarcasm_data
    selected_data.extend(combined_data)

    print(f"Selected data from {dialect}:")
    for entry in combined_data:
        print(entry)

README.md:   0%|          | 0.00/226 [00:00<?, ?B/s]

reddit-sarcasm-uk-train.jsonl:   0%|          | 0.00/271k [00:00<?, ?B/s]

reddit-sarcasm-uk-valid.jsonl:   0%|          | 0.00/36.9k [00:00<?, ?B/s]

Generating train split:   0%|          | 0/1031 [00:00<?, ? examples/s]

Generating validation split:   0%|          | 0/141 [00:00<?, ? examples/s]

Selected data from mindhunter23/BESSTIE-reddit-sarcasm-uk:
{'id': '1cede54', 'text': 'If he gets kicked out you reckon his moral mission bollocks will go with him? As in the whole benefits reform', 'sarcasm_label': 1}
{'id': '1ct8tcj', 'text': 'Perfect for someone with a huge amount of clothing but no prospect of sex', 'sarcasm_label': 1}
{'id': '1co1nxa', 'text': 'Got my vote for London mayor, I want a recount', 'sarcasm_label': 1}
{'id': '1d4dgk9', 'text': 'The last Gov report showed the cost to the UK of reoffending was around 18 billion a year. \n\nThe same statistics put the cost of housing a prisoner each year somewhere around 50k.\n\nIn the UK currently we have 95k people in prison.\n\nLet’s say we invest 10 of that lost 18 billion into increased prison capacity. That would house an additional 200k people. We could literally triple our prison population and still be saving money. This whole economic benefit of releasing people is a myth.\n\nPeople have this fairy tinted view tha

README.md:   0%|          | 0.00/226 [00:00<?, ?B/s]

reddit-sarcasm-in-train.jsonl:   0%|          | 0.00/325k [00:00<?, ?B/s]

reddit-sarcasm-in-valid.jsonl:   0%|          | 0.00/45.8k [00:00<?, ?B/s]

Generating train split:   0%|          | 0/1686 [00:00<?, ? examples/s]

Generating validation split:   0%|          | 0/230 [00:00<?, ? examples/s]

Selected data from mindhunter23/BESSTIE-reddit-sarcasm-in:
{'id': '1d3sum0', 'text': 'Was she playing pub G?\n\nOn a serious note tho, om Shanti ', 'sarcasm_label': 1}
{'id': '1d1mar3', 'text': 'He writes the essay tho.', 'sarcasm_label': 1}
{'id': '1d4ud41', 'text': 'Just like the elephant! \n\n/S', 'sarcasm_label': 1}
{'id': '1d5kg4p', 'text': "illuminati is a myth. It's supposedly a secret society that has penetrated every institution of power and secretly controls the world.\n\nwe even have the Indian version of illuminati - the navaratnas.\n\nthese are myths people make so that they can escape from blaming themselves for their own failures.", 'sarcasm_label': 0}
{'id': '1cvjpd7', 'text': 'These terrorists want only one thing, to keep Kashmir underdeveloped so that unrest will be the norm.', 'sarcasm_label': 0}


README.md:   0%|          | 0.00/226 [00:00<?, ?B/s]

reddit-sarcasm-au-train.jsonl:   0%|          | 0.00/610k [00:00<?, ?B/s]

reddit-sarcasm-au-valid.jsonl:   0%|          | 0.00/81.1k [00:00<?, ?B/s]

Generating train split:   0%|          | 0/1763 [00:00<?, ? examples/s]

Generating validation split:   0%|          | 0/241 [00:00<?, ? examples/s]

Selected data from mindhunter23/BESSTIE-reddit-sarcasm-au:
{'id': '1cxv5kt', 'text': "The last time we made the mistake of accepting no details we suffered Abbott's 2014 Austerity Budget. \n\nIn this case, Dutton imposed his own deadline of revealing details of his Nuclear Power Plants, Housing, and Immigration for 16 May during his Budget Reply speech. Then he backflipped, and now Taylor is scrambling.", 'sarcasm_label': 1}
{'id': '1d2km3k', 'text': "i peel off the printed layer and draw on them to make my own letter magnets\n\nmy brother cuts out the heads and puts them on other magnets,  \nhis plumber's head was on a peppa pig body, when i texted the photo to him he said my number would be blacklisted", 'sarcasm_label': 1}
{'id': '1czcbpf', 'text': "Dutton has said the reactors would be on the site of (former) coal stations because the poles and wires are already in place.  So, shouldn't be too hard to work out and set-off the NIMBYs so they can help shoot down the stupid idea.  The

####Step 5 : Define Sarcasm Detection Function

In [7]:
def detect_sarcasm(text):

    # Define the messages structure for the pipeline
    messages = [
        {"role": "system", "content": "You are a chatbot that classifies text as 'sarcasm' or 'not sarcasm'."},
        {"role": "user", "content": f"Is the following text sarcastic or not sarcastic? Text: \"{text}\" Answer with either 'sarcasm' or 'not sarcasm'."},
    ]
    outputs = pipe(messages, max_new_tokens=50)
    response = outputs[0]["generated_text"]
    return response  # Return the raw response

####Step 6 : Process Data and save the predictions

In [10]:

results = []
output_file = "/content/drive/MyDrive/Coling2025-workshop/predictions_6.jsonl"

print("Processing predictions...")
for entry in tqdm(selected_data, desc="Predicting", unit="sample"):
    try:
        # Check if the text field is available
        text = entry.get("text", "")
        if not text:
            raise ValueError(f"Missing text for entry ID: {entry.get('id', 'Unknown')}")

        response = detect_sarcasm(text)
        results.append({"id": entry["id"], "text": text, "raw_response": response})

    except Exception as e:
        print(f"Error processing entry ID {entry.get('id', 'Unknown')}: {e}")

# Save all predictions to a JSONL file
with jsonlines.open(output_file, mode="w") as writer:
    writer.write_all(results)

print("Results saved to:", output_file)

Processing predictions...


Predicting: 100%|██████████| 15/15 [00:25<00:00,  1.69s/sample]


Results saved to: /content/drive/MyDrive/Coling2025-workshop/predictions_6.jsonl


####Step 7 : Extract the predictions from the predicted raw outputs


In [16]:
input_file = "/content/drive/MyDrive/Coling2025-workshop/predictions_6.jsonl"
output_file = "/content/drive/MyDrive/Coling2025-workshop/extracted_outputs_6.csv"

# regex pattern to extract sarcasm classification
# you can change this pattern according to the language model output structure/pattern
sarcasm_pattern = r"\b(sarcasm|not sarcasm)\b"

results = []

# Process the JSONL file
with jsonlines.open(input_file, mode="r") as reader:
    for obj in reader:
        text_id = obj.get("id", "Unknown")
        raw_response = obj.get("raw_response", "")

        # Extract assistant's response content from the predicted jsonl file
        # Modify this part according to the model template
        assistant_response = ""
        if isinstance(raw_response, list):
            for message in raw_response:
                if message.get("role") == "assistant":
                    assistant_response = message.get("content", "").lower()
                    break

        # regex matching
        match = re.search(sarcasm_pattern, assistant_response)

        # Determine predicted label
        if match:
            predicted_label = match.group(1)
        else:
            predicted_label = "N/A"

        # Get the true label from the dataset
        true_label = "sarcasm" if text_id in [item['id'] for item in selected_data if item['sarcasm_label'] == 1] else "not sarcasm"

        # Add to results
        results.append({
            "id": text_id,
            "text": obj.get("text", ""),
            "predicted_label": predicted_label,
            "true_label": true_label
        })

# Save the results to a CSV file
df = pd.DataFrame(results)
df.to_csv(output_file, index=False)

print(f"Extracted outputs saved to {output_file}")

Extracted outputs saved to /content/drive/MyDrive/Coling2025-workshop/extracted_outputs_6.csv


####Step 8 : Evaluate the extracted predictions

In [17]:
import pandas as pd
from sklearn.metrics import classification_report, confusion_matrix

# Load the extracted outputs CSV file
file_path = "/content/drive/MyDrive/Coling2025-workshop/extracted_outputs_6.csv"
data = pd.read_csv(file_path)

# Ensure the predicted and true labels are present
if "predicted_label" not in data.columns or "true_label" not in data.columns:
    raise ValueError("The CSV file must contain 'predicted_label' and 'true_label' columns.")

# Evaluate the results
y_true = data["true_label"]
y_pred = data["predicted_label"]

# Generate the classification report
print("Classification Report:")
print(classification_report(y_true, y_pred, target_names=["not sarcasm", "sarcasm"]))

# Generate the confusion matrix
print("\nConfusion Matrix:")
print(confusion_matrix(y_true, y_pred, labels=["not sarcasm", "sarcasm"]))



Classification Report:
              precision    recall  f1-score   support

 not sarcasm       0.60      1.00      0.75         6
     sarcasm       1.00      0.56      0.71         9

    accuracy                           0.73        15
   macro avg       0.80      0.78      0.73        15
weighted avg       0.84      0.73      0.73        15


Confusion Matrix:
[[6 0]
 [4 5]]


### Explanation of Classification Report

#### **Classification Report**
The classification report provides performance metrics for each class (`not sarcasm` and `sarcasm`):

**Precision**:
   - Precision measures how many of the predicted instances of a class are correct:
     \begin{equation}
     \text{Precision} = \frac{\text{True Positives}}{\text{True Positives + False Positives}}
     \end{equation}

**Recall**:
   - Recall measures how many of the actual instances of a class are correctly identified:
     \begin{equation}
     \text{Recall} = \frac{\text{True Positives}}{\text{True Positives + False Negatives}}
     \end{equation}

**F1-Score**:
   - The F1-score is the harmonic mean of precision and recall:
     \begin{equation}
     \text{F1-Score} = 2 \cdot \frac{\text{Precision} \cdot \text{Recall}}{\text{Precision} + \text{Recall}}
     \end{equation}

**Support**:
   - The number of true instances for each class

**Accuracy**:
   - Accuracy measures the overall correctness of predictions:
     \begin{equation}
     \text{Accuracy} = \frac{\text{Correct Predictions}}{\text{Total Predictions}}
     \end{equation}

**Macro Average**:
   - Average of precision, recall, and F1-score across classes, without considering class imbalance

**Weighted Average**:
   - Average of precision, recall, and F1-score across classes, weighted by the number of instances in each class (support)

---



#### **Confusion Matrix**
The confusion matrix shows the breakdown of predictions for each class:

|                 | Predicted: Not Sarcasm | Predicted: Sarcasm |
|-----------------|-------------------------|---------------------|
| **Actual: Not Sarcasm** | 6                       | 0                   |
| **Actual: Sarcasm**     | 4                       | 5                   |

1. **True Positives (TP)**:
   - **For `not sarcasm`**: 6 correctly classified.
   - **For `sarcasm`**: 5 correctly classified.

2. **False Positives (FP)**:
   - **For `not sarcasm`**: 4 instances of `sarcasm` were incorrectly classified as `not sarcasm`.
   - **For `sarcasm`**: No false positives.

3. **False Negatives (FN)**:
   - **For `not sarcasm`**: 0 instances of `not sarcasm` were misclassified as `sarcasm`.
   - **For `sarcasm`**: 4 instances of `sarcasm` were misclassified as `not sarcasm`.

---



---


You can try the same steps as above with different dialect dataset seperately from https://huggingface.co/mindhunter23, and compare the results between each dialect.


---

