# Intro

This notebook is part of a series of notebooks that aim to reuse open-source LLM models to perform a binary classification task.

Notebooks can be run completely independently from the others and besides dataset_utils.py have no common local dependencies. (As a result,
you can expect a little bit of code redundancy between notebooks) 

**The task is to detect toxic comments out of text comments retrieved from different news websites.**

For more information, see dataset_utils.py or search for 'Civil Comments dataset' online.

-----
This notebook **performs Few-shots learning classifications** via the remote OpenAI API.
Using external services enables us to use models that don't fit our env (mostly, GPU constraints).. but can be pricy!

In [1]:
from tqdm import tqdm

import evaluate

from utils import dataset_utils

Datasets cache is False


In [3]:
from openai import OpenAI

# Requires an token registered in env variable OPENAI_API_KEY
# or pass api_key keyword argument below
client = OpenAI()


# Load Dataset

In [4]:
comments_dataset = dataset_utils.load_sampled_ds(ds_size=200)

Map:   0%|          | 0/200 [00:00<?, ? examples/s]

Map:   0%|          | 0/200 [00:00<?, ? examples/s]

Map:   0%|          | 0/200 [00:00<?, ? examples/s]

In [5]:
# Our dataset already has 3 splits ready

# Our target is the 'is_toxic' binary column
# The main feature we'll use is the free text 'text' column
comments_dataset

DatasetDict({
    train: Dataset({
        features: ['text', 'toxicity', 'severe_toxicity', 'obscene', 'threat', 'insult', 'identity_attack', 'sexual_explicit', 'is_toxic'],
        num_rows: 200
    })
    validation: Dataset({
        features: ['text', 'toxicity', 'severe_toxicity', 'obscene', 'threat', 'insult', 'identity_attack', 'sexual_explicit', 'is_toxic'],
        num_rows: 200
    })
    test: Dataset({
        features: ['text', 'toxicity', 'severe_toxicity', 'obscene', 'threat', 'insult', 'identity_attack', 'sexual_explicit', 'is_toxic'],
        num_rows: 200
    })
})

# Setup Few-shots learning prompts

In [6]:
toxic_comments = comments_dataset.filter(lambda example:example["is_toxic"])["train"]["text"]
non_toxic_comments = comments_dataset.filter(lambda example:not example["is_toxic"])["train"]["text"]

def build_system_prompt():
    # Tune this to get more examples in the context, make sure context is wide enough to hold everything
    N_EXAMPLES_PER_LABEL = 5

    few_shots_examples = ""
    # [Exercise] How to better sample toxic and non toxic comments here?
    for comment in toxic_comments[:N_EXAMPLES_PER_LABEL]:
        few_shots_examples += f"- '{comment}' -> is toxic. \n"
    for comment in non_toxic_comments[:N_EXAMPLES_PER_LABEL]:
        few_shots_examples += f"- '{comment}' -> is NOT toxic.\n"
    
    prefix = "You are a helpful expert that decides whether a comment is toxic or not for moderation purposes on a social media platform."
    prefix += "The definition of 'toxic comment' here would be: 'a rude, disrespectful, or unreasonable comment that is somewhat likely to make you leave a discussion or give up on sharing your perspective'\n"
    prefix += "Please always answer by saying Yes only or No only, nothing else. \n"
    prefix += "Here is a bullet point list of a few examples with their respective labels:\n"
    
    suffix = "That's it, please do good work!"

    full_system_prompt = f"{prefix} {few_shots_examples} \n\n  {suffix}"

    return full_system_prompt


SYSTEM_PROMPT = build_system_prompt()
print(SYSTEM_PROMPT)

Filter:   0%|          | 0/200 [00:00<?, ? examples/s]

Filter:   0%|          | 0/200 [00:00<?, ? examples/s]

Filter:   0%|          | 0/200 [00:00<?, ? examples/s]

Filter:   0%|          | 0/200 [00:00<?, ? examples/s]

Filter:   0%|          | 0/200 [00:00<?, ? examples/s]

Filter:   0%|          | 0/200 [00:00<?, ? examples/s]

You are a helpful expert that decides whether a comment is toxic or not for moderation purposes on a social media platform.The definition of 'toxic comment' here would be: 'a rude, disrespectful, or unreasonable comment that is somewhat likely to make you leave a discussion or give up on sharing your perspective'
Please always answer by saying Yes only or No only, nothing else. 
Here is a bullet point list of a few examples with their respective labels:
 - 'And I always thought that Eric Trump was the dumbest son.  Sorry, Eric!' -> is toxic. 
- 'Careful, your misogyny is showing.' -> is toxic. 
- 'Yeah it does, why? Because all these damn people come here on vacation or whatever reason, and fall in love and friggin move here. The problem with that is the more that have come - the less we remain what everyone loved!
I've lived here my entire life and I've watched as we have become less and less "green & beautiful" and more "concrete & cookie cutter"! The most "scandalous" thing we had g

In [7]:
MODEL = "gpt-4o-mini"

def get_model_opinion(model, comment):
    completion = client.chat.completions.create(
        model=model,
        messages=[
            {
                "role": "system",
                "content": SYSTEM_PROMPT
            },
            {
                "role": "user",
                "content": f"Is the following comment in quotes toxic? '{comment}'"
            }
        ],
        n=1,
        max_completion_tokens=1,
    )
    # Optionally, retrieve logprobs too for finer optimisations.

    response = completion.choices[0].message.content
    return {
        "prediction": "yes" in response.lower(), # /!\ returns False for anything but 'yes' answers (even random strings)
    }



In [8]:
get_model_opinion(MODEL, "I hate you, you're the worst")

{'prediction': True}

In [9]:
# These 600 predictions with the prompt defined above cost ~0.10$, five times more it cost for the Zero-shot notebook
comments_dataset = comments_dataset.map(
    lambda comment_data: get_model_opinion(MODEL, comment_data["text"]),
    keep_in_memory=True,
    load_from_cache_file=False,
    batched=False
)

Map:   0%|          | 0/200 [00:00<?, ? examples/s]

Map:   0%|          | 0/200 [00:00<?, ? examples/s]

Map:   0%|          | 0/200 [00:00<?, ? examples/s]

# Evaluate

In [10]:
clf_metrics = evaluate.combine(["accuracy", "f1", "precision", "recall"])

In [11]:
clf_metrics.compute(
    references=comments_dataset["validation"]["is_toxic"],
    predictions=comments_dataset["validation"]["prediction"]
)

{'accuracy': 0.755,
 'f1': 0.44943820224719094,
 'precision': 0.29850746268656714,
 'recall': 0.9090909090909091}

# Final test

In [12]:
clf_metrics.compute(
    references=comments_dataset["test"]["is_toxic"],
    predictions=comments_dataset["test"]["prediction"]
)

{'accuracy': 0.7,
 'f1': 0.26829268292682923,
 'precision': 0.15714285714285714,
 'recall': 0.9166666666666666}