# Intro

This notebook is part of a series of notebooks that aim to reuse open-source LLM models to perform a binary classification task.

Notebooks can be run completely independently from the others and besides dataset_utils.py have no common local dependencies. (As a result,
you can expect a little bit of code redundancy between notebooks) 

**The task is to detect toxic comments out of text comments retrieved from different news websites.**

For more information, see dataset_utils.py or search for 'Civil Comments dataset' online.

-----
This notebook **performs Zero-shot classifications** via the remote OpenAI API.
Using external services enables us to use models that don't fit our env (mostly, GPU constraints).. but can be pricy!

In [1]:
from tqdm import tqdm

import evaluate

from utils import dataset_utils

Datasets cache is False


In [2]:
from openai import OpenAI

# Requires an token registered in env variable OPENAI_API_KEY
# or pass api_key keyword argument below
client = OpenAI()


# Load Dataset

In [3]:
comments_dataset = dataset_utils.load_sampled_ds(ds_size=200)

Map:   0%|          | 0/200 [00:00<?, ? examples/s]

Map:   0%|          | 0/200 [00:00<?, ? examples/s]

Map:   0%|          | 0/200 [00:00<?, ? examples/s]

In [4]:
# Our dataset already has 3 splits ready

# Our target is the 'is_toxic' binary column
# The main feature we'll use is the free text 'text' column
comments_dataset

DatasetDict({
    train: Dataset({
        features: ['text', 'toxicity', 'severe_toxicity', 'obscene', 'threat', 'insult', 'identity_attack', 'sexual_explicit', 'is_toxic'],
        num_rows: 200
    })
    validation: Dataset({
        features: ['text', 'toxicity', 'severe_toxicity', 'obscene', 'threat', 'insult', 'identity_attack', 'sexual_explicit', 'is_toxic'],
        num_rows: 200
    })
    test: Dataset({
        features: ['text', 'toxicity', 'severe_toxicity', 'obscene', 'threat', 'insult', 'identity_attack', 'sexual_explicit', 'is_toxic'],
        num_rows: 200
    })
})

In [11]:
MODEL = "gpt-4o-mini"

def get_model_opinion(model, comment):
    completion = client.chat.completions.create(
        model=model,
        messages=[
            {
                "role": "system",
                "content": "You are a helpful expert that decides whether a comment is toxic or not for moderation purposes on a social media platform. Always answer by saying 'Yes' or 'No', nothing else."
            },
            {
                "role": "user",
                "content": f"Is the following comment in quotes toxic? '{comment}'"
            }
        ],
        n=1,
        max_completion_tokens=1,
    )
    # Optionally, retrieve logprobs too for finer optimisations.

    response = completion.choices[0].message.content
    return {
        "prediction": "yes" in response.lower(), # /!\ returns False for anything but 'yes' answers (even random strings)
    }



In [12]:
get_model_opinion(MODEL, "I hate you, you're the worst")

{'prediction': True}

In [13]:
comments_dataset = comments_dataset.map(
    lambda comment_data: get_model_opinion(MODEL, comment_data["text"]),
    keep_in_memory=True,
    load_from_cache_file=False,
    batched=False
)

Map:   0%|          | 0/200 [00:00<?, ? examples/s]

Map:   0%|          | 0/200 [00:00<?, ? examples/s]

Map:   0%|          | 0/200 [00:00<?, ? examples/s]

# Evaluate

In [14]:
clf_metrics = evaluate.combine(["accuracy", "f1", "precision", "recall"])

In [15]:
clf_metrics.compute(
    references=comments_dataset["validation"]["is_toxic"],
    predictions=comments_dataset["validation"]["prediction"]
)

{'accuracy': 0.885,
 'f1': 0.5964912280701754,
 'precision': 0.4857142857142857,
 'recall': 0.7727272727272727}

# Final test

In [16]:
clf_metrics.compute(
    references=comments_dataset["test"]["is_toxic"],
    predictions=comments_dataset["test"]["prediction"]
)

{'accuracy': 0.835,
 'f1': 0.326530612244898,
 'precision': 0.21621621621621623,
 'recall': 0.6666666666666666}