# Labeling the [Civil Comments](https://huggingface.co/datasets/civil_comments) dataset using Autolabel

This dataset contains public comments collected from news websites, the task is a binary classification task -- is the provided comment toxic or not

## Install Autolabel
Plus, setup your OpenAI API key, since we'll be using gpt-3.5-turbo as our LLM for labeling.

In [1]:
!pip3 install 'refuel-autolabel[openai]'

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting refuel-autolabel[openai]
  Downloading refuel_autolabel-0.0.4-py3-none-any.whl (57 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m57.8/57.8 kB[0m [31m1.8 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting loguru>=0.5.0 (from refuel-autolabel[openai])
  Downloading loguru-0.7.0-py3-none-any.whl (59 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m60.0/60.0 kB[0m [31m7.7 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting numpy>=1.23.0 (from refuel-autolabel[openai])
  Downloading numpy-1.25.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (17.6 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m17.6/17.6 MB[0m [31m70.0 MB/s[0m eta [36m0:00:00[0m
Collecting datasets>=2.7.0 (from refuel-autolabel[openai])
  Downloading datasets-2.13.1-py3-none-any.whl (486 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

In [29]:
import os

# provide your own OpenAI API key here
os.environ['OPENAI_API_KEY'] = 'sk-xxxxxxxxxxxxxxxxxxxxxx'

## Download the dataset

In [2]:
from autolabel import get_data

get_data('civil_comments')

Downloading seed example dataset to "seed.csv"...


Downloading test dataset to "test.csv"...


This downloads two datasets:

* `test.csv`: This is the larger dataset we are trying to label using LLMs
* `seed.csv`: This is a small dataset where we already have human-provided labels

## Start the labeling process!
Labeling with Autolabel is a 3-step process:

* First, we specify a labeling configuration (see `config` object below)
* Next, we do a dry-run on our dataset using the LLM specified in `config` by running `agent.plan`
* Finally, we run the labeling with `agent.run`

### Experiment #1: Very simple guidelines

In [3]:
from autolabel import LabelingAgent

In [12]:
config = {
    "task_name": "ToxicCommentClassification",
    "task_type": "classification", # classification task
    "dataset": {
        "label_column": "label",
    },
    "model": {
        "provider": "openai",
        "name": "gpt-3.5-turbo" # the model we want to use
    },
    "prompt": {
        "task_guidelines": "Does the provided comment contain 'toxic' language? Say toxic or not toxic.",
        "labels": [ # list of labels to choose from
            "toxic",
            "not toxic"
        ],
        "example_template": "Input: {example}\nOutput: {label}"
    }
}


Let's review the configuration file above. You'll notice the following useful keys:

* `task_type`: `classification` (since it's a classification task)
* `model`: `{'provider': 'openai', 'name': 'gpt-3.5-turbo'}` (use a specific OpenAI model)
* `prompt.task_guidelines`: Is the provided comment 'toxic' or 'not toxic'? (how we describe the task to the LLM)
* `prompt.labels`: ['toxic', 'not toxic'] (the two labels to choose from)

In [13]:
# create an agent for labeling
agent = LabelingAgent(config=config)

In [14]:
# dry-run -- this tells us how much this will cost and shows an example prompt
agent.plan('test.csv')

Output()

Does the provided comment contain 'toxic' language? Say toxic or not toxic.

You will return the answer with just one element: "the correct label"

Now I want you to label the following example:
Input: [ Integrity means that you pay your debts.]

Does this apply to President Trump too?
Output: 


In [15]:
# now, do the actual labeling
labels, df, metrics = agent.run('test.csv', max_items=100)

Does the provided comment contain 'toxic' language? Say toxic or not toxic.

You will return the answer with just one element: "the correct label"

Now I want you to label the following example:
Input: Perps are showing up a lot more often these days. We need laws to reflect current day crimes. Like execute when guilty. No waiting. No mercy as well. This is one of those cases that qualify.
Output: 


toxic


y


Output()

Actual Cost: 0.0


### Experiment #2: Few-shot prompting to provide helpful examples

In [16]:
config = {
    "task_name": "ToxicCommentClassification",
    "task_type": "classification",
    "dataset": {
        "label_column": "label",
        "delimiter": ","
    },
    "model": {
        "provider": "openai",
        "name": "gpt-3.5-turbo",
    },
    "prompt": {
        "task_guidelines":  "Does the provided comment contain 'toxic' language? Say toxic or not toxic.",
        "labels": [
            "toxic",
            "not toxic"
        ],
        "few_shot_examples": [
            {
                "example": "It's ridiculous that these guys are being called 'protesters'. Being armed is a threat of violence, which makes them terrorists.",
                "label": "toxic"
            },
            {
                "example": "This is so cool. It's like, 'would you want your mother to read this??' Really great idea, well done!",
                "label": "not toxic"
            },
            {
                "example": "This bitch is nuts. Who would read a book by a woman",
                "label": "toxic"
            },
            {
                "example": "It was a great show. Not a combo I'd of expected to be good together but it was.",
                "label": "not toxic"
            }
        ],
        "few_shot_selection": "fixed",
        "few_shot_num": 4,
        "example_template": "Input: {example}\nOutput: {label}"
    }
}


In [17]:
# create an agent for labeling
agent = LabelingAgent(config, cache=False)

In [18]:
# dry-run -- this tells us how much this will cost and shows an example prompt
agent.plan(dataset='test.csv')

Output()

Does the provided comment contain 'toxic' language? Say toxic or not toxic.

You will return the answer with just one element: "the correct label"

Some examples with their output answers are provided below:

Input: It's ridiculous that these guys are being called 'protesters'. Being armed is a threat of violence, which makes them terrorists.
Output: toxic

Input: This is so cool. It's like, 'would you want your mother to read this??' Really great idea, well done!
Output: not toxic

Input: This bitch is nuts. Who would read a book by a woman
Output: toxic

Input: It was a great show. Not a combo I'd of expected to be good together but it was.
Output: not toxic

Now I want you to label the following example:
Input: [ Integrity means that you pay your debts.]

Does this apply to President Trump too?
Output: 


In [19]:
_, output_df, _ = agent.run('test.csv', max_items=100)

Output()

Actual Cost: 0.0358


### Experiment #3: Improving prompts with error analysis (Prompt engineering)

`agent.run()` returned an output Pandas Dataframe. We'll review the mistakes and update the labeling guidelines to "teach" the LLM our content moderation policies.

In [None]:
import pandas as pd

pd.set_option('max_colwidth', None)

In [None]:
# review top 10 mistakes
display(output_df[output_df['label'] != output_df['ToxicCommentClassification_llm_label']].head(10))

Unnamed: 0,example,label,ToxicCommentClassification_llm_labeled_successfully,ToxicCommentClassification_llm_label
1,This is malfeasance by the Administrator and the Board. They are wasting our money!,not toxic,True,toxic
2,"@Rmiller101 - Spoken like a true elitist. But look out bud. The re-awakening in Europe, Brexit and now the Trump victory are just the beginning. The unwashed masses are coming for you and your ilk….",not toxic,True,toxic
4,Sorry you missed high school. Eisenhower sent troops to Vietnam after the French withdrew in 1954 and before that America was providing 50% of the cost of that war. WWI & WWII were won by Democrat Presidents and the last win a Republican had was 1865 but the did surrender in Korea and Vietnam and fail to win in desert Storm or these two wars.,not toxic,True,toxic
5,Let’s see if I understand this; Berkowitz announces a $14M surplus then he rails against Proposition 8 because it would constrain his ability to confiscate even more money from citizens. Berkowitz and his allies on the Assembly changed the tax cap language to facilitate confiscating greater amounts of citizen’s money. Now why would a mayor who professes fidelity with the taxpayers change the law to enable him to take more money from them? Probably because he plans to take more money from them. Berkowitz said if Proposition 8 passes “it severely constrains our ability to give out property tax relief.” So Berkowitz wants the ability to confiscate more of the citizen’s money to provide property relief. How about taking less of that money in the first place? Let’s hope Proposition 8 passes.,not toxic,True,toxic
8,What would Jerry Prevo think about that story? Exactly what I think about his....,not toxic,True,toxic
9,When wil the indigenous be accountable and take responsibility instead of blaming others ? They have to be accountable and pay their own way or go it alone.,not toxic,True,toxic
10,the people against O'Leary are the same people against\n\nTrump\nBrexit\n\nnot a good track record so far\n2 down and 1 to go,not toxic,True,toxic
12,"Since whining is the lifeblood of online comment sites, the Pope's sign will find little support here. The self-pitying ones with their myriad complaints would have you believe that Canada is a dreadful place to live -- why they haven't left for their perfect foreign paradise is a mystery -- when by any relative standard it remains a refuge of stability in a chaotic world, its ongoing issues not likely to be solved by endless griping from anonymous online cranks.",not toxic,True,toxic
15,IT IS DOA! As Collins announced her intentions to vote NO!,not toxic,True,toxic
23,"Kitty Piercy did her best to make the almighty ""legacy"" (city) hall larger-than-life, at the city council work session on 7-11. Move the slide time-bar to 11:18 minutes, on the webcast of that meeting:\n\nhttp://ceapps.eugene-or.gov/CEWebcast/WebCast/Play.aspx?mid=2996",not toxic,True,toxic


Based on the mistakes we are making, we will make some adjustments to our `task_guidelines`.

In [20]:
config = {
    "task_name": "ToxicCommentClassification",
    "task_type": "classification",
    "dataset": {
        "label_column": "label",
        "delimiter": ","
    },
    "model": {
        "provider": "openai",
        "name": "gpt-3.5-turbo",
    },
    "prompt": {
        "task_guidelines": "You are an expert at identifying toxic comments. You aim to act in a fair and balanced manner, where comments that provide fair criticism of something or someone are labelled 'not toxic'. Similarly, criticisms of policy and politicians are marked 'not toxic', unless the comment includes obscenities, racial slurs or sexually explicit material. Any comments that are sexually explicit, obscene, or insults a person, demographic or race are not allowed and labeled 'toxic'. \nYour job is to correctly label the provided input example into one of the following categories:\n{labels}",
        "labels": [
            "toxic",
            "not toxic"
        ],
        "few_shot_examples": [
            {
                "example": "It's ridiculous that these guys are being called 'protesters'. Being armed is a threat of violence, which makes them terrorists.",
                "label": "toxic"
            },
            {
                "example": "This is so cool. It's like, 'would you want your mother to read this??' Really great idea, well done!",
                "label": "not toxic"
            },
            {
                "example": "This bitch is nuts. Who would read a book by a woman",
                "label": "toxic"
            },
            {
                "example": "It was a great show. Not a combo I'd of expected to be good together but it was.",
                "label": "not toxic"
            }
        ],
        "few_shot_selection": "fixed",
        "few_shot_num": 4,
        "example_template": "Input: {example}\nOutput: {label}"
    }
}


In [21]:
# create an agent for labeling
agent = LabelingAgent(config, cache=False)

In [22]:
# dry-run -- this tells us how much this will cost and shows an example prompt
agent.plan(dataset='test.csv')

Output()

You are an expert at identifying toxic comments. You aim to act in a fair and balanced manner, where comments that provide fair criticism of something or someone are labelled 'not toxic'. Similarly, criticisms of policy and politicians are marked 'not toxic', unless the comment includes obscenities, racial slurs or sexually explicit material. Any comments that are sexually explicit, obscene, or insults a person, demographic or race are not allowed and labeled 'toxic'. 
Your job is to correctly label the provided input example into one of the following categories:
toxic
not toxic

You will return the answer with just one element: "the correct label"

Some examples with their output answers are provided below:

Input: It's ridiculous that these guys are being called 'protesters'. Being armed is a threat of violence, which makes them terrorists.
Output: toxic

Input: This is so cool. It's like, 'would you want your mother to read this??' Really great idea, well done!
Output: not toxic

In

In [23]:
# now, do the actual labeling
_, output_df, _ = agent.run('test.csv', max_items=100)

Output()

Actual Cost: 0.0505


### Experiment #4: Using a different LLM

We've iterated a fair bit on prompts, and few-shot examples. Let's evaluate a few different LLMs provided by the library out of the box. For example, we observe that we can boost performance even further by using `text-davinci-003`

In [25]:
config = {
    "task_name": "ToxicCommentClassification",
    "task_type": "classification",
    "dataset": {
        "label_column": "label",
        "delimiter": ","
    },
    "model": {
        "provider": "openai",
        "name": "text-davinci-003",
    },
    "prompt": {
        "task_guidelines": "You are an expert at identifying toxic comments. You aim to act in a fair and balanced manner, where comments that provide fair criticism of something or someone are labelled 'not toxic'. Similarly, criticisms of policy and politicians are marked 'not toxic', unless the comment includes obscenities, racial slurs or sexually explicit material. Any comments that are sexually explicit, obscene, or insults a person, demographic or race are not allowed and labeled 'toxic'. \nYour job is to correctly label the provided input example into one of the following categories:\n{labels}",
        "labels": [
            "toxic",
            "not toxic"
        ],
        "few_shot_examples": [
            {
                "example": "It's ridiculous that these guys are being called 'protesters'. Being armed is a threat of violence, which makes them terrorists.",
                "label": "toxic"
            },
            {
                "example": "This is so cool. It's like, 'would you want your mother to read this??' Really great idea, well done!",
                "label": "not toxic"
            },
            {
                "example": "This bitch is nuts. Who would read a book by a woman",
                "label": "toxic"
            },
            {
                "example": "It was a great show. Not a combo I'd of expected to be good together but it was.",
                "label": "not toxic"
            }
        ],
        "few_shot_selection": "fixed",
        "few_shot_num": 4,
        "example_template": "Input: {example}\nOutput: {label}"
    }
}


In [26]:
# create an agent for labeling
agent = LabelingAgent(config, cache=False)

In [27]:
# dry-run -- this tells us how much this will cost and shows an example prompt
agent.plan(dataset='test.csv')

Output()

You are an expert at identifying toxic comments. You aim to act in a fair and balanced manner, where comments that provide fair criticism of something or someone are labelled 'not toxic'. Similarly, criticisms of policy and politicians are marked 'not toxic', unless the comment includes obscenities, racial slurs or sexually explicit material. Any comments that are sexually explicit, obscene, or insults a person, demographic or race are not allowed and labeled 'toxic'. 
Your job is to correctly label the provided input example into one of the following categories:
toxic
not toxic

You will return the answer with just one element: "the correct label"

Some examples with their output answers are provided below:

Input: It's ridiculous that these guys are being called 'protesters'. Being armed is a threat of violence, which makes them terrorists.
Output: toxic

Input: This is so cool. It's like, 'would you want your mother to read this??' Really great idea, well done!
Output: not toxic

In

In [28]:
# now, do the actual labeling
_, output_df, _ = agent.run('test.csv', max_items=100)

Output()

Actual Cost: 0.7022


### Experiment #5: Using confidence scores

In [None]:
# Start computing confidence scores (using Refuel's LLMs)
os.environ['REFUEL_API_KEY'] = 'sk-xxxxxxxxxxxx'

In [None]:
config = {
    "task_name": "ToxicCommentClassification",
    "task_type": "classification",
    "dataset": {
        "label_column": "label",
        "delimiter": ","
    },
    "model": {
        "provider": "openai",
        "name": "gpt-3.5-turbo",
        "compute_confidence": True,
    },
    "prompt": {
        "task_guidelines": "You are an expert at identifying toxic comments. You aim to act in a fair and balanced manner, where comments that provide fair criticism of something or someone are labelled 'not toxic'. Similarly, criticisms of policy and politicians are marked 'not toxic', unless the comment includes obscenities, racial slurs or sexually explicit material. Any comments that are sexually explicit, obscene, or insults a person, demographic or race are not allowed and labeled 'toxic'. \nYour job is to correctly label the provided input example into one of the following categories:\n{labels}",
        "labels": [
            "toxic",
            "not toxic"
        ],
        "few_shot_examples": [
            {
                "example": "It's ridiculous that these guys are being called 'protesters'. Being armed is a threat of violence, which makes them terrorists.",
                "label": "toxic"
            },
            {
                "example": "This is so cool. It's like, 'would you want your mother to read this??' Really great idea, well done!",
                "label": "not toxic"
            },
            {
                "example": "This bitch is nuts. Who would read a book by a woman",
                "label": "toxic"
            },
            {
                "example": "It was a great show. Not a combo I'd of expected to be good together but it was.",
                "label": "not toxic"
            }
        ],
        "few_shot_selection": "fixed",
        "few_shot_num": 4,
        "example_template": "Input: {example}\nOutput: {label}"
    }
}


In [None]:
# create an agent for labeling
agent = LabelingAgent(config, cache=False)

In [None]:
# dry-run -- this tells us how much this will cost and shows an example prompt
agent.plan('test.csv')

Output()


You are an expert at identifying toxic comments.

You aim to act in a fair and balanced manner, where comments that provide fair criticism of something or someone are labelled 'not toxic'. Similarly, criticisms of policy and politicians are marked 'not toxic', unless the comment includes obscenities, racial slurs or sexually explicit material.

Any comments that are explicit, obscene, or insults a person, demographic or race are not allowed and labeled 'toxic'.


You will return the answer with just one element: "the correct label"

Now I want you to label the following example:
Input: [ Integrity means that you pay your debts.]

Does this apply to President Trump too?
Output: 


In [None]:
# now, do the actual labeling
_, output_df, _ = agent.run('test.csv', start_index=0, max_items=100)

Metric: auroc: 0.5



You are an expert at identifying toxic comments.

You aim to act in a fair and balanced manner, where comments that provide fair criticism of something or someone are labelled 'not toxic'. Similarly, criticisms of policy and politicians are marked 'not toxic', unless the comment includes obscenities, racial slurs or sexually explicit material.

Any comments that are explicit, obscene, or insults a person, demographic or race are not allowed and labeled 'toxic'.


You will return the answer with just one element: "the correct label"

Now I want you to label the following example:
Input: Between racist Eastman and 'Slappy' Wilson the Wasilla delegation is an embarassment to the valley and the State. Both should resign. This is Trump's America, open racism in the Republican party and physical attacks on reporters.
Cue the usual racist adn commenters telling us how Eastman actually has a valid point with no evidence beyond an anecdote from the time they did a stint in the bush in 3...2...

toxic


n


Output()

Metric: auroc: 0.8875
Actual Cost: 0.0376
