# Labeling the [craigslist](https://huggingface.co/datasets/craigslist_bargains) dataset using Autolabel

This is a multi-class classification task where the input are conversations between buyers and sellers and we have to correctly classify the item being sold into one of 6 categories. 

## Install Autolabel
Plus, setup your OpenAI API key, since we'll be using `gpt-3.5-turbo` as our LLM for labeling.

In [None]:
!pip install 'refuel-autolabel[openai]'

In [1]:
import os

# provide your own OpenAI API key here
os.environ['OPENAI_API_KEY'] = 'sk-'


## Download the dataset

This dataset is available to install via Autolabel.

In [2]:
from autolabel import get_data

get_data('craigslist')

Downloading example dataset from https://autolabel-benchmarking.s3.us-west-2.amazonaws.com/craigslist/seed.csv to seed.csv...
Downloading example dataset from https://autolabel-benchmarking.s3.us-west-2.amazonaws.com/craigslist/test.csv to test.csv...
100% [........................................] [734774/734774] bytes

This downloads two datasets:
* `test.csv`: This is the larger dataset we are trying to label using LLMs
* `seed.csv`: This is a small dataset where we already have human-provided labels

## Start the labeling process!

Labeling with Autolabel is a 3-step process:
* First, we specify a labeling configuration (see `config.json` below)
* Next, we do a dry-run on our dataset using the LLM specified in `config.json` by running `agent.plan`
* Finally, we run the labeling with `agent.run`

### First labeling run

In [3]:
import json

from autolabel import LabelingAgent

In [4]:
# load the config
with open('config_craigslist.json', 'r') as f:
     config = json.load(f)

Let's review the configuration file below. You'll notice the following useful keys:
* `task_type`: `classification` (since it's a classification task)
* `model`: `{'provider': 'openai', 'name': 'gpt-3.5-turbo'}` (use a specific OpenAI model)
* `prompt.task_guidelines`: `'You are an expert at understanding bank customers support complaints and queries...` (how we describe the task to the LLM)
* `prompt.labels`: `['age_limit', 'apple_pay_or_google_pay', 'atm_support', ...]` (the full list of labels to choose from)
* `prompt.few_shot_num`: 10 (how many labeled examples to provide to the LLM)

In [5]:
config

{'task_name': 'CraigslistConversationClassification',
 'task_type': 'classification',
 'dataset': {'label_column': 'label', 'delimiter': ','},
 'model': {'provider': 'openai', 'name': 'gpt-3.5-turbo'},
 'prompt': {'task_guidelines': 'You are an expert at understanding conversations.\n Given a text passage as input comprising of dialogue of negotiations between a seller and a buyer about the sale of an item, your task is to classify the item being sold into one of the following categories:\n{labels}',
  'output_guidelines': 'You will answer with just the the correct output label and nothing else.',
  'labels': ['housing', 'furniture', 'bike', 'phone', 'car', 'electronics'],
  'few_shot_examples': 'seed.csv',
  'few_shot_selection': 'semantic_similarity',
  'few_shot_num': 10,
  'example_template': 'Input: {example}\nOutput: {label}'}}

In [6]:
# create an agent for labeling
agent = LabelingAgent(config=config)

In [7]:
# dry-run -- this tells us how much this will cost and shows an example prompt
from autolabel import AutolabelDataset
ds = AutolabelDataset("data/craigslist/test.csv", config=config)
agent.plan(ds)

Output()

In [8]:
# now, do the actual labeling
ds = agent.run(ds, max_items=100)

2023-09-27 23:35:04 autolabel.labeler INFO: Task run already exists.


 y


Output()

2023-08-13 09:27:15 openai INFO: error_code=rate_limit_exceeded error_message='Rate limit reached for default-gpt-3.5-turbo in organization org-etZVkYhAIYGmLcxLmarMmAPo on tokens per min. Limit: 90000 / min. Current: 89386 / min. Contact us through our help center at help.openai.com if you continue to have issues.' error_param=None error_type=tokens message='OpenAI API error received' stream_error=False


2023-08-13 09:27:19 openai INFO: error_code=rate_limit_exceeded error_message='Rate limit reached for default-gpt-3.5-turbo in organization org-etZVkYhAIYGmLcxLmarMmAPo on tokens per min. Limit: 90000 / min. Current: 89026 / min. Contact us through our help center at help.openai.com if you continue to have issues.' error_param=None error_type=tokens message='OpenAI API error received' stream_error=False


Actual Cost: 0.1017
