# Labeling the [banking](https://huggingface.co/datasets/banking77) dataset using Autolabel

This is a multi-class classification task where the input are customer service queries and we have to correctly label them with one of 77 intents. 

## Install Autolabel
Plus, setup your OpenAI API key, since we'll be using `gpt-3.5-turbo` as our LLM for labeling.

In [None]:
!pip install 'refuel-autolabel[openai]'

In [1]:
import os

# provide your own OpenAI API key here
os.environ['OPENAI_API_KEY'] = 'sk-XXXXXXXXXXXXXXXXXXXXXXXX'


## Download the dataset

This dataset is available to install via Autolabel.

In [2]:
from autolabel import get_data

get_data('banking')

This downloads two datasets:
* `test.csv`: This is the larger dataset we are trying to label using LLMs
* `seed.csv`: This is a small dataset where we already have human-provided labels

## Start the labeling process!

Labeling with Autolabel is a 3-step process:
* First, we specify a labeling configuration (see `config.json` below)
* Next, we do a dry-run on our dataset using the LLM specified in `config.json` by running `agent.plan`
* Finally, we run the labeling with `agent.run`

### First labeling run

In [3]:
import json

from autolabel import LabelingAgent

In [4]:
# load the config
with open('config_banking.json', 'r') as f:
     config = json.load(f)

Let's review the configuration file below. You'll notice the following useful keys:
* `task_type`: `classification` (since it's a classification task)
* `model`: `{'provider': 'openai', 'name': 'gpt-3.5-turbo'}` (use a specific OpenAI model)
* `prompt.task_guidelines`: `'You are an expert at understanding bank customers support complaints and queries...` (how we describe the task to the LLM)
* `prompt.labels`: `['age_limit', 'apple_pay_or_google_pay', 'atm_support', ...]` (the full list of labels to choose from)
* `prompt.few_shot_num`: 10 (how many labeled examples to provide to the LLM)

In [5]:
config

{'task_name': 'BankingComplaintsClassification',
 'task_type': 'classification',
 'dataset': {'label_column': 'label', 'delimiter': ','},
 'model': {'provider': 'openai', 'name': 'gpt-3.5-turbo'},
 'prompt': {'task_guidelines': 'You are an expert at understanding bank customers support complaints and queries.\nYour job is to correctly classify the provided input example into one of the following categories.\nCategories:\n{labels}',
  'output_guidelines': 'You will answer with just the the correct output label and nothing else.',
  'labels': ['activate_my_card',
   'age_limit',
   'apple_pay_or_google_pay',
   'atm_support',
   'automatic_top_up',
   'balance_not_updated_after_bank_transfer',
   'balance_not_updated_after_cheque_or_cash_deposit',
   'beneficiary_not_allowed',
   'cancel_transfer',
   'card_about_to_expire',
   'card_acceptance',
   'card_arrival',
   'card_delivery_estimate',
   'card_linking',
   'card_not_working',
   'card_payment_fee_charged',
   'card_payment_not_r

In [6]:
# create an agent for labeling
agent = LabelingAgent(config=config)

In [8]:
# dry-run -- this tells us how much this will cost and shows an example prompt
from autolabel import AutolabelDataset
ds = AutolabelDataset("data/banking/test.csv", config=config)
agent.plan(ds)

Output()

You are an expert at understanding bank customers support complaints and queries.
Your job is to correctly classify the provided input example into one of the following categories.
Categories:
activate_my_card
age_limit
apple_pay_or_google_pay
atm_support
automatic_top_up
balance_not_updated_after_bank_transfer
balance_not_updated_after_cheque_or_cash_deposit
beneficiary_not_allowed
cancel_transfer
card_about_to_expire
card_acceptance
card_arrival
card_delivery_estimate
card_linking
card_not_working
card_payment_fee_charged
card_payment_not_recognised
card_payment_wrong_exchange_rate
card_swallowed
cash_withdrawal_charge
cash_withdrawal_not_recognised
change_pin
compromised_card
contactless_not_working
country_support
declined_card_payment
declined_cash_withdrawal
declined_transfer
direct_debit_payment_not_recognised
disposable_card_limits
edit_personal_details
exchange_charge
exchange_rate
exchange_via_app
extra_charge_on_statement
failed_transfer
fiat_currency_support
get_disposable_

In [9]:
# now, do the actual labeling
ds = agent.run(ds, max_items=100)

Output()



2023-08-13 09:26:57 openai INFO: error_code=rate_limit_exceeded error_message='Rate limit reached for default-gpt-3.5-turbo in organization org-etZVkYhAIYGmLcxLmarMmAPo on tokens per min. Limit: 90000 / min. Current: 88913 / min. Contact us through our help center at help.openai.com if you continue to have issues.' error_param=None error_type=tokens message='OpenAI API error received' stream_error=False


2023-08-13 09:26:58 openai INFO: error_code=rate_limit_exceeded error_message='Rate limit reached for default-gpt-3.5-turbo in organization org-etZVkYhAIYGmLcxLmarMmAPo on tokens per min. Limit: 90000 / min. Current: 88462 / min. Contact us through our help center at help.openai.com if you continue to have issues.' error_param=None error_type=tokens message='OpenAI API error received' stream_error=False


2023-08-13 09:27:00 openai INFO: error_code=rate_limit_exceeded error_message='Rate limit reached for default-gpt-3.5-turbo in organization org-etZVkYhAIYGmLcxLmarMmAPo on tokens per min. Limit: 90000 / min. Current: 89189 / min. Contact us through our help center at help.openai.com if you continue to have issues.' error_param=None error_type=tokens message='OpenAI API error received' stream_error=False


2023-08-13 09:27:01 openai INFO: error_code=rate_limit_exceeded error_message='Rate limit reached for default-gpt-3.5-turbo in organization org-etZVkYhAIYGmLcxLmarMmAPo on tokens per min. Limit: 90000 / min. Current: 88721 / min. Contact us through our help center at help.openai.com if you continue to have issues.' error_param=None error_type=tokens message='OpenAI API error received' stream_error=False


2023-08-13 09:27:03 openai INFO: error_code=rate_limit_exceeded error_message='Rate limit reached for default-gpt-3.5-turbo in organization org-etZVkYhAIYGmLcxLmarMmAPo on tokens per min. Limit: 90000 / min. Current: 89387 / min. Contact us through our help center at help.openai.com if you continue to have issues.' error_param=None error_type=tokens message='OpenAI API error received' stream_error=False


2023-08-13 09:27:05 openai INFO: error_code=rate_limit_exceeded error_message='Rate limit reached for default-gpt-3.5-turbo in organization org-etZVkYhAIYGmLcxLmarMmAPo on tokens per min. Limit: 90000 / min. Current: 88905 / min. Contact us through our help center at help.openai.com if you continue to have issues.' error_param=None error_type=tokens message='OpenAI API error received' stream_error=False


2023-08-13 09:27:06 openai INFO: error_code=rate_limit_exceeded error_message='Rate limit reached for default-gpt-3.5-turbo in organization org-etZVkYhAIYGmLcxLmarMmAPo on tokens per min. Limit: 90000 / min. Current: 88502 / min. Contact us through our help center at help.openai.com if you continue to have issues.' error_param=None error_type=tokens message='OpenAI API error received' stream_error=False


2023-08-13 09:27:08 openai INFO: error_code=rate_limit_exceeded error_message='Rate limit reached for default-gpt-3.5-turbo in organization org-etZVkYhAIYGmLcxLmarMmAPo on tokens per min. Limit: 90000 / min. Current: 89068 / min. Contact us through our help center at help.openai.com if you continue to have issues.' error_param=None error_type=tokens message='OpenAI API error received' stream_error=False


2023-08-13 09:27:10 openai INFO: error_code=rate_limit_exceeded error_message='Rate limit reached for default-gpt-3.5-turbo in organization org-etZVkYhAIYGmLcxLmarMmAPo on tokens per min. Limit: 90000 / min. Current: 88677 / min. Contact us through our help center at help.openai.com if you continue to have issues.' error_param=None error_type=tokens message='OpenAI API error received' stream_error=False


2023-08-13 09:27:11 openai INFO: error_code=rate_limit_exceeded error_message='Rate limit reached for default-gpt-3.5-turbo in organization org-etZVkYhAIYGmLcxLmarMmAPo on tokens per min. Limit: 90000 / min. Current: 88274 / min. Contact us through our help center at help.openai.com if you continue to have issues.' error_param=None error_type=tokens message='OpenAI API error received' stream_error=False


2023-08-13 09:27:13 openai INFO: error_code=rate_limit_exceeded error_message='Rate limit reached for default-gpt-3.5-turbo in organization org-etZVkYhAIYGmLcxLmarMmAPo on tokens per min. Limit: 90000 / min. Current: 88551 / min. Contact us through our help center at help.openai.com if you continue to have issues.' error_param=None error_type=tokens message='OpenAI API error received' stream_error=False


2023-08-13 09:27:15 openai INFO: error_code=rate_limit_exceeded error_message='Rate limit reached for default-gpt-3.5-turbo in organization org-etZVkYhAIYGmLcxLmarMmAPo on tokens per min. Limit: 90000 / min. Current: 89386 / min. Contact us through our help center at help.openai.com if you continue to have issues.' error_param=None error_type=tokens message='OpenAI API error received' stream_error=False


2023-08-13 09:27:19 openai INFO: error_code=rate_limit_exceeded error_message='Rate limit reached for default-gpt-3.5-turbo in organization org-etZVkYhAIYGmLcxLmarMmAPo on tokens per min. Limit: 90000 / min. Current: 89026 / min. Contact us through our help center at help.openai.com if you continue to have issues.' error_param=None error_type=tokens message='OpenAI API error received' stream_error=False


Actual Cost: 0.1017


We are at 76% accuracy when labeling the first 100 examples. Let's see if we can use confidence scores to improve accuracy further by removing the less confident examples from our labeled set.

## Compute confidence scores

In [28]:
# Start computing confidence scores (using Refuel's LLMs)
os.environ['REFUEL_API_KEY'] = 'sk-xxxxxxxxxxxx'

In [29]:
# set `compute_confidence` -> True
config["model"]["compute_confidence"] = True

In [30]:
agent = LabelingAgent(config=config)

In [31]:
from autolabel import AutolabelDataset
ds = AutolabelDataset("data/banking/test.csv", config=config)
agent.plan(ds)

Generating Prompts... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100/100 0:00:24 0:00:00
┌──────────────────────────┬─────────┐
│ Total Estimated Cost     │ $6.6836 │
│ Number of Examples       │ 1998    │
│ Average cost per example │ $0.0033 │
└──────────────────────────┴─────────┘
──────────────────────────────── Prompt Example ────────────────────────────────
You are an expert at understanding bank customers support complaints and queries.
Your job is to correctly classify the provided input example into one of the following categories.
Categories:
activate_my_card
age_limit
apple_pay_or_google_pay
atm_support
automatic_top_up
balance_not_updated_after_bank_transfer
balance_not_updated_after_cheque_or_cash_deposit
beneficiary_not_allowed
cancel_transfer
card_about_to_expire
card_acceptance
card_arrival
card_delivery_estimate
card_linking
card_not_working
card_payment_fee_charged
card_payment_not_recognised
card_payment_wrong_exchange_rate
card_swallowed
cash_withdrawal_charge
cash_withdrawa

In [33]:
ds = agent.run(ds, max_items=100)

Output()

2023-06-13 22:59:21 openai INFO: error_code=None error_message='That model is currently overloaded with other requests. You can retry your request, or contact us through our help center at help.openai.com if the error persists. (Please include the request ID 603d90627ab4c936108e1009bec434b8 in your message.)' error_param=None error_type=server_error message='OpenAI API error received' stream_error=False
2023-06-13 23:00:06 openai INFO: error_code=None error_message='That model is currently overloaded with other requests. You can retry your request, or contact us through our help center at help.openai.com if the error persists. (Please include the request ID 9ddebe0bfe2beb3935b21d667d5905ec in your message.)' error_param=None error_type=server_error message='OpenAI API error received' stream_error=False
2023-06-13 23:01:04 openai INFO: error_code=None error_message='That model is currently overloaded with other requests. You can retry your request, or contact us through our help center 

Metric: auroc: 0.8737
Actual Cost: 0.1356
┏━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┓
┃ support ┃ threshold ┃ accuracy ┃ completion_rate ┃
┡━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━━━━━━┩
│ 100     │ -inf      │ 0.74     │ 1.0             │
│ 1       │ 0.9999    │ 1.0      │ 0.01            │
│ 12      │ 0.9992    │ 1.0      │ 0.12            │
│ 13      │ 0.9991    │ 0.9231   │ 0.13            │
│ 41      │ 0.9916    │ 0.9756   │ 0.41            │
│ 42      │ 0.9912    │ 0.9524   │ 0.42            │
│ 43      │ 0.9912    │ 0.9535   │ 0.43            │
│ 44      │ 0.9901    │ 0.9318   │ 0.44            │
│ 48      │ 0.9873    │ 0.9375   │ 0.48            │
│ 49      │ 0.9872    │ 0.9184   │ 0.49            │
│ 63      │ 0.9695    │ 0.9365   │ 0.63            │
│ 64      │ 0.9664    │ 0.9219   │ 0.64            │
│ 66      │ 0.9601    │ 0.9242   │ 0.66            │
│ 67      │ 0.9587    │ 0.9104   │ 0.67            │
│ 68      │ 0.9523    │ 0.9118   │ 0.68            │
│ 69

Looking at the table above, we can see that if we set the confidence threshold at `0.9305`, we are able to label at 90% accuracy and getting a completion rate of 74%. This means, we would ignore all the data points where confidence score is less than `0.9305` (which would end up being around 26% of all samples). This would, however, guarantee a very high quality labeled dataset for us. 