## Exploring the SQUADv2 dataset using Autolabel

#### Setup the API Keys for providers that you want to use

In [1]:
import os

# provide your own OpenAI API key here
os.environ['OPENAI_API_KEY'] = 'sk-xxxxxxxxxxxxxxxxxxxx'

#### Install the autolabel library

In [2]:
!pip install 'refuel-autolabel[openai]'

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting refuel-autolabel[openai]
  Downloading refuel_autolabel-0.0.3-py3-none-any.whl (57 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m57.4/57.4 kB[0m [31m3.0 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting loguru>=0.5.0 (from refuel-autolabel[openai])
  Downloading loguru-0.7.0-py3-none-any.whl (59 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m60.0/60.0 kB[0m [31m6.7 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting numpy>=1.23.0 (from refuel-autolabel[openai])
  Downloading numpy-1.24.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (17.3 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m17.3/17.3 MB[0m [31m66.0 MB/s[0m eta [36m0:00:00[0m
Collecting datasets>=2.7.0 (from refuel-autolabel[openai])
  Downloading datasets-2.13.0-py3-none-any.whl (485 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

#### Download the dataset

In [1]:
from autolabel import get_data

get_data('squad_v2')

Downloading seed example dataset to "seed.csv"...


Downloading test dataset to "test.csv"...


This downloads two datasets:
* `test.csv`: This is the larger dataset we are trying to label using LLMs
* `seed.csv`: This is a small dataset where we already have human-provided labels

## Start the labeling process!

Labeling with Autolabel is a 3-step process:
* First, we specify a labeling configuration (see `config.json` below)
* Next, we do a dry-run on our dataset using the LLM specified in `config.json` by running `agent.plan`
* Finally, we run the labeling with `agent.run`

In [2]:
import json

from autolabel import LabelingAgent

  from .autonotebook import tqdm as notebook_tqdm


In [3]:
# load the config
with open('config_squad_v2.json', 'r') as f:
    config = json.load(f)

Let's review the configuration file below. You'll notice the following useful keys:
* `task_type`: `question_answering` (since it's a question answering task)
* `model`: `{'provider': 'openai', 'name': 'gpt-3.5-turbo'}` (use a specific OpenAI model)
* `prompt.task_guidelines`: `'You are an expert at answering questions based on wikipedia articles` (how we describe the task to the LLM)
* `prompt.few_shot_num`: 3 (how many labeled examples to provide to the LLM)

In [4]:
config

{'task_name': 'OpenbookQAWikipedia',
 'task_type': 'question_answering',
 'dataset': {'label_column': 'answer', 'delimiter': ','},
 'model': {'provider': 'openai', 'name': 'gpt-3.5-turbo'},
 'prompt': {'task_guidelines': 'You are an expert at answering questions based on wikipedia articles. Your job is to answer the following questions using the context provided with the question. The answer is a continuous span of words from the context. Use the context to answer the question. If the question cannot be answered using the context, answer the question as unanswerable.',
  'few_shot_examples': [{'question': 'What was created by the modern Conservative Party in 1859 to define basic Conservative principles?',
    'answer': 'unanswerable',
    'context': "The modern Conservative Party was created out of the 'Pittite' Tories of the early 19th century. In the late 1820s disputes over political reform broke up this grouping. A government led by the Duke of Wellington collapsed amidst dire elec

In [7]:
# create an agent for labeling
agent = LabelingAgent(config=config)

In [9]:
from autolabel import AutolabelDataset
ds = AutolabelDataset("data/squad_v2/test.csv", config=config)
agent.plan(ds)

Output()

You are an expert at answering questions based on wikipedia articles. Your job is to answer the following questions using the context provided with the question. The answer is a continuous span of words from the context. Use the context to answer the question. If the question cannot be answered using the context, answer the question as unanswerable.

You will return the answer one element: "the correct label"


Some examples with their output answers are provided below:

Context: The modern Conservative Party was created out of the 'Pittite' Tories of the early 19th century. In the late 1820s disputes over political reform broke up this grouping. A government led by the Duke of Wellington collapsed amidst dire election results. Following this disaster Robert Peel set about assembling a new coalition of forces. Peel issued the Tamworth Manifesto in 1834 which set out the basic principles of Conservatism; – the necessity in specific cases of reform in order to survive, but an opposition 

In [10]:
ds = agent.run(ds, max_items=100)

Output()



Actual Cost: 0.1792


We are at 59% accuracy when labeling the first 100 examples. Let's see if we can use confidence scores to improve accuracy further by removing the less confident examples from our labeled set.

### Compute confidence scores


In [2]:
# Start computing confidence scores (using Refuel's LLMs)
os.environ['REFUEL_API_KEY'] = 'sk-xxxxxxxxxxxxxxxx'

In [12]:
config["model"]["compute_confidence"] = True

In [13]:
agent = LabelingAgent(config=config)

In [14]:
from autolabel import AutolabelDataset
ds = AutolabelDataset("data/squad_v2/test.csv", config=config)
agent.plan(ds)

Output()

You are an expert at answering questions based on wikipedia articles. Your job is to answer the following questions using the context provided with the question. The answer is a continuous span of words from the context. Use the context to answer the question. If the question cannot be answered using the context, answer the question as unanswerable.

You will return the answer one element: "the correct label"


Some examples with their output answers are provided below:

Context: The modern Conservative Party was created out of the 'Pittite' Tories of the early 19th century. In the late 1820s disputes over political reform broke up this grouping. A government led by the Duke of Wellington collapsed amidst dire election results. Following this disaster Robert Peel set about assembling a new coalition of forces. Peel issued the Tamworth Manifesto in 1834 which set out the basic principles of Conservatism; – the necessity in specific cases of reform in order to survive, but an opposition 

In [15]:
ds = agent.run(ds, max_items=100)

Output()

Metric: auroc: 0.864
Actual Cost: 0.0095


Looking at the table above, we can see that if we set the confidence threshold at `0.8449`, we are able to label at 80.65% accuracy and getting a completion rate of 65%. This means, we would ignore all the data points where confidence score is less than `0.8449` (which would end up being around 35% of all samples). This would, however, guarantee a very high quality labeled dataset for us.