## Exploring the SciQ dataset using Autolabel

#### Setup the API Keys for providers that you want to use

In [1]:
import os

# provide your own OpenAI API key here
os.environ['OPENAI_API_KEY'] = 'sk-xxxxxxxxxxxxxxxxx'

#### Install the autolabel library

In [2]:
!pip install 'refuel-autolabel[openai]'





#### Download the dataset

In [3]:
from autolabel import get_data

get_data('sciq')

  from .autonotebook import tqdm as notebook_tqdm


Downloading seed example dataset to "seed.csv"...
100% [..........................................................] 29687 / 29687

Downloading test dataset to "test.csv"...
100% [........................................................] 119128 / 119128

This downloads two datasets:
* `test.csv`: This is the larger dataset we are trying to label using LLMs
* `seed.csv`: This is a small dataset where we already have human-provided labels

## Start the labeling process!

Labeling with Autolabel is a 3-step process:
* First, we specify a labeling configuration (see `config.json` below)
* Next, we do a dry-run on our dataset using the LLM specified in `config.json` by running `agent.plan`
* Finally, we run the labeling with `agent.run`

In [4]:
import json

from autolabel import LabelingAgent

In [5]:
# load the config
with open('config_sciq.json', 'r') as f:
     config = json.load(f)

Let's review the configuration file below. You'll notice the following useful keys:
* `task_type`: `question_answering` (since it's a question answering task)
* `model`: `{'provider': 'openai', 'name': 'gpt-3.5-turbo'}` (use a specific OpenAI model)
* `prompt.task_guidelines`: `'You are an expert at answer science questions...` (how we describe the task to the LLM)
* `prompt.few_shot_num`: 10 (how many labeled examples to provide to the LLM)

In [8]:
config

{'task_name': 'ScienceQuestionAnswering',
 'task_type': 'question_answering',
 'dataset': {'label_column': 'answer', 'delimiter': ','},
 'model': {'provider': 'openai', 'name': 'gpt-3.5-turbo'},
 'prompt': {'task_guidelines': 'You are an expert at answer science questions. Your job is to answer the given question, using the options provided for each question. Choose the best answer for the question from among the options provided',
  'example_template': 'Question: {question}\nOptions: {options}\nAnswer: {answer}',
  'few_shot_examples': 'seed.csv',
  'few_shot_selection': 'semantic_similarity',
  'few_shot_num': 10}}

In [9]:
# create an agent for labeling
agent = LabelingAgent(config=config)

In [10]:
from autolabel import AutolabelDataset
ds = AutolabelDataset("data/sciq/test.csv", config=config)
agent.plan(ds)

You are an expert at answer science questions. Your job is to answer the given question, using the options provided for each question. Choose the best answer for the question from among the options provided

You will return the answer one element: "the correct label"


Some examples with their output answers are provided below:

Question: What do you call health-promoting molecules that inhibit the oxidation of other molecules?
Options: ['nutrients', 'antioxidants', 'neurotransmitters', 'hormones']
Answer: antioxidants

Question: Highly reactive nonmetals, which only accept electrons and do not give them up, make poor what?
Options: ['electricity conductors', 'insulators', 'electromagnets', 'alloys']
Answer: electricity conductors

Question: The majority of elements, including iron and copper, are of what type?
Options: ['oils', 'metals', 'minerals', 'acids']
Answer: metals

Question: A hydrogen atom with one neutron is called what?
Options: ['magnesium', 'deuterium', 'covalent', 'ioni

In [11]:
ds = agent.run(ds, max_items=100)

Actual Cost: 0.1063


We are at 94% accuracy when labeling the first 100 examples. Let's see if we can use confidence scores to improve accuracy further by removing the less confident examples from our labeled set.

### Compute confidence scores


In [17]:
# Start computing confidence scores (using Refuel's LLMs)
os.environ['REFUEL_API_KEY'] = 'xxxxxxxxxxxxxxxxx'

In [18]:
config["model"]["compute_confidence"] = True

In [19]:
agent = LabelingAgent(config=config)

In [20]:
from autolabel import AutolabelDataset
ds = AutolabelDataset("data/sciq/test.csv", config=config)
agent.plan(ds)

You are an expert at answer science questions. Your job is to answer the given question, using the options provided for each question. Choose the best answer for the question from among the options provided

You will return the answer one element: "the correct label"


Some examples with their output answers are provided below:

Question: What do you call health-promoting molecules that inhibit the oxidation of other molecules?
Options: ['nutrients', 'antioxidants', 'neurotransmitters', 'hormones']
Answer: antioxidants

Question: Highly reactive nonmetals, which only accept electrons and do not give them up, make poor what?
Options: ['electricity conductors', 'insulators', 'electromagnets', 'alloys']
Answer: electricity conductors

Question: The majority of elements, including iron and copper, are of what type?
Options: ['oils', 'metals', 'minerals', 'acids']
Answer: metals

Question: A hydrogen atom with one neutron is called what?
Options: ['magnesium', 'deuterium', 'covalent', 'ioni

In [21]:
ds = agent.run(ds, max_items=100)

2023-06-14 15:09:14 autolabel.labeler INFO: Task run already exists.


Metric: auroc: 0.5


You are an expert at answer science questions. Your job is to answer the given question, using the options provided for each question. Choose the best answer for the question from among the options provided

You will return the answer one element: "the correct label"


Some examples with their output answers are provided below:

Question: The male gametophyte releases what, which swim - propelled by their flagella - to reach and fertilize the female gamete or egg?
Options: ['sperm', 'cytoplasm', 'tadpoles', 'dna']
Answer: sperm

Question: Prophase is preceded by a preprophase stage in what type of cells?
Options: ['hair and nail cells', 'plant cells', 'egg cells', 'brain cells']
Answer: plant cells

Question: What form of reproduction creates offspring that are genetically identical to the parent?
Options: ['microscopic', 'primitive', 'sexual', 'asexual']
Answer: asexual

Question: In humans, fertilization occurs soon after the oocyte leaves this?
Options: ['placenta', 'testes', 'ovary

sperm and egg


n




Metric: auroc: 0.6986
Actual Cost: 0.0268


Looking at the table above, we can see that if we set the confidence threshold at `0.9036`, we are able to label at 97.3% accuracy and getting a completion rate of 74%. This means, we would ignore all the data points where confidence score is less than `0.9036` (which would end up being around 26% of all samples). This would, however, guarantee a very high quality labeled dataset for us. 