# Few-shot text classification with OpenAI's GPT models

This notebook illustrates how to use OpenAI's GPT4o model for few-shot text classification.

In [1]:
import os
from dotenv import load_dotenv
from openai import OpenAI

import re

In [2]:
load_dotenv()
client = OpenAI(api_key=os.environ.get("OPENAI_API_KEY"))

In [3]:
MODEL = 'gpt-4o-2024-08-06' # currently the latest version of GPT-4o (on 2024-09-25)

In [4]:
data_path = os.path.join('..', 'data', 'labeled', 'benoit_crowdsourced_2016', '')

In [5]:
SEED=42

## Define the task

In this example, we adapt the instruction for the economic/social/neither policy area classification task in Benoit et al. (2016)

- see [this README file](../data/labeled/benoit_crowdsourced_2016/README.md) for a description of the data and tasks covered in the paper
- see [this file](../data/labeled/benoit_crowdsourced_2016/instructions/econ_social_policy.md) for a copy of their original task instructions

In [6]:
instructions = """
This task involves reading sentences from political texts and judging whether these deal with economic or social policy.

The sentences you will be asked to interpret come from political party manifestos.
Some of these sentences will deal with economic policy; some will deal with social policy; other sentences will deal with neither economic nor social policy. We tell you below about what we mean by "economic" and "social" policy.

First, you will read a short section from a party manifesto.
For the sentence highlighted in red, enter your best judgment about whether it mainly refers to economic policy, to social policy, or to neither.

If the sentence refers to economic policy, select "economic" in the drop down menu; if it refers to social policy, select "social".
If the sentence does not refer to either policy area, select "Not Economic or Social" -- in this case you will move directly to the next sentence.

Now we need to tell you about what we mean by "economic" and "social" policy.

## Your task

Classify the input text in one of the following categories: economic, social, neither

## Response format

Only respond with the chosen category and no additional text or explanations 
"""

### Load the data

In [9]:
from utils.io import read_tabular
from utils.finetuning import split_data

fp = data_path+'benoit_crowdsourced_2016-policy_area.csv'
df = read_tabular(fp, columns=['text', 'label', 'metadata__gold'])

df = df[df.metadata__gold]
del df['metadata__gold']

id2label = {
    2: "economic",
    3: "social",
    1: "neither"
}

df.label = df.label.map(id2label)

# subset to 100 examples per label class 
df = df.groupby('label').sample(n=50, random_state=SEED)

In [10]:
df.label.value_counts()

label
economic    50
neither     50
social      50
Name: count, dtype: int64

In [11]:
# split into "training" and "test" data
#  - training is used for examples
#  - test is used for few-shot classification  
# note: if you had different example selection 
#  strategies of variations of a prompt, you could use a dev for comparing them
data_splits = split_data(df, test_size=0.5, dev_size=None, stratify_by='label', return_dict=True)
data_splits.keys()

dict_keys(['train', 'test'])

In [12]:
data_splits['train'].label.value_counts()

label
economic    25
neither     25
social      25
Name: count, dtype: int64

## sample examples

There is various different ways of sampling few-shot exemplars

Here, we take the **most representative exemplars** by 

1. embedding exemplars
2. computing the centroid for each label class
3. and ranking exemplars in terms of their closeness to their class centroid

In [13]:
import numpy as np
from typing import List
def embed(texts: List[str]):
    # see https://platform.openai.com/docs/guides/embeddings/what-are-embeddings?lang=python
    texts = [text.replace("\n", " ") for text in texts] 
    res = client.embeddings.create(
        input=texts,
        model='text-embedding-3-small'
    )
    embeddings = [(e.index, e.embedding) for e in res.data]
    # sort embeddings by index
    embeddings = [e[1] for e in sorted(embeddings, key=lambda x: x[0])]
    return np.array(embeddings)

In [14]:
from sklearn.metrics.pairwise import cosine_similarity

ranked_exemplars = {}
for label_class, df in data_splits['train'].groupby('label'):
    # get embeddings from OpenAI model
    embeddings = embed(df.text.to_list())
    # compute centroid
    centroid = embeddings.mean(axis=0)
    # compute cosine similarity between the centroid and the embeddings
    dists = cosine_similarity([centroid], embeddings)
    # rank the examples by similarity, descending
    ranked_indices = np.argsort(dists[0])[::-1]
    # add to output
    ranked_exemplars[label_class]= df.iloc[ranked_indices, 0]

In [15]:
import random
def get_n_examples(ranked_exemplars, k, shuffle=True, random_state=SEED):
    exs = []
    for label_class, exemplars in ranked_exemplars.items():
        ex = exemplars[:k].to_list()
        ex = [{'text': text, 'label': label_class} for text in ex]
        exs.extend(ex)
    if shuffle:
        random.Random(random_state).shuffle(exs)
    return exs

In [16]:
get_n_examples(ranked_exemplars, 3)

[{'text': 'We seek the support of the British people to make this achievement truly secure, to build upon it and to extend its benefits to all.',
  'label': 'neither'},
 {'text': 'We have encouraged tougher sentences for violent criminals.',
  'label': 'social'},
 {'text': 'We will strengthen the law on public order to combat racial hatred and take firm action against the growing menace of racial attacks.',
  'label': 'social'},
 {'text': 'A Fateful Choice. For decades there was basic agreement between political parties on defence and foreign policy.',
  'label': 'neither'},
 {'text': 'We will encourage the recruitment of ethnic minorities into the police force and require action to be taken against discrimination within the force.',
  'label': 'social'},
 {'text': 'We will continue to reduce the burden of capital gains tax and inheritance tax as it is prudent to do so.',
  'label': 'economic'},
 {'text': "Labour's policy would mean not a secure Britain, but a neutralist Britain.",
  '

In [17]:
def convert_exemplars_list_to_convo(exemplars):
    convo = []
    for ex in exemplars:
        convo.append({"role": "user", "content": f"'''{ex['text']}'''"},)
        convo.append({"role": "assistant", "content": ex['label']},)
    return convo

In [18]:
convert_exemplars_list_to_convo(get_n_examples(ranked_exemplars, 3))

[{'role': 'user',
  'content': "'''We seek the support of the British people to make this achievement truly secure, to build upon it and to extend its benefits to all.'''"},
 {'role': 'assistant', 'content': 'neither'},
 {'role': 'user',
  'content': "'''We have encouraged tougher sentences for violent criminals.'''"},
 {'role': 'assistant', 'content': 'social'},
 {'role': 'user',
  'content': "'''We will strengthen the law on public order to combat racial hatred and take firm action against the growing menace of racial attacks.'''"},
 {'role': 'assistant', 'content': 'social'},
 {'role': 'user',
  'content': "'''A Fateful Choice. For decades there was basic agreement between political parties on defence and foreign policy.'''"},
 {'role': 'assistant', 'content': 'neither'},
 {'role': 'user',
  'content': "'''We will encourage the recruitment of ethnic minorities into the police force and require action to be taken against discrimination within the force.'''"},
 {'role': 'assistant', '

### A single text example

In [19]:
text_input = data_splits['test'].text.values[0]

In [21]:
# convert to conversation history
messages = [
  # system prompt
  {"role": "system", "content": instructions},
  # exemplars
  *convert_exemplars_list_to_convo(get_n_examples(ranked_exemplars, 3)),
  # user input
  {"role": "user", "content": f"'''{text_input}'''"},
]

messages

[{'role': 'system',
  'content': '\nThis task involves reading sentences from political texts and judging whether these deal with economic or social policy.\n\nThe sentences you will be asked to interpret come from political party manifestos.\nSome of these sentences will deal with economic policy; some will deal with social policy; other sentences will deal with neither economic nor social policy. We tell you below about what we mean by "economic" and "social" policy.\n\nFirst, you will read a short section from a party manifesto.\nFor the sentence highlighted in red, enter your best judgment about whether it mainly refers to economic policy, to social policy, or to neither.\n\nIf the sentence refers to economic policy, select "economic" in the drop down menu; if it refers to social policy, select "social".\nIf the sentence does not refer to either policy area, select "Not Economic or Social" -- in this case you will move directly to the next sentence.\n\nNow we need to tell you about

In [22]:
response = client.chat.completions.create(
  model=MODEL,
  messages=messages,
  # for reproducibility
  temperature=0.001,
  seed=42,
)

In [23]:
# parse the response
response.choices[0].message.content

'economic'

In [20]:
data_splits['test'].label.values[0]

'economic'

### Iterate over multiple examples

Let's first define a custom function to classify tweets:

In [25]:
from typing import List, Dict
def classify_tweet(text, model: str, system_message: str, exemplars: List[Dict]):

  # clean the text 
  text = re.sub(r'\s+', ' ', text).strip()

  # construct input

  messages = [
    # system prompt
    {"role": "system", "content": system_message},
    # exemplars
    *convert_exemplars_list_to_convo(exemplars),
    # user input
    {"role": "user", "content": f"'''{text}'''"},
  ]

  response = client.chat.completions.create(
    model=model,
    messages=messages,
    temperature=0.001,
    seed=42
  )
  
  if len(response.choices) != 1:
      print("WARNING: Response should have one 'choice'")
      return None
  if response.choices[0].finish_reason != 'stop':
      print("WARNING: Response should have 'finish_reason' of 'stop' but got:", response.choices[0].finish_reason)
      return None

  result = response.choices[0].message.content
  
  return result

Now we can iterate over example texts:

In [26]:
from tqdm.notebook import tqdm

# with GPT 4o turbo
texts = data_splits['test'].text.to_list()
exemplars = get_n_examples(ranked_exemplars, 3)
classifications = [classify_tweet(text, model=MODEL, system_message=instructions, exemplars=exemplars) for text in tqdm(texts)]
classifications

  0%|          | 0/75 [00:00<?, ?it/s]

['economic',
 'economic',
 'economic',
 'economic',
 'social',
 'economic',
 'neither',
 'social',
 'neither',
 'social',
 'neither',
 'economic',
 'neither',
 'neither',
 'social',
 'economic',
 'economic',
 'social',
 'neither',
 'neither',
 'social',
 'social',
 'social',
 'neither',
 'social',
 'social',
 'social',
 'economic',
 'neither',
 'neither',
 'social',
 'social',
 'economic',
 'neither',
 'economic',
 'social',
 'social',
 'social',
 'neither',
 'neither',
 'neither',
 'social',
 'economic',
 'neither',
 'social',
 'social',
 'neither',
 'economic',
 'neither',
 'neither',
 'neither',
 'neither',
 'social',
 'economic',
 'neither',
 'social',
 'social',
 'social',
 'neither',
 'social',
 'neither',
 'social',
 'social',
 'economic',
 'economic',
 'social',
 'economic',
 'economic',
 'social',
 'economic',
 'neither',
 'economic',
 'social',
 'social',
 'social']

In [27]:
from sklearn.metrics import classification_report

cr = classification_report(
    y_true = data_splits['test'].label.to_list(),
    y_pred = classifications
)

print(cr)

              precision    recall  f1-score   support

    economic       1.00      0.80      0.89        25
     neither       1.00      0.96      0.98        25
      social       0.81      1.00      0.89        25

    accuracy                           0.92        75
   macro avg       0.94      0.92      0.92        75
weighted avg       0.94      0.92      0.92        75

