# Zero-shot text classification with OpenAI's GPT models

This notebook illustrates how to use different GPT models provided by OpenAI for text classification.

In [3]:
import os
from dotenv import load_dotenv
from openai import OpenAI

from utils.io import read_tabular
import re

In [2]:
load_dotenv()
client = OpenAI(api_key=os.environ.get("OPENAI_API_KEY"))

In [4]:
## Load the data
data_path = os.path.join('..', 'data', 'labeled', 'benoit_crowdsourced_2016')

fp = os.path.join(data_path, 'benoit_crowdsourced_2016-policy_area.csv')
df = read_tabular(fp, columns=['text', 'label', 'metadata__gold'])

# subset to gold examples
df = df[df.metadata__gold]
del df['metadata__gold']

id2label = {
    2: 'economic',
    3: 'social',
    1: 'neither',
}
df.label = df.label.map(id2label)

print(df.label.value_counts())

# get five examples per label class
expls = df.groupby('label').sample(20, random_state=42)

label
economic    225
neither     181
social      100
Name: count, dtype: int64


## Define the task

In this example, we adapt the instruction for one of the tweet classification tasks examined in Gilardi et al. ([2023](https://www.pnas.org/doi/10.1073/pnas.2305016120)) "ChatGPT outperforms crowd workers for text-annotation tasks"

- see [this README file](../data/labeled/gilardi_chatgpt_2023/README.md) for a description of the data and tasks covered in the paper
- see [this file](../data/labeled/gilardi_chatgpt_2023/instructions.md) for a copy of their original task instructions

In [16]:
instructions_group1 = """
Your task is to analyze sentences from political party manifestos and determine whether they primarily address economic policy, social policy, or neither.

If the sentence refers to economic policy, categorize it as "economic."
If it refers to social policy, categorize it as "social."
If the sentence does not relate to either economic or social policy, categorize it as "neither"
"""

In [21]:
categories = ["economic", "social"]

definitions = """
Sentences should be coded as "economic" if they deal with aspects of the economy, such as: Taxation, Government spending, Services provided by the government or other public bodies, Pensions, unemployment and welfare benefits, and other state benefits, Property, investment and share ownership, public or private, Interest rates and exchange rates, Regulation of economic activity, public or private, Relations between employers, workers and trade unions
Sentences should be coded as "social" if they deal with aspects of social and moral life, relationships between social groups, and matters of national and social identity. These include: Policing, crime, punishment and rehabilitation of offenders; Immigration, relations between social groups, discrimination and multiculturalism; The role of the state in regulating the social and moral behavior of individuals

"""

instructions_group2 = f"""
Act as a text classification system versatile in performing content analysis. You will read sentences from political texts and judge whether these deal with economic or social policy.
You must classify posts into single-label categories. These are the categories: {categories}. 

These are the categories' definitions: {definitions}

For each post in the sample, follow these instructions:

1. Carefully read the text of the sentence, paying close attention to details.
2. Assess whether the sentence belongs to any of the categories. If not, return 'neither' as your response.
3. Classify the sentence with the category it belongs to. Return only the name of the category.

Only include the selected category in your response and no further text.
Seperate the classifications of individual sentences by newline characters.
"""

In [12]:
texts = expls.text.to_list()
texts[:3]

['They are no longer content that some of the most important decisions in their lives what school their children attend, for example, or whether or not to go on strike should be taken by officialdom or trade union bosses.',
 'Any extra burden on business will destroy jobs.',
 'We will increase the bonus by paying a double pension in the first week of December.']

## With ChatGPT

In [7]:
MODEL = 'gpt-4o-2024-08-06' # currently the latest version of GPT-4o (on 2024-09-25)

### Iterate over multiple examples

Let's first define a custom function to classify tweets:

In [8]:
def classify_text(text, system_message, model):

  # clean the text 
  text = re.sub(r'\s+', ' ', text).strip()

  # construct input

  messages = [
    # system prompt
    {"role": "system", "content": system_message},
    # user input
    {"role": "user", "content": text},
  ]

  response = client.chat.completions.create(
    model=model,
    messages=messages,
    temperature=0.001,
    seed=42
  )
  
  result = response.choices[0].message.content
  
  return result

Now we can iterate over example texts:

In [17]:
from tqdm.notebook import tqdm
classifications_group2 = [classify_text(text, instructions, model=MODEL) for text in tqdm(texts)]

  0%|          | 0/60 [00:00<?, ?it/s]

In [14]:
from sklearn.metrics import classification_report

cr = classification_report(
    y_true=expls.label,
    y_pred=classifications_group1,
)
print(cr)

              precision    recall  f1-score   support

    economic       1.00      0.80      0.89        20
     neither       0.90      0.95      0.93        20
      social       0.78      0.90      0.84        20

    accuracy                           0.88        60
   macro avg       0.90      0.88      0.88        60
weighted avg       0.90      0.88      0.88        60



In [18]:
from sklearn.metrics import classification_report

cr = classification_report(
    y_true=expls.label,
    y_pred=classifications_group2,
)
print(cr)

              precision    recall  f1-score   support

    economic       0.95      0.90      0.92        20
     neither       1.00      0.65      0.79        20
      social       0.71      1.00      0.83        20

    accuracy                           0.85        60
   macro avg       0.89      0.85      0.85        60
weighted avg       0.89      0.85      0.85        60

