# Few-shot text classification with Llama 3.1 8B and ollama

For **running on _Google Colab_**, see notebook [llm_zeroshot_classification_ollama.ipynb](./llm_zeroshot_classification_ollama.ipynb).

In [1]:
import os
from ollama import Client

import re

In [2]:
# create the ollama client that communicates with the `ollama` server running in the background
client = Client()
MODEL = 'llama3.1:8b' # currently the latest version of GPT-4o

In [3]:
get_available_models = lambda : [m['model'] for m in client.list()['models']]

if MODEL not in get_available_models():
  print(f"Model not found. Running `client.pull('{MODEL}')` to download it.")
  client.pull(MODEL)

# confirm
assert MODEL in get_available_models()

In [4]:
data_path = os.path.join('..', 'data', 'labeled', 'benoit_crowdsourced_2016', '')

In [5]:
SEED=42

## Define the task

In this example, we adapt the instruction for the economic/social/neither policy area classification task in Benoit et al. (2016)

- see [this README file](../data/labeled/benoit_crowdsourced_2016/README.md) for a description of the data and tasks covered in the paper
- see [this file](../data/labeled/benoit_crowdsourced_2016/instructions/econ_social_policy.md) for a copy of their original task instructions

In [6]:
instructions = """
This task involves reading sentences from political texts and judging whether these deal with economic or social policy.

The sentences you will be asked to interpret come from political party manifestos.
Some of these sentences will deal with economic policy; some will deal with social policy; other sentences will deal with neither economic nor social policy. We tell you below about what we mean by "economic" and "social" policy.

First, you will read a short section from a party manifesto.
For the sentence highlighted in red, enter your best judgment about whether it mainly refers to economic policy, to social policy, or to neither.

If the sentence refers to economic policy, select "economic" in the drop down menu; if it refers to social policy, select "social".
If the sentence does not refer to either policy area, select "Not Economic or Social" -- in this case you will move directly to the next sentence.

Now we need to tell you about what we mean by "economic" and "social" policy.

## What is "economic" policy?

**"Economic" policies** deal with all aspects of the economy, including:

- Taxation
- Government spending
- Services provided by the government or other public bodies
- Pensions, unemployment and welfare benefits, and other state benefits
- Property, investment and share ownership, public or private
- Interest rates and exchange rates
- Regulation of economic activity, public or private
- Relations between employers, workers and trade unions

## What is "social" policy?

**"Social" policies** deal with aspects of social and moral life, relationships between social groups, and matters of national and social identity, including:

- Policing, crime, punishment and rehabilitation of offenders;
- Immigration, relations between social groups, discrimination and multiculturalism;
- The role of the state in regulating the social and moral behavior of individuals

## Your task

Classify the input text in one of the following categories: economic, social, neither

## Response format

Only respond with the chosen category and no additional text or explanations 
"""

### Load the data

In [7]:
from utils.io import read_tabular
from utils.finetuning import split_data

fp = data_path+'benoit_crowdsourced_2016-policy_area.csv'
df = read_tabular(fp, columns=['text', 'label'])

id2label = {
    2: "economic",
    3: "social",
    1: "neither"
}

df.label = df.label.map(id2label)

# subset to 100 examples per label class 
df = df.groupby('label').sample(n=100, random_state=SEED)

In [8]:
df.label.value_counts()

label
economic    100
neither     100
social      100
Name: count, dtype: int64

In [9]:
# split into "training" and "test" data
#  - training is used for examples
#  - test is used for few-shot classification  
# note: if you had different example selection 
#  strategies of variations of a prompt, you could use a dev for comparing them
data_splits = split_data(df, test_size=0.5, dev_size=None, stratify_by='label', return_dict=True)
data_splits.keys()

dict_keys(['train', 'test'])

## sample examples

There is various different ways of sampling few-shot exemplars

Here, we take the **most representative exemplars** by 

1. embedding exemplars
2. computing the centroid for each label class
3. and ranking exemplars in terms of their closeness to their class centroid

In [10]:
client.embed?

[0;31mSignature:[0m
[0mclient[0m[0;34m.[0m[0membed[0m[0;34m([0m[0;34m[0m
[0;34m[0m    [0mmodel[0m[0;34m:[0m [0mstr[0m [0;34m=[0m [0;34m''[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0minput[0m[0;34m:[0m [0mUnion[0m[0;34m[[0m[0mstr[0m[0;34m,[0m [0mSequence[0m[0;34m[[0m[0;34m~[0m[0mAnyStr[0m[0;34m][0m[0;34m][0m [0;34m=[0m [0;34m''[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mtruncate[0m[0;34m:[0m [0mbool[0m [0;34m=[0m [0;32mTrue[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0moptions[0m[0;34m:[0m [0mOptional[0m[0;34m[[0m[0mollama[0m[0;34m.[0m[0m_types[0m[0;34m.[0m[0mOptions[0m[0;34m][0m [0;34m=[0m [0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mkeep_alive[0m[0;34m:[0m [0mUnion[0m[0;34m[[0m[0mfloat[0m[0;34m,[0m [0mstr[0m[0;34m,[0m [0mNoneType[0m[0;34m][0m [0;34m=[0m [0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m[0;34m)[0m [0;34m->[0m [0mMapping[0m[0;34m[[0m[0mstr[0m[0

In [13]:
import numpy as np
from typing import List
def embed(texts: List[str]):
    # see https://platform.openai.com/docs/guides/embeddings/what-are-embeddings?lang=python
    texts = [text.replace("\n", " ") for text in texts] 
    res = client.embed(
        input=texts,
        model=MODEL
        # alternatively, take 'mxbai-embed-large' for faster embedding
    )
    return np.array(res['embeddings'])

In [14]:
from tqdm.notebook import tqdm
from sklearn.metrics.pairwise import cosine_similarity

ranked_exemplars = {}
for label_class, df in tqdm(data_splits['train'].groupby('label')):
    # get embeddings from OpenAI model
    embeddings = embed(df.text.to_list())
    # compute centroid
    centroid = embeddings.mean(axis=0)
    # compute cosine similarity between the centroid and the embeddings
    dists = cosine_similarity([centroid], embeddings)
    # rank the examples by similarity, descending
    ranked_indices = np.argsort(dists[0])[::-1]
    # add to output
    ranked_exemplars[label_class]= df.iloc[ranked_indices, 0]

  0%|          | 0/3 [00:00<?, ?it/s]

In [15]:
import random
def get_n_examples(ranked_exemplars, k, shuffle=True, random_state=SEED):
    exs = []
    for label_class, exemplars in ranked_exemplars.items():
        ex = exemplars[:k].to_list()
        ex = [{'text': text, 'label': label_class} for text in ex]
        exs.extend(ex)
    if shuffle:
        random.Random(random_state).shuffle(exs)
    return exs

In [16]:
get_n_examples(ranked_exemplars, 3)

[{'text': 'Liberal Democrats are determined that Britain should lead this reform.',
  'label': 'neither'},
 {'text': 'We support police initiatives to target the hard core of persistent criminals.',
  'label': 'social'},
 {'text': 'Labour will be a friend of those denied human rights and a supporter of steps to strengthen them.',
  'label': 'social'},
 {'text': 'We will focus more on the poorest, paying particular attention to development within the Commonwealth.',
  'label': 'neither'},
 {'text': 'Racial discrimination is an injustice and can have no place in a tolerant and civilised society.',
  'label': 'social'},
 {'text': 'The programme which can be quickly and easily established will allow us to start bringing down unemployment immediately.',
  'label': 'economic'},
 {'text': 'We will bring in much tighter labelling requirements for all foods, and make funding available for food research and scientific establishments.',
  'label': 'neither'},
 {'text': 'It is in sharp contrast to

In [17]:
def convert_exemplars_list_to_convo(exemplars):
    convo = []
    for ex in exemplars:
        convo.append({"role": "user", "content": f"'''{ex['text']}'''"},)
        convo.append({"role": "assistant", "content": ex['label']},)
    return convo

In [18]:
convert_exemplars_list_to_convo(get_n_examples(ranked_exemplars, 3))[:6]

[{'role': 'user',
  'content': "'''Liberal Democrats are determined that Britain should lead this reform.'''"},
 {'role': 'assistant', 'content': 'neither'},
 {'role': 'user',
  'content': "'''We support police initiatives to target the hard core of persistent criminals.'''"},
 {'role': 'assistant', 'content': 'social'},
 {'role': 'user',
  'content': "'''Labour will be a friend of those denied human rights and a supporter of steps to strengthen them.'''"},
 {'role': 'assistant', 'content': 'social'}]

### A single text example

In [19]:
text_input = data_splits['test'].text.values[0]

In [20]:
# convert to conversation history
messages = [
  # system prompt
  {"role": "system", "content": instructions},
  # exemplars
  *convert_exemplars_list_to_convo(get_n_examples(ranked_exemplars, 3))[:6],
  # user input
  {"role": "user", "content": f"'''{text_input}'''"},
]

In [21]:
options = options = {'seed': 42, 'temperature': 0.0, 'max_tokens': 3}
response = client.chat(
  model=MODEL,
  messages=messages,
  options=options
)

In [33]:
# parse the response
response['message']

{'role': 'assistant', 'content': 'neither'}

In [23]:
data_splits['test'].label.values[0]

'neither'

### Iterate over multiple examples

Let's first define a custom function to classify tweets:

In [37]:
from typing import List, Dict
def classify_tweet(text, model: str, system_message: str, exemplars: List[Dict]):

  # clean the text 
  text = re.sub(r'\s+', ' ', text).strip()

  # construct input

  messages = [
    # system prompt
    {"role": "system", "content": system_message},
    # exemplars
    *convert_exemplars_list_to_convo(exemplars),
    # user input
    {"role": "user", "content": f"'''{text}'''"},
  ]

  options = options = {'seed': 42, 'temperature': 0.0, 'max_tokens': 3}
  response = client.chat(
    model=MODEL,
    messages=messages,
    options=options
  )
  
  if 'message' not in response:
      print("WARNING: Response should have one 'meassage'")
      return None
  if not response['done'] or response['done_reason'] != 'stop':
      print("WARNING: Response should have 'done_reason' of 'stop' but got:", response['done_reason'])
      return None

  return response['message']['content']

Now we can iterate over example texts:

In [40]:
from tqdm.notebook import tqdm

texts = data_splits['test'].text.to_list()
exemplars = get_n_examples(ranked_exemplars, 3)
classifications = [classify_tweet(text, model=MODEL, system_message=instructions, exemplars=exemplars) for text in tqdm(texts)]

  0%|          | 0/150 [00:00<?, ?it/s]

In [41]:
from sklearn.metrics import classification_report

cr = classification_report(
    y_true = data_splits['test'].label.to_list(),
    y_pred = classifications
)

print(cr)

              precision    recall  f1-score   support

    economic       0.89      0.62      0.73        50
     neither       0.67      0.82      0.74        50
      social       0.70      0.76      0.73        50

    accuracy                           0.73       150
   macro avg       0.75      0.73      0.73       150
weighted avg       0.75      0.73      0.73       150

