# LLM Prompts Expirements

**Author: Haohang Li**

The purpose of the notebook is to demonstate the methods and results from papers:


*   **[Rethinking the Role of Demonstrations: What makes In-context Learning Work?](https://arxiv.org/abs/2202.12837)**, Min et al., 2022













## OpenAI API Set Up

In order to run the notebook, you need to create an OpenAI API account and get your API key. See this [post](https://www.maisieai.com/help/how-to-get-an-openai-api-key-for-chatgpt) for a step-by-step instructions.

Also, please add some credits to your OpenAI account as the ChatCompletion endpoint is not free(A fews dollar will be more than enough). Please see the detailed pricing [here](https://openai.com/pricing). As rule of thumb, a word in English worths 1.3 tokens on average. A more detailed tokenizer counter is avaialble [here](https://platform.openai.com/tokenizer), or you download the tokenizer directly from its [GitHub page](https://github.com/openai/tiktoken).

## Package installation

In [1]:
!pip install openai
!pip install rich
!pip install datasets

Collecting openai
  Downloading openai-1.12.0-py3-none-any.whl.metadata (18 kB)
Collecting distro<2,>=1.7.0 (from openai)
  Downloading distro-1.9.0-py3-none-any.whl.metadata (6.8 kB)
Collecting httpx<1,>=0.23.0 (from openai)
  Downloading httpx-0.27.0-py3-none-any.whl.metadata (7.2 kB)
Collecting pydantic<3,>=1.9.0 (from openai)
  Downloading pydantic-2.6.1-py3-none-any.whl.metadata (83 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m83.5/83.5 kB[0m [31m3.4 MB/s[0m eta [36m0:00:00[0m
Collecting typing-extensions<5,>=4.7 (from openai)
  Downloading typing_extensions-4.9.0-py3-none-any.whl.metadata (3.0 kB)
Collecting httpcore==1.* (from httpx<1,>=0.23.0->openai)
  Downloading httpcore-1.0.4-py3-none-any.whl.metadata (20 kB)
Collecting h11<0.15,>=0.13 (from httpcore==1.*->httpx<1,>=0.23.0->openai)
  Downloading h11-0.14.0-py3-none-any.whl.metadata (8.2 kB)
Collecting annotated-types>=0.4.0 (from pydantic<3,>=1.9.0->openai)
  Downloading annotated_types-0.6.0-py3

Collecting tzdata>=2022.7 (from pandas->datasets)
  Downloading tzdata-2024.1-py2.py3-none-any.whl.metadata (1.4 kB)
Downloading datasets-2.17.1-py3-none-any.whl (536 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m536.7/536.7 kB[0m [31m8.8 MB/s[0m eta [36m0:00:00[0ma [36m0:00:01[0m
[?25hDownloading dill-0.3.8-py3-none-any.whl (116 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m116.3/116.3 kB[0m [31m2.4 MB/s[0m eta [36m0:00:00[0ma [36m0:00:01[0m
[?25hDownloading fsspec-2023.10.0-py3-none-any.whl (166 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m166.4/166.4 kB[0m [31m3.2 MB/s[0m eta [36m0:00:00[0ma [36m0:00:01[0m
[?25hDownloading aiohttp-3.9.3-cp311-cp311-macosx_11_0_arm64.whl (387 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m387.7/387.7 kB[0m [31m5.7 MB/s[0m eta [36m0:00:00[0ma [36m0:00:01[0m
[?25hDownloading huggingface_hub-0.20.3-py3-none-any.whl (330 kB)
[2K   [90m━━━━━━━

In [2]:
import time
import random
import matplotlib.pyplot as plt
from openai import OpenAI
from rich import print
from typing import List, Tuple, Dict
from datasets import load_dataset
from sklearn.metrics import accuracy_score
from tqdm.auto import tqdm
import numpy as np
import pandas as pd
from sklearn.utils import shuffle

from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

# Experiment Setup


We'll use GPT to detect sentiment of a statement. We send a sentence along with `N` demonstrations (i.e., (sentence, label) pairs). See the slides for the details

## Financial Sentiment Dataset

In [3]:
# download dataset
dataset = load_dataset("financial_phrasebank", "sentences_allagree")["train"]
len(dataset)


# Show labels
label_mapping = dataset.features["label"].names
label_mapping

# Show one example
dataset["sentence"][0]
dataset["label"][0]

2264

['negative', 'neutral', 'positive']

'According to Gran , the company has no plans to move all production to Russia , although that is where the company is growing .'

1

## Format input message


Each input message contains:
    - Sentence which you ask GPT to determine the sentiment
    - `N` pairs of demonstrations of the format `(sentence, label)`

In [5]:
# number of demonstrations for each query
num_Demo = 16

# number queries
num_query = 10

# random seed
random_seed = 233

# In total, we'll need 16 + 10 samples. Randomly select a subset
selected_samples = dataset.shuffle(seed=random_seed)[:(num_Demo + num_query)]

# Get sentence part
sentences = selected_samples['sentence']

# Get label part. We need text label
labels = np.array(label_mapping)[selected_samples['label']]

# Combine (sentence, label) pair
queries = list(zip(sentences[0:num_query], labels[0:num_query]))
demos = list(zip(sentences[num_query:], labels[num_query:]))

queries
demos

[("The manufacture of CPPs will be undertaken at the existing Export Oriented Unit EOU at Wartsila 's factory at Khopoli , near Mumbai .",
  'neutral'),
 ("The aim is an annual improvement in Ruukki Construction 's operating profit of more than EUR 3 million USD 4.1 m starting in 2009 .",
  'positive'),
 ('The power supplies , DC power systems and inverters designed and manufactured by Efore , and systems incorporating them are used in many different applications .',
  'neutral'),
 ("The group 's operating loss was EUR 0.8 mn , down from a profit of EUR 2.5 mn in 2004 .",
  'negative'),
 ('However , the proportion of the paid standing orders grew in 2009 .',
  'positive'),
 ('Under the deal , Know IT will pay SEK90m ( USD12 .8 m-EUR8 .6 m ) in cash and stock .',
  'neutral'),
 ('Peigs www.peigs.se will become part of Sardus Latta Maltider Light Meals unit .',
  'neutral'),
 ('Its annual capacity is some 10,000 MW .', 'neutral'),
 ('Operating profit rose to EUR 5mn from EUR 2.8 mn in th

[("The fixed-term contract of Mr. Jarmo Ukonaho , the current General Manager of Incap 's Indian operations , will finish by the end of the year .",
  'neutral'),
 ('Net sales grew in the period to  x20ac 402 million $ 585US million from  x20ac 401 million in 2006 .',
  'positive'),
 ("Arvo Vuorenmaa , the Loviisa plant 's general manager said the application for the new licence was a `` standard '' procedure and that he was `` quite confident '' about approval being granted .",
  'positive'),
 ('- Cash flow from operating activities before investments was EUR 0.8 -1.2 million .',
  'neutral'),
 ('U.S.-based T Corp. is in talks with Scandinavian telecoms company TeliaSonera to sell its stake in Uzbek cellular operator Coscom , an executive at Coscom told Interfax .',
  'neutral'),
 ('( ADP News ) - Feb 11 , 2009 - Finnish management software solutions provider Ixonos Oyj ( HEL : XNS1V ) said today its net profit rose to EUR 3.5 million ( USD 4.5 m ) for 2008 from EUR 3.1 million for 20

## Endpoint Wrapper

In [6]:
OPENAI_API_KEY = "Your Key here"              # replace this by your own API key

In [7]:
class OpenAIClassifier:
    
    def __init__(self, api_key: str, 
                 model:str="gpt-3.5-turbo-1106", 
                 temperature:float=0.2):
        
        self.client = OpenAI(api_key=api_key)
        self.model = model
        self.temperature = temperature  # very important parameter govern the randomness/creativeness of generated text
        # see details: https://community.openai.com/t/cheat-sheet-mastering-temperature-and-top-p-in-chatgpt-api-a-few-tips-and-tricks-on-controlling-the-creativity-deterministic-output-of-prompt-responses/172683

    def __call__(self, demo_pairs: List[Tuple[str, str]],
                 query: str,
                 prompt_template: str='{example}", "The sentiment is: {label}',  # see: https://github.com/Alrope123/rethinking-demonstrations/blob/main/templates.py
                 label_gudience_text: str='Please help me classify my input sentence based on the following examples. I will give you a few examples, the format is {example}", "The sentiment is: {label}',
                 system_message:str="You are a helpful assistant.") -> str:

        messages = [
            {"role": "system", "content": system_message},
            {"role": "user", "content": label_gudience_text}
        ]
        # add examples
        for sentence, label in demo_pairs:
            
            messages.append({"role": "user", "content": prompt_template.format(example=sentence, label=label)})
        
        # add the input to be classified
        messages.append({"role": "user", "content": f"What is the sentment of {query} based on the above examples. You need choose from {label_mapping}. If the information is not enough to decide, you can label it as neutral."})
        
        # for debugging
        #print(messages)
        
        # completion request
        ret = self.client.chat.completions.create(messages=messages, model=self.model, temperature=self.temperature)

        return ret.choices[0].message.content

In [8]:
def parse_output(output: str) -> str:
    ouput = output.lower()

    if "neutral" in output:
        return "neutral"
    elif "positive" in output:
        return "positive"
    elif "negative" in output:
        return "negative"
    else:
        raise ValueError(f"{output} fail to process")

In [10]:
# Initialize the classifier

gpt_35_classifier = OpenAIClassifier(api_key=OPENAI_API_KEY, model="gpt-3.5-turbo-1106")  # gpt 3.5 turbo
gpt_4_classifier = OpenAIClassifier(api_key=OPENAI_API_KEY, model="gpt-4-1106-preview")  # gpt 4 turbo

# Experiments

## A Single Query

In [11]:
# Take the first query
query = queries[0]     # sentence only

output = gpt_35_classifier(demo_pairs=demos, query=query)
output

'Based on the given examples, the sentiment of the sentence "The manufacture of CPPs will be undertaken at the existing Export Oriented Unit EOU at Wartsila\'s factory at Khopoli, near Mumbai." would be classified as neutral.'

In [12]:
# Take the first query
query = queries[0]

output = gpt_4_classifier(demo_pairs=demos, query=query)
output

'The sentiment of the sentence "The manufacture of CPPs will be undertaken at the existing Export Oriented Unit EOU at Wartsila \'s factory at Khopoli , near Mumbai." is: neutral'

## Put Everything toget

In [13]:
gpt_35_outputs = []
gpt_4_outputs = []

for pair in tqdm(queries):
    
    query = pair[0] # we only need sentences
    
    output = gpt_35_classifier(demo_pairs=demos, query=query)
    gpt_35_outputs.append(parse_output(output))
    
    output = gpt_4_classifier(demo_pairs=demos, query=query)
    gpt_4_outputs.append(parse_output(output))
    

  0%|          | 0/10 [00:00<?, ?it/s]

In [14]:
# Show accuracy

true_labels = zip(*queries)
result = pd.DataFrame(queries, columns = ['sentence','label'])
result['gpt_35_output'] = gpt_35_outputs
result['gpt_4_output'] = gpt_4_outputs

result

gpt_35_acc = (result['gpt_35_output'] == result['label']).mean()
gpt_35_acc

gpt_4_acc = (result['gpt_4_output'] == result['label']).mean()
gpt_4_acc

Unnamed: 0,sentence,label,gpt_35_output,gpt_4_output
0,The manufacture of CPPs will be undertaken at ...,neutral,neutral,neutral
1,The aim is an annual improvement in Ruukki Con...,positive,positive,positive
2,"The power supplies , DC power systems and inve...",neutral,neutral,neutral
3,"The group 's operating loss was EUR 0.8 mn , d...",negative,negative,negative
4,"However , the proportion of the paid standing ...",positive,neutral,positive
5,"Under the deal , Know IT will pay SEK90m ( USD...",neutral,neutral,positive
6,Peigs www.peigs.se will become part of Sardus ...,neutral,neutral,neutral
7,"Its annual capacity is some 10,000 MW .",neutral,neutral,neutral
8,Operating profit rose to EUR 5mn from EUR 2.8 ...,positive,positive,positive
9,Efore 's results for the last quarter showed a...,positive,positive,positive


0.9

0.9

## Random Labels


- Does GPT try to learn the mapping between sentences and labels in the demonstrations?
- To find out, let's use random labels

In [15]:
random_labels = shuffle(labels[num_query:].copy())
print(f"original labels: {labels[num_query:]}, \nshuffled labels: {random_labels}")

demos = list(zip(sentences[num_query:], random_labels))

In [16]:
gpt_35_outputs = []
gpt_4_outputs = []

for pair in tqdm(queries):
    
    query = pair[0] # we only need sentences
    
    output = gpt_35_classifier(demo_pairs=demos, query=query)
    gpt_35_outputs.append(parse_output(output))
    
    output = gpt_4_classifier(demo_pairs=demos, query=query)
    gpt_4_outputs.append(parse_output(output))
    

  0%|          | 0/10 [00:00<?, ?it/s]

In [17]:
true_labels = zip(*queries)
result = pd.DataFrame(queries, columns = ['sentence','label'])
result['gpt_35_output'] = gpt_35_outputs
result['gpt_4_output'] = gpt_4_outputs

result

gpt_35_acc = (result['gpt_35_output'] == result['label']).mean()
gpt_35_acc

gpt_4_acc = (result['gpt_4_output'] == result['label']).mean()
gpt_4_acc


Unnamed: 0,sentence,label,gpt_35_output,gpt_4_output
0,The manufacture of CPPs will be undertaken at ...,neutral,neutral,neutral
1,The aim is an annual improvement in Ruukki Con...,positive,positive,positive
2,"The power supplies , DC power systems and inve...",neutral,neutral,neutral
3,"The group 's operating loss was EUR 0.8 mn , d...",negative,negative,negative
4,"However , the proportion of the paid standing ...",positive,neutral,neutral
5,"Under the deal , Know IT will pay SEK90m ( USD...",neutral,neutral,positive
6,Peigs www.peigs.se will become part of Sardus ...,neutral,neutral,neutral
7,"Its annual capacity is some 10,000 MW .",neutral,neutral,neutral
8,Operating profit rose to EUR 5mn from EUR 2.8 ...,positive,positive,positive
9,Efore 's results for the last quarter showed a...,positive,positive,positive


0.9

0.8