# Intro

In this tutorial we will be exploring in-context learning for LLM using OpenAI's and Cohere's API.

For illustration, we will look into the task of Natural Language Inference (NLI). As dataset, we will use the multi-genre dataset [MNLI](https://gluebenchmark.com/) in the GLUE benchmark, available in [huggingface](https://huggingface.co/datasets/glue).



# Preparing

In [7]:
!pip install datasets
!pip install cohere
!pip install openai
!pip install numpy seaborn pandas sklearn


Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting sklearn
  Downloading sklearn-0.0.post1.tar.gz (3.6 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Building wheels for collected packages: sklearn
  Building wheel for sklearn (setup.py) ... [?25l[?25hdone
  Created wheel for sklearn: filename=sklearn-0.0.post1-py3-none-any.whl size=2344 sha256=b0def3e81737c8aaeb824817a87d2db51bd5cffced4d21d681a2681c0004a9cf
  Stored in directory: /root/.cache/pip/wheels/14/25/f7/1cc0956978ae479e75140219088deb7a36f60459df242b1a72
Successfully built sklearn
Installing collected packages: sklearn
Successfully installed sklearn-0.0.post1


In [5]:
import numpy as np
from datasets import load_dataset
import cohere
from cohere import CohereError
import openai

import pdb

### Load the dataset

In [10]:
dataset = load_dataset("glue", 'mnli')
train = dataset['train']
dev = dataset['validation_matched']



  0%|          | 0/5 [00:00<?, ?it/s]

### Check some examples

In [11]:
# entailment (0), neutral (1), contradiction (2).
LABEL_NAMES = {
    0: "entailment",
    1: "neutral",
    2: "contradiction"
}

for idx in [0,1,9]:
  # print(train[idx])
  print("[premise]",train[idx]["premise"])
  print("[hypothesis]",train[idx]["hypothesis"])
  print("[label]",LABEL_NAMES[train[idx]["label"]])
  print()

[premise] Conceptually cream skimming has two basic dimensions - product and geography.
[hypothesis] Product and geography are what make cream skimming work. 
[label] neutral

[premise] you know during the season and i guess at at your level uh you lose them to the next level if if they decide to recall the the parent team the Braves decide to call to recall a guy from triple A then a double A guy goes up to replace him and a single A guy goes up to replace him
[hypothesis] You lose the things to the following level if the people recall.
[label] entailment

[premise] At the end of Rue des Francs-Bourgeois is what many consider to be the city's most handsome residential square, the Place des Vosges, with its stone and red brick facades.
[hypothesis] Place des Vosges is constructed entirely of gray marble.
[label] contradiction



# Using Large Language Models (LLMs)

### OpenAI's GPT-3

In order to use the API you should create an account [here](https://openai.com/api/). Create an API key and paste it below.
We will be using 'text-davince-003', and you can play with directly [here](https://https://platform.openai.com/playground/p/default-summarize?model=text-davinci-003)



In [12]:
openai_apikey=""
openai.api_key = openai_apikey

In [13]:
prompt = "In a hole in the ground there lived a hobbit. Not a nasty, dirty, wet hole, filled with the ends of worms and an oozy smell"

openai.Completion.create(
  model="text-davinci-003",
  prompt=prompt,
  temperature=0.7,
  max_tokens=64,
  top_p=1.0,
  frequency_penalty=0.0,
  presence_penalty=0.0
)


<OpenAIObject text_completion id=cmpl-6oXaM85uGPeBe7ozkXKwM9QloIC89 at 0x7f756d1164a0> JSON: {
  "choices": [
    {
      "finish_reason": "length",
      "index": 0,
      "logprobs": null,
      "text": ", nor yet a dry, bare, sandy hole with nothing in it to sit down on or to eat: it was a hobbit-hole, and that means comfort.\n\nIt had a perfectly round door like a porthole, painted green, with a shiny yellow brass knob in the exact middle. The"
    }
  ],
  "created": 1677503522,
  "id": "cmpl-6oXaM85uGPeBe7ozkXKwM9QloIC89",
  "model": "text-davinci-003",
  "object": "text_completion",
  "usage": {
    "completion_tokens": 64,
    "prompt_tokens": 32,
    "total_tokens": 96
  }
}

### Cohere

Sign-up to cohere.com using this [link](https://dashboard.cohere.ai/welcome/register?utm_source=cohere-owned&utm_medium=event&utm_campaign=matthias-course).
Retrieve your API key and copy this to the variable below.

Find more documentation on the parameters [here](https://docs.cohere.ai/reference/generate).


In [14]:

co_apikey=""
co = cohere.Client(co_apikey)


In [15]:
prompt = "In a hole in the ground there lived a hobbit. Not a nasty, dirty, wet hole, filled with the ends of worms and an oozy smell"
co.generate(prompt,
            model="xlarge",
            max_tokens=64,
            temperature=0.7,
            num_generations=1)

cohere.Generations {
	id: None
	generations: [cohere.Generation {
	id: 654a88af-4f01-4e53-b814-c2a1a903692a
	text: , nor yet a dry, bare, sandy hole with nothing in it to sit down on or to eat: it was a hobbit-hole, and that means comfort."

And so begins one of the most beloved stories in history. Now a major motion picture from director Peter Jackson, Academy Award®-
	likelihood: None
	token_likelihoods: None
}]
	return_likelihoods: None
}

## In-context learning

In [57]:
def normalize_predictions(preds):
  res = []
  for pred in preds:
    pred = pred.lower()
    if "false" in pred or "incorrect" in pred: res.append(2)
    elif "true" in pred or "correct" in pred: res.append(0)
    else:              res.append(1)
  return res

def evaluate_accuracy(gold_labels,predicted_labels):
  nn = len(gold_labels)
  if nn==0: return 0.0
  return sum(x==y for x,y in zip(gold_labels,predicted_labels)) / nn

In [59]:
idxs = np.arange(len(dev),dtype=int)
np.random.shuffle(idxs)

num_samples = 20
dev_sample = [dev[int(i)] for i in idxs[:num_samples]]


task_description = "Determine if a hypothesis sentence is True, False, or Undetermined, given a premise sentence."

### Zero-shot ICL

In [None]:
# Cohere API
predictions = []

for item in dev_sample:
  _input = f"task: {task_description}\npremise: {item['premise']}\nhypothesis: {item['hypothesis']}\nanswer:"
  output = co.generate(_input,
                model="xlarge",
                max_tokens=10,
                temperature=1.0,
                num_generations=1)[0]
  resp = output.text.strip(" ")
  predictions.append(resp)
#

In [60]:
# OpenAI API
predictions = []

for i in range(num_samples):
  _input = f"task: {task_description}\npremise: {train[i]['premise']}\nhypothesis: {train[i]['hypothesis']}\nanswer:"
  output = openai.Completion.create(
              model="text-davinci-003",
              prompt=_input,
              temperature=1.0,
              max_tokens=10,
              top_p=1.0,
              frequency_penalty=0.0,
              presence_penalty=0.0
            )
  
  print(output["choices"][0]["text"])
  resp = output["choices"][0]["text"].strip(" ")
  predictions.append(resp)
  

 False
 False
 True
 False
 True
 False
 False
 False
 True
 True
 Undetermined
 True
 True
 False
 False
 False
 False
 Undetermined
 False
 True


In [61]:
# Evaluate

gold = [item["label"] for item in dev_sample]
acc = evaluate_accuracy(gold,normalize_predictions(predictions))
print("[Accuracy]",acc)

[Accuracy] 0.55


In [62]:
print(gold)
print(normalize_predictions(predictions))

[2, 0, 0, 2, 2, 2, 0, 0, 0, 0, 0, 0, 1, 1, 1, 2, 2, 0, 2, 0]
[2, 2, 0, 2, 0, 2, 2, 2, 0, 0, 1, 0, 0, 2, 2, 2, 2, 1, 2, 0]


### Few-shot in-context learning

Few-shot in-context learning consists in providing some examples, structured in a certain way ("prompt engineering") and leverage the fact that LLM are good in pattern-matching.

In [64]:
idxs = np.arange(len(train),dtype=int)
np.random.shuffle(idxs)
train = [train[int(i)] for i in idxs]

train_sample = []
for lab in LABEL_NAMES.keys():
  for item in train:
    if item["label"] == lab:
      train_sample.append(item)
      break

exemplars = "\n".join([f"premise: {titem['premise']}\nhypothesis: {titem['hypothesis']}\nanswer: {label_idx_to_token[titem['label']]}" for titem in train_sample])
prompt = f"task: {task_description}\n{exemplars}"

In [66]:
print(prompt)

task: Determine if a hypothesis sentence is True, False, or Undetermined, given a premise sentence.
premise: The DSPB will be asked to provide any additional written information it wishes to be considered to assist the LSC President in fully and fairly entertaining all concerns and objections.
hypothesis: The LSC President may receive assistance from the DSPB.
answer: True
premise: His interest was absorbed by the adults.
hypothesis: The adults absorbed all interest he had in the ship.
answer: Undetermined
premise: yes i i felt that it certainly was i mean i was smarter than most of the people that i was working for and uh you know every time something new came up i was explaining it to them and uh i had
hypothesis: I am not very smart.
answer: False


In [68]:
label_idx_to_token = {
    2: "False",
    0: "True",
    1: "Undetermined",
}

predictions = []

for item in dev_sample:
  _input = f"{prompt}\npremise: {item['premise']}\nhypothesis: {item['hypothesis']}\nanswer:"
  
  output = openai.Completion.create(
              model="text-davinci-003",
              prompt=_input,
              temperature=1.0,
              max_tokens=10,
              top_p=1.0,
              frequency_penalty=0.0,
              presence_penalty=0.0
            )
  print(output["choices"][0]["text"])
  resp = output["choices"][0]["text"].strip(" ")
  predictions.append(resp)
  

 False
 True
 True
 False
 False
 Undetermined
 True
 True
 True
 True
 True
 True
 False
 True
 False
 False
 False
 Undetermined


RateLimitError: ignored

In [69]:
# Evaluate

gold = [item["label"] for item in dev_sample]
acc = evaluate_accuracy(gold,normalize_predictions(predictions))
print("[Accuracy]",acc)

[Accuracy] 0.65
