# Introduction to In-context learning
1. **🤖 What is in-context learning (ICL)**
2. **🎨 Prompt design**
3. **🦮 Zero-shot vs. Few-shot ICL**
4. **✋ Hands-on: Transforming a dataset into a few-shot prompt-label dataset and evaluating existing models**


In [None]:
import sys
!{sys.executable} -m pip install git+https://github.com/fewshot-goes-multilingual/promptsource transformers[sentencepiece]==4.19.1

# 1. 🤖 In-context learning (ICL)

In context learning is a behavior a generative model shows, where it is able to perform never before seen tasks with only its description as a part of the input. 

This behavior is mainly exhibited by **Large Language models**. The cause of why exactly it occurs is still unknown, but it may have to do with the latent concepts the LM has acquired from pretraining on large amount of data.

Learning is not meant as training, instead it means "understading" the task solely from the user's input, aka a prompt.



In [11]:
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
 
model_path = "gaussalgo/mt5-base-priming-QA_en-cs"
tokenizer = AutoTokenizer.from_pretrained(model_path)
model = AutoModelForSeq2SeqLM.from_pretrained(model_path)


In [18]:
long_text = """
I rented I AM CURIOUS-YELLOW from my video store because of all the controversy that surrounded it when it was first released in 1967. 
I also heard that at first it was seized by U.S. customs if it ever tried to enter this country, therefore being a fan of films considered "controversial"
I really had to see this for myself. The plot is centered around a young Swedish drama student named Lena who wants to learn everything she can about life.
In particular she wants to focus her attentions to making some sort of documentary on what the
average Swede thought about certain political issues such as the Vietnam War and race issues in the United States.
"""

prompt = "What is meant by: {}".format(long_text) # We could be asking about the sentiment of the sentence, or meaning...the instruction is unclear

inputs = tokenizer([prompt], return_tensors="pt", padding=True)
outputs = model.generate(**inputs.to(model.device))
outputs_str = tokenizer.batch_decode(outputs, skip_special_tokens=True)

# predictions:
outputs_str

['The plot is centered around a young Swedish drama student named Lena']

## 2. 🎨 What should the prompts look like?
For training a custom in-context learner we need text pairs of a prompt and label. In the above example we see how difficult it can be to create a  prompt. There is art in creating a prompt, that works best with a given model. Below we will present the [promptsource](https://github.com/bigscience-workshop/promptsource) library. Which contains over 2000 prompts for use with 180 different EN datasets.

In [19]:
from datasets import load_dataset
from promptsource.templates import DatasetTemplates

dataset = load_dataset('super_glue', 'boolq', split="validation[:10%]")

prompts = DatasetTemplates("super_glue/boolq")
print(prompts.all_template_names) # Here you can see all available prompts for the given dataset
prompt = prompts['after_reading']


Found cached dataset super_glue (/home/nikola/.cache/huggingface/datasets/super_glue/boolq/1.0.3/bb9675f958ebfee0d5d6dc5476fafe38c79123727a7258d515c450873dbdbbed)


['GPT-3 Style', 'I wonder…', 'after_reading', 'based on the following passage', 'based on the previous passage', 'could you tell me…', 'exam', 'exercise', 'valid_binary', 'yes_no_question']


### 2.1 Evaluation
Let's evaluate our model on a dataset created using the promptsource library and a dataset about if the answer to a question is in the context. (The model was not trained on this dataset)

In [None]:
from tqdm import tqdm

predictions = []
references = [x==1 for x in dataset["label"]]

# Get predictions
for item in tqdm(dataset):
    model_input_string = prompt.apply(item)
    inputs = tokenizer(model_input_string,padding=True, truncation=True, return_tensors="pt")
    outputs = model.generate(**inputs.to(model.device))
    response_text = tokenizer.decode(outputs[0], skip_special_tokens=True).strip()
    predictions.append(response_text)


In [22]:
# Accuracy
correct_predictions = sum([pred == str(true) for pred, true in zip(predictions, references)])
incorrect_predictions = sum([pred != str(true) for pred, true in zip(predictions, references)])

accuracy = correct_predictions / (correct_predictions+incorrect_predictions)
print("Prediction using '%s' classifier; accuracy: %s" % (model.config.model_type, accuracy))  

Prediction using 'mt5' classifier; accuracy: 0.23547400611620795


## 3. 🦮 Zero-shot vs few-shot in-context learning
The prompts we talked about above we all "zero-shot" prompts, which means the model had tu learn the task from the prompted instruction and text without any demonstrations on how the expected output should look like. 

"Few-shot" prompting is when we show multiple demonstrations to the model as a part of the input prompt. These input-output examples can considerably up the models performance on never before seen tasks. The demonstration provide a lead on what is the task at hand.


In [21]:
input_zero_shot = """
Question: What is the sentiment of the context: positive or negative? 
Context: I am very happy to be here today.
Answer:""
"""
input_few_shot_not_heterogenic = """
Question: What is the sentiment of the context: positive or negative? 
Context: He said, that the consert was very dull.
Answer:"negative"
Question: What is the sentiment of the context: positive or negative? 
Context: She came from school sad and lonely.
Answer:"negative"
Question: What is the sentiment of the context: positive or negative? 
Context: I am very happy to be here today.
Answer:""
"""
input_few_shot = """
Question: What is the sentiment of the context: positive or negative? 
Context: He said, that the consert was very dull.
Answer:"negative"
Question: What is the sentiment of the context: positive or negative? 
Context: She came from school smiling and singing.
Answer:"positive"
Question: What is the sentiment of the context: positive or negative? 
Context: I am very happy to be here today.
Answer:""
"""
inputs = tokenizer([input_zero_shot,input_few_shot_not_heterogenic,  input_few_shot], return_tensors="pt", padding=True)
outputs = model.generate(**inputs.to(model.device))
outputs_str = tokenizer.batch_decode(outputs, skip_special_tokens=True)

# predictions:
outputs_str

['happy', 'positive or negative', 'positive']

# 4. ✋ Hands on: Creation of an evaluation dataset 

* Download an existing dataset and transform it into a prompt input - label pair (either by creating your own prompt or by using the promtsource library).
  * Text classification (https://huggingface.co/datasets/imdb)
  * Named Entity Recognition (https://huggingface.co/datasets/polyglot_ner/viewer/en/train)
  * Question Answering (https://huggingface.co/datasets/squad_v2)
  * or other

* Adjust the evaluation script accordingly

* Create a function which will generate a few-shot (the prompt will include few demonstrations of the same task) prompt and label pairs.

* evaluate FLAN base model on your dataset (try both zero-shot and few-shot prompts):

In [None]:
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
 
model_path = "google/flan-t5-base"
tokenizer = AutoTokenizer.from_pretrained(model_path)
model = AutoModelForSeq2SeqLM.from_pretrained(model_path)

In [24]:
def pick_random_demonstrations():
    # From your custom dataset pick random demostrations (prompt-label pairs)
    pass

def create_few_shot_prompt():
    # With the pick_random_demonstrations() function create a new prompt
    pass

# Get models predictions

# Evaluate (depending on your dataset you may need to change the evaluation script) 