# In-context learning (ICL)

In context learning is a use of a generative model, where the description of a desired task is a part of the input. 

While pre-training, the model is trained on a task of "guessing" the right word in context. This is achieved by tasks like Masked Language Modeling (MLM) or Causal Language Modeling (CLM). During these tasks, the model aquires an inherent understanding of the language. 

After pre-training, traditionaly, we would then fine-tune the model through Supervised ML for a specific task for which we need:
* Training data (input and label pairs)
* Adding a specific layer ("head") to the model relevant to our desired task
The resulting model is fit for that one specific task.

When we talk about ICL, we mean models, which were finetuned using text inputs containing description of desired task(=prompts) and text outputs from several tasks.

During inference of these models we provide the prompt. Being trained on a variety of prompts for multiple tasks, the model has a better understanding of the description of the task itself and thus may show ICL capability even on never before seen tasks.



In [7]:
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
 
model_path = "gaussalgo/mt5-base-priming-QA_en-cs"
tokenizer = AutoTokenizer.from_pretrained(model_path)
model = AutoModelForSeq2SeqLM.from_pretrained(model_path)


: 

: 

In [3]:
prompt = """
What is meant by: I rented I AM CURIOUS-YELLOW from my video store because of all the controversy that surrounded it when it was first released in 1967. I also heard that at first it was seized by U.S. customs if it ever tried to enter this country, therefore being a fan of films considered "controversial" I really had to see this for myself.<br /><br />The plot is centered around a young Swedish drama student named Lena who wants to learn everything she can about life. In particular she wants to focus her attentions to making some sort of documentary on what the average Swede thought about certain political issues such as the Vietnam War and race issues in the United States. In between asking politicians and ordinary denizens of Stockholm about their opinions on politics, she has sex with her drama teacher, classmates, and married men.<br /><br />What kills me about I AM CURIOUS-YELLOW is that 40 years ago, this was considered pornographic. Really, the sex and nudity scenes are few and far between, even then it's not shot like some cheaply made porno. While my countrymen mind find it shocking, in reality sex and nudity are a major staple in Swedish cinema. Even Ingmar Bergman, arguably their answer to good old boy John Ford, had sex scenes in his films.<br /><br />I do commend the filmmakers for the fact that any sex shown in the film is shown for artistic purposes rather than just to shock people and make money to be shown in pornographic theaters in America. I AM CURIOUS-YELLOW is a good film for anyone wanting to study the meat and potatoes (no pun intended) of Swedish cinema. But really, this film doesn't have much of a plot."
"""

inputs = tokenizer([prompt], return_tensors="pt", padding=True)
outputs = model.generate(**inputs.to(model.device))
outputs_str = tokenizer.batch_decode(outputs, skip_special_tokens=True)

# predictions:
outputs_str



['a young Swedish drama student named Lena']

## Zero-shot vs few-shot



The model might understand the task from the input, but it does not know how do we expect it to respond. Therefore, if we have the model adjusted for such use, we can show it the format of the task from a few input-output examples and see if it comprehends.

This approach is called in-context few-shot learning: In addition to the description of the task, we give the model a few input-output examples (demonstrations). Given these, the model has much easier time understanding the format of the interaction that we expect from it. The demonstration are the only lead the model has to understand the task at hand. We can see, that if we pick only examples with a negative sentiment, the model is unable to pick the correct label. 

In this setting, we need to standardize the format of prediction, so that the model can rely on it.

In [4]:




input_zero_shot = """
Question: What is the sentiment of the context: positive or negative? 
Context: I am very happy to be here today.
Answer:""
"""
input_few_shot_not_heterogenic = """
Question: What is the sentiment of the context: positive or negative? 
Context: He said, that the consert was very dull.
Answer:"negative"
Question: What is the sentiment of the context: positive or negative? 
Context: She came from school sad and lonely.
Answer:"negative"
Question: What is the sentiment of the context: positive or negative? 
Context: I am very happy to be here today.
Answer:""
"""
input_few_shot = """
Question: What is the sentiment of the context: positive or negative? 
Context: He said, that the consert was very dull.
Answer:"negative"
Question: What is the sentiment of the context: positive or negative? 
Context: She came from school smiling and singing.
Answer:"positive"
Question: What is the sentiment of the context: positive or negative? 
Context: I am very happy to be here today.
Answer:""
"""
inputs = tokenizer([input_zero_shot,input_few_shot_not_heterogenic,  input_few_shot], return_tensors="pt", padding=True)
outputs = model.generate(**inputs.to(model.device))
outputs_str = tokenizer.batch_decode(outputs, skip_special_tokens=True)

# predictions:
outputs_str

['happy', 'positive or negative', 'positive']

## What should the prompts look like?
For training a custom in-context learner we need text pairs of a prompt and label. While in the above example we see a unified prompt, in training it is beneficial to create multiple prompts for one task, as we want to support the models capability to understand the task by its description, not by identifying a task by a template. This diversivication should yield a benefit of having the model understanding never before seen tasks better.

In [5]:
#TODO Prompt source library...how to use existing prompts from promt source for your custom data

In [6]:
#TODO evaluation script


## [Hands on] Creation of evaluation dataset 

Download an existing dataset and transform it into a prompt input - label pair.
* Text classification (https://huggingface.co/datasets/imdb)
* Named Entity Recognition (https://huggingface.co/datasets/polyglot_ner/viewer/en/train)
* Question Answering (https://huggingface.co/datasets/squad_v2)
* or other

With the prompt source library create an evaluation dataset and adjust the evaluation script. 

Create a function which will generate a few-shot (the prompt will include few demonstrations of the same task) prompt and label pairs. Evaluate the model on your custom dataset

Then evaluate some existing models:

* https://huggingface.co/google/flan-t5-large
* https://huggingface.co/allenai/mtk-instruct-3b-def-pos

## Existing models

* T5 models (https://huggingface.co/t5-large) - pre-trained on mixture of tasks in text2text format
* FLAN - T5 models (https://huggingface.co/google/flan-t5-large) - fine-tuned T5 model on 1000 additional tasks
* mT5 models (https://huggingface.co/google/mt5-large) - pre-trained on 101 languages, no supervised tasks - needs to be fine-tuned
* Tk models (https://huggingface.co/allenai/tk-instruct-large-def-pos) - fine-tuned T5 on tasks with prompts written as in-context instructions
* mTk models (https://huggingface.co/allenai/mtk-instruct-3b-def-pos) - multilingual version of the Tk model
* MPT (https://huggingface.co/mosaicml/mpt-7b)
* Alpaca (https://github.com/tatsu-lab/stanford_alpaca)

