# In-context learning with Hugging Face

We will explore the capabilities of open-source transformer-based models in the task of in-context learning.

In [1]:
!pip install transformers --quiet
!pip install torch --quiet

Load all necessary libraries.

In [2]:
from transformers import AutoModelForSeq2SeqLM
from transformers import AutoTokenizer
from transformers import GenerationConfig

Load a pretrained model. We can experiment with varius models and model sizes: https://huggingface.co/google.

In [3]:
model_name='google/flan-t5-large'
model = AutoModelForSeq2SeqLM.from_pretrained(model_name)


In [4]:
tokenizer = AutoTokenizer.from_pretrained(model_name, use_fast=True)

## Zero-shot learning

Let's start with prompt encoding.

In [5]:
prompt = """
What is the capital of Germany?
"""

sentence_encoded = tokenizer(prompt, return_tensors='pt')

tokens = sentence_encoded['input_ids']

print (tokens)

tensor([[ 363,   19,    8, 1784,   13, 3434,   58,    3,    1]])


We can print out actual tokens.

In [6]:
tokenizer.convert_ids_to_tokens(tokens[0])

['▁What', '▁is', '▁the', '▁capital', '▁of', '▁Germany', '?', '▁', '</s>']

In [7]:
completion = model.generate(tokens, max_new_tokens=50)

output = tokenizer.decode(completion[0], skip_special_tokens=True)

print (output)

berlin


## One-shot learning
It is not exactly what we wanted. Let's provide the model with one example.

In [8]:
prompt = """What is the capital of France?
The capital of France is Paris.

What is the capital of Germany?
"""

sentence_encoded = tokenizer(prompt, return_tensors='pt')

In [9]:
sentence_encoded = tokenizer(prompt, return_tensors='pt')

tokens = sentence_encoded['input_ids']

In [10]:
completion = model.generate(tokens, max_new_tokens=50)

output = tokenizer.decode(completion[0], skip_special_tokens=True)

print (output)

Berlin


## Few-shot learning
It's better, but we want to receive the answer in the form of a complete sentence. Let's provide the model with one more example.

In [11]:
prompt = """What is the capital of France?
The capital of France is Paris.

What is the capital of Spain?
The capital of Spain is Madrid.

What is the capital of Germany?
"""

sentence_encoded = tokenizer(prompt, return_tensors='pt')

In [12]:
sentence_encoded = tokenizer(prompt, return_tensors='pt')

tokens = sentence_encoded['input_ids']

In [13]:
completion = model.generate(tokens, max_new_tokens=50)

output = tokenizer.decode(completion[0], skip_special_tokens=True)

print (output)

The capital of Germany is Berlin.
