# Prompt-based learning

Prompt-based learning is a new paradigm in the NLP field. In prompt-based learning, we do not have to hold any supervised learning process since we directly rely on the objective function (such as MLM) of any pre-trained language model. In order to use the models to achieve prediction tasks, the only thing to be done is to modify the original input<X> using a task-specific template into a textual string prompt such as <X, that is [MASK]> so that the model can achieve the task even without learning.
Such a mechanism allows us to exploit the LM that is pre-trained on huge amounts of textual data. This prompting function can be defined to make any LM be able to achieve few-shot, one-shot, or even zero-shot learning tasks where we easily adapt the model to new scenarios even with few or no labeled data.

In [3]:
!pip install -q transformers datasets

[K     |████████████████████████████████| 3.5 MB 4.2 MB/s 
[K     |████████████████████████████████| 311 kB 44.1 MB/s 
[K     |████████████████████████████████| 6.8 MB 32.5 MB/s 
[K     |████████████████████████████████| 596 kB 72.3 MB/s 
[K     |████████████████████████████████| 67 kB 5.3 MB/s 
[K     |████████████████████████████████| 895 kB 36.3 MB/s 
[K     |████████████████████████████████| 243 kB 41.1 MB/s 
[K     |████████████████████████████████| 133 kB 56.1 MB/s 
[K     |████████████████████████████████| 1.1 MB 57.3 MB/s 
[K     |████████████████████████████████| 94 kB 3.3 MB/s 
[K     |████████████████████████████████| 271 kB 34.6 MB/s 
[K     |████████████████████████████████| 144 kB 73.4 MB/s 
[?25h

In [5]:
import os
import numpy as np
import pandas as pd
from google.colab import drive

# For Google Colab Mounting
drive.mount('/content/drive')
os.chdir("/content/drive/MyDrive/akademi/Packt NLP with Transformers/TheSecondEdition/Prompting")

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [6]:
from transformers import AutoModelForMaskedLM , AutoTokenizer
import torch
model_path="bert-base-uncased"
tokenizer = AutoTokenizer.from_pretrained(model_path)

Downloading:   0%|          | 0.00/28.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/570 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/226k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/455k [00:00<?, ?B/s]

In [7]:
# load Prompting class
from prompt import Prompting
prompting= Prompting(model=model_path)

Downloading:   0%|          | 0.00/420M [00:00<?, ?B/s]

Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForMaskedLM: ['cls.seq_relationship.bias', 'cls.seq_relationship.weight']
- This IS expected if you are initializing BertForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


## Let the model predict some tokens

A POSITIVE example Input

In [8]:
prompt="Because it was [MASK]."
text="I really like the film a lot."

In [9]:
prompting.prompt_pred(text+prompt)[:10]

[('great', tensor(9.5558)),
 ('amazing', tensor(9.2532)),
 ('good', tensor(9.1464)),
 ('fun', tensor(8.3979)),
 ('fantastic', tensor(8.3277)),
 ('wonderful', tensor(8.2719)),
 ('beautiful', tensor(8.1584)),
 ('awesome', tensor(8.1071)),
 ('incredible', tensor(8.0140)),
 ('funny', tensor(7.8785))]

A NEGATIVE example Input

In [10]:
text="I did not like the film."
prompting.prompt_pred(text+prompt)[:10]

[('bad', tensor(8.6784)),
 ('funny', tensor(8.1660)),
 ('good', tensor(7.9858)),
 ('awful', tensor(7.7454)),
 ('scary', tensor(7.3526)),
 ('boring', tensor(7.1553)),
 ('wrong', tensor(7.1402)),
 ('terrible', tensor(7.1296)),
 ('horrible', tensor(6.9923)),
 ('ridiculous', tensor(6.7731))]

## Producing the results for  a pair of neg/pos words
Now we pass a list of neg/pos words rather thansinlge neg/pos words (tokens)

In [11]:
text="not worth watching"
prompting.compute_tokens_prob(text+prompt, token_list1=["great","amazin","good"], token_list2= ["bad","awfull","terrible"])

tensor([0.1496, 0.8504])

In [12]:
text="I strongly recommend that moview"
prompting.compute_tokens_prob(text+prompt, token_list1=["great","amazin","good"], token_list2= ["bad","awfull","terrible"])

tensor([0.9321, 0.0679])

In [13]:
text="I strongly recommend that moview"
prompting.compute_tokens_prob(text+prompt, token_list1=["good"], token_list2= ["bad"])

tensor([0.9223, 0.0777])

# Zero-shot evaluation

In [14]:
from datasets import load_dataset
dataset = load_dataset("imdb")

Downloading:   0%|          | 0.00/1.79k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.05k [00:00<?, ?B/s]

Downloading and preparing dataset imdb/plain_text (download: 80.23 MiB, generated: 127.02 MiB, post-processed: Unknown size, total: 207.25 MiB) to /root/.cache/huggingface/datasets/imdb/plain_text/1.0.0/2fdd8b9bcadd6e7055e742a706876ba43f19faee861df134affd7a3f60fc38a1...


Downloading:   0%|          | 0.00/84.1M [00:00<?, ?B/s]

0 examples [00:00, ? examples/s]

0 examples [00:00, ? examples/s]

0 examples [00:00, ? examples/s]

Dataset imdb downloaded and prepared to /root/.cache/huggingface/datasets/imdb/plain_text/1.0.0/2fdd8b9bcadd6e7055e742a706876ba43f19faee861df134affd7a3f60fc38a1. Subsequent calls will reuse this data.


  0%|          | 0/3 [00:00<?, ?it/s]

We can check LM to find the which words are the most suitable (pos or neg) in according to LM. 

In [15]:
pos=dataset["train"].filter(lambda e: e["label"]==1)
import pandas as pd
pd.DataFrame(pos)


  0%|          | 0/25 [00:00<?, ?ba/s]

Unnamed: 0,text,label
0,Zentropa has much in common with The Third Man...,1
1,Zentropa is the most original movie I've seen ...,1
2,Lars Von Trier is never backward in trying out...,1
3,*Contains spoilers due to me having to describ...,1
4,That was the first thing that sprang to mind a...,1
...,...,...
12495,A hit at the time but now better categorised a...,1
12496,I love this movie like no other. Another time ...,1
12497,This film and it's sequel Barry Mckenzie holds...,1
12498,'The Adventures Of Barry McKenzie' started lif...,1


In [16]:
neg=dataset["train"].filter(lambda e: e["label"]==0)
import pandas as pd
pd.DataFrame(neg)


  0%|          | 0/25 [00:00<?, ?ba/s]

Unnamed: 0,text,label
0,I rented I AM CURIOUS-YELLOW from my video sto...,0
1,"""I Am Curious: Yellow"" is a risible and preten...",0
2,If only to avoid making this type of film in t...,0
3,This film was probably inspired by Godard's Ma...,0
4,"Oh, brother...after hearing about this ridicul...",0
...,...,...
12495,There just isn't enough here. There a few funn...,0
12496,Tainted look at kibbutz life<br /><br />This f...,0
12497,"I saw this movie, just now, not when it was re...",0
12498,Any film which begins with a cowhand shagging ...,0


In [17]:
prompt=" Because it was [MASK]."

Set THRESHOLD as 0.5

In [18]:
crr=0
counter=0
THRESHOLD = 0.50
for text in pos[:200]["text"]:
  text=" ".join(text.split()[-100:])
  res=prompting.compute_tokens_prob(text+prompt, token_list1=["good"], token_list2= ["bad"])
  if res[0] > THRESHOLD:
    crr+=1
  counter+=1
p=crr
print(p)

196


In [19]:
crr=0
for text in neg[:200]["text"]:
  text=" ".join(text.split()[-100:])
  res=prompting.compute_tokens_prob(text+prompt, token_list1=["good"], token_list2= ["bad"])
  if res[0] < THRESHOLD:
    crr+=1
  counter+=1
n=crr
print(n)

30


In [20]:
(p+n)/counter

0.565

Biased training inputs may produce biased models. We know LM are biased models such that training data can contain racist, sexist, and bigoted language. Likewise, since The token embeddings suffers from biases, the model tends to classify masked token as "good" label more than expected.

Let us pass an empty template to the model to see bias!

In [21]:
prompting.compute_tokens_prob("it was [MASK].", token_list1=["good"], token_list2= ["bad"])

tensor([0.8495, 0.1505])

As you see the model classified the empty template input "Good" label (word) with 85 % probability. Therefore we set THREAHOLD to that value, and see what happens

In [None]:
crr=0
counter=0
THRESHOLD = 0.85
for text in pos[:200]["text"]:
  text=" ".join(text.split()[-100:])
  res=prompting.compute_tokens_prob(text+prompt, token_list1=["good"], token_list2= ["bad"])
  if res[0] > THRESHOLD:
    crr+=1
  counter+=1
p=crr
print(p)
crr=0
for text in neg[:200]["text"]:
  text=" ".join(text.split()[-100:])
  res=prompting.compute_tokens_prob(text+prompt, token_list1=["good"], token_list2= ["bad"])
  if res[0] < THRESHOLD:
    crr+=1
  counter+=1
n=crr
print(n)

In [25]:
# Zero-shot performance
(p+n)/counter

0.735

# Named Entity Recognition
we apply the template
>`Sentence. John is a type of [MASK]`

In [26]:
prompting.prompt_pred("John went to Paris to visit the University. John is a type of [MASK].")[:5]

[('man', tensor(8.1382)),
 ('john', tensor(7.1325)),
 ('guy', tensor(6.9672)),
 ('writer', tensor(6.4336)),
 ('philosopher', tensor(6.3823))]

John is a very common name and the model can know directly that it is a person without context, and this is not surprising. Let me use my own name *Savaş* since my name as a person is not used much in English texts, training data of the model.

In [27]:
prompting.prompt_pred("Savaş went to Paris to visit the university. Savaş is a type of [MASK].")[:5]

[('philosopher', tensor(7.6558)),
 ('poet', tensor(7.5621)),
 ('saint', tensor(7.0104)),
 ('man', tensor(6.8890)),
 ('pigeon', tensor(6.6780))]

Lets applt person-or-city classificaiton. Check if savas is a city or person. But before we run empty template first! 

In [31]:
prompting.compute_tokens_prob("It is a type of [MASK].",
                              token_list1=["person","man"], token_list2=["location","city","place"])

tensor([0.7603, 0.2397])

Well. Our threshold is 76.03 probability. 

In [32]:
prompting.compute_tokens_prob("Savaş went to Paris to visit the parliament. Savaş is a type of [MASK].",
                              token_list1=["person","man"], token_list2=["location","city","place"])

tensor([9.9987e-01, 1.2744e-04])

99.98 % perfect. Savaş is not city for sure! Lets check Paris. Buti Paris is very common as type of City. Lets change it as Laris

In [33]:
prompting.compute_tokens_prob("Savaş went to Laris to visit the parliament. Laris is a type of [MASK].",
                              token_list1=["person","man"], token_list2=["location","city","place"])

tensor([0.3263, 0.6737])

Wonderfull!. Less then 76%. So Laris is city then! Make it harder

In [36]:
prompting.compute_tokens_prob("Savas went to XYZ to visit friends. XYZ is a type of [MASK].",
                              token_list1=["person","man"], token_list2=["location","city","place"])

tensor([0.5516, 0.4484])

# Topic Classificaiton

In [None]:
prompting.prompt_pred("He ate a big hot dog and a pizza with some drinks. \
 He like eating especially street food. The topic is type of [MASK].")[:10]

[('interesting', tensor(7.3585)),
 ('funny', tensor(6.6522)),
 ('taboo', tensor(6.1840)),
 ('serious', tensor(5.9274)),
 ('different', tensor(5.6251)),
 ('personal', tensor(5.5419)),
 ('weird', tensor(5.3860)),
 ('political', tensor(5.3487)),
 ('unusual', tensor(5.3030)),
 ('crazy', tensor(5.2515))]

In [None]:
prompting.prompt_pred("Savas went to Paris to study \
 computer science. he started to learn basic staff like programming,\
 algorithm, operating systemvisit the parliament. The topic is a type of [MASK].")[:10]

[('mathematics', tensor(8.8438)),
 ('computer', tensor(7.9713)),
 ('programming', tensor(7.7146)),
 ('computing', tensor(7.6635)),
 ('math', tensor(7.5143)),
 ('algebra', tensor(7.1716)),
 ('computers', tensor(7.0013)),
 ('game', tensor(6.9694)),
 ('physics', tensor(6.9225)),
 ('computation', tensor(6.8152))]

# Sentence Embeddings

In [38]:
from transformers import pipeline
fe=pipeline("feature-extraction", model=model_path)

Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertModel: ['cls.predictions.bias', 'cls.seq_relationship.bias', 'cls.seq_relationship.weight', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight', 'cls.predictions.transform.dense.weight']
- This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


In [39]:
text="the film is ok. it means [MASK]."
indexed_tokens= tokenizer(text, return_tensors="pt").input_ids
tokenized_text= tokenizer.convert_ids_to_tokens (indexed_tokens[0])
print(tokenized_text)

['[CLS]', 'the', 'film', 'is', 'ok', '.', 'it', 'means', '[MASK]', '.', '[SEP]']


In [42]:
mask_pos= (indexed_tokens[0]== tokenizer.mask_token_id).nonzero().item()
text_emb=fe(text)
mask_emb=text_emb[0][mask_pos]
len(mask_emb)

768