# Prompt-based learning

In [127]:
!pip install -q transformers

In [135]:
import os
import numpy as np

## Prompting Class
Here is the class definition for Prompting

In [130]:
from transformers import AutoModelForMaskedLM , AutoTokenizer
class Prompting(object):
  """ doc string 
   This class helps us to implement
   Prompt-based Learning Model
  """
  def __init__(self, **kwargs):
    """ constructor 
    model: path to a Pre-trained language model form HuggingFace Hub
    tokenizer: path to tokenizer if different tokenizer is used, otherwise leave it empty
    """
    model_path=kwargs['model']
    tokenizer_path= kwargs['model']
    if "tokenizer" in kwargs.keys():
      tokenizer_path= kwargs['tokenizer']
    self.model = AutoModelForMaskedLM.from_pretrained(model_path)
    self.tokenizer = AutoTokenizer.from_pretrained(model_path)

  def prompt_pred(self,text):
    """
      Provide a text including a [MASK]. It supports single MASK token. 
      If more [MASK]ed tokens are given, it takes the first one.
    """
    tokenized_text = self.tokenizer.tokenize(text)
    indexed_tokens = self.tokenizer.convert_tokens_to_ids(tokenized_text)
    tokens_tensor = torch.tensor([indexed_tokens])
    # take the first masked token
    mask_pos=tokenized_text.index("[MASK]")
    model.eval()
    with torch.no_grad():
      outputs = self.model(tokens_tensor)
      predictions = outputs[0]
    values, indices=torch.sort(predictions[0, mask_pos],  descending=True)
    result=list(zip(tokenizer.convert_ids_to_tokens(indices), values))
    self.scores_dict={a:b for a,b in result}
    return result

  def compute_tokens_prob(self, text, token1, token2):
    """
    Compute the activations for given two tokens
    """
    _=self.prompt_pred(text)
    score1= self.scores_dict[token1] if token1 in self.scores_dict.keys() else 0
    score2= self.scores_dict[token2] if token2 in self.scores_dict.keys() else 0
    return {token1:score1, token2:score2}


I take Turkish LM here, you can choose any other model. 

In [94]:
prompting= Prompting(model="dbmdz/bert-base-turkish-cased")

Some weights of the model checkpoint at dbmdz/bert-base-turkish-cased were not used when initializing BertForMaskedLM: ['cls.seq_relationship.weight', 'cls.seq_relationship.bias']
- This IS expected if you are initializing BertForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


## Producing the predicted tokens

In [131]:
text="Çok keyif almadım filmden"
propmt=", çünkü [MASK] idi."
prompted= text + propmt
prompting.prompt_pred(prompted)[:10]

[('gereksiz', tensor(8.3082)),
 ('kötü', tensor(8.2386)),
 ('sıkıcı', tensor(8.0629)),
 ('eski', tensor(7.8365)),
 ('saçma', tensor(7.7022)),
 ('berbat', tensor(7.5442)),
 ('iğrenç', tensor(7.5263)),
 ('yeni', tensor(7.4843)),
 ('eğlenceli', tensor(7.4627)),
 ('güzel', tensor(7.4235))]

In [132]:
text="Çok keyif aldım filmden"
propmt=", çünkü [MASK] idi."
prompted= text + propmt
prompting.prompt_pred(prompted)[:10]

[('harika', tensor(8.9176)),
 ('güzel', tensor(8.7036)),
 ('eğlenceli', tensor(8.6776)),
 ('muhteşem', tensor(8.5812)),
 ('mükemmel', tensor(8.3756)),
 ('iyi', tensor(7.8504)),
 ('komik', tensor(7.8444)),
 ('keyifli', tensor(7.6584)),
 ('akıcı', tensor(7.5700)),
 ('süper', tensor(7.5630))]

## Producing the results for  a pair of neg/pos words

In [134]:
text="Çok keyif almadım filmden"
propmt=", çünkü [MASK] idi."
prompted= text + propmt
prompting.compute_tokens_prob(prompted, "gereksiz", "harika")


{'gereksiz': tensor(8.3082), 'harika': tensor(7.0045)}

In [126]:
text="Çok keyif aldım filmden"
prompted= text + propmt
prompting.compute_tokens_prob(prompted, "gereksiz", "harika")

{'gereksiz': tensor(6.8251), 'harika': tensor(8.9176)}

# Learning