# Language modelling

The exercise shows how a language model may be used to solve word-prediction tasks and to generate text.


## Tasks

1. Read the documentation of [Language modelling in the Transformers](https://huggingface.co/transformers/task_summary.html#language-modeling) library.
2. Download three [Polish models](https://huggingface.co/models?filter=pl) from the Huggingface repository. 

3. Produce the predictions for the following sentences (use each model and check 5 predictions):


    i. (M) Warszawa to największe `[MASK]`.
    ii. (D) Te zabawki należą do `[MASK]`.
    iii. (C) Policjant przygląda się `[MASK]`.
    iv. (B) Na środku skrzyżowania widać `[MASK]`.
    v. (N) Właściciel samochodu widział złodzieja z `[MASK]`.
    vi. (Ms) Prezydent z premierem rozmawiali wczoraj o `[MASK]`.
    vii. (W) Witaj drogi `[MASK]`.

In [None]:
import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)


In [27]:
from transformers import AutoTokenizer, AutoModel
import torch
from transformers import *


In [31]:
from transformers import AutoModelForMaskedLM, AutoTokenizer
import torch
tokenizer = AutoTokenizer.from_pretrained("allegro/herbert-base-cased")
model = AutoModelForMaskedLM.from_pretrained("allegro/herbert-base-cased")

sequence = f"Warszawa to największe {tokenizer.mask_token}."



Some weights of the model checkpoint at allegro/herbert-base-cased were not used when initializing BertForMaskedLM: ['cls.sso.sso_relationship.weight', 'cls.sso.sso_relationship.bias']
- This IS expected if you are initializing BertForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


In [32]:
predict(sequence)

['Warszawa to największe miasto.',
 'Warszawa to największe lotnisko.',
 'Warszawa to największe centrum.',
 'Warszawa to największe miasta.',
 'Warszawa to największe atrakcje.']

In [39]:
class Model:
    def __init__(self):
        pass
    
    def predict(self,sequence):
        pass
        
class KleczekModel(Model):
    def __init__(self):
        self.model = BertForMaskedLM.from_pretrained("dkleczek/bert-base-polish-cased-v1")
        self.tokenizer = BertTokenizer.from_pretrained("dkleczek/bert-base-polish-cased-v1")
        self.nlp = pipeline('fill-mask', model=self.model, tokenizer=self.tokenizer)

    def predict(self,sequence):
        sequence = sequence.replace("[MASK]",self.nlp.tokenizer.mask_token)
        
        for pred in self.nlp(sequence):
            print(pred)    
        

class AllegroModel(Model):
    def __init__(self):
        self.tokenizer = AutoTokenizer.from_pretrained("allegro/herbert-base-cased")
        self.model = AutoModelForMaskedLM.from_pretrained("allegro/herbert-base-cased")


    def predict(self,sequence):
        sequence = sequence.replace("[MASK]",self.tokenizer.mask_token)
        inputs = self.tokenizer(sequence, return_tensors="pt")
        mask_token_index = torch.where(inputs["input_ids"] == self.tokenizer.mask_token_id)[1]
        token_logits = self.model(**inputs).logits

        mask_token_logits = token_logits[0, mask_token_index, :]
        top_5_tokens = torch.topk(mask_token_logits, 5, dim=1).indices[0].tolist()

        
        for token in top_5_tokens:
            print(sequence.replace(self.tokenizer.mask_token, self.tokenizer.decode([token])))
            
            
class BertModel(Model):
    def __init__(self):
        self.model = BertForMaskedLM.from_pretrained("dkleczek/bert-base-polish-cased-v1")
        self.tokenizer = BertTokenizer.from_pretrained("dkleczek/bert-base-polish-cased-v1")
        self.nlp = pipeline('fill-mask', model=self.model, tokenizer=self.tokenizer)

    
    def predict(self,sequence):
        sequence = sequence.replace("[MASK]",self.nlp.tokenizer.mask_token)
        for pred in self.nlp(sequence):
            print(pred)


In [35]:
model = KleczekModel()
model.predict(f"Adam Mickiewicz wielkim polskim [MASK] był.")

Some weights of the model checkpoint at dkleczek/bert-base-polish-cased-v1 were not used when initializing BertForMaskedLM: ['cls.seq_relationship.bias', 'cls.seq_relationship.weight']
- This IS expected if you are initializing BertForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


{'sequence': 'Adam Mickiewicz wielkim polskim pisarzem był.', 'score': 0.5391160249710083, 'token': 37120, 'token_str': 'p i s a r z e m'}
{'sequence': 'Adam Mickiewicz wielkim polskim człowiekiem był.', 'score': 0.1168326586484909, 'token': 6810, 'token_str': 'c z ł o w i e k i e m'}
{'sequence': 'Adam Mickiewicz wielkim polskim bohaterem był.', 'score': 0.06021444872021675, 'token': 17709, 'token_str': 'b o h a t e r e m'}
{'sequence': 'Adam Mickiewicz wielkim polskim mistrzem był.', 'score': 0.05187029018998146, 'token': 14652, 'token_str': 'm i s t r z e m'}
{'sequence': 'Adam Mickiewicz wielkim polskim artystą był.', 'score': 0.03178742155432701, 'token': 35680, 'token_str': 'a r t y s t ą'}


In [33]:
model = AllegroModel()
model.predict(f"Adam Mickiewicz wielkim polskim [MASK] był.")

Some weights of the model checkpoint at allegro/herbert-base-cased were not used when initializing BertForMaskedLM: ['cls.sso.sso_relationship.bias', 'cls.sso.sso_relationship.weight']
- This IS expected if you are initializing BertForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Adam Mickiewicz wielkim polskim poetą był.
Adam Mickiewicz wielkim polskim pisarzem był.
Adam Mickiewicz wielkim polskim politykiem był.
Adam Mickiewicz wielkim polskim nie był.
Adam Mickiewicz wielkim polskim człowiekiem był.


In [40]:
model = BertModel()
model.predict(f"Adam Mickiewicz wielkim polskim [MASK] był.")

Some weights of the model checkpoint at dkleczek/bert-base-polish-cased-v1 were not used when initializing BertForMaskedLM: ['cls.seq_relationship.bias', 'cls.seq_relationship.weight']
- This IS expected if you are initializing BertForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


{'sequence': 'Adam Mickiewicz wielkim polskim pisarzem był.', 'score': 0.5391160249710083, 'token': 37120, 'token_str': 'p i s a r z e m'}
{'sequence': 'Adam Mickiewicz wielkim polskim człowiekiem był.', 'score': 0.1168326586484909, 'token': 6810, 'token_str': 'c z ł o w i e k i e m'}
{'sequence': 'Adam Mickiewicz wielkim polskim bohaterem był.', 'score': 0.06021444872021675, 'token': 17709, 'token_str': 'b o h a t e r e m'}
{'sequence': 'Adam Mickiewicz wielkim polskim mistrzem był.', 'score': 0.05187029018998146, 'token': 14652, 'token_str': 'm i s t r z e m'}
{'sequence': 'Adam Mickiewicz wielkim polskim artystą był.', 'score': 0.03178742155432701, 'token': 35680, 'token_str': 'a r t y s t ą'}


{'sequence': 'Adam Mickiewicz wielkim polskim pisarzem był.', 'score': 0.5391160249710083, 'token': 37120, 'token_str': 'p i s a r z e m'}
{'sequence': 'Adam Mickiewicz wielkim polskim człowiekiem był.', 'score': 0.1168326586484909, 'token': 6810, 'token_str': 'c z ł o w i e k i e m'}
{'sequence': 'Adam Mickiewicz wielkim polskim bohaterem był.', 'score': 0.06021444872021675, 'token': 17709, 'token_str': 'b o h a t e r e m'}
{'sequence': 'Adam Mickiewicz wielkim polskim mistrzem był.', 'score': 0.05187029018998146, 'token': 14652, 'token_str': 'm i s t r z e m'}
{'sequence': 'Adam Mickiewicz wielkim polskim artystą był.', 'score': 0.03178742155432701, 'token': 35680, 'token_str': 'a r t y s t ą'}


In [4]:
def predict(sequence):
    inputs = tokenizer(sequence, return_tensors="pt")
    mask_token_index = torch.where(inputs["input_ids"] == tokenizer.mask_token_id)[1]
    token_logits = model(**inputs).logits

    mask_token_logits = token_logits[0, mask_token_index, :]
    top_5_tokens = torch.topk(mask_token_logits, 5, dim=1).indices[0].tolist()
    
    predictions = []
    for token in top_5_tokens:
        predictions.append(sequence.replace(tokenizer.mask_token, tokenizer.decode([token])))
    return predictions

In [16]:
model = BertForMaskedLM.from_pretrained("dkleczek/bert-base-polish-cased-v1")
tokenizer = BertTokenizer.from_pretrained("dkleczek/bert-base-polish-cased-v1")
nlp = pipeline('fill-mask', model=model, tokenizer=tokenizer)
for pred in nlp(f"Adam Mickiewicz wielkim polskim {nlp.tokenizer.mask_token} był."):
    print(pred)


Some weights of the model checkpoint at dkleczek/bert-base-polish-cased-v1 were not used when initializing BertForMaskedLM: ['cls.seq_relationship.bias', 'cls.seq_relationship.weight']
- This IS expected if you are initializing BertForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


{'sequence': 'Adam Mickiewicz wielkim polskim pisarzem był.', 'score': 0.5391160249710083, 'token': 37120, 'token_str': 'p i s a r z e m'}
{'sequence': 'Adam Mickiewicz wielkim polskim człowiekiem był.', 'score': 0.1168326586484909, 'token': 6810, 'token_str': 'c z ł o w i e k i e m'}
{'sequence': 'Adam Mickiewicz wielkim polskim bohaterem był.', 'score': 0.06021444872021675, 'token': 17709, 'token_str': 'b o h a t e r e m'}
{'sequence': 'Adam Mickiewicz wielkim polskim mistrzem był.', 'score': 0.05187029018998146, 'token': 14652, 'token_str': 'm i s t r z e m'}
{'sequence': 'Adam Mickiewicz wielkim polskim artystą był.', 'score': 0.03178742155432701, 'token': 35680, 'token_str': 'a r t y s t ą'}


In [2]:
from transformers import pipeline
unmasker = pipeline('fill-mask', model='Geotrend/bert-base-pl-cased',tokenizer='Geotrend/bert-base-pl-cased')
unmasker("Adam Mickiewicz wielkim polskim [MASK] był.")

[{'sequence': 'Adam Mickiewicz wielkim polskim polskim był.',
  'score': 0.22181102633476257,
  'token': 17037,
  'token_str': 'polskim'},
 {'sequence': 'Adam Mickiewicz wielkim polskimiem był.',
  'score': 0.21510043740272522,
  'token': 2231,
  'token_str': '##iem'},
 {'sequence': 'Adam Mickiewicz wielkim polskim autorem był.',
  'score': 0.07783893495798111,
  'token': 13238,
  'token_str': 'autorem'},
 {'sequence': 'Adam Mickiewicz wielkim polskim synem był.',
  'score': 0.023916326463222504,
  'token': 13328,
  'token_str': 'synem'},
 {'sequence': 'Adam Mickiewicz wielkim polskim Polaków był.',
  'score': 0.017501063644886017,
  'token': 20153,
  'token_str': 'Polaków'}]

In [37]:
from transformers import pipeline
unmasker = pipeline('fill-mask', model='allegro/herbert-base-cased',tokenizer="allegro/herbert-base-cased")
unmasker("Adam Mickiewicz wielkim polskim [MASK] był.")

Some weights of the model checkpoint at allegro/herbert-base-cased were not used when initializing BertForMaskedLM: ['cls.sso.sso_relationship.weight', 'cls.sso.sso_relationship.bias']
- This IS expected if you are initializing BertForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


PipelineException: No mask_token (<mask>) found on the input

In [21]:
from transformers import AutoTokenizer, AutoModel

from transformers import AutoTokenizer, AutoModel

tokenizer = AutoTokenizer.from_pretrained("Geotrend/bert-base-pl-cased")
model = AutoModel.from_pretrained("Geotrend/bert-base-pl-cased")

sequence = f"Adam Mickiewicz wielkim polskim {nlp.tokenizer.mask_token} był."
inputs = tokenizer(sequence, return_tensors="pt")
mask_token_index = torch.where(inputs["input_ids"] == tokenizer.mask_token_id)[1]

token_logits = model(**inputs).logits

mask_token_logits = token_logits[0, mask_token_index, :]
top_5_tokens = torch.topk(mask_token_logits, 5, dim=1).indices[0].tolist()

for token in top_5_tokens:
    print(sequence.replace(tokenizer.mask_token, tokenizer.decode([token])))

HBox(children=(HTML(value='Downloading'), FloatProgress(value=0.0, max=49.0), HTML(value='')))




HBox(children=(HTML(value='Downloading'), FloatProgress(value=0.0, max=752.0), HTML(value='')))




HBox(children=(HTML(value='Downloading'), FloatProgress(value=0.0, max=141081.0), HTML(value='')))




HBox(children=(HTML(value='Downloading'), FloatProgress(value=0.0, max=413149559.0), HTML(value='')))




Some weights of the model checkpoint at Geotrend/bert-base-pl-cased were not used when initializing BertModel: ['cls.predictions.decoder.weight', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.bias', 'cls.predictions.transform.LayerNorm.bias']
- This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertModel were not initialized from the model checkpoint at Geotrend/bert-base-pl-cased and are newly initialized: ['bert.pooler.dense.weight', 'bert.pooler.

AttributeError: 'BaseModelOutputWithPoolingAndCrossAttentions' object has no attribute 'logits'

In [26]:
from transformers import BertTokenizer, BertModel
tokenizer = BertTokenizer.from_pretrained('Geotrend/bert-base-pl-cased')
model = BertModel.from_pretrained("Geotrend/bert-base-pl-cased")
text = "Replace me by any [MASK] text you'd like."
encoded_input = tokenizer(text, return_tensors='pt')
output = model(**encoded_input)

Some weights of the model checkpoint at Geotrend/bert-base-pl-cased were not used when initializing BertModel: ['cls.predictions.decoder.weight', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.bias', 'cls.predictions.transform.LayerNorm.bias']
- This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertModel were not initialized from the model checkpoint at Geotrend/bert-base-pl-cased and are newly initialized: ['bert.pooler.dense.weight', 'bert.pooler.

In [23]:
model(**inputs)[0, mask_token_index, :]

TypeError: tuple indices must be integers or slices, not tuple

In [19]:
predict(f"Warszawa to największe {tokenizer.mask_token}.")


AttributeError: 'BaseModelOutput' object has no attribute 'logits'

In [25]:
tokenizer = AutoTokenizer.from_pretrained("allegro/plt5-base")
model = AutoModel.from_pretrained("allegro/plt5-base")


ValueError: Couldn't instantiate the backend tokenizer from one of: 
(1) a `tokenizers` library serialization file, 
(2) a slow tokenizer instance to convert or 
(3) an equivalent slow tokenizer class to instantiate and convert. 
You need to have sentencepiece installed to convert a slow tokenizer to a fast one.

In [9]:
model = BertForMaskedLM.from_pretrained("dkleczek/bert-base-polish-uncased-v1")
tokenizer = BertTokenizer.from_pretrained("dkleczek/bert-base-polish-uncased-v1")


NameError: name 'BertForMaskedLM' is not defined

HBox(children=(HTML(value='Downloading'), FloatProgress(value=0.0, max=459.0), HTML(value='')))




HBox(children=(HTML(value='Downloading'), FloatProgress(value=0.0, max=531146786.0), HTML(value='')))




Some weights of the model checkpoint at dkleczek/bert-base-polish-cased-v1 were not used when initializing BertForMaskedLM: ['cls.seq_relationship.bias', 'cls.seq_relationship.weight']
- This IS expected if you are initializing BertForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


HBox(children=(HTML(value='Downloading'), FloatProgress(value=0.0, max=489360.0), HTML(value='')))




HBox(children=(HTML(value='Downloading'), FloatProgress(value=0.0, max=112.0), HTML(value='')))




HBox(children=(HTML(value='Downloading'), FloatProgress(value=0.0, max=30.0), HTML(value='')))


{'sequence': 'Adam Mickiewicz wielkim polskim pisarzem był.', 'score': 0.5391160249710083, 'token': 37120, 'token_str': 'p i s a r z e m'}
{'sequence': 'Adam Mickiewicz wielkim polskim człowiekiem był.', 'score': 0.1168326586484909, 'token': 6810, 'token_str': 'c z ł o w i e k i e m'}
{'sequence': 'Adam Mickiewicz wielkim polskim bohaterem był.', 'score': 0.06021444872021675, 'token': 17709, 'token_str': 'b o h a t e r e m'}
{'sequence': 'Adam Mickiewicz wielkim polskim mistrzem był.', 'score': 0.05187029018998146, 'token': 14652, 'token_str': 'm i s t r z e m'}
{'sequence': 'Adam Mickiewicz wielkim polskim artystą był.', 'score': 0.03178742155432701, 'token': 35680, 'token_str': 'a r t y s t ą'}


In [13]:
tokenizer = AutoTokenizer.from_pretrained("allegro/herbert-klej-cased-tokenizer-v1")
model = AutoModel.from_pretrained("allegro/herbert-klej-cased-v1")
predict(f"Warszawa to największe {tokenizer.mask_token}.")


AttributeError: 'BaseModelOutputWithPoolingAndCrossAttentions' object has no attribute 'logits'

In [15]:
from transformers import XLMTokenizer, RobertaModel

tokenizer = XLMTokenizer.from_pretrained("allegro/herbert-klej-cased-tokenizer-v1")
model = RobertaModel.from_pretrained("allegro/herbert-klej-cased-v1")

encoded_input = tokenizer.encode("Kto ma lepszą sztukę, ma lepszy rząd – to jasne.", return_tensors='pt')
outputs = model(encoded_input)


In [14]:
predict(f"Warszawa to największe {tokenizer.mask_token}.")


AttributeError: 'BaseModelOutputWithPoolingAndCrossAttentions' object has no attribute 'logits'

In [None]:
class LanguageModel:
    
    def __init__(self,model, tokenizer )
    

In [8]:
from transformers import AutoTokenizer, AutoModel

tokenizer = AutoTokenizer.from_pretrained("allegro/plt5-base")
model = AutoModel.from_pretrained("allegro/plt5-base")

predict(f"Warszawa to największe {tokenizer.mask_token}.")
predict(f"Te zabawki należą do {tokenizer.mask_token}.")
predict(f"Policjant przygląda się {tokenizer.mask_token}.")
predict(f"Policjant przygląda się {tokenizer.mask_token}.")
predict(f"Na środku skrzyżowania widać {tokenizer.mask_token}.")
predict(f"Właściciel samochodu widział złodzieja z {tokenizer.mask_token}.")
predict(f"Prezydent z premierem rozmawiali wczoraj o {tokenizer.mask_token}.")
predict(f"Witaj drogi {tokenizer.mask_token}.")


HBox(children=(HTML(value='Downloading'), FloatProgress(value=0.0, max=1100541913.0), HTML(value='')))




Some weights of the model checkpoint at allegro/plt5-base were not used when initializing T5Model: ['lm_head.weight']
- This IS expected if you are initializing T5Model from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing T5Model from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Using mask_token, but it is not set yet.


TypeError: where(): argument 'condition' (position 1) must be Tensor, not bool

In [None]:
from transformers import XLMTokenizer, RobertaModel

model = AutoModel.from_pretrained("Helsinki-NLP/opus-mt-pl-en")
tokenizer = AutoTokenizer.from_pretrained("Helsinki-NLP/opus-mt-pl-en")



In [7]:
predict(f"Warszawa to największe {tokenizer.mask_token}.")
predict(f"Te zabawki należą do {tokenizer.mask_token}.")
predict(f"Policjant przygląda się {tokenizer.mask_token}.")
predict(f"Policjant przygląda się {tokenizer.mask_token}.")
predict(f"Na środku skrzyżowania widać {tokenizer.mask_token}.")
predict(f"Właściciel samochodu widział złodzieja z {tokenizer.mask_token}.")
predict(f"Prezydent z premierem rozmawiali wczoraj o {tokenizer.mask_token}.")
predict(f"Witaj drogi {tokenizer.mask_token}.")

Using mask_token, but it is not set yet.


TypeError: where(): argument 'condition' (position 1) must be Tensor, not bool

   
4. Check the model predictions for the following sentences (using each model):


    i. Gdybym wiedział wtedy dokładnie to, co wiem teraz, to bym się nie `[MASK]`.
    ii. Gdybym wiedziała wtedy dokładnie to, co wiem teraz, to bym się nie `[MASK]`.
   

In [19]:
predict(f"Gdybym wiedział wtedy dokładnie to, co wiem teraz, to bym się nie {tokenizer.mask_token}.")
predict(f"Gdybym wiedziała wtedy dokładnie to, co wiem teraz, to bym się nie {tokenizer.mask_token}.")


Gdybym wiedział wtedy dokładnie to, co wiem teraz, to bym się nie poddał.
Gdybym wiedział wtedy dokładnie to, co wiem teraz, to bym się nie zdziwił.
Gdybym wiedział wtedy dokładnie to, co wiem teraz, to bym się nie dowiedział.
Gdybym wiedział wtedy dokładnie to, co wiem teraz, to bym się nie zastanawiał.
Gdybym wiedział wtedy dokładnie to, co wiem teraz, to bym się nie przyznał.
Gdybym wiedziała wtedy dokładnie to, co wiem teraz, to bym się nie dowiedziała.
Gdybym wiedziała wtedy dokładnie to, co wiem teraz, to bym się nie przyznała.
Gdybym wiedziała wtedy dokładnie to, co wiem teraz, to bym się nie bała.
Gdybym wiedziała wtedy dokładnie to, co wiem teraz, to bym się nie zmieniła.
Gdybym wiedziała wtedy dokładnie to, co wiem teraz, to bym się nie zgodziła.


5. Check the model predictions for the following sentences:


    i. `[MASK]` wrze w temperaturze 100 stopni, a zamarza w temperaturze 0 stopni Celsjusza.
    ii. W wakacje odwiedziłem `[MASK]`, który jest stolicą Islandii.
    iii. Informatyka na `[MASK]` należy do najlepszych kierunków w Polsce.
   

In [20]:
predict(f"{tokenizer.mask_token} wrze w temperaturze 100 stopni, a zamarza w temperaturze 0 stopni Celsjusza.")
predict(f"W wakacje odwiedziłem {tokenizer.mask_token}, który jest stolicą Islandii.")
predict(f"Informatyka na {tokenizer.mask_token} należy do najlepszych kierunków w Polsce.")



Woda wrze w temperaturze 100 stopni, a zamarza w temperaturze 0 stopni Celsjusza.
Słońce wrze w temperaturze 100 stopni, a zamarza w temperaturze 0 stopni Celsjusza.
Ziemia wrze w temperaturze 100 stopni, a zamarza w temperaturze 0 stopni Celsjusza.
Następnie wrze w temperaturze 100 stopni, a zamarza w temperaturze 0 stopni Celsjusza.
Ciało wrze w temperaturze 100 stopni, a zamarza w temperaturze 0 stopni Celsjusza.
W wakacje odwiedziłem Kraków, który jest stolicą Islandii.
W wakacje odwiedziłem Oslo, który jest stolicą Islandii.
W wakacje odwiedziłem Londyn, który jest stolicą Islandii.
W wakacje odwiedziłem Gdańsk, który jest stolicą Islandii.
W wakacje odwiedziłem Toruń, który jest stolicą Islandii.
Informatyka na pewno należy do najlepszych kierunków w Polsce.
Informatyka na AGH należy do najlepszych kierunków w Polsce.
Informatyka na UW należy do najlepszych kierunków w Polsce.
Informatyka na studiach należy do najlepszych kierunków w Polsce.
Informatyka na UMK należy do najlepszy

6. If you want to use causal language models such as PapuGaPT2 or plT5, you should change the last three examples to accomodate for the fact, that these
   models are better suited for causal language modelling.
   

7. Answer the following questions:
   
   
    i. Which of the models produced the best results?
    ii. Was any of the models able to capture Polish grammar?
    iii. Was any of the models able to capture long-distant relationships between the words?
    iv. Was any of the models able to capture world knowledge?
    v. What are the most striking errors made by the models?