In [1]:
from transformers import pipeline

### Task 1, 2 - Download models

Hugging Face allows for filtering models by task. Selecting `Fill-Mask` option allowed me to search for correct models.

Selected models:
- [roberta](https://huggingface.co/FacebookAI/xlm-roberta-base)
- [distilbert](https://huggingface.co/distilbert/distilbert-base-multilingual-cased)
- [Polish longformer](https://huggingface.co/sdadas/polish-longformer-large-4096)

In [2]:
roberta = pipeline(task="fill-mask", model="FacebookAI/xlm-roberta-base")

Some weights of the model checkpoint at FacebookAI/xlm-roberta-base were not used when initializing XLMRobertaForMaskedLM: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
- This IS expected if you are initializing XLMRobertaForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing XLMRobertaForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Hardware accelerator e.g. GPU is available in the environment, but no `device` argument is passed to the `Pipeline` object. Model will be on CPU.


In [3]:
distilbert = pipeline(
    task="fill-mask", model="distilbert/distilbert-base-multilingual-cased"
)

Hardware accelerator e.g. GPU is available in the environment, but no `device` argument is passed to the `Pipeline` object. Model will be on CPU.


In [4]:
longformer = pipeline(task="fill-mask", model="sdadas/polish-longformer-large-4096")

Some weights of the model checkpoint at sdadas/polish-longformer-large-4096 were not used when initializing LongformerForMaskedLM: ['longformer.embeddings.position_ids']
- This IS expected if you are initializing LongformerForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing LongformerForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Hardware accelerator e.g. GPU is available in the environment, but no `device` argument is passed to the `Pipeline` object. Model will be on CPU.


### Task 3 - See if model understands Polish cases

In [5]:
cases = [
    "mianownik",
    "dopełniacz",
    "celownik",
    "biernik",
    "narzędnik",
    "miejscownik",
    "wołacz",
]

queries_roberta = [
    "<mask> jest największym eksporterem wina.",
    "Bez <mask> nie damy rady dokończyć projektu",
    "Dałem <mask> dziś wolne. Bardzo się ucieszył.",
    "Zastąpiłem dziś <mask> na nocnej zmianie.",
    "Poszliśmy dziś z <mask> na plażę.",
    "Polska to najciekawszy kraj w <mask>",
    "Szanowny <mask>, chciałbym prosić Pana o przysługę.",
]

roberta

In [6]:
for query, case in zip(queries_roberta, cases):
    print(f"{case} : {roberta(query)[0]['token_str']}")

mianownik : Polska
dopełniacz : niego
celownik : mu
biernik : pracę
narzędnik : dziećmi
miejscownik : Europie
wołacz : Pan


The model fails at predicting the last case - wołacz - correctly

---

longformer

In [7]:
for query, case in zip(queries_roberta, cases):
    print(f"{case} : {longformer(query)[0]['token_str']}")

Input ids are automatically padded to be a multiple of `config.attention_window`: 512


mianownik : Polska
dopełniacz : Was
celownik : mu
biernik : kolegę
narzędnik : dziećmi
miejscownik : Europie
wołacz : Panie


Inference with Polish longformer was visibly slower than on roberta model. Is able handle polish cases much more accurately. No mistakes were made

---

distilbert

In [8]:
queries_distilbert = [q.replace("<mask>", "[MASK]") for q in queries_roberta]

for query, case in zip(queries_distilbert, cases):
    print(f"{case} : {distilbert(query)[0]['token_str']}")

mianownik : Miasto
dopełniacz : tego
celownik : było
biernik : pracował
narzędnik : преко
miejscownik : Europie
wołacz : ##m


The model makes noticable errors and even failed to detect polish language in the 5th case. The 7th case looks like a complete gibberish

### Task 4 - Long-range relationships

roberta

In [9]:
query = "Zaprosiła mnie do kina. Oboje jesteśmy introwertykami ale myślałem, że za jakiś czas, w końcu się przemogę i to ja zrobię pierwszy ruch. Nie spodziewałem się tego z <mask> strony."

roberta(query)[0]["token_str"]

'jej'

Model is able to recognize subject gender even in long range.

---

longformer

In [10]:
longformer(query)[0]["token_str"]

'jej'

Result identical to roberta

---

distilbert

In [11]:
distilbert(query.replace("<mask>", "[MASK]"))[0]["token_str"]

'jednej'

The model uses completely incorrect word that only fits into a very local context



---

### Task 5 - real world knowledge

roberta

In [12]:
query = "Pierwszy miesiąc w roku nazywa się <mask>."

roberta(query)[0]["token_str"]

'poniedziałek'

longformer

In [13]:
longformer(query)[0]["token_str"]

'styczeń'

distilbert

In [14]:
distilbert(query.replace("<mask>", "[MASK]"))[0]["token_str"]

'rok'

Only polish longformer was able to predict the word correctly

The example below shows that the model is capable to display the needed knowledge when using English.

In [19]:
roberta("The first month of the year is called <mask>.")[0]["token_str"]

'January'

distilbert performs poorly...

In [21]:
distilbert("The first month of the year is called [MASK].")[0]["token_str"]

'Ґ'

### Task 6 - zero-shot learning

In [15]:
queries_roberta = [
    "Analizując emocje w tekście 'Ten film był niesamowity. Naprawdę świetnie się bawiłem' możemy stwierdzić, że wypowiedź ta ma <mask> charakter",
    "Analizując emocje w tekście 'Ten film był okropny. Myślałem, że wyjdę w połowie' możemy stwierdzić, że wypowiedź ta ma <mask> charakter",
    "Analizując emocje w tekście 'Ten film był niezły. Myślę, że godny polecenia' możemy stwierdzić, że wypowiedź ta ma <mask> charakter",
    "Analizując emocje w tekście 'Ten film był straszny. Nigdy więcej go nie obejrzę' możemy stwierdzić, że wypowiedź ta ma <mask> charakter",
    "Analizując emocje w tekście 'Ten film zupełnie mi się nie podobał. Był tragiczny.' możemy stwierdzić, że wypowiedź ta ma <mask> charakter",
]

In [16]:
for query, case in zip(queries_roberta, cases):
    print(roberta(query)[0]["token_str"])

pozytywny
pozytywny
pozytywny
pozytywny
pozytywny


Zero-shot learning returns many false positives

In [17]:
for query, case in zip(queries_roberta, cases):
    print(roberta(query)[0]["token_str"])

pozytywny
pozytywny
pozytywny
pozytywny
pozytywny


Same as above

In [18]:
queries_distilbert = [q.replace("<mask>", "[MASK]") for q in queries_roberta]

for query, case in zip(queries_distilbert, cases):
    print(distilbert(query)[0]["token_str"])

돔
돔
돔
돔
돔


What is this even...

### Task 7 and 8

1. Answer the following questions:
   1. Which of the models produced the best results? - Polish longformer model. Other models were trained as general multi-lingual model. It's safe to assume that Polish longformer was fine-tuned for the Polish language.
   1. Was any of the models able to capture Polish grammar? - Roberta and longformer performed well but only longformer didn't make any mistake.
   1. Was any of the models able to capture long-distant relationships between the words? - Roberta and longformer captured the long-distance relationship correctly.
   1. Was any of the models able to capture world knowledge? - Only the model specific to Polish language was able to capture world knowledge. Roberta model was able to solve the task in english.
   1. Was any of the models good at doing zero-shot classification? - None of them wre 
   1. What are the most striking errors made by the models? - The errors returned in world knowledge tasks. Roberta confused "first month of the year" with "first day of the week". Distilbert replied with "year" even though the world "year" is used in the sentence.