# Text classification
The task concentrates on content-based text classification.

In [None]:
!pip install datasets
!pip install transformers
!pip install fasttext
!pip install lime

Collecting datasets
  Downloading datasets-1.16.1-py3-none-any.whl (298 kB)
[K     |████████████████████████████████| 298 kB 5.3 MB/s 
[?25hCollecting xxhash
  Downloading xxhash-2.0.2-cp37-cp37m-manylinux2010_x86_64.whl (243 kB)
[K     |████████████████████████████████| 243 kB 40.7 MB/s 
[?25hCollecting aiohttp
  Downloading aiohttp-3.8.1-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl (1.1 MB)
[K     |████████████████████████████████| 1.1 MB 36.7 MB/s 
Collecting huggingface-hub<1.0.0,>=0.1.0
  Downloading huggingface_hub-0.2.1-py3-none-any.whl (61 kB)
[K     |████████████████████████████████| 61 kB 517 kB/s 
Collecting fsspec[http]>=2021.05.0
  Downloading fsspec-2021.11.1-py3-none-any.whl (132 kB)
[K     |████████████████████████████████| 132 kB 53.4 MB/s 
Collecting aiosignal>=1.1.2
  Downloading aiosignal-1.2.0-py3-none-any.whl (8.2 kB)
Collecting frozenlist>=1.1.1
  Downloading frozenlist-1.2.0-cp37-cp37m-manylinux_2_5_x86_

## Get acquainted with the data of the Polish Cyberbullying detection dataset. Pay special attention to the distribution of the positive and negative examples in the first task as well as distribution of the classes in the second task.

In [None]:
from datasets import load_dataset, load_metric
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics import f1_score, precision_score, accuracy_score, recall_score
from sklearn.naive_bayes import GaussianNB
from transformers import pipeline, AutoTokenizer, Trainer, DataCollatorWithPadding, AutoModelForSequenceClassification, TrainingArguments
import fasttext
import numpy as np
import pandas as pd
from lime.lime_text import LimeTextExplainer

In [None]:
dataset_task1 = load_dataset("poleval2019_cyberbullying", "task01")
dataset_task2 = load_dataset("poleval2019_cyberbullying", "task02")

Downloading:   0%|          | 0.00/2.08k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.35k [00:00<?, ?B/s]

Downloading and preparing dataset poleval2019_cyber_bullying/task01 (download: 400.39 KiB, generated: 1.16 MiB, post-processed: Unknown size, total: 1.55 MiB) to /root/.cache/huggingface/datasets/poleval2019_cyber_bullying/task01/1.0.0/ce6060c56dae43c469bab309a7573b86299b0bcc2484e85cfe0ae70b5f770450...


Downloading:   0%|          | 0.00/340k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/70.1k [00:00<?, ?B/s]

0 examples [00:00, ? examples/s]

0 examples [00:00, ? examples/s]

Dataset poleval2019_cyber_bullying downloaded and prepared to /root/.cache/huggingface/datasets/poleval2019_cyber_bullying/task01/1.0.0/ce6060c56dae43c469bab309a7573b86299b0bcc2484e85cfe0ae70b5f770450. Subsequent calls will reuse this data.


  0%|          | 0/2 [00:00<?, ?it/s]

Downloading and preparing dataset poleval2019_cyber_bullying/task02 (download: 400.53 KiB, generated: 1.16 MiB, post-processed: Unknown size, total: 1.55 MiB) to /root/.cache/huggingface/datasets/poleval2019_cyber_bullying/task02/1.0.0/ce6060c56dae43c469bab309a7573b86299b0bcc2484e85cfe0ae70b5f770450...


Downloading:   0%|          | 0.00/340k [00:00<?, ?B/s]

0 examples [00:00, ? examples/s]

0 examples [00:00, ? examples/s]

Dataset poleval2019_cyber_bullying downloaded and prepared to /root/.cache/huggingface/datasets/poleval2019_cyber_bullying/task02/1.0.0/ce6060c56dae43c469bab309a7573b86299b0bcc2484e85cfe0ae70b5f770450. Subsequent calls will reuse this data.


  0%|          | 0/2 [00:00<?, ?it/s]

In [None]:
dataset_task1_train = dataset_task1['train']
dataset_task1_test = dataset_task1['test']
dataset_task2_train = dataset_task2['train']
dataset_task2_test = dataset_task2['test']

In [None]:
dataset_task1_train[:10]

{'label': [0, 0, 0, 0, 0, 0, 0, 0, 0, 1],
 'text': ['Dla mnie faworytem do tytułu będzie Cracovia. Zobaczymy, czy typ się sprawdzi.',
  '@anonymized_account @anonymized_account Brawo ty Daria kibic ma być na dobre i złe',
  '@anonymized_account @anonymized_account Super, polski premier składa kwiaty na grobach kolaborantów. Ale doczekaliśmy czasów.',
  '@anonymized_account @anonymized_account Musi. Innej drogi nie mamy.',
  'Odrzut natychmiastowy, kwaśna mina, mam problem',
  'Jaki on był fajny xdd pamiętam, że spóźniłam się na jego pierwsze zajęcia i to sporo i za karę kazał mi usiąść w pierwszej ławce XD',
  '@anonymized_account No nie ma u nas szczęścia 😉',
  '@anonymized_account Dawno kogoś tak wrednego nie widziałam xd',
  '@anonymized_account @anonymized_account Zaległości były, ale ważne czy były wezwania do zapłaty z których się klub nie wywiązał.',
  '@anonymized_account @anonymized_account @anonymized_account Gdzie jest @anonymized_account . Brudziński jesteś kłamcą i marnym 

In [None]:
dataset_task2_train[:10]

{'label': [0, 0, 0, 0, 0, 0, 0, 0, 0, 2],
 'text': ['Dla mnie faworytem do tytułu będzie Cracovia. Zobaczymy, czy typ się sprawdzi.',
  '@anonymized_account @anonymized_account Brawo ty Daria kibic ma być na dobre i złe',
  '@anonymized_account @anonymized_account Super, polski premier składa kwiaty na grobach kolaborantów. Ale doczekaliśmy czasów.',
  '@anonymized_account @anonymized_account Musi. Innej drogi nie mamy.',
  'Odrzut natychmiastowy, kwaśna mina, mam problem',
  'Jaki on był fajny xdd pamiętam, że spóźniłam się na jego pierwsze zajęcia i to sporo i za karę kazał mi usiąść w pierwszej ławce XD',
  '@anonymized_account No nie ma u nas szczęścia 😉',
  '@anonymized_account Dawno kogoś tak wrednego nie widziałam xd',
  '@anonymized_account @anonymized_account Zaległości były, ale ważne czy były wezwania do zapłaty z których się klub nie wywiązał.',
  '@anonymized_account @anonymized_account @anonymized_account Gdzie jest @anonymized_account . Brudziński jesteś kłamcą i marnym 

## Train the following classifiers on the training sets (for the task 1 and the task 2)

### Bayesian classifier with TF * IDF weighting.

In [None]:
columns = ["classifier", "accuracy", "precision", "recall", "f1"]

def to_scores_df(model, scores):
    return pd.DataFrame(data=[[
        model,
        scores[columns[1]],
        scores[columns[2]],
        scores[columns[3]],
        scores[columns[4]],
    ]], columns=columns)

scores_task1 = pd.DataFrame(data=[], columns=columns)
scores_task2 = pd.DataFrame(data=[], columns=columns)



In [None]:
def tf_idf(train, test):    
    vectorizer = TfidfVectorizer()
    vectorizer.fit(train)
    return vectorizer.transform(train).toarray(), vectorizer.transform(test).toarray()

dataset_task1_train_tfidf, dataset_task1_test_tfidf = tf_idf(dataset_task1_train['text'], dataset_task1_test['text'])
dataset_task2_train_tfidf, dataset_task2_test_tfidf = tf_idf(dataset_task2_train['text'], dataset_task2_test['text'])

In [None]:
gnb1 = GaussianNB()
gnb1.fit(dataset_task1_train_tfidf, dataset_task1_train['label'])
gnb2 = GaussianNB()
gnb2.fit(dataset_task2_train_tfidf, dataset_task2_train['label'])

GaussianNB()

In [None]:
def evaluate_task1(predicted, actual):
    return {"accuracy": accuracy_score(predicted, actual), "precision": precision_score(predicted, actual), "recall": recall_score(predicted, actual), "f1": f1_score(predicted, actual)} 

In [None]:
def evaluate_task2(predicted, actual):
    return {"accuracy": accuracy_score(predicted, actual), "precision": precision_score(predicted, actual, average='macro'), "recall": recall_score(predicted, actual, average='macro'), "f1": f1_score(predicted, actual, average='macro')} 

In [None]:
gnb1_scores = evaluate_task1(gnb1.predict(dataset_task1_test_tfidf), dataset_task1_test['label'])
gnb2_scores = evaluate_task2(gnb2.predict(dataset_task2_test_tfidf), dataset_task2_test['label'])
scores_task1 = scores_task1.append(to_scores_df("GaussianNaiveBayes", gnb1_scores))
scores_task2 = scores_task2.append(to_scores_df("GaussianNaiveBayes", gnb2_scores))

In [None]:
gnb1_scores

{'accuracy': 0.782,
 'f1': 0.2684563758389261,
 'precision': 0.29850746268656714,
 'recall': 0.24390243902439024}

In [None]:
gnb2_scores

{'accuracy': 0.787,
 'f1': 0.3968305029876156,
 'precision': 0.4081828647301029,
 'recall': 0.40132515731936985}

### Fasttext text classifier

In [None]:
def to_fasttext_input(dataset, filename):
    with open(filename, "w") as f:
        for label, text in zip(dataset['label'], dataset['text']):
            f.write(f"__label__{label} {text}\n")

to_fasttext_input(dataset_task1_train, 'fasttext_train1.txt')
to_fasttext_input(dataset_task1_test, 'fasttext_test1.txt')
to_fasttext_input(dataset_task2_train, 'fasttext_train2.txt')
to_fasttext_input(dataset_task2_test, 'fasttext_test2.txt')

In [None]:
fasttext_model1 = fasttext.train_supervised('fasttext_train1.txt')
fasttext_model2 = fasttext.train_supervised('fasttext_train2.txt')

In [None]:
def fasttext_scores(result):
    return { "accuracy": result[1], "precision": None, "recall": None, "f1": None }

fasttext_model1_scores = fasttext_scores(fasttext_model1.test('fasttext_test1.txt'))
fasttext_model2_scores = fasttext_scores(fasttext_model2.test('fasttext_test2.txt'))
scores_task1 = scores_task1.append(to_scores_df("fastText", fasttext_model1_scores))
scores_task2 = scores_task2.append(to_scores_df("fastText", fasttext_model2_scores))

In [None]:
fasttext_model1_scores

{'accuracy': 0.873, 'f1': None, 'precision': None, 'recall': None}

In [None]:
fasttext_model2_scores

{'accuracy': 0.868, 'f1': None, 'precision': None, 'recall': None}

### Transformer classifier (take into account that a number of experiments should be performed for this model).

In [None]:
def fine_tuned(model_name, dataset, expected_labels):
    tokenizer = AutoTokenizer.from_pretrained(model_name)
    tokenized_dt = dataset.map(lambda x: tokenizer(x["text"], truncation=True), batched=True)
    data_collator = DataCollatorWithPadding(tokenizer=tokenizer)
    model = AutoModelForSequenceClassification.from_pretrained(model_name, num_labels=expected_labels)
    
    training_args = TrainingArguments(
        output_dir='./results',
        learning_rate=0.00002,
        per_device_train_batch_size=16,
        per_device_eval_batch_size=16,
        num_train_epochs=2,
        weight_decay=0.01,
        evaluation_strategy="epoch"
    )

    trainer = Trainer(
        model=model,
        args=training_args,
        train_dataset=tokenized_dt["train"],
        eval_dataset=tokenized_dt["test"],
        tokenizer=tokenizer,
        data_collator=data_collator
    )

    trainer.train()
    return model

In [None]:
herbert_fine_tuned1 = fine_tuned("allegro/herbert-base-cased", dataset_task1, 2)
herbert_fine_tuned1.save_pretrained("herbert-base-cased-bullying")
# herbert_fine_tuned1 = AutoModelForSequenceClassification.from_pretrained("herbert-base-cased-bullying", local_files_only=True)

Downloading:   0%|          | 0.00/229 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/472 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/886k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/543k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/129 [00:00<?, ?B/s]

  0%|          | 0/11 [00:00<?, ?ba/s]

  0%|          | 0/1 [00:00<?, ?ba/s]

Downloading:   0%|          | 0.00/624M [00:00<?, ?B/s]

Some weights of the model checkpoint at allegro/herbert-base-cased were not used when initializing BertForSequenceClassification: ['cls.predictions.decoder.weight', 'cls.predictions.transform.dense.bias', 'cls.sso.sso_relationship.weight', 'cls.predictions.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.sso.sso_relationship.bias', 'cls.predictions.decoder.bias', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.dense.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification 

Epoch,Training Loss,Validation Loss
1,0.2336,0.327281
2,0.1729,0.324736


Saving model checkpoint to ./results/checkpoint-500
Configuration saved in ./results/checkpoint-500/config.json
Model weights saved in ./results/checkpoint-500/pytorch_model.bin
tokenizer config file saved in ./results/checkpoint-500/tokenizer_config.json
Special tokens file saved in ./results/checkpoint-500/special_tokens_map.json
The following columns in the evaluation set  don't have a corresponding argument in `BertForSequenceClassification.forward` and have been ignored: text.
***** Running Evaluation *****
  Num examples = 1000
  Batch size = 16
Saving model checkpoint to ./results/checkpoint-1000
Configuration saved in ./results/checkpoint-1000/config.json
Model weights saved in ./results/checkpoint-1000/pytorch_model.bin
tokenizer config file saved in ./results/checkpoint-1000/tokenizer_config.json
Special tokens file saved in ./results/checkpoint-1000/special_tokens_map.json
The following columns in the evaluation set  don't have a corresponding argument in `BertForSequenceCla

In [None]:
herbert_fine_tuned2 = fine_tuned("allegro/herbert-base-cased", dataset_task2, 3)
herbert_fine_tuned2.save_pretrained("herbert-base-cased-bullying_2")
# herbert_fine_tuned2 = AutoModelForSequenceClassification.from_pretrained("herbert-base-cased-bullying_2", local_files_only=True)

loading configuration file https://huggingface.co/allegro/herbert-base-cased/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/d24c58747dbe6b61ed3e1eb5d488dfec9332ed13dd3f8983588f30d96f6f1bde.193ae07fbea6bb9ac46f854cd03094e486dfa4483e0596fd6a159dcfaef521a5
Model config BertConfig {
  "_name_or_path": "allegro/herbert-base-cased",
  "architectures": [
    "BertModel"
  ],
  "attention_probs_dropout_prob": 0.1,
  "classifier_dropout": null,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "layer_norm_eps": 1e-12,
  "max_position_embeddings": 514,
  "model_type": "bert",
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "pad_token_id": 1,
  "position_embedding_type": "absolute",
  "tokenizer_class": "HerbertTokenizerFast",
  "transformers_version": "4.13.0",
  "type_vocab_size": 2,
  "use_cache": true,
  "vocab_size": 50000
}

loading file https://huggingface.co/allegr

  0%|          | 0/11 [00:00<?, ?ba/s]

  0%|          | 0/1 [00:00<?, ?ba/s]

loading configuration file https://huggingface.co/allegro/herbert-base-cased/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/d24c58747dbe6b61ed3e1eb5d488dfec9332ed13dd3f8983588f30d96f6f1bde.193ae07fbea6bb9ac46f854cd03094e486dfa4483e0596fd6a159dcfaef521a5
Model config BertConfig {
  "_name_or_path": "allegro/herbert-base-cased",
  "architectures": [
    "BertModel"
  ],
  "attention_probs_dropout_prob": 0.1,
  "classifier_dropout": null,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "id2label": {
    "0": "LABEL_0",
    "1": "LABEL_1",
    "2": "LABEL_2"
  },
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "label2id": {
    "LABEL_0": 0,
    "LABEL_1": 1,
    "LABEL_2": 2
  },
  "layer_norm_eps": 1e-12,
  "max_position_embeddings": 514,
  "model_type": "bert",
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "pad_token_id": 1,
  "position_embedding_type": "absolute",
  "tokenizer_class": "HerbertTokenizerF

Epoch,Training Loss,Validation Loss
1,0.2944,0.406826
2,0.2229,0.362272


Saving model checkpoint to ./results/checkpoint-500
Configuration saved in ./results/checkpoint-500/config.json
Model weights saved in ./results/checkpoint-500/pytorch_model.bin
tokenizer config file saved in ./results/checkpoint-500/tokenizer_config.json
Special tokens file saved in ./results/checkpoint-500/special_tokens_map.json
The following columns in the evaluation set  don't have a corresponding argument in `BertForSequenceClassification.forward` and have been ignored: text.
***** Running Evaluation *****
  Num examples = 1000
  Batch size = 16
Saving model checkpoint to ./results/checkpoint-1000
Configuration saved in ./results/checkpoint-1000/config.json
Model weights saved in ./results/checkpoint-1000/pytorch_model.bin
tokenizer config file saved in ./results/checkpoint-1000/tokenizer_config.json
Special tokens file saved in ./results/checkpoint-1000/special_tokens_map.json
The following columns in the evaluation set  don't have a corresponding argument in `BertForSequenceCla

In [None]:
def compute_metrics_singleclass(p):    
    pred, labels = p
    pred = np.argmax(pred, axis=1)
    return evaluate_task1(pred, labels)

def compute_metrics_multiclass(p):    
    pred, labels = p
    pred = np.argmax(pred, axis=1)
    return evaluate_task2(pred, labels)

def evaluate_transformers(model, dataset, tokenizer_name, compute_metrics):
    tokenizer = AutoTokenizer.from_pretrained(tokenizer_name)
    tokenized_dt = dataset.map(lambda x: tokenizer(x["text"], truncation=True), batched=True)
    trainer = Trainer(model=model,
                      eval_dataset=tokenized_dt["test"],
                      tokenizer=tokenizer,
                      compute_metrics=compute_metrics)

    return trainer.evaluate()

In [None]:
def transformers_to_scores(eval_res):
    return {"accuracy": eval_res["eval_accuracy"], "precision": eval_res["eval_precision"], "recall": eval_res["eval_recall"], "f1": eval_res["eval_f1"]} 

herbert_fine_tuned1_score = transformers_to_scores(
    evaluate_transformers(herbert_fine_tuned1, dataset_task1, "allegro/herbert-base-cased", compute_metrics_multiclass))
herbert_fine_tuned2_score = transformers_to_scores(
    evaluate_transformers(herbert_fine_tuned2, dataset_task2, "allegro/herbert-base-cased", compute_metrics_multiclass))

scores_task1 = scores_task1.append(to_scores_df("transformers_herbert-cased", herbert_fine_tuned1_score))
scores_task2 = scores_task2.append(to_scores_df("transformers_herbert-cased", herbert_fine_tuned2_score))



loading configuration file https://huggingface.co/allegro/herbert-base-cased/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/d24c58747dbe6b61ed3e1eb5d488dfec9332ed13dd3f8983588f30d96f6f1bde.193ae07fbea6bb9ac46f854cd03094e486dfa4483e0596fd6a159dcfaef521a5
Model config BertConfig {
  "_name_or_path": "allegro/herbert-base-cased",
  "architectures": [
    "BertModel"
  ],
  "attention_probs_dropout_prob": 0.1,
  "classifier_dropout": null,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "layer_norm_eps": 1e-12,
  "max_position_embeddings": 514,
  "model_type": "bert",
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "pad_token_id": 1,
  "position_embedding_type": "absolute",
  "tokenizer_class": "HerbertTokenizerFast",
  "transformers_version": "4.13.0",
  "type_vocab_size": 2,
  "use_cache": true,
  "vocab_size": 50000
}

loading file https://huggingface.co/allegr

  0%|          | 0/1 [00:00<?, ?ba/s]

No `TrainingArguments` passed, using `output_dir=tmp_trainer`.
PyTorch: setting up devices
The default value for the training argument `--report_to` will change in v5 (from all installed integrations to none). In v5, you will need to use `--report_to all` to get the same behavior as now. You should start updating your code and make this info disappear :-).
The following columns in the evaluation set  don't have a corresponding argument in `BertForSequenceClassification.forward` and have been ignored: text.
***** Running Evaluation *****
  Num examples = 1000
  Batch size = 8


loading configuration file https://huggingface.co/allegro/herbert-base-cased/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/d24c58747dbe6b61ed3e1eb5d488dfec9332ed13dd3f8983588f30d96f6f1bde.193ae07fbea6bb9ac46f854cd03094e486dfa4483e0596fd6a159dcfaef521a5
Model config BertConfig {
  "_name_or_path": "allegro/herbert-base-cased",
  "architectures": [
    "BertModel"
  ],
  "attention_probs_dropout_prob": 0.1,
  "classifier_dropout": null,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "layer_norm_eps": 1e-12,
  "max_position_embeddings": 514,
  "model_type": "bert",
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "pad_token_id": 1,
  "position_embedding_type": "absolute",
  "tokenizer_class": "HerbertTokenizerFast",
  "transformers_version": "4.13.0",
  "type_vocab_size": 2,
  "use_cache": true,
  "vocab_size": 50000
}

loading file https://huggingface.co/allegr

  0%|          | 0/1 [00:00<?, ?ba/s]

No `TrainingArguments` passed, using `output_dir=tmp_trainer`.
PyTorch: setting up devices
The default value for the training argument `--report_to` will change in v5 (from all installed integrations to none). In v5, you will need to use `--report_to all` to get the same behavior as now. You should start updating your code and make this info disappear :-).
The following columns in the evaluation set  don't have a corresponding argument in `BertForSequenceClassification.forward` and have been ignored: text.
***** Running Evaluation *****
  Num examples = 1000
  Batch size = 8


In [None]:
herbert_fine_tuned1_score

{'accuracy': 0.906,
 'f1': 0.731707595529221,
 'precision': 0.6807934921236772,
 'recall': 0.8719858156028368}

In [None]:
herbert_fine_tuned2_score

{'accuracy': 0.901,
 'f1': 0.48652275347353124,
 'precision': 0.461752512518451,
 'recall': 0.5434682286119986}

## Compare the results of classification on the test set. Select the appropriate measures (from accuracy, F1, macro/micro F1, MCC) to compare the results.

In [None]:
scores_task1

Unnamed: 0,classifier,accuracy,precision,recall,f1
0,GaussianNaiveBayes,0.782,0.298507,0.243902,0.268456
0,fastText,0.873,,,
0,transformers_herbert-cased,0.906,0.680793,0.871986,0.731708


In [None]:
scores_task2

Unnamed: 0,classifier,accuracy,precision,recall,f1
0,GaussianNaiveBayes,0.787,0.408183,0.401325,0.396831
0,fastText,0.868,,,
0,transformers_herbert-cased,0.901,0.461753,0.543468,0.486523


## Select 1 TP, 1 TN, 1 FP and 1 FN from your predictions (for the best classifier) and compare the decisions of each classifier on these examples using LIME.

In [None]:
def predictions(model, dataset):
    lime_tokenizer = AutoTokenizer.from_pretrained("allegro/herbert-base-cased")
    tokenized = [lime_tokenizer(x, truncation=True) for x in dataset]
    trainer = Trainer(model=model, tokenizer=lime_tokenizer)
    return trainer.predict(tokenized)

In [None]:
pred = predictions(herbert_fine_tuned1, dataset_task1['test']['text'])[0]

loading configuration file https://huggingface.co/allegro/herbert-base-cased/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/d24c58747dbe6b61ed3e1eb5d488dfec9332ed13dd3f8983588f30d96f6f1bde.193ae07fbea6bb9ac46f854cd03094e486dfa4483e0596fd6a159dcfaef521a5
Model config BertConfig {
  "_name_or_path": "allegro/herbert-base-cased",
  "architectures": [
    "BertModel"
  ],
  "attention_probs_dropout_prob": 0.1,
  "classifier_dropout": null,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "layer_norm_eps": 1e-12,
  "max_position_embeddings": 514,
  "model_type": "bert",
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "pad_token_id": 1,
  "position_embedding_type": "absolute",
  "tokenizer_class": "HerbertTokenizerFast",
  "transformers_version": "4.13.0",
  "type_vocab_size": 2,
  "use_cache": true,
  "vocab_size": 50000
}

loading file https://huggingface.co/allegr

In [None]:
pred_labels = np.argmax(pred, axis=1)
real_labels = dataset_task1["test"]["label"]

zipped = list(zip(pred_labels, real_labels))
tp = zipped.index((1, 1)) 
tn = zipped.index((0, 0))
fp = zipped.index((1, 0))
fn = zipped.index((0, 1))



In [None]:
dataset_task1["test"][tp]

{'label': 1,
 'text': '@anonymized_account Dokładnie, pisdzielstwo nie ma prawa rozpierdalać systemu,  sądownictwa nie mając większości'}

In [None]:
dataset_task1["test"][tn]

{'label': 0,
 'text': '@anonymized_account Spoko, jak im Duda z Morawieckim zamówią po pięć piw to wszystko będzie ok.'}

In [None]:
dataset_task1["test"][fp]

{'label': 0,
 'text': '@anonymized_account No czy Prezes nie miał racji, mówiąc,ze to są zdradzieckie mordy? No czy nie miał racji?😁😁'}

In [None]:
dataset_task1["test"][fn]

{'label': 1, 'text': '@anonymized_account Tej szmaty się nie komentuje'}

In [None]:
def explain(text):
    exp = LimeTextExplainer(class_names=["neutral", "bullying"]).explain_instance(text, lambda x: predictions(herbert_fine_tuned1, x)[0], num_features=10)
    return exp.as_list()

In [None]:
tp_explained = explain(dataset_task1["test"][tp]["text"])
tn_explained = explain(dataset_task1["test"][tn]["text"])
fp_explained = explain(dataset_task1["test"][fp]["text"])
fn_explained = explain(dataset_task1["test"][fn]["text"])

loading configuration file https://huggingface.co/allegro/herbert-base-cased/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/d24c58747dbe6b61ed3e1eb5d488dfec9332ed13dd3f8983588f30d96f6f1bde.193ae07fbea6bb9ac46f854cd03094e486dfa4483e0596fd6a159dcfaef521a5
Model config BertConfig {
  "_name_or_path": "allegro/herbert-base-cased",
  "architectures": [
    "BertModel"
  ],
  "attention_probs_dropout_prob": 0.1,
  "classifier_dropout": null,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "layer_norm_eps": 1e-12,
  "max_position_embeddings": 514,
  "model_type": "bert",
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "pad_token_id": 1,
  "position_embedding_type": "absolute",
  "tokenizer_class": "HerbertTokenizerFast",
  "transformers_version": "4.13.0",
  "type_vocab_size": 2,
  "use_cache": true,
  "vocab_size": 50000
}

loading file https://huggingface.co/allegr

loading configuration file https://huggingface.co/allegro/herbert-base-cased/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/d24c58747dbe6b61ed3e1eb5d488dfec9332ed13dd3f8983588f30d96f6f1bde.193ae07fbea6bb9ac46f854cd03094e486dfa4483e0596fd6a159dcfaef521a5
Model config BertConfig {
  "_name_or_path": "allegro/herbert-base-cased",
  "architectures": [
    "BertModel"
  ],
  "attention_probs_dropout_prob": 0.1,
  "classifier_dropout": null,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "layer_norm_eps": 1e-12,
  "max_position_embeddings": 514,
  "model_type": "bert",
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "pad_token_id": 1,
  "position_embedding_type": "absolute",
  "tokenizer_class": "HerbertTokenizerFast",
  "transformers_version": "4.13.0",
  "type_vocab_size": 2,
  "use_cache": true,
  "vocab_size": 50000
}

loading file https://huggingface.co/allegr

loading configuration file https://huggingface.co/allegro/herbert-base-cased/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/d24c58747dbe6b61ed3e1eb5d488dfec9332ed13dd3f8983588f30d96f6f1bde.193ae07fbea6bb9ac46f854cd03094e486dfa4483e0596fd6a159dcfaef521a5
Model config BertConfig {
  "_name_or_path": "allegro/herbert-base-cased",
  "architectures": [
    "BertModel"
  ],
  "attention_probs_dropout_prob": 0.1,
  "classifier_dropout": null,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "layer_norm_eps": 1e-12,
  "max_position_embeddings": 514,
  "model_type": "bert",
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "pad_token_id": 1,
  "position_embedding_type": "absolute",
  "tokenizer_class": "HerbertTokenizerFast",
  "transformers_version": "4.13.0",
  "type_vocab_size": 2,
  "use_cache": true,
  "vocab_size": 50000
}

loading file https://huggingface.co/allegr

loading configuration file https://huggingface.co/allegro/herbert-base-cased/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/d24c58747dbe6b61ed3e1eb5d488dfec9332ed13dd3f8983588f30d96f6f1bde.193ae07fbea6bb9ac46f854cd03094e486dfa4483e0596fd6a159dcfaef521a5
Model config BertConfig {
  "_name_or_path": "allegro/herbert-base-cased",
  "architectures": [
    "BertModel"
  ],
  "attention_probs_dropout_prob": 0.1,
  "classifier_dropout": null,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "layer_norm_eps": 1e-12,
  "max_position_embeddings": 514,
  "model_type": "bert",
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "pad_token_id": 1,
  "position_embedding_type": "absolute",
  "tokenizer_class": "HerbertTokenizerFast",
  "transformers_version": "4.13.0",
  "type_vocab_size": 2,
  "use_cache": true,
  "vocab_size": 50000
}

loading file https://huggingface.co/allegr

In [None]:
print("Explain true positive", tp_explained)
print("Explain true negative", tn_explained)
print("Explain false positive", fp_explained)
print("Explain false negative", fn_explained)

Explain true positive [('pisdzielstwo', 3.3089579642559652), ('anonymized_account', 0.43402883269232967), ('rozpierdalać', 0.2353210530465218), ('nie', -0.21302804974005496), ('ma', -0.10788851315139412), ('Dokładnie', 0.05569568488557857), ('prawa', -0.0532865376809127), ('sądownictwa', -0.048188164659379217), ('większości', -0.04407593021337539), ('mając', -0.04204153949482299)]
Explain true negative [('Morawieckim', 0.3050314492453677), ('ok', -0.2306309044029568), ('Duda', 0.20996179862586928), ('zamówią', -0.1749966622772737), ('z', -0.15572882958776801), ('pięć', -0.13891380811291776), ('anonymized_account', 0.1104716392355953), ('Spoko', -0.09657377868491167), ('im', 0.09253336596699333), ('jak', 0.06349602947537006)]
Explain false positive [('mordy', 2.551825317315504), ('zdradzieckie', 1.5583004140926886), ('anonymized_account', 0.29649282151969014), ('miał', -0.13201145328032393), ('Prezes', 0.10222059316627484), ('są', -0.08914741641539663), ('racji', 0.06962432530213741), (

## Answer the following questions:


*       Which of the classifiers works the best for the task 1 and the task 2. *Transformers for both*

*   Did you achieve results comparable with the results of PolEval Task? *Better for 1st task http://2019.poleval.pl/index.php/results/*

*   Did you achieve results comparable with the Klej leaderboard? *Better https://klejbenchmark.com/leaderboard/*

*   Describe strengths and weaknesses of each of the compared algorithms. *Results the best with transformers, the fastest is fasttext*

*  Do you think comparison of raw performance values on a single task is enough to assess the value of a given algorithm/model? *Yes*

*  Did SHAP show that the models use valuable features/words when performing their decision? *Bullying words have high values*

    
    
    
    
    
