<a href="https://colab.research.google.com/github/s34836/WUM/blob/main/Lab_15_Language_Models_2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Language models
## Using pre-trained language models with `transformers`

### Example - adapting BERT for a classification task

In [1]:
from transformers import AutoTokenizer, AutoModelForSequenceClassification, DataCollatorWithPadding
tokenizer = AutoTokenizer.from_pretrained('distilbert-base-uncased')
model = AutoModelForSequenceClassification.from_pretrained('distilbert-base-uncased', num_labels=2)

for name, param in model.base_model.named_parameters():
    param.requires_grad = False

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/483 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/268M [00:00<?, ?B/s]

Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [2]:
from datasets import load_dataset
dataset = load_dataset('rotten_tomatoes')

def preprocess_function(examples):
    return tokenizer(examples['text'], truncation=True, padding=True)

encoded_dataset = dataset.map(preprocess_function, batched=True)

README.md: 0.00B [00:00, ?B/s]

train.parquet:   0%|          | 0.00/699k [00:00<?, ?B/s]

validation.parquet:   0%|          | 0.00/90.0k [00:00<?, ?B/s]

test.parquet:   0%|          | 0.00/92.2k [00:00<?, ?B/s]

Generating train split:   0%|          | 0/8530 [00:00<?, ? examples/s]

Generating validation split:   0%|          | 0/1066 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/1066 [00:00<?, ? examples/s]

Map:   0%|          | 0/8530 [00:00<?, ? examples/s]

Map:   0%|          | 0/1066 [00:00<?, ? examples/s]

Map:   0%|          | 0/1066 [00:00<?, ? examples/s]

In [3]:
!pip install evaluate

Collecting evaluate
  Downloading evaluate-0.4.6-py3-none-any.whl.metadata (9.5 kB)
Downloading evaluate-0.4.6-py3-none-any.whl (84 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m84.1/84.1 kB[0m [31m4.6 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: evaluate
Successfully installed evaluate-0.4.6


In [4]:
from transformers import Trainer, TrainingArguments
import evaluate

def compute_metrics(eval_pred):
    predictions, labels = eval_pred
    predictions = predictions.argmax(-1)

    accuracy_metric = evaluate.load('accuracy')
    f1_metric = evaluate.load('f1')

    acc = accuracy_metric.compute(predictions=predictions, references=labels)
    f1 = f1_metric.compute(predictions=predictions, references=labels)

    return {'accuracy': acc['accuracy'], 'f1': f1['f1']}

training_args = TrainingArguments(
    output_dir='./results',
    num_train_epochs=3,
    eval_strategy='steps',
    report_to='none'
)

data_collator = DataCollatorWithPadding(tokenizer=tokenizer)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=encoded_dataset['train'],
    eval_dataset=encoded_dataset['validation'],
    data_collator=data_collator,
    compute_metrics=compute_metrics
)

In [5]:
trainer.train()

Step,Training Loss,Validation Loss,Accuracy,F1
500,0.6037,0.52713,0.754221,0.704955
1000,0.5144,0.490041,0.762664,0.786137
1500,0.4711,0.445086,0.806754,0.80566
2000,0.4826,0.43313,0.817073,0.815516
2500,0.4694,0.429522,0.80863,0.801942
3000,0.4494,0.428716,0.804878,0.796875


Downloading builder script: 0.00B [00:00, ?B/s]

Downloading builder script: 0.00B [00:00, ?B/s]

TrainOutput(global_step=3201, training_loss=0.4959597577157895, metrics={'train_runtime': 123.5902, 'train_samples_per_second': 207.055, 'train_steps_per_second': 25.9, 'total_flos': 501145384597464.0, 'train_loss': 0.4959597577157895, 'epoch': 3.0})

In [6]:
predictions = trainer.predict(encoded_dataset['validation'])
metrics = compute_metrics((predictions.predictions, predictions.label_ids))

print("Accuracy:", metrics['accuracy'])
print("F1 Score:", metrics['f1'])

Accuracy: 0.8133208255159474
F1 Score: 0.8095693779904306


## Pipelines

The transformers library provides complete pipelines for common NLP tasks, such as sentiment analysis, named entity recognition, zero-shot classification, question answering, text generation, summarization, ...

In [7]:
from transformers import pipeline

model = pipeline(task="sentiment-analysis")

model.predict(["I absolutely love this movie! It's fantastic and thrilling.", "I really hated this film. It was boring and too long."])

No model was supplied, defaulted to distilbert/distilbert-base-uncased-finetuned-sst-2-english and revision 714eb0f (https://huggingface.co/distilbert/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.


config.json:   0%|          | 0.00/629 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/268M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

vocab.txt: 0.00B [00:00, ?B/s]

Device set to use cuda:0


[{'label': 'POSITIVE', 'score': 0.9998844861984253},
 {'label': 'NEGATIVE', 'score': 0.9997817873954773}]

In [8]:
from transformers import pipeline

model = pipeline(task="ner")

model.predict("Albert Einstein was born in Ulm, Germany in 1879.")

No model was supplied, defaulted to dbmdz/bert-large-cased-finetuned-conll03-english and revision 4c53496 (https://huggingface.co/dbmdz/bert-large-cased-finetuned-conll03-english).
Using a pipeline without specifying a model name and revision in production is not recommended.


config.json:   0%|          | 0.00/998 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/1.33G [00:00<?, ?B/s]

Some weights of the model checkpoint at dbmdz/bert-large-cased-finetuned-conll03-english were not used when initializing BertForTokenClassification: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight']
- This IS expected if you are initializing BertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForTokenClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


tokenizer_config.json:   0%|          | 0.00/60.0 [00:00<?, ?B/s]

vocab.txt: 0.00B [00:00, ?B/s]

Device set to use cuda:0


[{'entity': 'I-PER',
  'score': np.float32(0.9992594),
  'index': 1,
  'word': 'Albert',
  'start': 0,
  'end': 6},
 {'entity': 'I-PER',
  'score': np.float32(0.99950576),
  'index': 2,
  'word': 'Einstein',
  'start': 7,
  'end': 15},
 {'entity': 'I-LOC',
  'score': np.float32(0.99692804),
  'index': 6,
  'word': 'U',
  'start': 28,
  'end': 29},
 {'entity': 'I-LOC',
  'score': np.float32(0.99383223),
  'index': 7,
  'word': '##lm',
  'start': 29,
  'end': 31},
 {'entity': 'I-LOC',
  'score': np.float32(0.99957985),
  'index': 9,
  'word': 'Germany',
  'start': 33,
  'end': 40}]

# Task
Compare the results of the model trained in the previous example to:
1. The outputs generated by a generic sentiment-analysis pipeline.
2. The outputs generated by a generic zero-shot classification pipeline.

In [11]:
from transformers import pipeline
import numpy as np
import evaluate

sa = pipeline("sentiment-analysis")  # domyślnie: distilbert-base-uncased-finetuned-sst-2-english

# Bierzemy z oryginalnego datasetu, nie z encoded_dataset
texts_raw = dataset["validation"]["text"]
labels = np.array(dataset["validation"]["label"])

# Bezpieczne czyszczenie wejścia
texts = [t if isinstance(t, str) else "" for t in texts_raw]

out = sa(texts, batch_size=32, truncation=True)

preds = np.array([1 if x["label"].upper() in ["POSITIVE", "LABEL_1"] else 0 for x in out])

acc_metric = evaluate.load("accuracy")
f1_metric = evaluate.load("f1")

acc = acc_metric.compute(predictions=preds, references=labels)["accuracy"]
f1 = f1_metric.compute(predictions=preds, references=labels)["f1"]

print("Generic sentiment-analysis pipeline")
print("Accuracy:", acc)
print("F1:", f1)

No model was supplied, defaulted to distilbert/distilbert-base-uncased-finetuned-sst-2-english and revision 714eb0f (https://huggingface.co/distilbert/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.
Device set to use cuda:0


Generic sentiment-analysis pipeline
Accuracy: 0.9052532833020638
F1: 0.9055191768007483


In [12]:
from transformers import pipeline
import numpy as np
import evaluate

zsc = pipeline("zero-shot-classification")  # domyślnie zwykle: facebook/bart-large-mnli

texts_raw = dataset["validation"]["text"]
labels = np.array(dataset["validation"]["label"])
texts = [t if isinstance(t, str) else "" for t in texts_raw]

candidate_labels = ["negative", "positive"]

# Zmaterializuj wynik (czasem pipeline zwraca iterator)
out = list(zsc(texts, candidate_labels=candidate_labels, batch_size=8, truncation=True))

preds = np.array([1 if o["labels"][0] == "positive" else 0 for o in out])

acc_metric = evaluate.load("accuracy")
f1_metric = evaluate.load("f1")

acc = acc_metric.compute(predictions=preds, references=labels)["accuracy"]
f1 = f1_metric.compute(predictions=preds, references=labels)["f1"]

print("Generic zero-shot classification pipeline (negative/positive)")
print("Accuracy:", acc)
print("F1:", f1)

No model was supplied, defaulted to facebook/bart-large-mnli and revision d7645e1 (https://huggingface.co/facebook/bart-large-mnli).
Using a pipeline without specifying a model name and revision in production is not recommended.
Device set to use cuda:0


Generic zero-shot classification pipeline (negative/positive)
Accuracy: 0.8067542213883677
F1: 0.7863070539419087
