***Fine-tunning***

- Words are used in sentences (MLM)
- Sentences are treated in a larger document (NSP)

We can use what BERT has learned and fine-tune it on a specific task of our choosing

**Sequence Classification**

The architecture will look similar to the next sentence prediction task, however using only one sentence.

We will pass this sentence into the pre-trained BERT

We ignore all tokens other than the CLS token and add another feed forward layer after the pooler to train it to the different classes we will have for our specific task

**Token Classification**

Here we do care about the representations of each token,
we will pass all of these tokens through a feed forward layer to classify each token with however many layers that we have.

An example would be named entity recognition

We are simply adding extra layers onto the pre-trained BERT model in order to perform the task we need

**Question/Answering**

We will pass a question and some piece of text with the question in it.

The context will be large and will use BERT to find the answer to the question, that is whether each specific token represents the start or the end of the answer to the question in order to extract it.

In [2]:
# imports

from transformers import (pipeline,
                          BertForQuestionAnswering,
                          BertForSequenceClassification,
                          BertForTokenClassification
                          )

  from .autonotebook import tqdm as notebook_tqdm


In [3]:
# Loading the model

bert_sq = BertForSequenceClassification.from_pretrained('bert-base-uncased', num_labels= 2)

Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [4]:
bert_sq

BertForSequenceClassification(
  (bert): BertModel(
    (embeddings): BertEmbeddings(
      (word_embeddings): Embedding(30522, 768, padding_idx=0)
      (position_embeddings): Embedding(512, 768)
      (token_type_embeddings): Embedding(2, 768)
      (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
      (dropout): Dropout(p=0.1, inplace=False)
    )
    (encoder): BertEncoder(
      (layer): ModuleList(
        (0-11): 12 x BertLayer(
          (attention): BertAttention(
            (self): BertSelfAttention(
              (query): Linear(in_features=768, out_features=768, bias=True)
              (key): Linear(in_features=768, out_features=768, bias=True)
              (value): Linear(in_features=768, out_features=768, bias=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
            (output): BertSelfOutput(
              (dense): Linear(in_features=768, out_features=768, bias=True)
              (LayerNorm): LayerNorm((768,), eps=1e-12,

In [5]:
bert_sq.classifier

Linear(in_features=768, out_features=2, bias=True)

The bert classifier takes in the 768 features from the CLS token and by default outputs 2 features, however we can change this when we instantiate the model passing the argument "num_labels".

In [6]:
# Finding a classifier from the HuggingFace model repository

finbert = pipeline('text-classification',
                  model='ProsusAI/finbert',
                  tokenizer='ProsusAI/finbert')

**Examples of financial classification**

In [7]:
finbert('Stocks rallied and the Brittish pound gained')

[{'label': 'positive', 'score': 0.7386967539787292}]

In [8]:
finbert('The stocks did ok')

[{'label': 'neutral', 'score': 0.7982069849967957}]

In [9]:
# Looking at the model in depth

finbert.model

BertForSequenceClassification(
  (bert): BertModel(
    (embeddings): BertEmbeddings(
      (word_embeddings): Embedding(30522, 768, padding_idx=0)
      (position_embeddings): Embedding(512, 768)
      (token_type_embeddings): Embedding(2, 768)
      (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
      (dropout): Dropout(p=0.1, inplace=False)
    )
    (encoder): BertEncoder(
      (layer): ModuleList(
        (0-11): 12 x BertLayer(
          (attention): BertAttention(
            (self): BertSelfAttention(
              (query): Linear(in_features=768, out_features=768, bias=True)
              (key): Linear(in_features=768, out_features=768, bias=True)
              (value): Linear(in_features=768, out_features=768, bias=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
            (output): BertSelfOutput(
              (dense): Linear(in_features=768, out_features=768, bias=True)
              (LayerNorm): LayerNorm((768,), eps=1e-12,

We can see it is a bert model with a classifying layer

In [10]:
# Instantiating a token classifier model

bert_tc = BertForTokenClassification.from_pretrained('bert-base-uncased')
bert_tc

Some weights of BertForTokenClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


BertForTokenClassification(
  (bert): BertModel(
    (embeddings): BertEmbeddings(
      (word_embeddings): Embedding(30522, 768, padding_idx=0)
      (position_embeddings): Embedding(512, 768)
      (token_type_embeddings): Embedding(2, 768)
      (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
      (dropout): Dropout(p=0.1, inplace=False)
    )
    (encoder): BertEncoder(
      (layer): ModuleList(
        (0-11): 12 x BertLayer(
          (attention): BertAttention(
            (self): BertSelfAttention(
              (query): Linear(in_features=768, out_features=768, bias=True)
              (key): Linear(in_features=768, out_features=768, bias=True)
              (value): Linear(in_features=768, out_features=768, bias=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
            (output): BertSelfOutput(
              (dense): Linear(in_features=768, out_features=768, bias=True)
              (LayerNorm): LayerNorm((768,), eps=1e-12, el

Again after all the layers we have a classifier

In [11]:
bert_tc.classifier

Linear(in_features=768, out_features=2, bias=True)

In [12]:
# Loading a model for classification in Turkish

custom_model = 'savasy/bert-base-turkish-ner-cased'
ner = pipeline('ner', model= custom_model, tokenizer= custom_model)

sequence = "Merhaba! Benim anim Sinan. San Francisco' dan gelyorum"
ner(sequence)

Some weights of the model checkpoint at savasy/bert-base-turkish-ner-cased were not used when initializing BertForTokenClassification: ['bert.pooler.dense.weight', 'bert.pooler.dense.bias']
- This IS expected if you are initializing BertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForTokenClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


[{'entity': 'B-LOC',
  'score': 0.9988091,
  'index': 8,
  'word': 'San',
  'start': 27,
  'end': 30},
 {'entity': 'I-LOC',
  'score': 0.9978946,
  'index': 9,
  'word': 'Francisco',
  'start': 31,
  'end': 40}]

We can see how the model finds a location B-LOC meaning it is the start of a location and I-LOC being the continuation of a location, this is how entity recognition has found those entities within the Turkish text.

In [13]:
# Question/Answering model

bert_qa = BertForQuestionAnswering.from_pretrained('bert-base-uncased')
bert_qa

Some weights of BertForQuestionAnswering were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['qa_outputs.weight', 'qa_outputs.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


BertForQuestionAnswering(
  (bert): BertModel(
    (embeddings): BertEmbeddings(
      (word_embeddings): Embedding(30522, 768, padding_idx=0)
      (position_embeddings): Embedding(512, 768)
      (token_type_embeddings): Embedding(2, 768)
      (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
      (dropout): Dropout(p=0.1, inplace=False)
    )
    (encoder): BertEncoder(
      (layer): ModuleList(
        (0-11): 12 x BertLayer(
          (attention): BertAttention(
            (self): BertSelfAttention(
              (query): Linear(in_features=768, out_features=768, bias=True)
              (key): Linear(in_features=768, out_features=768, bias=True)
              (value): Linear(in_features=768, out_features=768, bias=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
            (output): BertSelfOutput(
              (dense): Linear(in_features=768, out_features=768, bias=True)
              (LayerNorm): LayerNorm((768,), eps=1e-12, elem

We have at the end the QA outputs, where there will always be 2 outcomes for every token, whether or not it is the start or end of a question


In [14]:
# We instantiate a derivative or the Roberta model
# it is a derivative of the BERT model that does well in QA tasks

model_name = 'deepset/roberta-base-squad2'
qa = pipeline(model= model_name, tokenizer= model_name,
              revision= 'v1.0', task='question-answering')

Some weights of the model checkpoint at deepset/roberta-base-squad2 were not used when initializing RobertaForQuestionAnswering: ['roberta.pooler.dense.weight', 'roberta.pooler.dense.bias']
- This IS expected if you are initializing RobertaForQuestionAnswering from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForQuestionAnswering from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


In [17]:
sequence = 'Where is Sinan living these days?', 'Sinan lives in California but Matt lives in Boston'
qa(*sequence)

{'score': 0.9734253883361816, 'start': 15, 'end': 25, 'answer': 'California'}