This will summarize the different ways to use BERT that we encountered during the course.

# Tokenization with BERT

During any text data preprocessing, there is a tokenization phase involed. The tokenizer used by Google and available with their package is very powerful!

## Dependencies

In [0]:
!pip install bert-for-tf2
!pip install sentencepiece

Collecting bert-for-tf2
  Downloading https://files.pythonhosted.org/packages/c2/d8/14e0cfa03bbeb72c314f0648267c490bcceec5e8fb25081ec31307b5509c/bert-for-tf2-0.12.6.tar.gz
Collecting py-params>=0.7.3
  Downloading https://files.pythonhosted.org/packages/ec/17/71c5f3c0ab511de96059358bcc5e00891a804cd4049021e5fa80540f201a/py-params-0.8.2.tar.gz
Collecting params-flow>=0.7.1
  Downloading https://files.pythonhosted.org/packages/0d/12/2604f88932f285a473015a5adabf08496d88dad0f9c1228fab1547ccc9b5/params-flow-0.7.4.tar.gz
Building wheels for collected packages: bert-for-tf2, py-params, params-flow
  Building wheel for bert-for-tf2 (setup.py) ... [?25l[?25hdone
  Created wheel for bert-for-tf2: filename=bert_for_tf2-0.12.6-cp36-none-any.whl size=29115 sha256=6b3150da5cfc2503b95442af9f1479b9a5a1981a2fab967a2fc1caa245c73359
  Stored in directory: /root/.cache/pip/wheels/24/19/54/51eeca468b219a1bc910c54aff87f0648b28a1fb71c115ba0f
  Building wheel for py-params (setup.py) ... [?25l[?25hdone
  C

In [0]:
try:
    %tensorflow_version 2.x
except Exception:
    pass
import tensorflow as tf

import tensorflow_hub as hub

import bert

TensorFlow 2.x selected.


## Tokenization

Create the tokenizer with the BERT layer we need to call.

In [0]:
FullTokenizer = bert.bert_tokenization.FullTokenizer
bert_layer = hub.KerasLayer("https://tfhub.dev/tensorflow/bert_en_uncased_L-12_H-768_A-12/1",
                            trainable=False) # trainable=False because we won't train this layer, we just need info from it
vocab_file = bert_layer.resolved_object.vocab_file.asset_path.numpy()
do_lower_case = bert_layer.resolved_object.do_lower_case.numpy()
tokenizer = FullTokenizer(vocab_file, do_lower_case)

Applying the tokenizer then converting into ids.

In [0]:
print(tokenizer.tokenize("Roses are red, violets are blue."))
print(tokenizer.convert_tokens_to_ids(tokenizer.tokenize("Roses are red, violets are blue.")))

['roses', 'are', 'red', ',', 'violet', '##s', 'are', 'blue', '.']
[10529, 2024, 2417, 1010, 8766, 2015, 2024, 2630, 1012]


# Embedding with BERT

This time we go even further, keeping any NLP model we already have but improving the embedding layer with BERT.

## Dependencies

In [0]:
!pip install bert-for-tf2
!pip install sentencepiece

In [0]:
try:
    %tensorflow_version 2.x
except Exception:
    pass
import tensorflow as tf

import tensorflow_hub as hub

import bert

In [0]:
import numpy as np
from collections import namedtuple

## Inputs creation

Our BERT embedding layer will need three types of input tokens.

In [0]:
FullTokenizer = bert.bert_tokenization.FullTokenizer
bert_layer = hub.KerasLayer("https://tfhub.dev/tensorflow/bert_en_uncased_L-12_H-768_A-12/1",
                            trainable=False) # trainable=False because we won't train this layer, we just need info from it
vocab_file = bert_layer.resolved_object.vocab_file.asset_path.numpy()
do_lower_case = bert_layer.resolved_object.do_lower_case.numpy()
tokenizer = FullTokenizer(vocab_file, do_lower_case)

In [0]:
def get_ids(tokens):
    return tokenizer.convert_tokens_to_ids(tokens)

def get_mask(tokens):
    return np.char.not_equal(tokens, "[PAD]").astype(int)

def get_segments(tokens):
    seg_ids = []
    current_seg_id = 0
    for tok in tokens:
        seg_ids.append(current_seg_id)
        if tok == "[SEP]":
            current_seg_id = 1-current_seg_id # turns 1 into 0 and vice versa
    return seg_ids

In [0]:
my_sent = "Roses are red, violets are blue."
my_tok_sent = tokenizer.tokenize(my_sent)

In [0]:
my_input = tf.expand_dims([get_ids(my_tok_sent),
                           get_mask(my_tok_sent),
                           get_segments(my_tok_sent)],
                          axis=0) # expand_dims to simulate batch

In [0]:
my_input[:, 0, :]

<tf.Tensor: shape=(1, 9), dtype=int32, numpy=
array([[10529,  2024,  2417,  1010,  8766,  2015,  2024,  2630,  1012]],
      dtype=int32)>

## Your model

In [0]:
def YourModel(tf.keras.Model):

    def __init__(self,
                 # ...,
                 ):
        super(YourModel, self).__init__()

        self.bert_embedder = hub.keras_layer(
            "https://tfhub.dev/tensorflow/bert_en_uncased_L-12_H-768_A-12/1",
            trainable=False) # we still don't want to train this layer, the learning
                             # will come with the rest of the model, this is as frozen embedding layer
            
        # ...

    # ...

    def call(self, inputs):
        _, embedded = self.bert_embedder([inputs[:, 0, :],
                                          inputs[:, 1, :],
                                          inputs[:, 2, :])
        
        # ...
    

# Fine-tuning BERT

This time we use BERT as the core of our model and we fine-tune it.
We need to identify to things:

*   Which of two outputs from BERT we will use, the first one (sentence-level reprensentation, for calssification for example) or the second one (token-level representation).
*   How to use the dense layer we use after BERT that suits our task (a simple dense layer with `nb_units=nb_classes` for a classification for instance).

## Dependencies

We will use a different package because it comes with an better optimizer to train BERT.

In [0]:
!pip install tf-models-official
!pip install tf-nightly

In [0]:
import tensorflow as tf
import tensorflow_hub as hub

In [0]:
from official.nlp.bert.tokenization import FullTokenizer
from official.nlp import optimization

## Inputs

Same process as for the BERT embedding, but we need to add [CLS] and [SEP] tokens.

In [0]:
FullTokenizer = bert.bert_tokenization.FullTokenizer
bert_layer = hub.KerasLayer("https://tfhub.dev/tensorflow/bert_en_uncased_L-12_H-768_A-12/1",
                            trainable=False) # trainable=False because we won't train this layer, we just need info from it
vocab_file = bert_layer.resolved_object.vocab_file.asset_path.numpy()
do_lower_case = bert_layer.resolved_object.do_lower_case.numpy()
tokenizer = FullTokenizer(vocab_file, do_lower_case)

In [0]:
def get_ids(tokens):
    return tokenizer.convert_tokens_to_ids(tokens)

def get_mask(tokens):
    return np.char.not_equal(tokens, "[PAD]").astype(int)

def get_segments(tokens):
    seg_ids = []
    current_seg_id = 0
    for tok in tokens:
        seg_ids.append(current_seg_id)
        if tok == "[SEP]":
            current_seg_id = 1-current_seg_id # turns 1 into 0 and vice versa
    return seg_ids

In [0]:
my_sent_1 = "Roses are red."
my_sent_2 = "Violets are blue."
my_tok_sent = ["[CLS]"] + tokenizer.tokenize(my_sent_1) + ["[SEP]"] + tokenizer.tokenize(my_sent_2) + ["[SEP]"]

In [0]:
my_tok_sent

['[CLS]',
 'roses',
 'are',
 'red',
 '.',
 '[SEP]',
 'violet',
 '##s',
 'are',
 'blue',
 '.',
 '[SEP]']

In [0]:
my_input = tf.expand_dims([get_ids(my_tok_sent),
                           get_mask(my_tok_sent),
                           get_segments(my_tok_sent)],
                          axis=0) # expand_dims to simulate batch

In [0]:
my_input[:, 0, :]

## Model

We define a simple dense-based layer the will be added after BERT.

In [0]:
def MyLayer(tf.keras.layers.Layer):

    def __init__(self,
                 nb_units):
        super(MyLayer, self).__init__()

        self.my_dense = tf.keras.layers.Dense(
            nb_units,
            kernel_initializer=tf.keras.initializers.TruncatedNormal(stddev=0.02)) # good initializer for BERT's dense layer
    
    def cal(self, inputs):
        x = self.my_dens(inputs)
        
        # ... any other task specific computation.
        # For classification for instance, this would be enough.

        return x

Building our whole model.

In [0]:
class BERTModel(tf.keras.Model):
    
    def __init__(self,
                 nb_units,
                 dropout_rate):
        super(BERTModel, self).__init__()

        self.dropout_rate = dropout_rate
        
        self.bert_layer = hub.KerasLayer(
            "https://tfhub.dev/tensorflow/bert_en_uncased_L-12_H-768_A-12/1",
            trainable=True) # Trainable=True to tune the weights for our task!
        
        self.my_layer = MyLayer(nb_units)

    def call(self, inputs):
        # To use first BERT output (sentence-level representation),
        # for classification for instance.

        x, _ = self.bert_embedder([inputs[:, 0, :],
                                   inputs[:, 1, :],
                                   inputs[:, 2, :])
        

        # To use second BERT output (token-level representation).

        _, x = self.bert_embedder([inputs[:, 0, :],
                                   inputs[:, 1, :],
                                   inputs[:, 2, :])
        
        x = tf.nn.dropout(x, self.dropout_rate) # Might be good to add a dropout here.

        my_output = self.my_layer(x)
        
        return my_output

## Training

In [0]:
NB_UNITS = 2

DROPOUT_RATE = 0.1

BATCH_SIZE = 32
NB_EPOCHS = 5
INIT_LR = 5e-5
WARMUP_STEPS = int(NB_BATCHES_TRAIN * 0.1)

In [0]:
my_model = BERTModel(NB_UNITS,
                     DROPOUT_RATE)

We can use the optimizer provided by the package (a modified Adam).

In [0]:
optimizer = optimization.create_optimizer(
    init_lr=INIT_LR,
    num_train_steps=NB_BATCHES_TRAIN,
    num_warmup_steps=WARMUP_STEPS)

Let's compile.

In [0]:
bert_classifier.compile(optimizer,
                        my_loss_fn,
                        [my_metric])

And now we can fit, evaluate and use our model like any other!