<a href="https://colab.research.google.com/github/seungeunlee00/JUNIA/blob/main/AML/Transformers.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Transformers tutorial

This tutorial purpose is the usage of Tensorflow and Keras APIs to train a neural network based on the Transformer Encoder.

In a first stage, we'll use Tensorflow-NLP to build our model, while in the second we'll use Keras-NLP to account for padding.

All links to APIs and additional tutorials can be found at the end of the notebook.

## Tensorflow-NLP

Install tensorflow-models-officials

In [None]:
!pip install tf-models-official

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting tf-models-official
  Downloading tf_models_official-2.11.3-py2.py3-none-any.whl (2.3 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.3/2.3 MB[0m [31m21.4 MB/s[0m eta [36m0:00:00[0m
Collecting seqeval
  Downloading seqeval-1.2.2.tar.gz (43 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m43.6/43.6 KB[0m [31m3.3 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting py-cpuinfo>=3.3.0
  Downloading py_cpuinfo-9.0.0-py3-none-any.whl (22 kB)
Collecting sacrebleu
  Downloading sacrebleu-2.3.1-py3-none-any.whl (118 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m118.9/118.9 KB[0m [31m4.9 MB/s[0m eta [36m0:00:00[0m
Collecting tf-slim>=1.1.0
  Downloading tf_slim-1.1.0-py2.py3-none-any.whl (352 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m352.1/352.1 

Import APIs

In [None]:
import tensorflow as tf
from tensorflow import keras
import tensorflow_models as tfm

Download the Reuters dataset while limiting the vocabulary size

In [None]:
max_features = 20000  # Only consider the top 20k words

(x_train, y_train), (x_val, y_val) = keras.datasets.reuters.load_data(num_words=max_features)

In [None]:
print(len(x_train), "Training sequences")
print(len(x_val), "Validation sequences")

8982 Training sequences
2246 Validation sequences


In [None]:
# This assume that all classes have at least one sample
n_classes = max(y_train) + 1

Pad the sequence to a maximum length and convert the labels index to one-hot vectors

In [None]:
maxlen = 200  # Only consider the first 200 words of each newswire

x_train = keras.preprocessing.sequence.pad_sequences(x_train, maxlen=maxlen)
x_val = keras.preprocessing.sequence.pad_sequences(x_val, maxlen=maxlen)

y_train = tf.one_hot(y_train, n_classes)
y_val = tf.one_hot(y_val, n_classes)

Construct the model based on the tensorflow-models-official API

In [None]:
d_model = 128   # dimension of vectors in the encoder
d_ffn = 512     # dimension of vectors in the feed forward network
n_head = 4      # number of head in Multi-Head Attention
n_layer = 3     # number of encoder layers/blocks

Position encoding

In [None]:
class PositionEncoding(keras.layers.Layer):
    def __init__(self, **kwargs):
        super().__init__(**kwargs)

    def make_values(self, n, d):

        # intialize with the denominator inside of sine values
        pos_enc = 1 / 10000**(2 * tf.range(tf.cast(tf.math.ceil(d/2), tf.float32) / tf.cast(d, tf.float32)))
        
        # multiply by position (numerator)
        pos_enc = pos_enc * tf.cast(tf.repeat(tf.expand_dims(tf.range(n), axis=-1), tf.cast(tf.math.ceil(d/2), tf.int32), axis=-1), tf.float32)
        
        return pos_enc

    def call(self, inputs):
        input_shape = tf.shape(inputs)   # batch, sequence, features

        # computes the inside of sine values
        pos_enc = self.make_values(input_shape[1], input_shape[2])
        
        # stack sine and cosine values
        pos_enc = tf.stack((tf.sin(pos_enc), tf.cos(pos_enc)), axis=-1)
        
        # reshape them to get sine in even and cosine in odd dimensions
        pos_enc = tf.reshape(pos_enc, (input_shape[1], -1))
        
        # repeat for all sample in batch
        pos_enc = tf.repeat(tf.expand_dims(pos_enc, axis=0), input_shape[0], axis=0)
        
        # add to inputs
        return inputs + pos_enc[..., :input_shape[2]]

Model architecture

In [None]:
# Input for variable-length sequences of integers (id of words)
inputs = keras.Input(shape=(None,), dtype="int32")

# Embed each integer in a vector wtih one-hotencoding + embedding
x = keras.layers.Embedding(max_features, d_model)(inputs)

# Add position encoding to tokens
x = PositionEncoding()(x)

# Add encoder blocks
for i in range(n_layer):
    x = tfm.nlp.layers.TransformerEncoderBlock(num_attention_heads=n_head, inner_dim=d_ffn, inner_activation='relu')(x)

# Reduce the sequence to a single vector with global average pooling
x = keras.layers.GlobalAveragePooling1D()(x)

# Classifier layer
outputs = keras.layers.Dense(n_classes, activation="softmax")(x)

# Build the model
model = keras.Model(inputs, outputs)

# Display information on the model 
model.summary()

Model: "model"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 input_1 (InputLayer)        [(None, None)]            0         
                                                                 
 embedding (Embedding)       (None, None, 128)         2560000   
                                                                 
 position_encoding (Position  (None, None, 128)        0         
 Encoding)                                                       
                                                                 
 transformer_encoder_block (  (None, None, 128)        198272    
 TransformerEncoderBlock)                                        
                                                                 
 transformer_encoder_block_1  (None, None, 128)        198272    
  (TransformerEncoderBlock)                                      
                                                             

Compile and train the model

In [None]:
# Pay attention to the hyperparameters used here
model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=1e-3), loss="categorical_crossentropy", metrics=["accuracy"])

# Fit the model to the data
model.fit(x_train, y_train, batch_size=64, epochs=100, validation_data=(x_val, y_val))

Epoch 1/100
Epoch 2/100
Epoch 3/100

KeyboardInterrupt: ignored

## Keras-NLP

In [None]:
!pip install keras_nlp

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting keras_nlp
  Downloading keras_nlp-0.4.0-py3-none-any.whl (337 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m337.5/337.5 KB[0m [31m17.2 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: keras_nlp
Successfully installed keras_nlp-0.4.0


In [None]:
import keras_nlp

inputs = keras.Input(shape=(None,), dtype="int32")

x = keras.layers.Embedding(max_features, d_model, mask_zero=True)(inputs)   # Notice the mask_zero parameter to indicate to not pay attention to padding

positional_encoding = keras_nlp.layers.SinePositionEncoding()(x)   # encode the position using Keras API

x = x + positional_encoding   # add to the tokens

for i in range(n_layer):
    x = keras_nlp.layers.TransformerEncoder(intermediate_dim=d_ffn, num_heads=n_head, activation='relu')(x)

x = keras.layers.GlobalAveragePooling1D()(x)

outputs = keras.layers.Dense(n_classes, activation="softmax")(x)

model = keras.Model(inputs, outputs)

model.summary()

Model: "model_1"
__________________________________________________________________________________________________
 Layer (type)                   Output Shape         Param #     Connected to                     
 input_2 (InputLayer)           [(None, None)]       0           []                               
                                                                                                  
 embedding_1 (Embedding)        (None, None, 128)    2560000     ['input_2[0][0]']                
                                                                                                  
 sine_position_encoding (SinePo  (None, None, 128)   0           ['embedding_1[0][0]']            
 sitionEncoding)                                                                                  
                                                                                                  
 tf.__operators__.add (TFOpLamb  (None, None, 128)   0           ['embedding_1[0][0]',      

In [None]:
model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=1e-3), loss="categorical_crossentropy", metrics=["accuracy"])
model.fit(x_train, y_train, batch_size=64, epochs=100, validation_data=(x_val, y_val))

Epoch 1/100


KeyboardInterrupt: ignored

# Links

### APIs

Keras-NLP : https://keras.io/api/keras_nlp/

Tensorflow-NLP : https://www.tensorflow.org/api_docs/python/tfm/nlp

Keras-NLP Transformer Encoder : https://keras.io/api/keras_nlp/modeling_layers/transformer_encoder/

Keras-NLP position encoding : https://keras.io/api/keras_nlp/modeling_layers/sine_position_encoding/

Tensorflow-NLP Transformer Encoder : https://www.tensorflow.org/api_docs/python/tfm/nlp/layers/TransformerEncoderBlock

### Tutorials

Keras NLP tutorials : https://keras.io/examples/nlp/

Example of text classification with Transformers : https://keras.io/examples/nlp/text_classification_with_transformer/


### Dataset

Keras datasets : https://keras.io/api/datasets/

Reuters newswire : https://keras.io/api/datasets/reuters/

IMDB movie review sentiment : https://keras.io/api/datasets/imdb/