# How to write a Custom Keras Model so that it can be deployed for Serving
<i> Accompanies this blog post https://medium.com/@lakshmanok/how-to-write-a-custom-keras-model-so-that-it-can-be-deployed-for-serving-7d81ace4a1f8 </i>

At some point, you will find yourself going beyond pre-built Keras capabilities and subclassing Keras layers and models.

Unfortunately, when you do that, exporting the model so that it can be easily deployed becomes quite difficult.

The documentation that explains what you have do is quite scattered. My aim with this notebook is to show you an end-to-end example of how you can solve this problem.



## NER Transformer model

To explain, I will take this NER transformer model that is one of the official Keras examples:

https://keras.io/examples/nlp/ner_transformers/

Basically, an NER model identifies entities. Suppose we have a NER model that is trained to identify names and locations. Then, it will take a sentence of the form:

<i> John went to Paris </i>

and return:

<i> NAME out out LOCATION </i>

How this model works itself isn't all that important.
Just that it involves custom Keras layers and a custom Keras model

In [2]:
import os
import numpy as np
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
from collections import Counter

class TransformerBlock(layers.Layer):
    def __init__(self, embed_dim, num_heads, ff_dim, rate=0.1):
        super(TransformerBlock, self).__init__()
        self.att = keras.layers.MultiHeadAttention(
            num_heads=num_heads, key_dim=embed_dim
        )
        self.ffn = keras.Sequential(
            [
                keras.layers.Dense(ff_dim, activation="relu"),
                keras.layers.Dense(embed_dim),
            ]
        )
        self.layernorm1 = keras.layers.LayerNormalization(epsilon=1e-6)
        self.layernorm2 = keras.layers.LayerNormalization(epsilon=1e-6)
        self.dropout1 = keras.layers.Dropout(rate)
        self.dropout2 = keras.layers.Dropout(rate)

    def call(self, inputs, training=False):
        attn_output = self.att(inputs, inputs)
        attn_output = self.dropout1(attn_output, training=training)
        out1 = self.layernorm1(inputs + attn_output)
        ffn_output = self.ffn(out1)
        ffn_output = self.dropout2(ffn_output, training=training)
        return self.layernorm2(out1 + ffn_output)

class TokenAndPositionEmbedding(layers.Layer):
    def __init__(self, maxlen, vocab_size, embed_dim):
        super(TokenAndPositionEmbedding, self).__init__()
        self.token_emb = keras.layers.Embedding(
            input_dim=vocab_size, output_dim=embed_dim
        )
        self.pos_emb = keras.layers.Embedding(input_dim=maxlen, output_dim=embed_dim)

    def call(self, inputs):
        maxlen = tf.shape(inputs)[-1]
        positions = tf.range(start=0, limit=maxlen, delta=1)
        position_embeddings = self.pos_emb(positions)
        token_embeddings = self.token_emb(inputs)
        return token_embeddings + position_embeddings

class NERModel(keras.Model):
    def __init__(
        self, num_tags, vocab_size, maxlen=128, embed_dim=32, num_heads=2, ff_dim=32
    ):
        super(NERModel, self).__init__()
        self.embedding_layer = TokenAndPositionEmbedding(maxlen, vocab_size, embed_dim)
        self.transformer_block = TransformerBlock(embed_dim, num_heads, ff_dim)
        self.dropout1 = layers.Dropout(0.1)
        self.ff = layers.Dense(ff_dim, activation="relu")
        self.dropout2 = layers.Dropout(0.1)
        self.ff_final = layers.Dense(num_tags, activation="softmax")

    def call(self, inputs, training=False):
        x = self.embedding_layer(inputs)
        x = self.transformer_block(x)
        x = self.dropout1(x, training=training)
        x = self.ff(x)
        x = self.dropout2(x, training=training)
        x = self.ff_final(x)
        return x

## Synthetic dataset

The training dataset for a NER model is in "[IOB2](https://en.wikipedia.org/wiki/Inside%E2%80%93outside%E2%80%93beginning_(tagging))" format.
* I=Inside
* OUT=Outside
* B=Beginnning

It's pretty intuitive with an example.
In our case, the description will be tokenized as follows, assuming that we are interested only in the firm and the customer id:
<pre>
Jean   B-NAME 
Baptiste  I-NAME
Chevalier I-NAME
went OUT
to   OUT
Paris  B-LOCATION
France  I-LOCATION
</pre>

Let's create a quick synthetic dataset to train the model on

In [4]:
import random

names = [
         'Emmanuel Macaron', 'Xi Jinping', 'Joe Biden',
         'Vladimir Putin', 'Vlodymyr Zelenskyy', 'Narendra Modi', 
         'Justin Trudeau', 'Jair Bolsonaro']
junk_words = ['went to', 'visited', 'was in', 'was chased out of']
locations = ['Paris France', 'Beijing China', 'Washington DC',
             'Moscow Russia', 'Kyiv Ukraine', 'New Delhi India',
             'Ottawa Canada', 'Brasilia Brazil']

def generate_description():
  descr = []
  labels = []
  
  # name
  gen = random.choice(names).split()
  descr += gen
  labels += ['B-NAME'] + ['I-NAME'] * (len(gen) - 1)

  # verb
  gen = random.choice(junk_words).split()
  descr += gen
  labels += ['OUT'] * len(gen)

  # place
  gen = random.choice(locations).split()
  descr += gen
  labels += ['B-LOCATION'] + ['I-LOCATION'] * (len(gen) - 1)

  return ' '.join(descr), ' '.join(labels)

generate_description()

('Emmanuel Macaron went to Paris France',
 'B-NAME I-NAME OUT OUT B-LOCATION I-LOCATION')

In [6]:
import pandas as pd

train_df = pd.DataFrame(data=[generate_description() for x in range(1000)], columns=['description', 'labels'])
valid_df = pd.DataFrame(data=[generate_description() for x in range(100)], columns=['description', 'labels'])
train_df.head()

Unnamed: 0,description,labels
0,Vlodynyr Zelenskyy was in Ottawa Canada,B-NAME I-NAME OUT OUT B-LOCATION I-LOCATION
1,Vladimir Putin was chased out of Washington DC,B-NAME I-NAME OUT OUT OUT OUT B-LOCATION I-LOC...
2,Emmanuel Macaron was in New Delhi India,B-NAME I-NAME OUT OUT B-LOCATION I-LOCATION I-...
3,Vlodynyr Zelenskyy visited New Delhi India,B-NAME I-NAME OUT B-LOCATION I-LOCATION I-LOCA...
4,Vlodynyr Zelenskyy was in Paris France,B-NAME I-NAME OUT OUT B-LOCATION I-LOCATION


In [9]:
train_df.to_csv('train.csv', index=False, header=False, sep='\t')
valid_df.to_csv('valid.csv', index=False, header=False, sep='t')

In [10]:
!head train.csv

Vlodynyr Zelenskyy was in Ottawa Canada	B-NAME I-NAME OUT OUT B-LOCATION I-LOCATION
Vladimir Putin was chased out of Washington DC	B-NAME I-NAME OUT OUT OUT OUT B-LOCATION I-LOCATION
Emmanuel Macaron was in New Delhi India	B-NAME I-NAME OUT OUT B-LOCATION I-LOCATION I-LOCATION
Vlodynyr Zelenskyy visited New Delhi India	B-NAME I-NAME OUT B-LOCATION I-LOCATION I-LOCATION
Vlodynyr Zelenskyy was in Paris France	B-NAME I-NAME OUT OUT B-LOCATION I-LOCATION
Vlodynyr Zelenskyy went to Kyiv Ukraine	B-NAME I-NAME OUT OUT B-LOCATION I-LOCATION
Justin Trudeau was in Moscow Russia	B-NAME I-NAME OUT OUT B-LOCATION I-LOCATION
Justin Trudeau was chased out of Kyiv Ukraine	B-NAME I-NAME OUT OUT OUT OUT B-LOCATION I-LOCATION
Vladimir Putin visited New Delhi India	B-NAME I-NAME OUT B-LOCATION I-LOCATION I-LOCATION
Emmanuel Macaron was in Brasilia Brazil	B-NAME I-NAME OUT OUT B-LOCATION I-LOCATION


## Train the NER Model

Again, some of the details here will become important, but for now, let's just say that we write a tf.data input pipeline to read the data and pass it into the model for training.

In [7]:
# The labels that we are interested in
NER_LABELS = ["NAME", "LOCATION"]
MAX_LEN = 16  # number of words in description

# How many times does a word have to appear in the training dataset before
# we treat it as fixed?
MIN_FREQ = 5
EMBED_DIM = 8
BATCH_SIZE = 32
NUM_EPOCHS = 10
EXPORT_PATH = "ner_model"

In [12]:
def create_vocab_lookup(filename):
  train_df = pd.read_csv(filename, sep='\t', names=['descr', 'labels'])
  all_tokens = sum(train_df["descr"].apply(str.split), [])
  print(all_tokens[:5])
  all_tokens_array = np.array(list(map(str.lower, all_tokens)))
  print(all_tokens_array[:5])

  # all the unique customer ids etc. should not be included in vocabulary
  counter = Counter(all_tokens_array)
  vocabulary = [elem for elem, cnt in counter.items() if cnt >= MIN_FREQ]
  vocabulary += [ '[PAD]' ]
  print('{} unique tokens in training file; {} occur more than {}x'.format(
      len(counter), len(vocabulary), MIN_FREQ))

  # vocabulary size is the number of words + the PAD
  # The StringLookup class will convert tokens to token IDs
  return len(vocabulary) + 1, keras.layers.StringLookup(vocabulary=vocabulary)

vocab_size, vocab_lookup_layer = create_vocab_lookup('train.csv')
print(vocab_lookup_layer(['notinvocab', 'paris', 'modi', 'xi']))

['Vlodynyr', 'Zelenskyy', 'was', 'in', 'Ottawa']
['vlodynyr' 'zelenskyy' 'was' 'in' 'ottawa']
41 unique tokens in training file; 42 occur more than 5x
tf.Tensor([ 0 20 33 36], shape=(4,), dtype=int64)


In [13]:
def make_tag_lookup_table():
    iob_labels = ["B", "I"]
    all_labels = [(label1, label2) for label2 in NER_LABELS for label1 in iob_labels]
    all_labels = ["-".join([a, b]) for a, b in all_labels]
    all_labels = ["[PAD]", "OUT"] + all_labels
    return all_labels, dict(zip(range(0, len(all_labels) + 1), all_labels))

all_labels, mapping = make_tag_lookup_table()
num_tags = len(all_labels)
print(mapping)
label_lookup_layer = keras.layers.StringLookup(vocabulary=all_labels, oov_token='[PAD]')
print(label_lookup_layer(['[PAD]', 'OUT', 'I-FIRM']))

{0: '[PAD]', 1: 'OUT', 2: 'B-NAME', 3: 'I-NAME', 4: 'B-LOCATION', 5: 'I-LOCATION'}
tf.Tensor([0 1 0], shape=(3,), dtype=int64)


In [15]:
def map_record_to_training_data(record):
    record = tf.strings.split(record, sep="\t")

    tokens = tf.strings.split(record[0])
    tokens = tf.strings.lower(tokens)
    tokens = vocab_lookup_layer(tokens)

    tags = tf.strings.split(record[1])
    tags = label_lookup_layer(tags)
    return tokens, tags

# We use `padded_batch` here because each record in the dataset has a
# different length.
batch_size = 32
train_dataset = (
    tf.data.TextLineDataset('train.csv')
    .map(map_record_to_training_data)
    .padded_batch(batch_size)
)
val_dataset = (
    tf.data.TextLineDataset('valid.csv')
    .map(map_record_to_training_data)
    .padded_batch(batch_size)
)

ner_model = NERModel(num_tags, vocab_size, embed_dim=32, num_heads=4, ff_dim=64)

class CustomNonPaddingTokenLoss(keras.losses.Loss):
    def __init__(self, name="custom_ner_loss"):
        super().__init__(name=name)

    def call(self, y_true, y_pred):
        loss_fn = keras.losses.SparseCategoricalCrossentropy(
            from_logits=True, reduction=keras.losses.Reduction.NONE
        )
        loss = loss_fn(y_true, y_pred)
        mask = tf.cast((y_true > 0), dtype=tf.float32)
        loss = loss * mask
        return tf.reduce_sum(loss) / tf.reduce_sum(mask)


loss = CustomNonPaddingTokenLoss()
ner_model.compile(optimizer="adam", loss=loss)
ner_model.fit(train_dataset, epochs=10)

Epoch 1/10


  return dispatch_target(*args, **kwargs)


Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.History at 0x7febd887bf50>

## Export model and deploy the model

Here's where the interesting things start.
Let's try to save the model so that we can deploy it.

Typically, this is what it will involve:
<pre>
model.save(EXPORT_PATH)
</pre>
Then, we will take the saved model and give to a service such as Sagemaker or Vertex AI and it will do:
<pre>
model = saved_model.load_model(EXPORT_PATH)
model.predict(...)
</pre>
Unfortunately, because of all the custom layers and code above, this stratightforward approach won't work.

Let's see:

In [17]:
!rm -rf {EXPORT_PATH}
ner_model.save(EXPORT_PATH)
!ls -l {EXPORT_PATH}



INFO:tensorflow:Assets written to: ner_model/assets


INFO:tensorflow:Assets written to: ner_model/assets


total 544
drwxr-xr-x 2 root root   4096 Apr 15 04:30 assets
-rw-r--r-- 1 root root  19920 Apr 15 04:30 keras_metadata.pb
-rw-r--r-- 1 root root 527307 Apr 15 04:30 saved_model.pb
drwxr-xr-x 2 root root   4096 Apr 15 04:30 variables


In [18]:
model = tf.keras.models.load_model(EXPORT_PATH)

ValueError: ignored

## Problem 1: Unknown loss function

The error above:
<pre>
Unknown loss function: CustomNonPaddingTokenLoss. Please ensure this object is passed to the `custom_objects` argument. See https://www.tensorflow.org/guide/keras/save_and_serialize#registering_the_custom_object for details.
</pre>

How do we solve this? I will show you how to register custom objects later in this notebook. But the first thing to realize is that WE DO NOT NEED TO EXPORT THE LOSS. The loss is needed only for training, not for deployment.

So, we have a much simpler thing we can do. Just remove the loss.

In [19]:
!rm -rf {EXPORT_PATH}
# remove the custom loss before saving.
temp_model = ner_model
temp_model.compile('adam', loss=None)
temp_model.save(EXPORT_PATH)
!ls -l {EXPORT_PATH}



INFO:tensorflow:Assets written to: ner_model/assets


INFO:tensorflow:Assets written to: ner_model/assets


total 468
drwxr-xr-x 2 root root   4096 Apr 15 04:34 assets
-rw-r--r-- 1 root root  19564 Apr 15 04:34 keras_metadata.pb
-rw-r--r-- 1 root root 448230 Apr 15 04:34 saved_model.pb
drwxr-xr-x 2 root root   4096 Apr 15 04:34 variables


In [20]:
model = tf.keras.models.load_model(EXPORT_PATH)

Success!   Bottom line: remove custom losses before exporting Keras models for deployment.

## Problem 2: Wrong input shape

Now, let's do what TensorFlow Serving (or the managed service that wraps TensorFlow Serving, such as Sagemaker or Keras) does: call the predict() method of the model we just loaded.

In [21]:
sample_input = [
     "Justin Trudeau went to New Delhi India",
     "Vladimir Putin was chased out of Kyiv Ukraine"
]
model.predict(sample_input)

ValueError: ignored

### Capturing the preprocessing

The error above is:
<pre>
ValueError: Exception encountered when calling layer "ner_model_1" (type NERModel).
    
    Could not find matching concrete function to call loaded from the SavedModel. Got:
      Positional arguments (2 total):
        * Tensor("inputs:0", shape=(None,), dtype=string)
        * False
      Keyword arguments: {}
    
     Expected these arguments to match one of the following 4 option(s):
    
    Option 1:
      Positional arguments (2 total):
        * TensorSpec(shape=(None, None), dtype=tf.int64, name='inputs')
        * False
      Keyword arguments: {}
</pre>

Essentially, we are trying to send in a full sentence, but our model was trained on a set of vocabulary ids. That's why the expected input is a set of integers.

We did this in our tf.data() pipeline when we called tf.strings.split(), tf.strings.lower and vocab_lookup_layer()
<pre>
def map_record_to_training_data(record):
    record = tf.strings.split(record, sep="\t")

    tokens = tf.strings.split(record[0])
    tokens = tf.strings.lower(tokens)
    tokens = vocab_lookup_layer(tokens)

    tags = tf.strings.split(record[1])
    tags = label_lookup_layer(tags)
    return tokens, tags
</pre>

We've got to do that.

How?

Well, we could use a [preprocessing container](https://cloud.google.com/blog/topics/developers-practitioners/add-preprocessing-functions-tensorflow-models-and-deploy-vertex-ai) on Vertex AI or the similar functionality on Sagemaker. But that sort of defeats our purpose of having a simple, all-in deployed Keras model.

Instead, reorganize our tf.data input pipeline. What we want is to have a function (here, I call it process_descr) that we can can call from both the tf.data pipeline and from our exported model:

In [24]:
def process_descr(descr):
  # split the string on spaces, and make it a rectangular tensor
  tokens = tf.strings.split(tf.strings.lower(descr))
  tokens = vocab_lookup_layer(tokens)
  max_len = MAX_LEN # max([x.shape[0] for x in tokens])
  input_words = tokens.to_tensor(default_value=0, shape=[tf.rank(tokens), max_len])
  return input_words

def process_tag(tags):
  # split the string on spaces, and make it a rectangular tensor
  tags = label_lookup_layer(tf.strings.split(tags))
  max_len = MAX_LEN # max([x.shape[0] for x in tags])
  tags = tags.to_tensor(default_value=0, shape=[tf.rank(tags), max_len])
  return tags

def map_record_to_training_data(record):
    record = tf.strings.split(record, sep="\t")
    return record[0], record[1]

def read_dataset(filename, batch_size=BATCH_SIZE):
  data = tf.data.TextLineDataset(filename)
  dataset = (
      data
      .map(map_record_to_training_data)
      .padded_batch(batch_size)
      .map(lambda d, t: (process_descr(d), process_tag(t)))
  )
  return dataset

print(list(read_dataset('train.csv', 3).take(1).as_numpy_iterator()))

[(array([[ 1,  2,  3,  4,  5,  6,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0],
       [ 7,  8,  3,  9, 10, 11, 12, 13,  0,  0,  0,  0,  0,  0,  0,  0]]), array([[2, 3, 1, 1, 4, 5, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [2, 3, 1, 1, 1, 1, 4, 5, 0, 0, 0, 0, 0, 0, 0, 0]]))]


I should also come clean here. It's not exactly identical. There was a bit of Python list comprehension:
<pre>
max([x.shape[0] for x in tokens])
</pre>
The equivalent TensorFlow code would be:
<pre>
tf.reduce_max(tf.map_fn(lambda x: x.shape[0]))
</pre>
However, this function is not traceable because the batch size is not known (long story). I replaced it by MAX_LEN. A little inefficient since all batches are being padded to the MAX_LEN even if the batch contains only shorter strings.

### Train model

Let's see that we can train the model just like before (it's the same code, except that all the preprocessing code is in process_descr):

In [26]:
train_dataset = read_dataset('train.csv')
val_dataset = read_dataset('valid.csv')
ner_model = NERModel(num_tags, vocab_size,
                     maxlen=MAX_LEN, embed_dim=EMBED_DIM)
ner_model.compile(optimizer="adam", loss='sparse_categorical_crossentropy')
ner_model.fit(train_dataset, epochs=NUM_EPOCHS)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.History at 0x7febd4ba2350>

### Create a new model with preproocessing layer

Now, when we save the model, we have to ensure that preprocessing method gets called. One way to do that is to define a prediction signature that calls the preprocessing function.  However, that is problematic because if there are errors in your custom model, you won't get the errors (ask me how I know).

A simpler approach that tells you the errors and gives you a chance to fix them is to do what we did with the loss function. Define a new standard model that has a lambda layer that does the preprocessing before feeding it to the custom model and write that out.

In [29]:
temp_model = tf.keras.Sequential([
  tf.keras.Input(shape=[], dtype=tf.string, name='description'),
  tf.keras.layers.Lambda(process_descr),
  ner_model                            
])
temp_model.compile('adam', loss=None)
temp_model.save(EXPORT_PATH)
!ls -l {EXPORT_PATH}



AssertionError: ignored

## Problem 3: Untracked Tensor

The error message above is:
<pre>
Tried to export a function which references 'untracked' resource Tensor
</pre>

What's that about? Here's the issue:

When you write a custom Keras layer or Keras loss or Keras model, you are defining code. But when you are exporting the model, you have to make a flat file out of it. What happens to the code? It's lost! How can the prediction work then?

You need to tell Keras how to pass in all the constructor arguments etc.

The way you do that is by defining a getConfig() method that has all the constructor arguments. Basically, a custom layer that looks like this:
<pre>
class TokenAndPositionEmbedding(layers.Layer):
    def __init__(self, maxlen, vocab_size, embed_dim):
        super(TokenAndPositionEmbedding, self).__init__()
        self.token_emb = keras.layers.Embedding(
            input_dim=vocab_size, output_dim=embed_dim
        )
        self.pos_emb = keras.layers.Embedding(input_dim=maxlen, output_dim=embed_dim)

    def call(self, inputs):
        maxlen = tf.shape(inputs)[-1]
        positions = tf.range(start=0, limit=maxlen, delta=1)
        position_embeddings = self.pos_emb(positions)
        token_embeddings = self.token_emb(inputs)
        return token_embeddings + position_embeddings
</pre>

will have to look like this:
<pre>
@tf.keras.utils.register_keras_serializable() # 1
class TokenAndPositionEmbedding(layers.Layer):
    def __init__(self, maxlen, vocab_size, embed_dim, **kwargs): # 2
        super(TokenAndPositionEmbedding, self).__init__(**kwargs) # 3
        self.token_emb = keras.layers.Embedding(
            input_dim=vocab_size, output_dim=embed_dim
        )

        #4 save the constructor parameters for get_config()
        self.maxlen = maxlen
        self.vocab_size = vocab_size
        self.embed_dim = embed_dim

        self.pos_emb = keras.layers.Embedding(input_dim=maxlen, output_dim=embed_dim)

    def call(self, inputs):
        maxlen = tf.shape(inputs)[-1]
        positions = tf.range(start=0, limit=maxlen, delta=1)
        position_embeddings = self.pos_emb(positions)
        token_embeddings = self.token_emb(inputs)
        return token_embeddings + position_embeddings

    def get_config(self): # 5
        config = super().get_config()
        # save constructor args
        config['maxlen'] = self.maxlen
        config['vocab_size'] = self.vocab_size
        config['embed_dim'] = self.embed_dim
        return config
</pre>

There are 5 changes:
1. Add the annotation to register the custom layer with Keras
2. Add a **kwargs to the constructor parameter
3. Add a **kwargs to the super constructor
4. Save the constructor parameters as instance fields
5. Define a get_config method that saves the constructor args

Let's do them to our custom Layer and Model classes.

In [30]:
import os
import numpy as np
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
from collections import Counter

@tf.keras.utils.register_keras_serializable() # 1
class TransformerBlock(layers.Layer):
    def __init__(self, embed_dim, num_heads, ff_dim, rate=0.1, **kwargs): #2
        super(TransformerBlock, self).__init__(**kwargs) #3

        #4 save the constructor parameters for get_config() to work properly
        self.embed_dim = embed_dim
        self.num_heads = num_heads
        self.ff_dim = ff_dim
        self.rate = rate

        self.att = keras.layers.MultiHeadAttention(
            num_heads=num_heads, key_dim=embed_dim
        )
        self.ffn = keras.Sequential(
            [
                keras.layers.Dense(ff_dim, activation="relu"),
                keras.layers.Dense(embed_dim),
            ]
        )
        self.layernorm1 = keras.layers.LayerNormalization(epsilon=1e-6)
        self.layernorm2 = keras.layers.LayerNormalization(epsilon=1e-6)
        self.dropout1 = keras.layers.Dropout(rate)
        self.dropout2 = keras.layers.Dropout(rate)

    def call(self, inputs, training=False):
        attn_output = self.att(inputs, inputs)
        attn_output = self.dropout1(attn_output, training=training)
        out1 = self.layernorm1(inputs + attn_output)
        ffn_output = self.ffn(out1)
        ffn_output = self.dropout2(ffn_output, training=training)
        return self.layernorm2(out1 + ffn_output)

    def get_config(self): #5
        config = super().get_config()
        # save constructor args
        config['embed_dim'] = self.embed_dim
        config['num_heads'] = self.num_heads
        config['ff_dim'] = self.ff_dim
        config['rate'] = self.rate
        return config

@tf.keras.utils.register_keras_serializable() # 1
class TokenAndPositionEmbedding(layers.Layer):
    def __init__(self, maxlen, vocab_size, embed_dim, **kwargs): #2
        super(TokenAndPositionEmbedding, self).__init__(**kwargs) #3

        #4 save the constructor parameters for get_config()
        self.maxlen = maxlen
        self.vocab_size = vocab_size
        self.embed_dim = embed_dim    

        self.token_emb = keras.layers.Embedding(
            input_dim=vocab_size, output_dim=embed_dim
        )
        self.pos_emb = keras.layers.Embedding(input_dim=maxlen, output_dim=embed_dim)

    def call(self, inputs):
        maxlen = tf.shape(inputs)[-1]
        positions = tf.range(start=0, limit=maxlen, delta=1)
        position_embeddings = self.pos_emb(positions)
        token_embeddings = self.token_emb(inputs)
        return token_embeddings + position_embeddings

    def get_config(self): #5
        config = super().get_config()
        # save constructor args
        config['maxlen'] = self.maxlen
        config['vocab_size'] = self.vocab_size
        config['embed_dim'] = self.embed_dim
        return config

@tf.keras.utils.register_keras_serializable() # 1
class NERModel(keras.Model):
    def __init__(
        self, num_tags, vocab_size, maxlen=128, embed_dim=32, num_heads=2, ff_dim=32, **kwargs #2
    ):
        super(NERModel, self).__init__(**kwargs) #3

        #4 save the constructor parameters for get_config()
        self.num_tags = num_tags
        self.vocab_size = vocab_size
        self.maxlen = maxlen
        self.embed_dim = embed_dim
        self.num_heads = num_heads
        self.ff_dim = ff_dim

        self.embedding_layer = TokenAndPositionEmbedding(maxlen, vocab_size, embed_dim)
        self.transformer_block = TransformerBlock(embed_dim, num_heads, ff_dim)
        self.dropout1 = layers.Dropout(0.1)
        self.ff = layers.Dense(ff_dim, activation="relu")
        self.dropout2 = layers.Dropout(0.1)
        self.ff_final = layers.Dense(num_tags, activation="softmax")

    def call(self, inputs, training=False):
        x = self.embedding_layer(inputs)
        x = self.transformer_block(x)
        x = self.dropout1(x, training=training)
        x = self.ff(x)
        x = self.dropout2(x, training=training)
        x = self.ff_final(x)
        return x

    def get_config(self): #5
        config = super().get_config()
        # save constructor args
        config['num_tags'] = self.num_tags 
        config['vocab_size'] = self.vocab_size
        config['maxlen'] = self.maxlen
        config['embed_dim'] = self.embed_dim
        config['num_heads'] = self.num_heads
        config['ff_dim'] = self.ff_dim
        return config

In [31]:
train_dataset = read_dataset('train.csv')
val_dataset = read_dataset('valid.csv')
ner_model = NERModel(num_tags, vocab_size,
                     maxlen=MAX_LEN, embed_dim=EMBED_DIM)
ner_model.compile(optimizer="adam", loss='sparse_categorical_crossentropy')
ner_model.fit(train_dataset, epochs=NUM_EPOCHS)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.History at 0x7febd8823d90>

In [32]:
temp_model = tf.keras.Sequential([
  tf.keras.Input(shape=[], dtype=tf.string, name='description'),
  tf.keras.layers.Lambda(process_descr),
  ner_model                            
])
temp_model.compile('adam', loss=None)
temp_model.save(EXPORT_PATH)
!ls -l {EXPORT_PATH}



AssertionError: ignored

## Problem 4: Untracked resource in Lambda Layer

We still have a problem. Even though we went through and fixed all the custom layers and models, there is still one piece of user-defined code.

The Lamda layer we are using for the preprocessing! It uses the vocabulary, and that vocab_lookup_layer is a resource that is untracked:
<pre>
def process_descr(descr):
  # split the string on spaces, and make it a rectangular tensor
  tokens = tf.strings.split(tf.strings.lower(descr))
  tokens = vocab_lookup_layer(tokens)
  max_len = MAX_LEN # max([x.shape[0] for x in tokens])
  input_words = tokens.to_tensor(default_value=0, shape=[tf.rank(tokens), max_len])
  return input_words
</pre>

Bottom line: Lambda layers are dangerous and it's difficult to realize what resources we are forgetting.

I recommend that you get rid of any Lambda layers and replace them by custom layers.

Let's do that.

In [38]:
@tf.keras.utils.register_keras_serializable(name='descr')
class PreprocLayer(layers.Layer):
    def __init__(self, vocab_lookup_layer, **kwargs):
        super(PreprocLayer, self).__init__(**kwargs)

        # save the constructor parameters for get_config() to work properly
        self.vocab_lookup_layer = vocab_lookup_layer

    def call(self, descr, training=False):
        # split the string on spaces, and make it a rectangular tensor
        tokens = tf.strings.split(tf.strings.lower(descr))
        tokens = self.vocab_lookup_layer(tokens)
        max_len = MAX_LEN # max([x.shape[0] for x in tokens])
        input_words = tokens.to_tensor(default_value=0, shape=[tf.rank(tokens), max_len])
        return input_words

    def get_config(self):
        config = super().get_config()
        # save constructor args
        config['vocab_lookup_layer'] = self.vocab_lookup_layer
        return config

In [39]:
PreprocLayer(vocab_lookup_layer)(['Joe Biden visited Paris'])

<tf.Tensor: shape=(2, 16), dtype=int64, numpy=
array([[34, 35, 19, 20,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0],
       [ 0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0]])>

In [40]:
temp_model = tf.keras.Sequential([
  tf.keras.Input(shape=[], dtype=tf.string, name='description'),
  PreprocLayer(vocab_lookup_layer),
  ner_model                            
])
temp_model.compile('adam', loss=None)
temp_model.save(EXPORT_PATH)
!ls -l {EXPORT_PATH}



INFO:tensorflow:Assets written to: ner_model/assets


INFO:tensorflow:Assets written to: ner_model/assets


total 716
drwxr-xr-x 2 root root   4096 Apr 15 05:19 assets
-rw-r--r-- 1 root root  24603 Apr 15 05:51 keras_metadata.pb
-rw-r--r-- 1 root root 696087 Apr 15 05:51 saved_model.pb
drwxr-xr-x 2 root root   4096 Apr 15 05:51 variables


Success!

## Prediction

Let's try predicting

In [41]:
model = tf.keras.models.load_model(EXPORT_PATH)
sample_input = [
     "Justin Trudeau went to New Delhi India",
     "Vladimir Putin was chased out of Kyiv Ukraine"
]
model.predict(sample_input)

array([[[7.6006036e-03, 4.3546227e-03, 9.7820580e-01, 1.3501652e-03,
         5.0268644e-03, 3.4619651e-03],
        [6.8284925e-03, 1.7240658e-02, 9.1373536e-04, 9.6674633e-01,
         5.9596724e-03, 2.3111277e-03],
        [2.7760639e-04, 9.9050373e-01, 1.8299931e-03, 2.9289189e-03,
         7.0543337e-04, 3.7544277e-03],
        [4.2376583e-04, 9.8965186e-01, 1.9375269e-03, 2.5036184e-03,
         8.3148218e-04, 4.6517830e-03],
        [1.4784709e-02, 1.7444469e-03, 8.8171000e-03, 1.1214550e-02,
         9.3378550e-01, 2.9653659e-02],
        [4.8861960e-03, 6.6756480e-03, 9.9279033e-03, 5.1787309e-03,
         7.7538565e-03, 9.6557772e-01],
        [3.6808266e-03, 1.0586737e-02, 9.0764472e-03, 7.3234723e-03,
         1.2727647e-02, 9.5660490e-01],
        [9.9849033e-01, 4.5291032e-05, 2.8122202e-04, 1.8797908e-04,
         3.3309078e-04, 6.6198298e-04],
        [9.9861550e-01, 6.5719221e-05, 2.2601534e-04, 2.7777068e-04,
         3.2932145e-04, 4.8567922e-04],
        [9.9868518e

Success!

## Postprocessing custom layer

But ... now that we know how to do it, let's add a postprocessor of the probabilities ...

We can't just do mapping[3] as we would in Python. We have to do it using TensorFlow functions. I won't bore you with the details, this is the relevant code:

In [43]:
mapping_lookup = tf.lookup.StaticHashTable(
    tf.lookup.KeyValueTensorInitializer( 
        tf.constant(list(mapping.keys())),
        tf.constant(list(mapping.values()))),
        default_value='[PAD]')
mapping_lookup.lookup( tf.constant([0, 1, 2]))

<tf.Tensor: shape=(3,), dtype=string, numpy=array([b'[PAD]', b'OUT', b'B-NAME'], dtype=object)>

In [44]:
@tf.keras.utils.register_keras_serializable(name='tagname')
class OutputTagLayer(layers.Layer):
    def __init__(self, mapping, **kwargs):
        super(OutputTagLayer, self).__init__(**kwargs)

        # save the constructor parameters for get_config() to work properly
        self.mapping = mapping

        # construct
        self.mapping_lookup = tf.lookup.StaticHashTable(
            tf.lookup.KeyValueTensorInitializer( 
                tf.range(start=0, limit=len(mapping.values()), delta=1, dtype=tf.int64),
                tf.constant(list(mapping.values()))),
                default_value='[PAD]')

    def call(self, descr_tags, training=False):
        prediction = tf.argmax(descr_tags, axis=-1)
        prediction = self.mapping_lookup.lookup(prediction)
        return prediction

    def get_config(self):
        config = super().get_config()
        # save constructor args
        config['mapping'] = self.mapping
        return config

In [45]:
OutputTagLayer(mapping)( tf.constant([
  [[0.1, 0.2, 0.7], [0.7, 0.2, 0.1], [0.2, 0.7, 0.1]],
  [[0.1, 0.2, 0.7], [0.7, 0.2, 0.1], [0.2, 0.7, 0.1]]
]))

<tf.Tensor: shape=(2, 3), dtype=string, numpy=
array([[b'B-NAME', b'[PAD]', b'OUT'],
       [b'B-NAME', b'[PAD]', b'OUT']], dtype=object)>

Now, let's put the preprocessing and postprocessing layers together:

In [46]:
temp_model = tf.keras.Sequential([
  tf.keras.Input(shape=[], dtype=tf.string, name='description'),
  PreprocLayer(vocab_lookup_layer),
  ner_model,
  OutputTagLayer(mapping)                             
])
temp_model.compile('adam', loss=None)
temp_model.save(EXPORT_PATH)
!ls -l {EXPORT_PATH}



INFO:tensorflow:Assets written to: ner_model/assets


INFO:tensorflow:Assets written to: ner_model/assets


total 740
drwxr-xr-x 2 root root   4096 Apr 15 05:19 assets
-rw-r--r-- 1 root root  25295 Apr 15 05:58 keras_metadata.pb
-rw-r--r-- 1 root root 716812 Apr 15 05:58 saved_model.pb
drwxr-xr-x 2 root root   4096 Apr 15 05:58 variables


## Predict using model

If you deploy this model, this is the input you will have to send it, and the output you will back from the endpoint (wrapped in a JSON envelope of course)

In [47]:
model = tf.keras.models.load_model(EXPORT_PATH)
sample_input = [
     "Justin Trudeau went to New Delhi India",
     "Vladimir Putin was chased out of Kyiv Ukraine"
]
model.predict(sample_input)

array([[b'B-NAME', b'I-NAME', b'OUT', b'OUT', b'B-LOCATION',
        b'I-LOCATION', b'I-LOCATION', b'[PAD]', b'[PAD]', b'[PAD]',
        b'[PAD]', b'[PAD]', b'[PAD]', b'[PAD]', b'[PAD]', b'[PAD]'],
       [b'B-NAME', b'I-NAME', b'OUT', b'OUT', b'OUT', b'OUT',
        b'B-LOCATION', b'I-LOCATION', b'[PAD]', b'[PAD]', b'[PAD]',
        b'[PAD]', b'[PAD]', b'[PAD]', b'[PAD]', b'[PAD]']], dtype=object)

Note how nice this is ... we send full sentences to the model and we get back the tag for each word in the sentence. Very understandable API!

In [48]:
# Sample inference using the trained model
sample_input = [
     "Justin Trudeau went to New Delhi India",
     "Vladimir Putin was chased out of Kyiv Ukraine"
]
predictions = model.predict(sample_input)

# print out
for idx, descr in enumerate(sample_input):
  words = descr.split()
  tags = list(predictions[idx])
  for word, tag in zip(words, tags):
    print(word, '->', tag)
  print('-'*100)

Justin -> b'B-NAME'
Trudeau -> b'I-NAME'
went -> b'OUT'
to -> b'OUT'
New -> b'B-LOCATION'
Delhi -> b'I-LOCATION'
India -> b'I-LOCATION'
----------------------------------------------------------------------------------------------------
Vladimir -> b'B-NAME'
Putin -> b'I-NAME'
was -> b'OUT'
chased -> b'OUT'
out -> b'OUT'
of -> b'OUT'
Kyiv -> b'B-LOCATION'
Ukraine -> b'I-LOCATION'
----------------------------------------------------------------------------------------------------


Copyright 2022 Valliappa Lakshmanan

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.