# BERT Embeddings Serverless Function
This notebook presents deployment of pretrained BERT model that outputs embeddings for given textual sequences as a serverless function. Embeddings are meaningful, contextual representations of text in the form of ndarrays that are used frequently as input to various learning tasks in the field of NLP.

## Example

In [1]:
%load_ext nb_black
import nuclio
import json
import pickle
from bert_embeddings import init_context, handler

<IPython.core.display.Javascript object>

In [2]:
# Mocked Nuclio context
context = nuclio.Context()
event = nuclio.Event(body=json.dumps(['John loves Mary']))

# Mocked context initialization
init_context(context)
outputs = pickle.loads(handler(context, event))

<IPython.core.display.Javascript object>

This is a good chance to view the outputs of this BERT model. It gives two different outputs. The first is a contextual embedding for each token in the input sequence and the second is a pooled embedding for the complete sequence.

In [3]:
print(f'embeddings per token shape: {outputs[0].shape}, pooled embeddings shape: {outputs[1].shape}')

embeddings per token shape: (1, 5, 768), pooled embeddings shape: (1, 768)


<IPython.core.display.Javascript object>

As seen both outputs share first dimension size of 1. This corresponds to the single sequence we passed as input, "John loves Mary". The last dimension for both is of size 768 which is the embedding dimension for this default configuration of bert. Note that the first input has an intermediate dimension of size 5 that corresponds to the number of tokens in the input sequence after addtion of two special tokens marking beginning and end of a sequence by the tokenizer.

## Deploy as serverless function

In [4]:
import yaml

with open('item.yaml') as item_file:
    items = yaml.load(item_file, Loader=yaml.FullLoader)

<IPython.core.display.Javascript object>

In [5]:
from mlrun import code_to_function
fn = code_to_function(name=items["name"],
                      kind=items["spec"]["kind"],
                      filename=items["spec"]["filename"],
                      description=items["description"],
                      categories=items["categories"],
                      labels=items["labels"],
                      requirements=items["spec"]["requirements"],)

fn.export("bert_embeddings.yaml")

> 2021-02-16 08:26:35,543 [info] function spec saved to path: bert_embeddings.yaml


<mlrun.runtimes.function.RemoteRuntime at 0x7f73a0284a50>

<IPython.core.display.Javascript object>

In [6]:
addr = fn.deploy(project='nlp-servers')

> 2021-02-16 08:26:35,562 [info] Starting remote function deploy
2021-02-16 08:26:35  (info) Deploying function
2021-02-16 08:26:35  (info) Building
2021-02-16 08:26:35  (info) Staging files and preparing base images
2021-02-16 08:26:35  (info) Building processor image
2021-02-16 08:26:37  (info) Build complete
2021-02-16 08:26:56  (info) Function deploy complete
> 2021-02-16 08:26:56,350 [info] function deployed, address=default-tenant.app.vmdev36.lab.iguazeng.com:30921


<IPython.core.display.Javascript object>

#### Test the function via http request

In [7]:
import requests


event_data = ['the quick brown fox jumps over the lazy dog', 'Hello I am Jacob']
resp = requests.post(addr, json=json.dumps(event_data))

<IPython.core.display.Javascript object>

In [8]:
output_embeddings = pickle.loads(resp.content)

<IPython.core.display.Javascript object>

In [9]:
print(f'embeddings per token shape: {output_embeddings[0].shape}, pooled embeddings shape: {output_embeddings[1].shape}')

embeddings per token shape: (2, 11, 768), pooled embeddings shape: (2, 768)


<IPython.core.display.Javascript object>

Now we can see that the size of the first dimension of the outputs is two since we passed in two sequences. Also the intermediate dimension of the first output is the maximal number of tokens across all input sequences. Sequences with less tokens are padded with zero values.