## BERT Embeddings Serverless Function
This notebook presents deployment of pretrained BERT model that outputs embeddings for given textual sequences as a serverless function. Embeddings are meaningful, contextual representations of text in the form of ndarrays that are used frequently as input to various learning tasks in the field of NLP.

### Example

In [1]:
import nuclio
import json
import pickle
from bert_embeddings import init_context, handler

In [2]:
context = nuclio.Context()
event = nuclio.Event(body=json.dumps(['John loves Mary']))
init_context(context)
outputs = pickle.loads(handler(context, event))

This is a good chance to view the outputs of this BERT model. It gives two different outputs. The first is a contextual embedding for each token in the input sequence and the second is a pooled embedding for the complete sequence.

In [3]:
print(f'embeddings per token shape: {outputs[0].shape}, pooled embeddings shape: {outputs[1].shape}')

embeddings per token shape: (1, 5, 768), pooled embeddings shape: (1, 768)


As seen both outputs share first dimension size of 1. This corresponds to the single sequence we passed as input, "John loves Mary". The last dimension for both is of size 768 which is the embedding dimension for this default configuration of bert. Note that the first input has an intermediate dimension of size 5 that corresponds to the number of tokens in the input sequence after addtion of two special tokens marking beginning and end of a sequence by the tokenizer.

### Deploy as serverless function

In [4]:
from mlrun import code_to_function
fn = code_to_function("bert_embeddings", kind="nuclio",
                      filename="bert_embeddings.py",
                      description="Get BERT based embeddings for given text",
                      categories=["NLP", "BERT", "embeddings"],
                      labels = {"author": "roye", "framework": "pytorch"},
                      requirements=["torch==1.6.0", "transformers==3.0.1", "nuclio"],)

fn.export("bert_embeddings.yaml")

> 2021-02-15 10:33:26,557 [info] function spec saved to path: bert_embeddings.yaml


<mlrun.runtimes.function.RemoteRuntime at 0x7f23ae137d90>

In [5]:
addr = fn.deploy(project='nlp-servers')

> 2021-02-15 10:33:26,571 [info] Starting remote function deploy
2021-02-15 10:33:26  (info) Deploying function
2021-02-15 10:33:26  (info) Building
2021-02-15 10:33:26  (info) Staging files and preparing base images
2021-02-15 10:33:26  (info) Building processor image
2021-02-15 10:33:28  (info) Build complete
> 2021-02-15 10:33:47,608 [info] function deployed, address=default-tenant.app.vmdev36.lab.iguazeng.com:30921


#### Test the function via http request

In [6]:
import requests


event_data = ['the quick brown fox jumps over the lazy dog', 'Hello I am Jacob']
resp = requests.post(addr, json=json.dumps(event_data))

In [7]:
output_embeddings = pickle.loads(resp.content)

In [8]:
print(f'embeddings per token shape: {output_embeddings[0].shape}, pooled embeddings shape: {output_embeddings[1].shape}')

embeddings per token shape: (2, 11, 768), pooled embeddings shape: (2, 768)


Now we can see that the size of the first dimension of the outputs is two since we passed in two sequences. Also the intermediate dimension of the first output is the maximal number of tokens across all input sequences. Sequences with less tokens are padded with zero values.