## BERT Embeddings Serverless Function
This notebook presents deployment of pretrained BERT model that outputs embeddings for given textual sequences as a serverless function. Embeddings are meaningful, contextual representations of text in the form of ndarrays that are used frequently as input to various learning tasks in the field of NLP.

In [13]:
# import packages
import mlrun
from mlrun import import_function

import nuclio
import json
import pickle

In [2]:
# import the function
bert_function = import_function("hub://bert_embeddings").apply(mlrun.platforms.auto_mount())

In [3]:
# deploy the serving function
serving_address = bert_function.deploy(project='bert-embeddings')

> 2021-07-26 14:15:13,020 [info] Starting remote function deploy
2021-07-26 14:15:13  (info) Deploying function
2021-07-26 14:15:13  (info) Building
2021-07-26 14:15:13  (info) Staging files and preparing base images
2021-07-26 14:15:13  (info) Building processor image
2021-07-26 14:20:13  (info) Build complete
> 2021-07-26 14:20:29,412 [info] function deployed, address=default-tenant.app.app-lab-testing.iguazio-cd2.com:31940


This is a good chance to view the outputs of this BERT model. It gives two different outputs. The first is a contextual embedding for each token in the input sequence and the second is a pooled embedding for the complete sequence.

#### Test the function via http request

In [14]:
import requests


event_data = ['the quick brown fox jumps over the lazy dog', 'Hello I am Jacob']
resp = requests.post(serving_address, json=json.dumps(event_data))

In [15]:
output_embeddings = pickle.loads(resp.content)

In [16]:
print(f'embeddings per token shape: {output_embeddings[0].shape}, pooled embeddings shape: {output_embeddings[1].shape}')

embeddings per token shape: (2, 11, 768), pooled embeddings shape: (2, 768)


Now we can see that the size of the first dimension of the outputs is two since we passed in two sequences. Also the intermediate dimension of the first output is the maximal number of tokens across all input sequences. Sequences with less tokens are padded with zero values.