## BERT Embeddings Serverless Function
This notebook presents deployment of pretrained BERT model that outputs embeddings for given textual sequences as a serverless function. Embeddings are meaningful, contextual representations of text in the form of ndarrays that are used frequently as input to various learning tasks in the field of NLP.

In [1]:
# nuclio: ignore
import nuclio

In [None]:
%%nuclio cmd -c
pip install torch
pip install transformers

### function code

In [2]:
from transformers import BertModel, BertTokenizer
import torch
from typing import Union, List
import json
import pickle

def init_context(context):
    tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
    model = BertModel.from_pretrained('bert-base-uncased')
    model.eval()
    setattr(context.user_data, 'tokenizer', tokenizer)
    setattr(context.user_data, 'model', model)

def handler(context, event):
    docs = json.loads(event.body)
    docs = [doc.lower() for doc in docs]
    docs = context.user_data.tokenizer.batch_encode_plus(docs, pad_to_max_length=True, return_tensors='pt')
    with torch.no_grad():
        embeddings = context.user_data.model(**docs)
    embeddings = [embeddings[0].numpy(), embeddings[1].numpy()]
    return pickle.dumps(embeddings)

In [None]:
# nuclio: end-code

### local test 

In [3]:
event = nuclio.Event(body=json.dumps(['John loves Mary']))
init_context(context)
outputs = pickle.loads(handler(context, event))

This is a good chance to view the outputs of this BERT model. It gives two different outputs. The first is a contextual embedding for each token in the input sequence and the second is a pooled embedding for the complete sequence.

In [4]:
print(f'embeddings per token shape: {outputs[0].shape}, pooled embeddings shape: {outputs[1].shape}')

embeddings per token shape: (1, 5, 768), pooled embeddings shape: (1, 768)


As seen both outputs share first dimension size of 1. This corresponds to the single sequence we passed as input, "John loves Mary". The last dimension for both is of size 768 which is the embedding dimension for this default configuration of bert. Note that the first input has an intermediate dimension of size 5 that corresponds to the number of tokens in the input sequence after addtion of two special tokens marking beginning and end of a sequence by the tokenizer.

### Deploy as serverless function

In [5]:
from mlrun import code_to_function
fn = code_to_function("bert-embeddings", kind="nuclio",
                      description="Get BERT based embeddings for given text",
                      categories=["NLP", "BERT", "embeddings"],
                      labels = {"author": "roye", "framework": "pytorch"},
                      code_output='.')

fn.export("function.yaml")

[mlrun] 2020-06-11 13:16:26,161 function spec saved to path: function.yaml


<mlrun.runtimes.function.RemoteRuntime at 0x7f1649c00128>

In [6]:
addr = fn.deploy(project='nlp-servers')

[mlrun] 2020-06-11 13:16:30,576 deploy started
[nuclio] 2020-06-11 13:16:38,751 (info) Build complete
[nuclio] 2020-06-11 13:16:58,965 (info) Function deploy complete
[nuclio] 2020-06-11 13:16:58,972 done updating nlp-servers-bert-embeddings, function address: 192.168.224.208:31596


#### Test the function via http request

In [7]:
import requests


event_data = ['the quick brown fox jumps over the lazy dog', 'Hello I am Jacob']
resp = requests.post(addr, json=json.dumps(event_data))

In [8]:
output_embeddings = pickle.loads(resp.content)

In [9]:
print(f'embeddings per token shape: {output_embeddings[0].shape}, pooled embeddings shape: {output_embeddings[1].shape}')

embeddings per token shape: (2, 11, 768), pooled embeddings shape: (2, 768)


Now we can see that the size of the first dimension of the outputs is two since we passed in two sequences. Also the intermediate dimension of the first output is the maximal number of tokens across all input sequences. Sequences with less tokens are padded with zero values.