Copyright (c) Microsoft Corporation. All rights reserved.

Licensed under the MIT License.

### PyTorch Pretrained BERT on AzureML for NER 
This notebook contains an end-to-end walkthrough of using Azure Machine Learning Service to run PyTorch reimplementation of Google's TensorFlow repository for the BERT model developed by Hugging Face. The code is based on this [amazing tutorial](https://www.depends-on-the-definition.com/named-entity-recognition-with-bert/?unapproved=907&moderation-hash=0d668913f6fe8b07d55eb1c6b009dbe0#comment-907) by Tobias Sterbak

You will find the following contents:

Download NER dataset on the remote compute and store them in Azure storage
Speep-up fine-tuning BERT for NER dataset on AzureML GPU clusters
Further fine-tune NER wtih AzureML hyperparameter optimizer
### Prerequisites

Understand the architecture and terms introduced by Azure Machine Learning (AML)

Install the Python SDK: make sure to install notebook, and contrib

conda create -n azureml -y Python=3.6
source activate azureml
pip install --upgrade azureml-sdk[notebooks,contrib] 
conda install ipywidgets
jupyter nbextension install --py --user azureml.widgets
jupyter nbextension enable azureml.widgets --user --py
You will need to restart jupyter after this Detailed instructions are [here](https://docs.microsoft.com/en-us/azure/machine-learning/service/quickstart-create-workspace-with-python/?WT.mc_id=bert-notebook-abornst)

If you need a free trial account to get started you can get one [here](https://azure.microsoft.com/en-us/offers/ms-azr-0044p/?WT.mc_id=bert-notebook-abornst)

## Initialize workspace

To create or access an Azure ML Workspace, you will need to import the AML library and the following information:
* A name for your workspace
* Your subscription id
* The resource group name

Initialize a [Workspace](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#workspace/?WT.mc_id=bert-notebook-abornst) object from the existing workspace you created in the Prerequisites step or create a new one. 

In [1]:
from azureml.core import Workspace

# subscription_id = ''
# resource_group  = ''
# workspace_name  = ''
#     ws = Workspace(subscription_id = subscription_id, resource_group = resource_group, workspace_name = workspace_name)
#     ws.write_config()

try:
    ws = Workspace.from_config()
    print(ws.name, ws.location, ws.resource_group, ws.location, sep='\t')
    print('Library configuration succeeded')
except:
    print('Workspace not found')

Performing interactive authentication. Please follow the instructions on the terminal.
To sign in, use a web browser to open the page https://microsoft.com/devicelogin and enter the code FCMPHUU4B to authenticate.
Interactive authentication successfully completed.
ariaiwork	eastus	ai_work	eastus
Library configuration succeeded


## Compute

There are two computer option run once(preview) and persistent computer for this demo we will use persistent compute to learn more about run once compute check out the docs.

In [2]:
from azureml.core.compute import ComputeTarget, AmlCompute
from azureml.core.compute_target import ComputeTargetException

# Choose a name for your CPU cluster
cluster_name = "cluster"

# Verify that cluster does not exist already
try:
    cluster = ComputeTarget(workspace=ws, name=cluster_name)
    print('Found existing cluster, use it.')
except ComputeTargetException:
    compute_config = AmlCompute.provisioning_configuration(vm_size='Standard_NC6',
                                                           min_nodes=1,
                                                           max_nodes=4)
    cluster = ComputeTarget.create(ws, cluster_name, compute_config)

cluster.wait_for_completion(show_output=True)

Found existing cluster, use it.
Succeeded
AmlCompute wait for completion finished
Minimum number of nodes requested have been provisioned


## Upload Data

The dataset we are using comes from Kaggle it can be downloaded [here](https://www.kaggle.com/abhinavwalia95/entity-annotated-corpus). To tag on your own data set check out the work done with the open source [Doccanno](https://towardsdatascience.com/text-annotation-on-a-budget-with-azure-web-apps-doccano-b29f479c0c54) which has one click deployment to Azure. Be sure to export to [IOB format](https://en.wikipedia.org/wiki/Inside%E2%80%93outside%E2%80%93beginning_(tagging)) also be sure to  replace **tag_vals** and **tag2idx** in the train script below with your own custom IOB tag values. 

In [3]:
ds = ws.get_default_datastore()

In [4]:
ds. upload_files(["ner_dataset.csv"], relative_root='.')

Target already exists. Skipping upload for ner_dataset.csv


$AZUREML_DATAREFERENCE_workspaceblobstore

## Train File

In [5]:
%%writefile train.py

import argparse
import os
import pandas as pd
import numpy as np
from tqdm import tqdm, trange

import torch
from torch.optim import Adam
from torch.utils.data import TensorDataset, DataLoader, RandomSampler, SequentialSampler
from keras.preprocessing.sequence import pad_sequences
from sklearn.model_selection import train_test_split
from pytorch_pretrained_bert import BertTokenizer, BertConfig
from pytorch_pretrained_bert import BertForTokenClassification, BertAdam

from seqeval.metrics import f1_score


from azureml.core import Run

class SentenceGetter(object):
    
    def __init__(self, data):
        self.n_sent = 1
        self.data = data
        self.empty = False
        agg_func = lambda s: [(w, p, t) for w, p, t in zip(s["Word"].values.tolist(),
                                                           s["POS"].values.tolist(),
                                                           s["Tag"].values.tolist())]
        self.grouped = self.data.groupby("Sentence #").apply(agg_func)
        self.sentences = [s for s in self.grouped]
    
    def get_next(self):
        try:
            s = self.grouped["Sentence: {}".format(self.n_sent)]
            self.n_sent += 1
            return s
        except:
            return None


# let user feed in 2 parameters, the location of the data files (from datastore), and the regularization rate of the logistic regression model
parser = argparse.ArgumentParser()
parser.add_argument('--data-folder', type=str, dest='data_folder', help='data folder mounting point')
parser.add_argument('--learning_rate', type=float, default=3e-5, help='learning rate')
parser.add_argument('--epochs', type=int, default=5)
args = parser.parse_args()

data_folder = args.data_folder
print('Data folder:', data_folder)

# load data
data = pd.read_csv(os.path.join(data_folder, "ner_dataset.csv"), encoding="latin1").fillna(method="ffill")
getter = SentenceGetter(data)
sentences = [" ".join([s[0] for s in sent]) for sent in getter.sentences]
labels = [[s[2] for s in sent] for sent in getter.sentences]

# For a custom Dataset replace tag_vals and tag2idx with your own custom IOB tag values  
tags_vals = ['B-art',
 'B-gpe',
 'B-per',
 'B-org',
 'B-tim',
 'B-nat',
 'B-eve',
 'I-geo',
 'I-per',
 'B-geo',
 'I-art',
 'I-gpe',
 'I-eve',
 'I-org',
 'I-tim',
 'O',
 'I-nat',
 'X']

tag2idx = {'B-art': 0,
 'B-eve': 6,
 'B-geo': 9,
 'B-gpe': 11,
 'B-nat': 5,
 'B-org': 3,
 'B-per': 2,
 'B-tim': 4,
 'I-art': 10,
 'I-eve': 12,
 'I-geo': 7,
 'I-gpe': 1,
 'I-nat': 16,
 'I-org': 13,
 'I-per': 8,
 'I-tim': 15,
 'O': 14,
 'X': 17}

# Apply Bert

MAX_LEN = 75
bs = 32

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
n_gpu = torch.cuda.device_count()

torch.cuda.get_device_name(0)

# Now we tokenize all sentences

tokenizer = BertTokenizer.from_pretrained('bert-base-uncased', do_lower_case=True)
tokenized_texts = [tokenizer.tokenize(sent) for sent in sentences]

# Pad the subword tokens

num_sent = len(labels)
for sent_id in range(num_sent):
    tokens_len = len(tokenized_texts[sent_id])
    for i in range(tokens_len):
      if tokenized_texts[sent_id][i][:2] == "##":
        labels[sent_id].insert(i, "X")


# Next, we cut and pad the token and label sequences to our desired length.

input_ids = pad_sequences([tokenizer.convert_tokens_to_ids(txt) for txt in tokenized_texts],
                          maxlen=MAX_LEN, dtype="long", truncating="post", padding="post")

tags = pad_sequences([[tag2idx.get(l) for l in lab] for lab in labels],
                     maxlen=MAX_LEN, value=tag2idx["O"], padding="post",
                     dtype="long", truncating="post")

# The Bert model supports something called attention_mask, which is similar to the masking in keras. So here we create the mask to ignore the padded elements in the sequences.

attention_masks = [[float(i>0) for i in ii] for ii in input_ids]

# Now we split the dataset to use 10% to validate the model.

tr_inputs, val_inputs, tr_tags, val_tags = train_test_split(input_ids, tags, 
                                                            random_state=2018, test_size=0.1)
tr_masks, val_masks, _, _ = train_test_split(attention_masks, input_ids,
                                             random_state=2018, test_size=0.1)

# Since we’re operating in pytorch, we have to convert the dataset to torch tensors.

tr_inputs = torch.tensor(tr_inputs)
val_inputs = torch.tensor(val_inputs)
tr_tags = torch.tensor(tr_tags)
val_tags = torch.tensor(val_tags)
tr_masks = torch.tensor(tr_masks)
val_masks = torch.tensor(val_masks)

# The last step is to define the dataloaders. We shuffle the data at training time with the RandomSampler and at test time we just pass them sequentially with the SequentialSampler."""

train_data = TensorDataset(tr_inputs, tr_masks, tr_tags)
train_sampler = RandomSampler(train_data)
train_dataloader = DataLoader(train_data, sampler=train_sampler, batch_size=bs)

valid_data = TensorDataset(val_inputs, val_masks, val_tags)
valid_sampler = SequentialSampler(valid_data)
valid_dataloader = DataLoader(valid_data, sampler=valid_sampler, batch_size=bs)

# Setup the Bert model for finetuning
# The pytorch-pretrained-bert package provides a BertForTokenClassification class for token-level predictions. 
# BertForTokenClassification is a fine-tuning model that wraps BertModel and adds token-level classifier on top of the BertModel. 
# The token-level classifier is a linear layer that takes as input the last hidden state of the sequence.
# We load the pre-trained bert-base-uncased model and provide the number of possible labels.

model = BertForTokenClassification.from_pretrained("bert-base-uncased", num_labels=len(tag2idx))

# Now we have to pass the model parameters to the GPU

model.cuda();

# Before we can start the fine-tuning process, we have to setup the optimizer and add the parameters it should update. A common choice is the Adam optimizer. 
# We also add some weight_decay as regularization to the main weight matrices. 
# If you have limited resources, you can also try to just train the linear classifier on top of Bert and keep all other weights fixed.
# This will still give you a good performance.

FULL_FINETUNING = True
if FULL_FINETUNING:
    param_optimizer = list(model.named_parameters())
    no_decay = ['bias', 'gamma', 'beta']
    optimizer_grouped_parameters = [
        {'params': [p for n, p in param_optimizer if not any(nd in n for nd in no_decay)],
         'weight_decay_rate': 0.01},
        {'params': [p for n, p in param_optimizer if any(nd in n for nd in no_decay)],
         'weight_decay_rate': 0.0}
    ]
else:
    param_optimizer = list(model.classifier.named_parameters()) 
    optimizer_grouped_parameters = [{"params": [p for n, p in param_optimizer]}]
optimizer = Adam(optimizer_grouped_parameters, lr=args.learning_rate)

# Finetune Bert
# First we define some metrics, we want to track while training. 
# We use the f1_score from the seqeval package. You can find more details here.
# And we use simple accuracy on a token level comparable to the accuracy in keras.


def flat_accuracy(preds, labels):
    pred_flat = np.argmax(preds, axis=2).flatten()
    labels_flat = labels.flatten()
    x_indecies = [n for n,x in enumerate(labels_flat) if x==tag2idx["X"]]
    pred_flat = np.asarray([pred_flat[i] for i in range(len(pred_flat)) if i not in x_indecies])
    labels_flat = np.asarray([labels_flat[i] for i in range(len(labels_flat)) if i not in x_indecies])
        
    return np.sum(pred_flat == labels_flat) / len(labels_flat)

# Finally, we can fine-tune the model. A few epochs should be enough. The paper suggest 3-4 epochs."""

epochs = args.epochs
max_grad_norm = 1.0

for _ in trange(epochs, desc="Epoch"):
    # TRAIN loop
    model.train()
    tr_loss = 0
    nb_tr_examples, nb_tr_steps = 0, 0
    for step, batch in enumerate(train_dataloader):
        # add batch to gpu
        batch = tuple(t.to(device) for t in batch)
        b_input_ids, b_input_mask, b_labels = batch
        # forward pass
        loss = model(b_input_ids, token_type_ids=None,
                     attention_mask=b_input_mask, labels=b_labels)
        # backward pass
        loss.backward()
        # track train loss
        tr_loss += loss.item()
        nb_tr_examples += b_input_ids.size(0)
        nb_tr_steps += 1
        # gradient clipping
        torch.nn.utils.clip_grad_norm_(parameters=model.parameters(), max_norm=max_grad_norm)
        # update parameters
        optimizer.step()
        model.zero_grad()
    # print train loss per epoch
    print("Train loss: {}".format(tr_loss/nb_tr_steps))
    # VALIDATION on validation set
    model.eval()
    eval_loss, eval_accuracy = 0, 0
    nb_eval_steps, nb_eval_examples = 0, 0
    predictions , true_labels = [], []
    for batch in valid_dataloader:
        batch = tuple(t.to(device) for t in batch)
        b_input_ids, b_input_mask, b_labels = batch
        
        with torch.no_grad():
            tmp_eval_loss = model(b_input_ids, token_type_ids=None,
                                  attention_mask=b_input_mask, labels=b_labels)
            logits = model(b_input_ids, token_type_ids=None,
                           attention_mask=b_input_mask)
        logits = logits.detach().cpu().numpy()
        label_ids = b_labels.to('cpu').numpy()
        predictions.extend([list(p) for p in np.argmax(logits, axis=2)])
        true_labels.append(label_ids)
        
        tmp_eval_accuracy = flat_accuracy(logits, label_ids)
        
        eval_loss += tmp_eval_loss.mean().item()
        eval_accuracy += tmp_eval_accuracy
        
        nb_eval_examples += b_input_ids.size(0)
        nb_eval_steps += 1
    eval_loss = eval_loss/nb_eval_steps
    print("Validation loss: {}".format(eval_loss))
    print("Validation Accuracy: {}".format(eval_accuracy/nb_eval_steps))
    pred_tags = [tags_vals[p_i] for p in predictions for p_i in p]
    valid_tags = [tags_vals[l_ii] for l in true_labels for l_i in l for l_ii in l_i]
    
    x_indecies = [n for n,x in enumerate(valid_tags) if x=="X"]
    pred_tags = [pred_tags[i] for i in range(len(pred_tags)) if i not in x_indecies]
    valid_tags = [valid_tags[i] for i in range(len(valid_tags)) if i not in x_indecies]


model.eval()

# save model
os.makedirs('outputs', exist_ok=True)
torch.save(model.state_dict(), 'outputs/bert_ner.model')

predictions = []
true_labels = []
eval_loss, eval_accuracy = 0, 0
nb_eval_steps, nb_eval_examples = 0, 0
for batch in valid_dataloader:
    batch = tuple(t.to(device) for t in batch)
    b_input_ids, b_input_mask, b_labels = batch

    with torch.no_grad():
        tmp_eval_loss = model(b_input_ids, token_type_ids=None,
                              attention_mask=b_input_mask, labels=b_labels)
        logits = model(b_input_ids, token_type_ids=None,
                       attention_mask=b_input_mask)
        
    logits = logits.detach().cpu().numpy()
    predictions.extend([list(p) for p in np.argmax(logits, axis=2)])
    label_ids = b_labels.to('cpu').numpy()
    true_labels.append(label_ids)
    tmp_eval_accuracy = flat_accuracy(logits, label_ids)

    eval_loss += tmp_eval_loss.mean().item()
    eval_accuracy += tmp_eval_accuracy

    nb_eval_examples += b_input_ids.size(0)
    nb_eval_steps += 1
    
pred_tags = [[tags_vals[p_i] for p_i in p] for p in predictions]
valid_tags = [[tags_vals[l_ii] for l_ii in l_i] for l in true_labels for l_i in l]
x_indecies = [[n for n,x in enumerate(t) if x=="X"] for t in valid_tags] 
pred_tags  = [[pred_tags[p][i] for i in range(len(pred_tags[p])) if i not in x_indecies[p]] for p in range(len(pred_tags))]
valid_tags = [[valid_tags[t][i] for i in range(len(valid_tags[t])) if i not in x_indecies[t]] for t in range(len(valid_tags))]

# get hold of the current run
run = Run.get_context()
run.log('loss', eval_loss/nb_eval_steps)
run.log('accuracy', eval_accuracy/nb_eval_steps)
run.log('f1', f1_score(pred_tags, valid_tags))
run.log('tag2idx', tag2idx)


Overwriting train.py


## Create An Expierment

Create an [Experiment](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#experiment/?WT.mc_id=bert-notebook-abornst) to track all the runs in your workspace for this distributed PyTorch tutorial. 

In [6]:
from azureml.core import Experiment
experiment_name = 'bert_test'

exp = Experiment(workspace=ws, name=experiment_name)

In [7]:
from azureml.train.dnn import PyTorch

script_params = {
    '--data-folder': ds
}
conda = ['tensorflow', 'keras', 'pytorch','scikit-learn']
pip = ["seqeval[gpu]","pytorch-pretrained-bert==0.4.0","pandas"]
pt_est = PyTorch(source_directory='.',
                 script_params=script_params,
                 compute_target=cluster,
                 conda_packages=conda,
                 pip_packages=pip,
                 entry_script='train.py',
                 use_gpu=True)

In [8]:
run = exp.submit(pt_est)

In [9]:
from azureml.widgets import RunDetails

RunDetails(run).show()

_UserRunWidget(widget_settings={'childWidgetDisplay': 'popup', 'send_telemetry': False, 'log_level': 'INFO', '…

In [11]:
run.id

'bert_test_1560586302_22363fe0'

In [12]:
model = run.register_model(model_name='bert_ner', model_path='outputs/bert_ner.model')
print(model.name, model.id, model.version, sep='\t')


bert_ner	bert_ner:3	3


## Fine-Tuning BERT with Hyperparameter Tuning

We would also like to optimize our hyperparameter, `learning rate`, using Azure Machine Learning's hyperparameter tuning capabilities.

For more information on the available tuning algorithms see [here](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-tune-hyperparameters#log-metrics-for-hyperparameter-tuning/?WT.mc_id=bert-notebook-abornst)

### Start a hyperparameter sweep
First, we will define the hyperparameter space to sweep over. In this example we will use random sampling to try different configuration sets of hyperparameter to minimize our primary metric, the evaluation accuracy (`eval_accuracy`).

In [None]:
from azureml.train.hyperdrive import *
import math

param_sampling = RandomParameterSampling( {
        'learning_rate': loguniform(math.log(1e-4), math.log(1e-6)),
    }
)

hyperdrive_run_config = HyperDriveRunConfig(estimator=pt_est,
                                            hyperparameter_sampling=param_sampling, 
                                            primary_metric_name='f1',
                                            primary_metric_goal=PrimaryMetricGoal.MAXIMIZE,
                                            max_total_runs=16,
                                            max_concurrent_runs=4)


Finally, lauch the hyperparameter tuning job.

In [None]:
hyperdrive_run = experiment.submit(hyperdrive_run_config)

### Monitor HyperDrive runs
We can monitor the progress of the runs with the following Jupyter widget. 

In [None]:
from azureml.widgets import RunDetails

RunDetails(hyperdrive_run).show()

### Find and register the best model
Once all the runs complete, we can find the run that produced the model with the highest evaluation f1.

In [None]:
best_run = hyperdrive_run.get_best_run_by_primary_metric()
best_run_metrics = best_run.get_metrics()
print(best_run)
print('Best Run is:\n  F1: {0:.5f} \n  Learning rate: {1:.8f}'.format(
        best_run_metrics['eval_f1'][-1],
        best_run_metrics['lr']
     ))

# Deploy as web service
Once you've tested the model and are satisfied with the results, deploy the model as a web service hosted in [Azure Container Instances](https://azure.microsoft.com/en-us/services/container-instances/?WT.mc_id=bert-notebook-abornst).

To build the correct environment for ACI, provide the following:

A scoring script to show how to use the model
An environment file to show what packages need to be installed
A configuration file to build the ACI
The model you trained before

## Create scoring script
Create the scoring script, called score.py, used by the web service call to show how to use the model.

You must include two required functions into the scoring script:

The init() function, which typically loads the model into a global object. This function is run only once when the Docker container is started.

The run(input_data) function uses the model to predict a value based on the input data. Inputs and outputs to the run typically use JSON for serialization and de-serialization, but other formats are supported.

In [46]:
%%writefile score.py
import os, json
import numpy as np
from tqdm import tqdm, trange

import torch
from torch.utils.data import TensorDataset, DataLoader, RandomSampler, SequentialSampler
from keras.preprocessing.sequence import pad_sequences
from pytorch_pretrained_bert import BertTokenizer, BertConfig
from pytorch_pretrained_bert import BertForTokenClassification

from azureml.core.model import Model

def init():
    global tag2idx, model, device, MAX_LEN, tokenizer, tags_vals
    
    tags_vals = ['B-art',
                'I-gpe',
                'B-per',
                'B-org',
                'B-tim',
                'B-nat',
                'B-eve',
                'I-geo',
                'I-per',
                'B-geo',
                'I-art',
                'B-gpe',
                'I-eve',
                'I-org',
                'O',
                'I-tim',
                'I-nat',
                'X']
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    
    MAX_LEN = 75
    # retrieve the path to the model file using the model name
    model = BertForTokenClassification.from_pretrained("bert-base-uncased", num_labels=len(tags_vals))
    model_path = Model.get_model_path('bert_ner')
    model.load_state_dict(torch.load(model_path, map_location=device))
    model.eval()
    
    tokenizer = BertTokenizer.from_pretrained('bert-base-uncased', do_lower_case=True)
    
def run(raw_data):
    print(raw_data)
    # make prediction
    tokenized_texts = [tokenizer.tokenize(raw_data)]
    input_ids = torch.tensor(pad_sequences([tokenizer.convert_tokens_to_ids(txt) for txt in tokenized_texts],
                          maxlen=MAX_LEN, dtype="long", truncating="post", padding="post"), device = device)
    attention_masks = torch.tensor([[float(i>0) for i in ii] for ii in input_ids], device = device)
    with torch.no_grad():
        logits = model(input_ids, token_type_ids=None, attention_mask=attention_masks)
    logits = logits.detach().cpu().numpy()
    predictions = [list(p) for p in np.argmax(logits, axis=2)][0]
    return [tags_vals[p_i] for p_i in predictions if p_i != 'X']

Overwriting score.py


## Create environment file
Next, create an environment file, called myenv.yml, that specifies all of the script's package dependencies. This file is used to ensure that all of those dependencies are installed in the Docker image. This model needs scikit-learn and azureml-sdk.

In [47]:
conda = ['tensorflow', 'keras']
pip = ["azureml-defaults", "azureml-monitoring", "torch","pytorch-pretrained-bert==0.4.0"]


In [48]:
from azureml.core.conda_dependencies import CondaDependencies 


myenv = CondaDependencies.create(conda_packages=conda, pip_packages=pip)

with open("myenv.yml","w") as f:
    f.write(myenv.serialize_to_string())

Review the content of the myenv.yml file.


In [49]:
with open("myenv.yml","r") as f:
    print(f.read())

# Conda environment specification. The dependencies defined in this file will
# be automatically provisioned for runs with userManagedDependencies=False.

# Details about the Conda environment file format:
# https://conda.io/docs/user-guide/tasks/manage-environments.html#create-env-file-manually

name: project_environment
dependencies:
  # The python interpreter version.
  # Currently Azure ML only supports 3.5.2 and later.
- python=3.6.2

- pip:
  - azureml-defaults
  - azureml-monitoring
  - torch
  - pytorch-pretrained-bert==0.4.0
- tensorflow
- keras



## Create configuration file
Create a deployment configuration file and specify the number of CPUs and gigabyte of RAM needed for your ACI container. While it depends on your model, the default of 1 core and 1 gigabyte of RAM is usually sufficient for many models. If you feel you need more later, you would have to recreate the image and redeploy the service.`

In [50]:
from azureml.core.webservice import AciWebservice

aciconfig = AciWebservice.deploy_configuration(cpu_cores=1, 
                                               memory_gb=2, 
                                               tags={"data": "Sentence",  "method" : "bert"}, 
                                               description='Predict NER with Bert')


# Deploy in ACI
Estimated time to complete: about 7-8 minutes

Configure the image and deploy. The following code goes through these steps:

Build an image using:
The scoring file (score.py)
The environment file (myenv.yml)
The model file
Register that image under the workspace.
Send the image to the ACI container.
Start up a container in ACI using the image.
Get the web service HTTP endpoint.

In [51]:
%%time
from azureml.core.model import Model
from azureml.core.webservice import Webservice
from azureml.core.image import ContainerImage

model=Model(ws, 'bert_ner')

# configure the image
image_config = ContainerImage.image_configuration(execution_script="score.py", 
                                                  runtime="python", 
                                                  conda_file="myenv.yml")

service = Webservice.deploy_from_model(workspace=ws,
                                       name='bert-ner-srvc',
                                       deployment_config=aciconfig,
                                       models=[model],
                                       image_config=image_config)

service.wait_for_deployment(show_output=True)

Creating image
Image creation operation finished for image bert-ner-srvc:18, operation "Succeeded"
Creating service
Running.....................................................................
SucceededACI service creation operation finished, operation "Succeeded"
CPU times: user 4.88 s, sys: 396 ms, total: 5.28 s
Wall time: 15min 24s


In [52]:
print(service.get_logs())


2019-06-15T13:23:50,538429640+00:00 - rsyslog/run 
2019-06-15T13:23:50,538442240+00:00 - iot-server/run 
2019-06-15T13:23:50,539124446+00:00 - gunicorn/run 
2019-06-15T13:23:50,656866486+00:00 - nginx/run 
EdgeHubConnectionString and IOTEDGE_IOTHUBHOSTNAME are not set. Exiting...
2019-06-15T13:23:55,663481005+00:00 - iot-server/finish 1 0
2019-06-15T13:23:55,664661615+00:00 - Exit code 1 is normal. Not restarting iot-server.
Starting gunicorn 19.6.0
Listening at: http://127.0.0.1:9090 (13)
Using worker: sync
worker timeout is set to 300
Booting worker with pid: 47
Better speed can be achieved with apex installed from https://www.github.com/nvidia/apex.
Initializing logger
Starting up app insights client
Starting up request id generator
Starting up app insight hooks
Invoking user's init function
https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-uncased.tar.gz not found in cache, downloading to /tmp/tmprdr64zmp
Using TensorFlow backend.
  0%|          | 0/407873900 [00:00<?,

In [53]:
service = ws.webservices['bert-ner-srvc']

Get the scoring web service's HTTP endpoint, which accepts REST client calls. This endpoint can be shared with anyone who wants to test the web service or integrate it into an application.



In [54]:
print(service.scoring_uri)

http://6cbb2059-8488-4f6d-bd93-df25d6647efb.eastus.azurecontainer.io/score


## Test deployed service

In [61]:
import requests
import json

# send a random row from the test set to score
input_data = "Microsoft to launch AI Digital labs in India, will bring benefits to 1.5 lakh students."

headers = {'Content-Type':'application/json'}

# for AKS deployment you'd need to the service key in the header as well
# api_key = service.get_key()
# headers = {'Content-Type':'application/json',  'Authorization':('Bearer '+ api_key)} 

resp = requests.post(service.scoring_uri, input_data, headers=headers)
print("prediction:", " ".join([f"{t} ({l})" for t, l in zip(input_data.split(), json.loads(resp.text))]))

prediction: Microsoft (B-org) to (O) launch (O) AI (B-org) Digital (I-org) labs (I-org) in (O) India, (B-geo) will (O) bring (O) benefits (O) to (O) 1.5 (O) lakh (O) students. (O)


# Next Steps

Hope you enjoyed this tutorial you should now have what you need to get started fine tuning your own state of the art BERT NER models.

To learn more about the developement of contextual word embeddings such as BERT check out the second post on my series [Beyond Word Embeddings](https://towardsdatascience.com/beyond-word-embeddings-part-2-word-vectors-nlp-modeling-from-bow-to-bert-4ebd4711d0ec)