# Deploying BERT NER Models on IBM Watson Machine Learning with Hugging Face Transformers

This notebook contains steps and code to demonstrate the deployment of a BERT NER model using Hugging Face transformers on IBM Watson Machine Learning (WML) service. It includes commands for setting up the environment, creating model definitions, training the model, persisting the trained model, deploying, and scoring the model.

## Learning Goals

- Working with Watson Machine Learning service.
- Training BERT NER models using Hugging Face.
- Saving trained models in Watson Machine Learning repository.
- Online deployment and scoring of trained model.

## Contents

1. [Set up the environment](#setup)
2. [Create model definition](#model_def)
3. [Train model](#training)
4. [Persist trained model](#persist)
5. [Deploy and Score](#deploy)
6. [Clean up](#clean)
7. [Summary and next steps](#summary)

<a id="setup"></a>
## 1. Set up the environment

Before you use the sample code in this notebook, you must perform the following setup tasks:

- Contact with your Cloud Pack for Data administrator and ask them for your account credentials

### Install and import the `ibm-watsonx-ai` and dependencies

In [None]:
!pip install wget | tail -n 1
!pip install -U ibm-watsonx-ai | tail -n 1
!pip install transformers | tail -n 1
!pip install torch | tail -n 1

### Connection to WML

Authenticate the Watson Machine Learning service on IBM Cloud Pack for Data. You need to provide platform `url`, your `username`, and `api_key`.

In [None]:
from ibm_watsonx_ai import Credentials, APIClient

username = 'PASTE YOUR USERNAME HERE'
api_key = 'PASTE YOUR API_KEY HERE'
url = 'PASTE THE PLATFORM URL HERE'

credentials = Credentials(
    username=username,
    api_key=api_key,
    url=url,
    instance_id="openshift",
    version="5.0"
)

client = APIClient(credentials)

### Working with spaces

First of all, you need to create a space that will be used for your work. If you do not have space already created, you can use `{PLATFORM_URL}/ml-runtime/spaces?context=icp4data` to create one.

- Click New Deployment Space
- Create an empty space
- Go to space `Settings` tab
- Copy `space_id` and paste it below

In [None]:
space_id = 'PASTE YOUR SPACE ID HERE'

You can use `list` method to print all existing spaces.

In [None]:
client.spaces.list(limit=10)

To be able to interact with all resources available in Watson Machine Learning, you need to set **space** which you will be using.

In [None]:
client.set.default_space(space_id)

<a id="model_def"></a>
## 2. Create model definition

### 2.1 Prepare model definition metadata

In [None]:
model_definition_metadata = {
    client.model_definitions.ConfigurationMetaNames.NAME: "BERT NER Model",
    client.model_definitions.ConfigurationMetaNames.DESCRIPTION: "BERT model for Named Entity Recognition",
    client.model_definitions.ConfigurationMetaNames.COMMAND: "ner_train.py",
    client.model_definitions.ConfigurationMetaNames.PLATFORM: {"name": "python", "versions": ["3.11"]},
    client.model_definitions.ConfigurationMetaNames.VERSION: "1.0",
    client.model_definitions.ConfigurationMetaNames.SPACE_UID: space_id
}

### 2.2  Get sample model definition content file

In [None]:
import wget, os

filename = 'bert-ner-model.zip'

if not os.path.isfile(filename):
    filename = wget.download('URL_TO_YOUR_ZIP_FILE_CONTAINING_MODEL_DEFINITION')

!unzip -oqd . bert-ner-model.zip

### 2.3  Publish model definition

In [None]:
definition_details = client.model_definitions.store(filename, model_definition_metadata)
model_definition_id = client.model_definitions.get_id(definition_details)
print(model_definition_id)

#### List models definitions

In [None]:
client.model_definitions.list(limit=5)

<a id="training"></a>
## 3. Train model

#### **Note**: Ensure that training data is saved in a folder where Watson Machine Learning Accelerator is installed.

### 3.1 Prepare training metadata

In [None]:
training_metadata = {
    client.training.ConfigurationMetaNames.NAME: "BERT NER Training",
    client.training.ConfigurationMetaNames.DESCRIPTION: "Training BERT model for Named Entity Recognition",
    client.training.ConfigurationMetaNames.TRAINING_RESULTS_REFERENCE: {
        "name": "NER results",
        "connection": {},
        "location": {"path": f"spaces/{space_id}/assets/experiment"},
        "type": "fs"
    },
    client.training.ConfigurationMetaNames.MODEL_DEFINITION: {
        "id": model_definition_id,
        "hardware_spec": {"name": "K80", "nodes": 1},
        "software_spec": {"name": "pytorch-onnx_rt24.1-py3.11"}
    },
    client.training.ConfigurationMetaNames.TRAINING_DATA_REFERENCES: [
        {
            "name": "training_input_data",
            "type": "fs",
            "connection": {},
            "location": {"path": "bert-ner-dataset"},
            "schema": {"id": "idmlp_schema", "fields": [{"name": "text", "type": "string"}]}
        }
    ]
}

### 3.2 Train the model in the background

In [None]:
training = client.training.run(training_metadata)

### 3.3 Get training id and status

In [None]:
training_id = client.training.get_id(training)
print(training_id)

In [None]:
client.training.get_status(training_id)['state']

### 3.4 Get training details

In [None]:
import json

training_details = client.training.get_details(training_id)
print(json.dumps(training_details, indent=2))

#### List trainings

In [None]:
client.training.list(limit=5)

<a id="persist"></a>
## 4. Persist trained model

### 4.1 Publish model

In [None]:
software_spec_id = client.software_specifications.get_id_by_name('pytorch-onnx_rt24.1-py3.11')

model_meta_props = {
    client.repository.ModelMetaNames.NAME: "BERT NER Model",
    client.repository.ModelMetaNames.TYPE: "pytorch-onnx_2.1",
    client.repository.ModelMetaNames.SOFTWARE_SPEC_ID: software_spec_id
}

published_model_details = client.repository.store_model(training_id, meta_props=model_meta_props)
model_id = client.repository.get_model_id(published_model_details)

### 4.2 Get model details

In [None]:
model_details = client.repository.get_details(model_id)
print(json.dumps(model_details, indent=2))

#### List stored models

In [None]:
client.repository.list_models(limit=5)

<a id="deploy"></a>
## 5. Deploy and score

### 5.1 Create online deployment for published model

In [None]:
deployment = client.deployments.create(
    model_id, meta_props={
        client.deployments.ConfigurationMetaNames.NAME: "BERT NER Deployment",
        client.deployments.ConfigurationMetaNames.ONLINE: {}
    }
)

scoring_url = client.deployments.get_scoring_href(deployment)
deployment_id = client.deployments.get_id(deployment)

### 5.2 Get deployments details

In [None]:
deployments_details = client.deployments.get_details(deployment_id)
print(json.dumps(deployments_details, indent=2))

### 5.3 Score deployed model

Prepare sample scoring data:

In [None]:
from transformers import pipeline

# Load pre-trained model and tokenizer
ner_pipeline = pipeline("ner", model="dbmdz/bert-large-cased-finetuned-conll03-english")

# Sample text for testing
test_text = "Hugging Face Inc. is a company based in New York City. Its headquarters are in DUMBO, therefore very close to the Manhattan Bridge."

# Get NER predictions
ner_results = ner_pipeline(test_text)
print(ner_results)

<a id="clean"></a>
## 6. Clean up

If you want to clean up all created assets:
- experiments
- trainings
- pipelines
- model definitions
- models
- functions
- deployments

please follow this sample [notebook](https://github.com/IBM/watson-machine-learning-samples/blob/master/cpd5.0/notebooks/python_sdk/instance-management/Machine%20Learning%20artifacts%20management.ipynb).

<a id="summary"></a>
## 7. Summary and next steps

You have successfully deployed a BERT NER model using Hugging Face transformers on IBM Watson Machine Learning. You learned how to set up the environment, create model definitions, train the model, persist the trained model, and deploy and score the model. 

For more information, check out the [Online Documentation](https://ibm.github.io/watsonx-ai-python-sdk/samples.html) for more samples, tutorials, and guidance on using IBM Watson Machine Learning with Hugging Face models. 

### Next Steps
1. Experiment with fine-tuning the BERT NER model on a custom dataset.
2. Explore deploying other transformer models available on Hugging Face.
3. Integrate the deployed model into an application for real-world usage.