## Following code in notebook is solely to demonstrate the usage of pre-trained models from HuggingFace and deploy in Azure by leveraging the services



### Deploy question-answering models from HuggingFaceHub to AzureML Online Endpoints

This sample shows how to deploy `deepset-roberta-base-squad2` `question-answering` models from the HuggingFaceHub to an online endpoint for inference. Learn more about `question-answering` task: https://huggingface.co/tasks/question-answering

A large set of models hosted on [Hugging Face Hub](https://huggingface.co/models) are available in the Hugging Face Hub collection in AzureML Model Catalog. This collection is powered by the Hugging Face Hub community registry. Integration with the AzureML Model Catalog enables seamless deployment of Hugging Face Hub models in AzureML. _todo: learn more link_

### Outline
* Set up pre-requisites.
* Pick a model to deploy.
* Deploy the model for real time inference.
* Try sample inference.
* Clean up resources.

### Set up pre-requisites
* Install dependencies
* Connect to AzureML Workspace. Learn more at [set up SDK authentication](https://learn.microsoft.com/en-us/azure/machine-learning/how-to-setup-authentication?tabs=sdk). Replace  `<WORKSPACE_NAME>`, `<RESOURCE_GROUP>` and `<SUBSCRIPTION_ID>` below.
* Connect to `HuggingFaceHub` community registry

In [4]:
from azure.ai.ml import MLClient
from azure.identity import (
    DefaultAzureCredential,
    InteractiveBrowserCredential,
    ClientSecretCredential,
)
from azure.ai.ml.entities import AmlCompute
import time

try:
    credential = DefaultAzureCredential()
    credential.get_token("https://management.azure.com/.default")
except Exception as ex:
    credential = InteractiveBrowserCredential()

# connect to a workspace
workspace_ml_client = None
try:
    workspace_ml_client = MLClient.from_config(credential)
    subscription_id = workspace_ml_client.subscription_id
    workspace = workspace_ml_client.workspace_name
    resource_group = workspace_ml_client.resource_group_name
except Exception as ex:
    print(ex)
    
    # Enter details of your workspace
    # subscription_id="<SUBSCRIPTION_ID>",
    # resource_group ="<RESOURCE_GROUP>",
    # workspace = "<WORKSPACE_NAME>",
    
    workspace_ml_client = MLClient(
        credential, subscription_id, resource_group, workspace
    )
# Connect to the HuggingFaceHub registry
registry_ml_client = MLClient(credential, registry_name="HuggingFace")
print(registry_ml_client)

Found the config file in: /config.json


MLClient(credential=<azure.identity._credentials.default.DefaultAzureCredential object at 0x7faf643b45e0>,
         subscription_id=d5210851-d1ca-44ac-8071-a0c7191e9631,
         resource_group_name=prod-azure-ml-registry,
         workspace_name=None)


### Pick a model to deploy

Check if the model `deepset-roberta-base-squad2` exists in Azure model registry

In [5]:
model_name = "deepset-roberta-base-squad2"
foundation_model = registry_ml_client.models.get(model_name, version="17")
print(
    "\n\nUsing model name: {0}, version: {1}, id: {2} for inferencing".format(
        foundation_model.name, foundation_model.version, foundation_model.id
    )
)



Using model name: deepset-roberta-base-squad2, version: 17, id: azureml://registries/HuggingFace/models/deepset-roberta-base-squad2/versions/17 for inferencing


### Deploy the model to an online endpoint
Online endpoints give a durable REST API that can be used to integrate with applications that need to use the model. Create an online endpoint and then create an online deployment. You need to specify the Virtual Machine instance or SKU when creating the deployment. You can find the optimal CPU or GPU SKU for a model by opening the quick deployment dialog from the model page in the AzureML Model Catalog. Specify the SKU in the `instance_type` input in deployment settings below.

Typically Online Endpoints require you to provide scoring script and a docker container image (through an AzureML environment), in addition to the model. You don't need to worry about them for HuggingFace Hub models available in AzureML Model Catalog because we have enabled 'no code deployments' for these models by packaging scoring script and container image along with the model.

Learn more about Online Endpoints: https://learn.microsoft.com/en-us/azure/machine-learning/how-to-deploy-online-endpoints

In [3]:
import time, sys
from azure.ai.ml.entities import (
    ManagedOnlineEndpoint,
    ManagedOnlineDeployment,
    OnlineRequestSettings,
)

# Create online endpoint - endpoint names need to be unique in a region, hence using timestamp to create unique endpoint name
timestamp = int(time.time())
online_endpoint_name = "question-answering-" + 'OpsChatBot'
# create an online endpoint
endpoint = ManagedOnlineEndpoint(
    name=online_endpoint_name,
    description="Online endpoint for "
    + foundation_model.name
    + ", for question-answering task",
    auth_mode="key",
)
workspace_ml_client.begin_create_or_update(endpoint).wait()

In [4]:

demo_deployment = ManagedOnlineDeployment(
    name="demo",
    endpoint_name=online_endpoint_name,
    model=foundation_model.id,
    instance_type="Standard_DS3_v2",
    instance_count=1,
)
workspace_ml_client.online_deployments.begin_create_or_update(demo_deployment).wait()

# online endpoints can have multiple deployments with traffic split or shadow traffic. Set traffic to 100% for demo deployment
endpoint.traffic = {"demo": 100}
workspace_ml_client.begin_create_or_update(endpoint).result()

Check: endpoint question-answering-OpsChatBot exists


.......................................................................................................................

ManagedOnlineEndpoint({'public_network_access': 'Enabled', 'provisioning_state': 'Succeeded', 'scoring_uri': 'https://question-answering-opschatbot.eastus2.inference.ml.azure.com/score', 'openapi_uri': 'https://question-answering-opschatbot.eastus2.inference.ml.azure.com/swagger.json', 'name': 'question-answering-opschatbot', 'description': 'Online endpoint for deepset-roberta-base-squad2, for question-answering task', 'tags': {'DeploymentId': '1178393', 'LaunchId': '37178', 'LaunchType': 'ON_DEMAND_LAB', 'TemplateId': '7490', 'TenantId': '277'}, 'properties': {'azureml.onlineendpointid': '/subscriptions/aea5d50a-8c8c-4b2f-ac7f-dea01e3b15f2/resourcegroups/talent-acquisition-1178393/providers/microsoft.machinelearningservices/workspaces/gowtham-ml/onlineendpoints/question-answering-opschatbot', 'AzureAsyncOperationUri': 'https://management.azure.com/subscriptions/aea5d50a-8c8c-4b2f-ac7f-dea01e3b15f2/providers/Microsoft.MachineLearningServices/locations/eastus2/mfeOperationsStatus/oe:d89

### Try sample inference

Online endpoints expose a REST API that can be integrated into your applications. Learn how to fetch the scoring REST API and credentials for online endpoints here: https://learn.microsoft.com/en-us/azure/machine-learning/how-to-authenticate-online-endpoint

In this example, we will use the Python SDK helper method to invoke the endpoint. 

In [22]:
# Get the model object from HuggingFaceHub. We can use it to check for sample test data
import urllib.request, json

raw_data = urllib.request.urlopen(
    "https://huggingface.co/api/models/" + foundation_model.tags["modelId"]
)

print("https://huggingface.co/api/models/" + foundation_model.tags["modelId"])
data = json.load(raw_data)

print('modelId ', data['modelId'], ',Author', data['author'], ',Last Modified', data['lastModified'])


https://huggingface.co/api/models/deepset/roberta-base-squad2
modelId  deepset/roberta-base-squad2 ,Author deepset ,Last Modified 2023-09-26T11:36:30.000Z


Printing the sample dataset from HuggingFace for model `deepset/roberta-base-squad2`

In [18]:
print(json.dumps(data["widgetData"], indent=2))

[
  {
    "text": "Where do I live?",
    "context": "My name is Wolfgang and I live in Berlin"
  },
  {
    "text": "Where do I live?",
    "context": "My name is Sarah and I live in London"
  },
  {
    "text": "What's my name?",
    "context": "My name is Clara and I live in Berkeley."
  },
  {
    "text": "Which name is also used to describe the Amazon rainforest in English?",
    "context": "The Amazon rainforest (Portuguese: Floresta Amaz\u00f4nica or Amaz\u00f4nia; Spanish: Selva Amaz\u00f3nica, Amazon\u00eda or usually Amazonia; French: For\u00eat amazonienne; Dutch: Amazoneregenwoud), also known in English as Amazonia or the Amazon Jungle, is a moist broadleaf forest that covers most of the Amazon basin of South America. This basin encompasses 7,000,000 square kilometres (2,700,000 sq mi), of which 5,500,000 square kilometres (2,100,000 sq mi) are covered by the rainforest. This region includes territory belonging to nine nations. The majority of the forest is contained within

In [7]:
# check if there is sample inference data available on HuggingFaceHub for the model, else try with the backup sample data
scoring_file = "./sample_score.json"
inputs = {}
input_question = []
input_context = []
if "widgetData" in data:
    for input in data["widgetData"]:
        input_question.append(input["text"])
        input_context.append(input["context"])
    inputs["question"] = input_question
    inputs["context"] = input_context
    # write the sample_score.json file
    score_dict = {"inputs": inputs}
    with open(scoring_file, "w") as outfile:
        json.dump(score_dict, outfile, indent=2)
else:
    scoring_file = "./sample_score_backup.json"

# print the sample scoring file
print("\n\nSample scoring file: ")
with open(scoring_file) as json_file:
    scoring_data = json.load(json_file)
    print(scoring_data)



Sample scoring file: 
{'inputs': {'question': ['Where do I live?', 'Where do I live?', "What's my name?", 'Which name is also used to describe the Amazon rainforest in English?'], 'context': ['My name is Wolfgang and I live in Berlin', 'My name is Sarah and I live in London', 'My name is Clara and I live in Berkeley.', 'The Amazon rainforest (Portuguese: Floresta Amazônica or Amazônia; Spanish: Selva Amazónica, Amazonía or usually Amazonia; French: Forêt amazonienne; Dutch: Amazoneregenwoud), also known in English as Amazonia or the Amazon Jungle, is a moist broadleaf forest that covers most of the Amazon basin of South America. This basin encompasses 7,000,000 square kilometres (2,700,000 sq mi), of which 5,500,000 square kilometres (2,100,000 sq mi) are covered by the rainforest. This region includes territory belonging to nine nations. The majority of the forest is contained within Brazil, with 60% of the rainforest, followed by Peru with 13%, Colombia with 10%, and with minor amo

In [8]:
# score the sample_score.json file using the online endpoint with the azureml endpoint invoke method

response = workspace_ml_client.online_endpoints.invoke(
    endpoint_name=online_endpoint_name,
    deployment_name="demo",
    request_file=scoring_file,
)
response_json = json.loads(response)
print(json.dumps(response_json, indent=2))

# workspace_ml_client.online_deployments.get(demo_deployment)

[
  {
    "score": 0.9190715551376343,
    "start": 34,
    "end": 40,
    "answer": "Berlin"
  },
  {
    "score": 0.7772308588027954,
    "start": 31,
    "end": 37,
    "answer": "London"
  },
  {
    "score": 0.9326565265655518,
    "start": 11,
    "end": 16,
    "answer": "Clara"
  },
  {
    "score": 0.750623881816864,
    "start": 201,
    "end": 230,
    "answer": "Amazonia or the Amazon Jungle"
  }
]


In [9]:
# score the sample_score.json file using the online endpoint with the azureml endpoint invoke method

response = workspace_ml_client.online_endpoints.invoke(
    endpoint_name=online_endpoint_name,
    deployment_name="demo",
    request_file=scoring_file,
)
response_json = json.loads(response)
print(json.dumps(response_json, indent=2))



[
  {
    "score": 0.9190715551376343,
    "start": 34,
    "end": 40,
    "answer": "Berlin"
  },
  {
    "score": 0.7772308588027954,
    "start": 31,
    "end": 37,
    "answer": "London"
  },
  {
    "score": 0.9326565265655518,
    "start": 11,
    "end": 16,
    "answer": "Clara"
  },
  {
    "score": 0.750623881816864,
    "start": 201,
    "end": 230,
    "answer": "Amazonia or the Amazon Jungle"
  }
]


Let's make the output prettier and print it in below format:

"Context: " 

"Question: " 

"Answer: "

In [10]:
for i in range(len(input_question)):
    print("Context: " + input_context[i])
    print("Question: " + input_question[i])
    print("Answer: " + response_json[i]["answer"])
    print("\n")

Context: My name is Wolfgang and I live in Berlin
Question: Where do I live?
Answer: Berlin


Context: My name is Sarah and I live in London
Question: Where do I live?
Answer: London


Context: My name is Clara and I live in Berkeley.
Question: What's my name?
Answer: Clara


Context: The Amazon rainforest (Portuguese: Floresta Amazônica or Amazônia; Spanish: Selva Amazónica, Amazonía or usually Amazonia; French: Forêt amazonienne; Dutch: Amazoneregenwoud), also known in English as Amazonia or the Amazon Jungle, is a moist broadleaf forest that covers most of the Amazon basin of South America. This basin encompasses 7,000,000 square kilometres (2,700,000 sq mi), of which 5,500,000 square kilometres (2,100,000 sq mi) are covered by the rainforest. This region includes territory belonging to nine nations. The majority of the forest is contained within Brazil, with 60% of the rainforest, followed by Peru with 13%, Colombia with 10%, and with minor amounts in Venezuela, Ecuador, Bolivia, G

### Delete the online endpoint
Don't forget to delete the online endpoint, else you will leave the billing meter running for the compute used by the endpoint.

In [2]:

# workspace_ml_client.online_deployments.begin_delete(demo_deployment)

{'score': 0.2117144614458084, 'start': 59, 'end': 84, 'answer': 'gives freedom to the user'}
