# CPU Inference on ONNX Runtime on Azure Kubernetes Service#### Video 
Vision Transformer converted to ONNXs

This example shows how to deploy an image classification neural network using ONNX Runtime on GPU compute SKUs in Azur In this example we useVideo  a Vision Transformer fine tuned with a custom datased to detect images non-safe for workrme


#### 1. Prerequisites to install A
Please restart kernel after pip installs to sync environment with new modules.

In [1]:
#! pip install matplotlib onnx opencv-python
# pip install azure-ai-ml azure-identity datasets azure-cli mlflow

#### 2. Connect to Azure Machine Learning workspace

Before we dive in the code, you'll need to connect to your workspace. The workspace is the top-level resource for Azure Machine Learning, providing a centralized place to work with all the artifacts you create when you use Azure Machine Learning.

For this lab, we've already setup an AzureML Workspace for you. If you'd like to learn more about `Workspace`s, please reference [`AzureML's documentation`](https://learn.microsoft.com/en-us/azure/machine-learning/how-to-manage-workspace?view=azureml-api-2&tabs=azure-portal).

We are using the `DefaultAzureCredential` to get access to workspace. `DefaultAzureCredential` should be capable of handling most scenarios. If you want to learn more about other available credentials, go to [`Set up authentication`](https://learn.microsoft.com/en-us/azure/machine-learning/how-to-setup-authentication?tabs=sdk&view=azureml-api-2) for more available credentials.

In [2]:
from azure.identity import DefaultAzureCredential
from azure.ai.ml import MLClient
from azure.ai.ml import Input
from azure.ai.ml.dsl import pipeline
from dotenv import dotenv_values
from dotenv import load_dotenv
from utils.login import get_ws_client
from utils.datasets import get_labels_dataset, create_datasets
from utils.computer import create_gpu_cluster
from azure.ai.ml.entities import Model
from azure.ai.ml.constants import AssetTypes

### .env file to use your service principal account

```md
RESOURCE_GROUP=
SUBSCRIPTION_ID=
AZUREML_WORKSPACE_NAME=
TENANTID=
AZURE_CLIENT_ID=
AZURE_TENANT_ID=
AZURE_CLIENT_SECRET=
```

In [5]:
# Load env and login to Workspace
load_dotenv(".env")
config = dotenv_values(".env")


# Enter details of your Azure Machine Learning workspace
subscription_id = config.get("SUBSCRIPTION_ID")
resource_group = config.get("RESOURCE_GROUP")
workspace = 'azure-ml-2'#config.get("AZUREML_WORKSPACE_NAME")

In [43]:
credential = DefaultAzureCredential()
# Check if given credential can get token successfully.
credential.get_token("https://management.azure.com/.default")


ml_client = get_ws_client(
    credential, subscription_id, resource_group, workspace
)
print(ml_client)

MLClient(credential=<azure.identity._credentials.default.DefaultAzureCredential object at 0x0000016AC0BAB790>,
         subscription_id=5a8ec57c-47f9-4bc3-aee5-9e4db1b89345,
         resource_group_name=olonok-ml,
         workspace_name=azure-ml-2)


#### Model Converted to ONNX
#### Video
LLMOps: Convert Video Classifier (ViViT ) to ONNX, Inference on a CPU  --> https://youtu.be/-vjr0IjH4Nc
#### code
https://github.com/olonok69/LLM_Notebooks/tree/main/video/convert_onnx


#### Model Base 
https://huggingface.co/google/vivit-b-16x2-kinetics400

### Fine Tuning with own Dataset
#### Code
https://github.com/olonok69/LLM_Notebooks/tree/main/video/fine_tune_ViViT

#### Video
LLMOps: Fine Tune Video Classifier (ViViT ) with your own data --> https://youtu.be/XNMU_bm0Xwc



In [48]:
path_model ="onnx_vivit/vivit.onnx"


#### Load Azure ML workspace



In [49]:
# Check core SDK version number
import azureml.core
from azureml.core import Workspace
print("SDK version:", azureml.core.VERSION)


SDK version: 1.57.0


In [50]:
from azureml.core import Workspace

# read existing workspace from config.json
ws = Workspace(subscription_id=subscription_id, resource_group=resource_group, workspace_name=workspace)

print(ws.subscription_id, ws.location, ws.resource_group, ws.name, sep = '\n')

5a8ec57c-47f9-4bc3-aee5-9e4db1b89345
uksouth
olonok-ml
azure-ml-2


#### Register your ONNX model with Azure ML



In [12]:
from azure.ai.ml.entities import Model
from azure.ai.ml.constants import AssetTypes


In [13]:
model_dir = "onnx_vitit/vivit.onnx" # replace this with the location of your model files


file_model = Model(
    path= path_model,
    type=AssetTypes.CUSTOM_MODEL,
    name="vivit_classifier",
    description= "Video classification",
    tags={"type": "google/vivit-b-16x2-kinetics400", "format":"onnx", "fine_tuned": "10 classes from https://www.crcv.ucf.edu/research/data-sets/ucf101/"}
)
ml_client.models.create_or_update(file_model)



[32mUploading vivit.onnx[32m (< 1 MB): 273MB [09:36, 473kB/s]                                                                                                                              [0m
[39m



Model({'job_name': None, 'intellectual_property': None, 'is_anonymous': False, 'auto_increment_version': False, 'auto_delete_setting': None, 'name': 'vivit_classifier', 'description': 'Video classification', 'tags': {'type': 'google/vivit-b-16x2-kinetics400', 'format': 'onnx', 'fine_tuned': '10 classes from https://www.crcv.ucf.edu/research/data-sets/ucf101/'}, 'properties': {}, 'print_as_yaml': True, 'id': '/subscriptions/5a8ec57c-47f9-4bc3-aee5-9e4db1b89345/resourceGroups/olonok-ml/providers/Microsoft.MachineLearningServices/workspaces/azure-ml-2/models/vivit_classifier/versions/1', 'Resource__source_path': None, 'base_path': 'D:\\repos\\onnx\\azureml', 'creation_context': <azure.ai.ml.entities._system_data.SystemData object at 0x0000016AABF03F10>, 'serialize': <msrest.serialization.Serializer object at 0x0000016AABF41FD0>, 'version': '1', 'latest_version': None, 'path': 'azureml://subscriptions/5a8ec57c-47f9-4bc3-aee5-9e4db1b89345/resourceGroups/olonok-ml/workspaces/azure-ml-2/datas

In [14]:
models = ws.models
for name, m in models.items():
    print("Name:", name,"\tVersion:", m.version, "\tDescription:", m.description, m.tags)

Name: vivit_classifier 	Version: 1 	Description: Video classification {'type': 'google/vivit-b-16x2-kinetics400', 'format': 'onnx', 'fine_tuned': '10 classes from https://www.crcv.ucf.edu/research/data-sets/ucf101/'}


In [15]:
#model_path = ml_client.models.download("onnx_nsfw", version=1)

In [51]:
import av
import json
import base64
import io
import numpy as np
def read_video_pyav(container, indices):
    '''
    Decode the video with PyAV decoder.
    Args:
        container (`av.container.input.InputContainer`): PyAV container.
        indices (`List[int]`): List of frame indices to decode.
    Returns:
        result (np.ndarray): np array of decoded frames of shape (num_frames, height, width, 3).
    '''
    frames = []
    container.seek(0)
    start_index = indices[0]
    end_index = indices[-1]
    for i, frame in enumerate(container.decode(video=0)):
        if i > end_index:
            break
        if i >= start_index and i in indices:
            reformatted_frame = frame.reformat(width=224,height=224)
            frames.append(reformatted_frame)
    new=np.stack([x.to_ndarray(format="rgb24") for x in frames])

    return new


def sample_frame_indices(clip_len, frame_sample_rate, seg_len):
    '''
    Sample a given number of frame indices from the video.
    Args:
        clip_len (`int`): Total number of frames to sample.
        frame_sample_rate (`int`): Sample every n-th frame.
        seg_len (`int`): Maximum allowed index of sample's last frame.
    Returns:
        indices (`List[int]`): List of sampled frame indices
    '''
    converted_len = int(clip_len * frame_sample_rate)
    end_idx = np.random.randint(converted_len, seg_len)
    start_idx = end_idx - converted_len
    indices = np.linspace(start_idx, end_idx, num=clip_len)
    indices = np.clip(indices, start_idx, end_idx - 1).astype(np.int64)
    return indices

def get_key(dict, value):
    """
    return key given a value. From a dictionary
    """
    for key, val in dict.items():
        if val == value:
            return key
    return "Value not found"

# Simulate a request to build the score.py file

In [53]:
file_name = "./image/6540601-uhd_2560_1440_25fps.mp4"
with open(file_name, "rb") as f:
    video = f.read()
    f.close()
    im_b64 = base64.b64encode(video).decode("utf-8")

In [55]:
input_data = json.dumps({'data': im_b64})
requests_json = json.loads(input_data.encode())

In [56]:
v = base64.b64decode(requests_json.get("data").encode("utf-8"))

In [57]:
iov = io.BytesIO(v)

In [58]:
type(iov)

_io.BytesIO

In [59]:
container= av.open(io.BytesIO(v))

In [60]:
container.streams.video[0].frames

399

In [188]:
indices = sample_frame_indices(clip_len=10, frame_sample_rate=2,seg_len=container.streams.video[0].frames)

In [189]:
indices

array([44, 46, 48, 50, 52, 55, 57, 59, 61, 63], dtype=int64)

In [191]:
video = read_video_pyav(container=container, indices=indices)

In [192]:
video.shape

(10, 224, 224, 3)

In [65]:
from transformers import VivitImageProcessor
import onnxruntime

In [66]:
ort_sess = onnxruntime.InferenceSession(
    "./onnx_vivit/vivit.onnx", providers=["CPUExecutionProvider"]
)
image_processor = VivitImageProcessor.from_pretrained("google/vivit-b-16x2-kinetics400")

input_name = ort_sess.get_inputs()[0].name
output_name = ort_sess.get_outputs()[0].name

In [67]:
label_dic = {'ApplyEyeMakeup':0, 'ApplyLipstick':1, 'Archery':2, 'BabyCrawling':3, 'BalanceBeam':4, 'BandMarching':5, 
             'BaseballPitch':6, 'Basketball':7,'BasketballDunk':8, 'BenchPress':9}

In [68]:
inputs_t = np.array(image_processor(list(video), return_tensors="pt")['pixel_values'])

In [69]:
inputs_t.shape

(1, 10, 3, 224, 224)

In [70]:
outputs = ort_sess.run([output_name], {input_name: inputs_t})[0]

In [71]:
# Get Logits
logits = np.array(outputs)
# Get Probabilities
probabilities = np.exp(logits) / np.sum(np.exp(logits), axis=1, keepdims=True)
# Get Pedicted Class
predicted_class = np.argmax(probabilities, axis=1)

In [72]:
predicted_class

array([2], dtype=int64)

In [73]:
print(f"Predicted classes: {predicted_class[0]}, label: { get_key(label_dic, predicted_class[0])}")
print("\n")
output_probs = {}
print("All Probabilities:")
for prob, key in zip(probabilities[0], range(0, len(probabilities[0]))):
    label = get_key(label_dic, key)
    output_probs[label] = float(prob)

print(output_probs)

Predicted classes: 2, label: Archery


All Probabilities:
{'ApplyEyeMakeup': 0.0025050577241927385, 'ApplyLipstick': 0.0018224065424874425, 'Archery': 0.9428067803382874, 'BabyCrawling': 0.002071560360491276, 'BalanceBeam': 0.004101442638784647, 'BandMarching': 0.009591354057192802, 'BaseballPitch': 0.0006394461379386485, 'Basketball': 0.007728622294962406, 'BasketballDunk': 0.0010978842619806528, 'BenchPress': 0.02763550356030464}


# Create Scoring file

In [158]:
%%writefile onnx_vivit/score.py
import json
import numpy as np
import onnxruntime
from transformers import VivitImageProcessor
import sys
import os
from azureml.core.model import Model
import time
import av
import base64
import io


def init():
    global session, input_name, output_name, image_processor, label_dic
    model = "./vivit.onnx"
    # Load the model in onnx runtime to start the session    
    session = onnxruntime.InferenceSession(model, providers=["CPUExecutionProvider"])
    input_name = session.get_inputs()[0].name
    output_name = session.get_outputs()[0].name 
    image_processor = VivitImageProcessor.from_pretrained("google/vivit-b-16x2-kinetics400")
    label_dic = {'ApplyEyeMakeup':0, 'ApplyLipstick':1, 'Archery':2, 'BabyCrawling':3, 'BalanceBeam':4, 'BandMarching':5, 'BaseballPitch':6, 'Basketball':7,'BasketballDunk':8, 'BenchPress':9}
    
def run(input_data):
    '''Purpose: evaluate test input in Azure Cloud using onnxruntime.
        We will call the run function later from our Jupyter Notebook 
        so our azure service can evaluate our model input in the cloud. '''

    try:
        # We expect a video im base64 format in the attribute data. Load the data and decode the base64
        requests_json = json.loads(input_data.encode())
        v = base64.b64decode(requests_json.get("data").encode("utf-8"))
        #create container from bytes request
        container= av.open(io.BytesIO(v))
        #Sample Frames
        indices = sample_frame_indices(clip_len=10, frame_sample_rate=1,seg_len=container.streams.video[0].frames)
        # create video with sampled frames
        video = read_video_pyav(container=container, indices=indices)
        # tokenize video
        data = np.array(image_processor(list(video), return_tensors="pt")['pixel_values'])
        #### INFERENCE ONNX #####
        # pass input data to do model inference with ONNX Runtime
        start = time.time()
        r = session.run([output_name], {input_name : data})
        end = time.time()
        probabilities, predicted_class = postprocess(r[0])
        # predicted class and label
        class_label = predicted_class[0]
        label = get_key(label_dic, class_label)
        
        result = label_map(probabilities)
        result['predicted_label'] = label
        result['predicted_calss'] = int(class_label)
        
        result_dict = {"result": result,
                      "time_in_sec": [end - start]}
    except Exception as e:
        result_dict = {"error": str(e)}
    
    return json.dumps(result_dict)



def label_map(probs, threshold=.5):
    """Take the most probable labels (output of postprocess) and returns the 
    probs of each label."""
    # labels and dictionary to 
    label_dic = {'ApplyEyeMakeup':0, 'ApplyLipstick':1, 'Archery':2, 'BabyCrawling':3, 'BalanceBeam':4, 'BandMarching':5, 
             'BaseballPitch':6, 'Basketball':7,'BasketballDunk':8, 'BenchPress':9}
    output_probs = {}
    image_preds = {}
    for prob, key in zip(probs[0], range(0, len(probs[0]))):
        label = get_key(label_dic, key)
        output_probs[label] = float(prob)

    image_preds["class_probabilities"] = output_probs
    return image_preds

def get_key(dict, value):
    """
    return key given a value. From a dictionary
    """
    for key, val in dict.items():
        if val == value:
            return key
    return "Value not found"

def postprocess(scores):
    """This function takes the scores generated by the network and 
    returns the class IDs in decreasing order of probability."""
    logits = np.array(scores)
    probabilities = np.exp(logits) / np.sum(np.exp(logits), axis=1, keepdims=True)
    predicted_class = np.argmax(probabilities, axis=1)
    
    return probabilities, predicted_class


def read_video_pyav(container, indices):
    '''
    Decode the video with PyAV decoder.
    Args:
        container (`av.container.input.InputContainer`): PyAV container.
        indices (`List[int]`): List of frame indices to decode.
    Returns:
        result (np.ndarray): np array of decoded frames of shape (num_frames, height, width, 3).
    '''
    frames = []
    container.seek(0)
    start_index = indices[0]
    end_index = indices[-1]
    for i, frame in enumerate(container.decode(video=0)):
        if i > end_index:
            break
        if i >= start_index and i in indices:
            reformatted_frame = frame.reformat(width=224,height=224)
            frames.append(reformatted_frame)
    new=np.stack([x.to_ndarray(format="rgb24") for x in frames])

    return new


def sample_frame_indices(clip_len, frame_sample_rate, seg_len):
    '''
    Sample a given number of frame indices from the video.
    Args:
        clip_len (`int`): Total number of frames to sample.
        frame_sample_rate (`int`): Sample every n-th frame.
        seg_len (`int`): Maximum allowed index of sample's last frame.
    Returns:
        indices (`List[int]`): List of sampled frame indices
    '''
    converted_len = int(clip_len * frame_sample_rate)
    end_idx = np.random.randint(converted_len, seg_len)
    start_idx = end_idx - converted_len
    indices = np.linspace(start_idx, end_idx, num=clip_len)
    indices = np.clip(indices, start_idx, end_idx - 1).astype(np.int64)
    return indices

Overwriting onnx_vivit/score.py


##### Create Endpoint

In [76]:
from azure.ai.ml.entities import (
    ManagedOnlineEndpoint,
    ManagedOnlineDeployment,
    Model,
    Environment,
    CodeConfiguration,
BuildContext
)

In [77]:

# Define an endpoint name
endpoint_name = "endpt-vivit-inference-onnx"

# Example way to define a random name
import datetime

# Create ManagedOnlineEndpoint

In [78]:


#endpoint_name = "endpt-" + datetime.datetime.now().strftime("%m%d%H%M%f")

# create an online endpoint
endpoint = ManagedOnlineEndpoint(
    name = endpoint_name, 
    description="this is a endpoint for onnx inference Video ViT classification model",
    auth_mode="key"
)

ml_client.online_endpoints.begin_create_or_update(endpoint).wait()
ml_client.begin_create_or_update(endpoint).result()

ManagedOnlineEndpoint({'public_network_access': 'Enabled', 'provisioning_state': 'Succeeded', 'scoring_uri': 'https://endpt-vivit-inference-onnx.uksouth.inference.ml.azure.com/score', 'openapi_uri': 'https://endpt-vivit-inference-onnx.uksouth.inference.ml.azure.com/swagger.json', 'name': 'endpt-vivit-inference-onnx', 'description': 'this is a endpoint for onnx inference Video ViT classification model', 'tags': {}, 'properties': {'azureml.onlineendpointid': '/subscriptions/5a8ec57c-47f9-4bc3-aee5-9e4db1b89345/resourcegroups/olonok-ml/providers/microsoft.machinelearningservices/workspaces/azure-ml-2/onlineendpoints/endpt-vivit-inference-onnx', 'AzureAsyncOperationUri': 'https://management.azure.com/subscriptions/5a8ec57c-47f9-4bc3-aee5-9e4db1b89345/providers/Microsoft.MachineLearningServices/locations/uksouth/mfeOperationsStatus/oeidp:31ae886d-ee28-4a4f-af29-64989f2a9076:c8a58ebb-305e-479b-a21c-0366f0e399ee?api-version=2022-02-01-preview'}, 'print_as_yaml': True, 'id': '/subscriptions/5a

In [117]:
#image = "mcr.microsoft.com/azureml/curated/acpt-pytorch-2.2-cuda12.1:13"

# Create Env from local Dockerfile 

In [118]:
env = Environment(
    build=BuildContext(path="docker"),
    name="docker-vivit",
    description="Environment Vivit.",
)
env_log= ml_client.environments.create_or_update(env)

In [119]:
env_log 

Environment({'intellectual_property': None, 'is_anonymous': False, 'auto_increment_version': False, 'auto_delete_setting': None, 'name': 'docker-vivit', 'description': 'Environment Vivit.', 'tags': {}, 'properties': {'azureml.labels': 'latest'}, 'print_as_yaml': True, 'id': '/subscriptions/5a8ec57c-47f9-4bc3-aee5-9e4db1b89345/resourceGroups/olonok-ml/providers/Microsoft.MachineLearningServices/workspaces/azure-ml-2/environments/docker-vivit/versions/3', 'Resource__source_path': None, 'base_path': 'D:\\repos\\onnx\\azureml', 'creation_context': <azure.ai.ml.entities._system_data.SystemData object at 0x0000016ABFDBD5D0>, 'serialize': <msrest.serialization.Serializer object at 0x0000016AC0A37E50>, 'version': '3', 'latest_version': None, 'conda_file': None, 'image': None, 'build': <azure.ai.ml.entities._assets.environment.BuildContext object at 0x0000016AC251E990>, 'inference_config': None, 'os_type': 'Linux', 'arm_type': 'environment_version', 'conda_file_path': None, 'path': None, 'datas

# Print Models

In [38]:
# for m in ml_client.models.list():
#     print(m)

In [39]:
# for env1 in ml_client.environments.list():
#     print(env1)

In [40]:
#image = "mcr.microsoft.com/azureml/curated/acpt-pytorch-2.2-cuda12.1:13"

In [96]:
# env = Environment(
#     conda_file="./conda.yml",
#     image=image
# )
# env
model = ml_client.models.get(name="vivit_classifier", version = 1)

In [97]:
env

Environment({'intellectual_property': None, 'is_anonymous': False, 'auto_increment_version': False, 'auto_delete_setting': None, 'name': 'docker-vivit', 'description': 'Environment Vivit.', 'tags': {}, 'properties': {}, 'print_as_yaml': True, 'id': None, 'Resource__source_path': None, 'base_path': 'D:\\repos\\onnx\\azureml', 'creation_context': None, 'serialize': <msrest.serialization.Serializer object at 0x0000016AC25DDD50>, 'version': '2', 'latest_version': None, 'conda_file': None, 'image': None, 'build': <azure.ai.ml.entities._assets.environment.BuildContext object at 0x0000016ABFD16BD0>, 'inference_config': None, 'os_type': None, 'arm_type': 'environment_version', 'conda_file_path': None, 'path': WindowsPath('D:/repos/onnx/azureml/docker'), 'datastore': None, 'upload_hash': None, 'translated_conda_file': None})

In [94]:
#ml_client.models.download(name="onnx_emotion", version = 1)

##### 9.3 Deploy scoring file to the endpoint

In [163]:
endpoint_name

'endpt-vivit-inference-onnx'

In [164]:
# Get Env
env = ml_client.environments.get(name="docker-vivit", version="2")

In [165]:
type(env)

azure.ai.ml.entities._assets.environment.Environment

In [166]:
env

Environment({'intellectual_property': None, 'is_anonymous': False, 'auto_increment_version': False, 'auto_delete_setting': None, 'name': 'docker-vivit', 'description': 'Environment Vivit.', 'tags': {}, 'properties': {'azureml.labels': ''}, 'print_as_yaml': True, 'id': '/subscriptions/5a8ec57c-47f9-4bc3-aee5-9e4db1b89345/resourceGroups/olonok-ml/providers/Microsoft.MachineLearningServices/workspaces/azure-ml-2/environments/docker-vivit/versions/2', 'Resource__source_path': None, 'base_path': 'D:\\repos\\onnx\\azureml', 'creation_context': <azure.ai.ml.entities._system_data.SystemData object at 0x0000016AAFBA7550>, 'serialize': <msrest.serialization.Serializer object at 0x0000016AC0A3B910>, 'version': '2', 'latest_version': None, 'conda_file': None, 'image': None, 'build': <azure.ai.ml.entities._assets.environment.BuildContext object at 0x0000016AC241B350>, 'inference_config': None, 'os_type': 'Linux', 'arm_type': 'environment_version', 'conda_file_path': None, 'path': None, 'datastore':

# Create Deployment

In [167]:


blue_deployment = ManagedOnlineDeployment(
    name="vivit-onnx",
    endpoint_name=endpoint_name,
    model=model,
    environment=env,
    code_configuration=CodeConfiguration(
        code="./onnx_vivit", scoring_script="score.py"
    ),
    instance_type="Standard_NC4as_T4_v3",
    instance_count=1,
)

ml_client.online_deployments.begin_create_or_update(blue_deployment)

Check: endpoint endpt-vivit-inference-onnx exists
Your file exceeds 100 MB. If you experience low speeds, latency, or broken connections, we recommend using the AzCopyv10 tool for this file transfer.

Example: azcopy copy 'D:\repos\onnx\azureml\onnx_vivit' 'https://azureml27454141413.blob.core.windows.net/31ae886d-ee28-4a4f-af29-64989f2a9076-pty30yvn6u01ek8f2ipgusn14s/onnx_vivit' 

See https://docs.microsoft.com/azure/storage/common/storage-use-azcopy-v10 for more information.
Uploading onnx_vivit (348.31 MBs): 100%|#########################################################################################| 348305750/348305750 [09:37<00:00, 602982.95it/s]




<azure.core.polling._poller.LROPoller at 0x16ac0ba4b10>

.......................................................................................................................................

In [178]:
KEY = "Z9IODQreZDexj45xfwSyQkZPvpzCpHWM"
# Documentation: https://learn.microsoft.com/en-us/azure/machine-learning/how-to-deploy-online-endpoints
# Troubleshooting: https://learn.microsoft.com/en-us/azure/machine-learning/how-to-troubleshoot-online-endpoints


In [194]:
file_name = "./image/6540601-uhd_2560_1440_25fps.mp4"
with open(file_name, "rb") as f:
    video = f.read()
    f.close()
    im_b64 = base64.b64encode(video).decode("utf-8")

In [201]:
len(im_b64)

13705632

In [195]:
input_data = json.dumps({'data': im_b64})

In [181]:
import json
import numpy as np
import onnxruntime
from transformers import VivitImageProcessor
import sys
import os
from azureml.core.model import Model
import time
import json
import av
import base64


def init():
    global session, input_name, output_name, image_processor, label_dic
    model = "onnx_vivit/vivit.onnx"
    # Load the model in onnx runtime to start the session    
    session = onnxruntime.InferenceSession(model, providers=["CPUExecutionProvider"])
    input_name = session.get_inputs()[0].name
    output_name = session.get_outputs()[0].name 
    image_processor = VivitImageProcessor.from_pretrained("google/vivit-b-16x2-kinetics400")
    label_dic = {'ApplyEyeMakeup':0, 'ApplyLipstick':1, 'Archery':2, 'BabyCrawling':3, 'BalanceBeam':4, 'BandMarching':5, 
             'BaseballPitch':6, 'Basketball':7,'BasketballDunk':8, 'BenchPress':9}
    
def run(input_data):
    '''Purpose: evaluate test input in Azure Cloud using onnxruntime.
        We will call the run function later from our Jupyter Notebook 
        so our azure service can evaluate our model input in the cloud. '''

    try:
        # We expect a video im base64 format in the attribute data. Load the data and decode the base64
        requests_json = json.loads(input_data.encode())
        v = base64.b64decode(requests_json.get("data").encode("utf-8"))
        #create container from bytes request
        container= av.open(io.BytesIO(v))
        #Sample Frames
        indices = sample_frame_indices(clip_len=10, frame_sample_rate=1,seg_len=container.streams.video[0].frames)
        # create video with sampled frames
        video = read_video_pyav(container=container, indices=indices)
        # tokenize video
        data = np.array(image_processor(list(video), return_tensors="pt")['pixel_values'])
        #### INFERENCE ONNX #####
        # pass input data to do model inference with ONNX Runtime
        start = time.time()
        r = session.run([output_name], {input_name : data})
        end = time.time()
        probabilities, predicted_class = postprocess(r[0])
        # predicted class and label
        class_label = predicted_class[0]
        label = get_key(label_dic, class_label)
        
        result = label_map(probabilities)
        result['predicted_label'] = label
        result['predicted_calss'] = int(class_label)
        
        result_dict = {"result": result,
                      "time_in_sec": [end - start]}
    except Exception as e:
        result_dict = {"error": str(e)}
    
    return json.dumps(result_dict)



def label_map(probs, threshold=.5):
    """Take the most probable labels (output of postprocess) and returns the 
    probs of each label."""
    # labels and dictionary to 
    label_dic = {'ApplyEyeMakeup':0, 'ApplyLipstick':1, 'Archery':2, 'BabyCrawling':3, 'BalanceBeam':4, 'BandMarching':5, 
             'BaseballPitch':6, 'Basketball':7,'BasketballDunk':8, 'BenchPress':9}
    output_probs = {}
    image_preds = {}
    for prob, key in zip(probs[0], range(0, len(probs[0]))):
        label = get_key(label_dic, key)
        output_probs[label] = float(prob)

    image_preds["class_probabilities"] = output_probs
    return image_preds

def get_key(dict, value):
    """
    return key given a value. From a dictionary
    """
    for key, val in dict.items():
        if val == value:
            return key
    return "Value not found"

def postprocess(scores):
    """This function takes the scores generated by the network and 
    returns the class IDs in decreasing order of probability."""
    logits = np.array(scores)
    probabilities = np.exp(logits) / np.sum(np.exp(logits), axis=1, keepdims=True)
    predicted_class = np.argmax(probabilities, axis=1)
    
    return probabilities, predicted_class


def read_video_pyav(container, indices):
    '''
    Decode the video with PyAV decoder.
    Args:
        container (`av.container.input.InputContainer`): PyAV container.
        indices (`List[int]`): List of frame indices to decode.
    Returns:
        result (np.ndarray): np array of decoded frames of shape (num_frames, height, width, 3).
    '''
    frames = []
    container.seek(0)
    start_index = indices[0]
    end_index = indices[-1]
    for i, frame in enumerate(container.decode(video=0)):
        if i > end_index:
            break
        if i >= start_index and i in indices:
            reformatted_frame = frame.reformat(width=224,height=224)
            frames.append(reformatted_frame)
    new=np.stack([x.to_ndarray(format="rgb24") for x in frames])

    return new


def sample_frame_indices(clip_len, frame_sample_rate, seg_len):
    '''
    Sample a given number of frame indices from the video.
    Args:
        clip_len (`int`): Total number of frames to sample.
        frame_sample_rate (`int`): Sample every n-th frame.
        seg_len (`int`): Maximum allowed index of sample's last frame.
    Returns:
        indices (`List[int]`): List of sampled frame indices
    '''
    converted_len = int(clip_len * frame_sample_rate)
    end_idx = np.random.randint(converted_len, seg_len)
    start_idx = end_idx - converted_len
    indices = np.linspace(start_idx, end_idx, num=clip_len)
    indices = np.clip(indices, start_idx, end_idx - 1).astype(np.int64)
    return indices

In [193]:
init()

In [196]:
run(input_data)
# body.decode()

'{"result": {"class_probabilities": {"ApplyEyeMakeup": 0.004351954907178879, "ApplyLipstick": 0.003930356819182634, "Archery": 0.8199613094329834, "BabyCrawling": 0.003192294854670763, "BalanceBeam": 0.01228354312479496, "BandMarching": 0.048293791711330414, "BaseballPitch": 0.0017063587438315153, "Basketball": 0.00959216058254242, "BasketballDunk": 0.002194389933720231, "BenchPress": 0.09449388831853867}, "predicted_label": "Archery", "predicted_calss": 2}, "time_in_sec": [1.792248249053955]}'

In [184]:
import urllib.request
import json
import os
import ssl

def allowSelfSignedHttps(allowed):
    # bypass the server certificate verification on client side
    if allowed and not os.environ.get('PYTHONHTTPSVERIFY', '') and getattr(ssl, '_create_unverified_context', None):
        ssl._create_default_https_context = ssl._create_unverified_context

allowSelfSignedHttps(True) 

In [197]:
body = str.encode(input_data)
url = 'https://endpt-vivit-inference-onnx.uksouth.inference.ml.azure.com/score'
# Replace this with the primary/secondary key, AMLToken, or Microsoft Entra ID token for the endpoint
api_key = KEY

In [202]:
import time

In [205]:

if not api_key:
    raise Exception("A key should be provided to invoke the endpoint")


headers = {'Content-Type':'application/json', 'Authorization':('Bearer '+ api_key), 'azureml-model-deployment': 'vivit-onnx' }
start = time.time()
req = urllib.request.Request(url, body, headers)
try:
    response = urllib.request.urlopen(req)

    result = response.read()
    print(result)
except urllib.error.HTTPError as error:
    print("The request failed with status code: " + str(error.code))

    # Print the headers - they include the requert ID and the timestamp, which are useful for debugging the failure
    print(error.info())
    print(error.read().decode("utf8", 'ignore'))
end = time.time()

b'"{\\"result\\": {\\"class_probabilities\\": {\\"ApplyEyeMakeup\\": 0.0026463675312697887, \\"ApplyLipstick\\": 0.0013310174690559506, \\"Archery\\": 0.9078341722488403, \\"BabyCrawling\\": 0.001594557543285191, \\"BalanceBeam\\": 0.0021795297507196665, \\"BandMarching\\": 0.0377991758286953, \\"BaseballPitch\\": 0.0008338856277987361, \\"Basketball\\": 0.004699265118688345, \\"BasketballDunk\\": 0.001007644459605217, \\"BenchPress\\": 0.04007432982325554}, \\"predicted_label\\": \\"Archery\\", \\"predicted_calss\\": 2}, \\"time_in_sec\\": [1.5316495895385742]}"'


In [206]:
end - start

27.342873096466064

In [207]:
json.loads(result.decode())

'{"result": {"class_probabilities": {"ApplyEyeMakeup": 0.0026463675312697887, "ApplyLipstick": 0.0013310174690559506, "Archery": 0.9078341722488403, "BabyCrawling": 0.001594557543285191, "BalanceBeam": 0.0021795297507196665, "BandMarching": 0.0377991758286953, "BaseballPitch": 0.0008338856277987361, "Basketball": 0.004699265118688345, "BasketballDunk": 0.001007644459605217, "BenchPress": 0.04007432982325554}, "predicted_label": "Archery", "predicted_calss": 2}, "time_in_sec": [1.5316495895385742]}'

##### 9.5: Delete the online endpoint
Don't forget to delete the online endpoint, else you will leave the billing meter running for the compute used by the endpoint

In [147]:
ml_client.online_endpoints.begin_delete(name=endpoint_name).wait()

........................................................................