# 3. RoseTTAFold on Azure ML - Batch Endpoint

## Introduction

Azure is collaborating with the Baker Lab to expose their RoseTTAFold model as a service. This document describes how to get started exploring RoseTTAFold on Azure Machine Learning (Azure ML).

**This is notebook #2 of 3**. In this notebook, we'll create a Batch Endpoint so that this can be called from the Azure CLI or as a REST call..

In *first* notebook, [1-setup-workspace.ipynb](1-setup-workspace.ipynb), we ran some one-time setup steps to prepare our Azure ML Workspace with the dependency Datasets and a Compute Cluster.

In the *second* notebook, [2-run-experiment.ipynb](2-run-experiment.ipynb) we specified some amino acid sequence data and run a RoseTTAFold job in your Azure Machine Learning workspace.

**Note.** This RoseTTAFold endpoint is not designed to run in production environments, and is strictly for non-production test environments.

In [None]:
import os
from azureml.core import Workspace

try:
    ws = Workspace.from_config()
    print(ws.name, ws.location, ws.resource_group, ws.location, sep='\t')

    # Set environment variables
    os.environ['AZUREML_SUBSCRIPTION_ID'] = ws.subscription_id
    os.environ['AZUREML_RESOURCE_GROUP'] = ws.resource_group
    os.environ['AZUREML_WORKSPACE_NAME'] = ws.name

    print('Azure ML workspace loaded')
except:
    print('Azure ML workspace not found')

In [None]:

from azureml.core import Environment

env = Environment(name="rosettaenv")
env.docker.base_image = None
env.docker.base_dockerfile = "./Dockerfile"
env.register(workspace=ws)

In [None]:
from azureml.core.model import Model

model = Model.register(model_name="placeholdermodel",
                       model_path="placeholdermodel",
                       workspace=ws)

## Pre-requisites

In [None]:
! az version

In [None]:
# Remove old versions of the Azure ML extension 
# (Note: if these are not installed, this will print an error that you can ignore)
! az extension remove -n azure-cli-ml
! az extension remove -n ml

# Install the latest Azure ML extension
! az extension add -n ml -y

Set your active Azure subscription in the Azure CLI

In [None]:
! az login
! az configure --defaults group=%AZUREML_RESOURCE_GROUP% workspace=%AZUREML_WORKSPACE_NAME%
! az account set --subscription %AZUREML_SUBSCRIPTION_ID%

### Create the Endpoint

In [None]:
# OPTIONAL: Delete a previous endpoint with the same name, if one exists
! az ml batch-endpoint delete --name rosetta-endpoint  --yes

In [None]:
! az ml batch-endpoint create --name rosettafold-endpoint
! az ml batch-deployment create --file batch-deployment.yml --set-default

# Usage

The endpoint is designed to process input files containing protein sequences of the form:

```
>T1078 Tsp1, Trichoderma virens, 138 residues|
MAAPTPADKSMMAAVPEWTITNLKRVCNAGNTSCTWTFGVDTHLATATSCTYVVKANANASQASGGPVTCGPYTITSSWSGQFGPNNGFTTFAVTDFSKKLIVWPAYTDVQVQAGKVVSPNQSYAPANLPLEHHHHHH
```

**Note.** Each input should be provided in its own input file. The input file name will be used to identify the corresponding output, so use unique input names.


#### Call endpoint with single local file

```
az ml endpoint invoke --name rosettafold --type batch –input-local-path <local/path/to>/input.fa
```

In [None]:
! az ml batch-endpoint invoke --name rosettafold-endpoint --input-local-path ./inputs/my-sequence.fa


#### Call endpoint with single remote file

Given a single input file stored in Azure Blob Storage, grab the corresponding URL: `https://<storage-account-name>.blob.core.windows.net/<storage-container>/<path/on/container>/input.fa`, invoke the endpoint with:

```
az ml endpoint invoke --name rosettafold --type batch --input-path https://<storage-account-name>.blob.core.windows.net/<storage-container>/<path/on/container>/input.fa
```

In [None]:
! az ml batch-endpoint invoke --name rosettafold-endpoint --input-path file:https://amsaiedws3295876841.blob.core.windows.net/azureml-blobstore-febe82a7-da37-4f81-85d5-48c8a0082e47/rosetta/input_samples/inputs.fa

#### Call the endpoint with multiple files

Run this exactly as above, only now pointing to a directory containing multiple files e.g.

```
az ml endpoint invoke --name rosettafold --type batch --input-path https://<storage-account-name>.blob.core.windows.net/<storage-container>/<path/on/container>/
```

In [None]:
! az ml batch-endpoint invoke --name rosettafold-endpoint --input-path folder:https://<storage-account-name>.blob.core.windows.net/<storage-container>/<path/on/container>/


#### Reading output

When the inferencing job is complete, output files will be uploaded to the workspace default Azure Blob Storage account with the following location:

```
https://<default-storage-account>.blob.core.windows.net/<default-container>/azureml/<run-id>/score/<input-filename>/t000.e2e.pdb
```

### Configuring Parallelism

When setting up the endpoint we configured the instance count, and the minibatch size parameters. These control the how the inference jobs will scale up.

- Instance count: the maximum number of nodes (VMs) that will spin up.
- Minibatch size: the maximum number of examples that will be processed at a time per-node.
A minibatch is sent to each instance, where it will be processed sequentially. Once that node completes its minibatch it will be sent another (assuming there are any remaining inputs to be processed).

### REST Endpoint

Batch endpoints can also be invoked via a REST endpoint as follows. Here is an example.

1.	Get batch endpoint scoring uri:
```
scoring_uri=$(az ml endpoint show --name rosettafold --type batch --query scoring_uri -o tsv)
```

In [None]:
# 1. Get batch endpoint scoring uri:
! scoring_uri=$(az ml endpoint show --name rosettafold --type batch --query scoring_uri -o tsv)

2.	Get authentication token:
```
auth_token=$(az account get-access-token --resource https://ml.azure.com --query accessToken -o tsv)
```

In [None]:
# 2. Get authentication token:
! auth_token=$(az account get-access-token --resource https://ml.azure.com --query accessToken -o tsv)

3.	Kick off inferencing job via CURL:

```
curl --location --request POST "$scoring_uri" --header "Authorization: Bearer $auth_token" --header "Content-Type: application/json" --data-raw "{'properties': {'dataset': {'dataInputType': 'DataUrl', 'Path': 'https://amsaiedws3295876841.blob.core.windows.net/azureml-blobstore-febe82a7-da37-4f81-85d5-48c8a0082e47/rosetta/input_samples/inputs.fa'}}}"
```

In [None]:

# 3. Kick off inferencing job via CURL:
! curl --location --request POST "$scoring_uri" --header "Authorization: Bearer $auth_token" --header "Content-Type: application/json" --data-raw "{'properties': {'dataset': {'dataInputType': 'DataUrl', 'Path': 'https://amsaiedws3295876841.blob.core.windows.net/azureml-blobstore-febe82a7-da37-4f81-85d5-48c8a0082e47/rosetta/input_samples/inputs.fa'}}}"
