# 2. RoseTTAFold on Azure ML - Run Experiment

## Introduction

Azure is collaborating with the Baker Lab to expose their RoseTTAFold model as a service. This document describes how to get started exploring RoseTTAFold on Azure Machine Learning (Azure ML).

**This is notebook #2 of 3**. In this notebook, we with run RosettaFold as an Experiment on Azure ML. We're create an input file, submit a Run, check status, and get the results.

In *previous* notebook, [1-setup-workspace.ipynb](1-setup-workspace.ipynb), we ran some one-time setup steps to prepare our Azure ML Workspace with the dependency Datasets and a Compute Cluster.

In the *next* notebook, [3-batch-endpoint.ipynb](3-batch-endpoint.ipynb), we'll create a Batch Endpoint so that this can be called from the Azure CLI or as a REST call. 

**Note.** These RoseTTAFold endpoints are not designed to run in production environments, and is strictly for non-production test environments.

## Setup

### Load Workspace Config
You must first download the config.json file from your Azure ML workspace (see previous steps in "Create a workspace"). This file should be saved locally, in the same directory as this notebook.

Like in the first notebook, run the following block to load your workspace config and ensure it succeeds before proceeding. Watch the output feed, as it may require you to launch your browser and sign in.

In [None]:
from azureml.core import Workspace

try:
    ws = Workspace.from_config()
    print(ws.name, ws.location, ws.resource_group, ws.location, sep='\t')
    print('Azure ML workspace loaded')
except:
    print('Azure ML workspace not found')

### Get the Azure ML Compute Cluster Reference
In the first notebook, we created a compute cluster within our Azure Machine Learning workspace. Now, we'll get a referece to it using the `compute_name`.

In [None]:
# Specify the name for your compute cluster
compute_name = 'gpu-cluster'

In [None]:
from azureml.core.compute import ComputeTarget, AmlCompute
from azureml.core.compute_target import ComputeTargetException

# Check if a compute cluster with this name already exists. If so, use it. 
try:
    compute_target = ComputeTarget(workspace=ws, name=compute_name)
    print('Found an existing cluster with this name, use it!')
except ComputeTargetException:
    print('Could not find compute target. Please follow the setup instructions in notebook 1, and specify the correct compute_name in the box above')

### Specify input data
The RoseTTAFold job requires inputs in [FASTA format](https://en.wikipedia.org/wiki/FASTA_format). Each input should be in its own file.

Here are the contents of a sample input file:
```
>T1078 Tsp1, Trichoderma virens, 138 residues|
MAAPTPADKSMMAAVPEWTITNLKRVCNAGNTSCTWTFGVDTHLATATSCTYVVKANANASQASGGPVTCGPYTITSSWSGQFGPNNGFTTFAVTDFSKKLIVWPAYTDVQVQAGKVVSPNQSYAPANLPLEHHHHHH
```

**Notes:** The leading `>` indicates the beginning of a record, and **is required**. The remaining record data on the first line is useful for labelling sequence and is part of the FASTA standard, but it is not used by the RoseTTAFold algorithm.
The amino acid sequence begins on the second line with no leading characters, and should be continuous (no spaces, no line breaks). This information is the actual input for the RoseTTAFold algoritm.

Specify your sequence data below as shown, then *run* the cell. This will overwrite the contents of the file `inputs/my-sequence.fa`.

Now upload the `inputs/my-sequence.fa` file to your Azure ML Datastore. This will create a input Dataset, which is a reference to a subdirectory within the workspace's default Datastore.

In [None]:
from azureml.core import Dataset
from azureml.data.datapath import DataPath

ds = ws.get_default_datastore()
input_dataset = Dataset.File.upload_directory(
    src_dir='./inputs/', 
    target=DataPath(ds, 'input_data'), 
    show_progress=True, 
    overwrite=True)

### Submit the Experiment
- Define the `Environment`, and specify the dockerfile (included in the same folder as this notebook) to be used. 
- Specify inputs and outputs
- Create `ScriptRunConfig` that includes all of the args and configuration needed
- Wrap it up in an `Experiment` and submit to your Azure Machine Learning workspace.

This will queue a job in Azure Machine Learning, which **is a billable activity**.

A link to this job's status web page will be printed below the cell.

In [None]:
from azureml.core import Environment, ScriptRunConfig
from azureml.core.runconfig import DockerConfiguration
from azureml.core import Experiment
from azureml.data import OutputFileDatasetConfig

env = Environment(name="rosettaenv")
env.docker.base_image = None
env.docker.base_dockerfile = "./Dockerfile"

outputs_path = 'outputs/'

args = ['--inputs', input_dataset.as_mount(),
        '--outputs', outputs_path]

run_config = ScriptRunConfig(source_directory='.',
                            script="score.py",
                            arguments=args,
                            compute_target=compute_target,
                            docker_runtime_config=DockerConfiguration(use_docker=True),
                            environment=env)

run = Experiment(ws, 'RoseTTAFold-Scoring').submit(run_config)
run

In [None]:
run.wait_for_completion(show_output=True)

### Get the outputs

In [None]:
run.download_files(prefix=outputs_path)