# Azure ML Python SDK Master
This notebook was compiled from several Microsoft tutorials, with additions from the online documentation and ==>**a good blog post [here](https://medium.com/@santiagof/the-holy-bible-of-azure-machine-learning-service-a-work-through-for-the-believer-part-1-4fe8f9853492)** by a Microsoft developer named Facundo Santiago that gives a nice overview of the AMLS, from which I've also cribbed some bits.

**Microsoft's Machine Learning Services can be accesssed (driven) in many different ways**, including:
- __The Machine Learning CLI__ is an extension to the Azure CLI. It provides commands for working with the Azure Machine Learning service.
- __The Azure Machine Learning SDK for Python__ is a Python package that provides programmatic access to the Azure Machine Learning service.
- __The Azure Machine Learning SDK for R__ is an R package that provides programmatic access to the Azure Machine Learning service.
- __The Azure Machine Learning Studio__ is a bit like RStudio (i.e., strives to be an IDE with a nice graphical interface).  The documentation says it's currently free for trial but the premium version may cost $10/month plus $1/hour in the future.
- __Visual Studio Code__ is an IDE by Microsoft which has a lot of very nice features for programming; there is an extension for MLS.

In addition, some computing environments have their own connectors (I think), such as:
- __Spark on Databricks__.  A platform for big data, based on Spark.
- __Spark on HDInsight__. A cloud distribution of Hadoop components, for processing massive amounts of data using open-source frameworks such as Hadoop, Spark, Hive, LLAP, Kafka, Storm, and R.
- __Azure Batch AI__.  A service for running large batch programs.

The Python SDK seems to be more capable (as of Jun 2020) than the CLI and less expensive and more flexible than Studio, so **this notebook focuses on the Python SDK**.

## Prerequisites
* If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, go through the [Configuration](../../../configuration.ipynb) notebook to install the Azure Machine Learning Python SDK and create an Azure ML `Workspace`.  See https://docs.microsoft.com/en-us/azure/machine-learning/how-to-configure-environment for configuring your local machine (in addition to a DSVM and other types).

In [2]:
!pwd

/Users/johntwo/Documents/github/jcpayne/testing_aml_workflow


In [1]:
# Check core SDK version number
import azureml.core

print("SDK version:", azureml.core.VERSION)

SDK version: 1.6.0


In [4]:
from pathlib import Path

## Diagnostics
Opt-in diagnostics for better experience, quality, and security of future releases.

## Create a workspace
A workspace is a logical container for your projects.  You can have many workspaces, and multiple experiments running simultaneously within a workspace. When you create a new workspace, it automatically creates several Azure resources that are used by the workspace:
- Azure **Container Registry**: Registers docker containers that you use during training and when you deploy a model. To minimize costs, ACR is lazy-loaded until deployment images are created.
- Azure **Storage account**: Is used as the default datastore for the workspace.  By default, both a blob storage and a file share are created (see below).
- Azure Application Insights: Stores **monitoring information** about your models.
- Azure **Key Vault**: Stores secrets that are used by compute targets and other sensitive information that's needed by the workspace.

Best practices for naming:  
https://docs.microsoft.com/en-us/azure/cloud-adoption-framework/ready/azure-best-practices/naming-and-tagging

### Create a workspace configuration file (once)
This code writes a configuration file called `config.json` to `.azureml/config.json`

In [6]:
configpath.exists()

True

In [None]:
from azureml.core import Workspace

subscription_id = '6c7f51dc-3592-4952-bf04-f7b2d747bfca'
resource_group  = 'bluedot'
workspace_name  = 'jp_workspace'

configfile = Path.cwd()/'.azureml/config.json'

try:
    ws = Workspace(subscription_id = subscription_id, resource_group = resource_group, workspace_name = workspace_name)
    if configfile.exists():
        print('Existing configuration will be overwritten.')
    ws.write_config()
    print('Library configuration succeeded')
except:
    print('Workspace not found')

### Initialize workspace
Initialize a [Workspace](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#workspace) object from the existing workspace you created in the Prerequisites step.

`Workspace.from_config()` creates a workspace object from the details stored in `config.json`.

In [7]:
from azureml.core.workspace import Workspace

ws = Workspace.from_config()
print('Workspace name: ' + ws.name, 
      'Azure region: ' + ws.location, 
      'Subscription id: ' + ws.subscription_id, 
      'Resource group: ' + ws.resource_group, sep='\n')

Workspace name: jp_workspace
Azure region: westus2
Subscription id: 6c7f51dc-3592-4952-bf04-f7b2d747bfca
Resource group: bluedot


## Create or Attach a compute target
Excellent documentation: https://docs.microsoft.com/en-us/azure/machine-learning/how-to-set-up-training-targets.  You can use several different kinds of targets, including:
1. **Your local machine**.  The simplest; useful for testing.
2. **A managed AML compute cluster** (additional charges for Linux machines).  A cluster can be one or more machines.  This is a slightly more expensive option (I think) that takes some of the hassle out of the next option.  There are two types: _Run-based_ (hardware is deallocated at the end of the run) and _Persisted_, which is ultimately more flexible and useful, since it can scale up and down as needed (you can set the minimum number of nodes to zero). 
3. **An attached VM**.  ***Only Ubuntu machines are supported***.  This is a "bring your own" Azure cloud compute, which could also be a DataBricks cluster, an HDInsight cluster or a Docker VM.  You can use a pre-existing VM (in which case you simply attach to it) or you can create and provision a new VM, then attach it.  Microsoft's DSVMs (Azure Data Science Virtual Machine) are configured well by default.  You can also [create pools of shared DSVMs](https://docs.microsoft.com/en-us/azure/machine-learning/data-science-virtual-machine/dsvm-pools) that can be [automatically scaled](https://docs.microsoft.com/en-us/azure/virtual-machine-scale-sets/virtual-machine-scale-sets-autoscale-overview) to add VMs when the load increases and/or on a schedule; this is basically a version of what Option 2 does.  Also [Azure virtual machine scale sets](https://docs.microsoft.com/en-us/azure/virtual-machine-scale-sets/overview) let you create and manage a group of identical, load balanced VMs, and you can provide your own machine image (you can also create vertically-scaled VM scale sets, which change from weaker to more powerful machines).  I'm not sure about how easy it would be to share a job across them, though. 

**From Larry O'Brien (author of a lot of the AML docs):**  
To train using more-powerful machines, the easy thing to do is use either
- Compute _Cluster_ (spun up and down by the training script)
- Compute _Instance_ (started and stopped manually)
 
Compute Clusters are generally more appropriate as they’re likely to save you money. Compute Instances are really more about development and iteration than standalone training. For instance, if you wanted to run Jupyter remotely on a powerful machine, you’d choose a Compute Instance. Because they’re manually started and stopped, Compute Instances may be a little more responsive if you are rapidly iterating but Compute Clusters are a little safer in terms of money (they’ll timeout and spin down). You set these choices as the `ComputeTarget` for your training.

**WARNING: "Azure Machine Learning only supports virtual machines that run Ubuntu.** When you create a VM or choose an existing VM, you must select a VM that uses Ubuntu.  Azure Machine Learning also requires the virtual machine to have a public IP address."

### Alternative 1: Attach your own local machine as the compute target

In [8]:
from azureml.core.runconfig import RunConfiguration

# Edit a run configuration property on the fly.
run_local = RunConfiguration()

run_local.environment.python.user_managed_dependencies = True

### Alternative 2. Attach a managed AML Compute cluster as the compute target
This alternative does a little hand-holding to make it simple to set up and scale resources.  Santiago says _Even though Persistent Compute Targets can be created via Python, you will typically provision the target with the Azure Portal because it allows you to better manage your costs and permissions._

Regarding environments, you can use a system-built conda environment, an existing Python environment, or a Docker container.  Your VM or docker image _must_ have conda installed.  When you execute by using a Docker container, you need to have Docker Engine running on the VM (obviously). 

**Creation of AmlCompute takes approximately 5 minutes.** If an AmlCompute with the given name is already in your workspace, this code will skip the creation process.

Microsoft limits AmlCompute resources (see [this article](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-manage-quotas) on the default limits (which are ZERO GPUs for the NC machines!)).  
To request a higher quota:  
https://ms.portal.azure.com/#blade/Microsoft_Azure_Support/HelpAndSupportBlade/newsupportrequest/

Note: this will check to see if a cluster is already running before creating one.  If it finds a cluster of the same name, it will use it instead of creating a new cluster.

Also, if you go to the Compute tab in Studio, you can edit the properties such as the minimum nodes (if 0, the cluster will power down when finished; if 1 it will stay alive).


In [None]:
#Get a list of supported VM sizes in your region (you'll need the name for the cluster)
AmlCompute.supported_vmsizes(ws, location='westus2')

In [None]:
from azureml.core.compute import ComputeTarget, AmlCompute
from azureml.core.compute_target import ComputeTargetException

# choose a name for your cluster
cluster_name = "gpu-cluster"

try:
    mycompute_target = ComputeTarget(workspace=ws, name=cluster_name)
    print('Found existing compute target.')
except ComputeTargetException:
    print('Creating a new compute target...')
    #max_nodes is the number of VMs you are creating
    compute_config = AmlCompute.provisioning_configuration(
        vm_size='STANDARD_NC6', 
        min_nodes = 1, #SET TO 0 FOR IT TO AUTO-STOP, OR TO 1 to keep alive
        max_nodes=1) 

    # create the cluster
    mycompute_target = ComputeTarget.create(ws, cluster_name, compute_config)
    
    #In the estimator, you'll specify
    estimator = Estimator(...compute_target=my_compute_target,...)

    mycompute_target.wait_for_completion(show_output=True)

# use get_status() to get a detailed status for the current cluster. 
print(mycompute_target.get_status().serialize())

The above code creates a GPU cluster. If you instead want to create a CPU cluster, provide a different VM size to the `vm_size` parameter, such as `STANDARD_D2_V2`.

### Alternative 3: Attach a cloud VM
Attach: To attach an existing virtual machine as a compute target, you must provide the resource ID, user name, and password for the virtual machine. The resource ID of the VM can be constructed using the subscription ID, resource group name, and VM name using the following string format: 
resource_id = `/subscriptions/<subscription_id>/resourceGroups/<resource_group>/providers/Microsoft.Compute/virtualMachines/<vm_name>` 

In [None]:
from azureml.core.compute import RemoteCompute, ComputeTarget

# Create the compute config 
compute_target_name = "spearhead-DSVM"
#Create the resource_id
resource_id = /subscriptions/6c7f51dc-3592-4952-bf04-f7b2d747bfca/resourceGroups/bluedot/providers/Microsoft.Compute/virtualMachines/Spearhead
attach_config = RemoteCompute.attach_configuration(resource_id=resource_id,
    ssh_port=22,
    username='egdod',
    password=None,
    private_key_file="/Users/johntwo/.ssh/azure_rsa")
    #private_key_passphrase="<passphrase>")

# For password access instead of ssh
# attach_config = RemoteCompute.attach_configuration(resource_id='<resource_id>',
#                                                 ssh_port=22,
#                                                 username='<username>',
#                                                 password="<password>")

# Attach the compute
compute = ComputeTarget.attach(ws, compute_target_name, attach_config)

compute.wait_for_completion(show_output=True)

### Finish creating a run configuration
Create a run configuration for the DSVM compute target. Docker and conda are used to create and configure the training environment on the DSVM.

In [None]:
import azureml.core
from azureml.core.runconfig import RunConfiguration
from azureml.core.conda_dependencies import CondaDependencies

run_dsvm = RunConfiguration(framework = "python")

# Set the compute target to the Linux DSVM
run_dsvm.target = compute_target_name 

# Use Docker in the remote VM
run_dsvm.environment.docker.enabled = True

# Use the CPU base image 
# To use GPU in DSVM, you must also use the GPU base Docker image "azureml.core.runconfig.DEFAULT_GPU_IMAGE"
run_dsvm.environment.docker.base_image = azureml.core.runconfig.DEFAULT_GPU_IMAGE
#run_dsvm.environment.docker.base_image = azureml.core.runconfig.DEFAULT_CPU_IMAGE
print('Base Docker image is:', run_dsvm.environment.docker.base_image)

# Specify the CondaDependencies object
run_dsvm.environment.python.conda_dependencies = CondaDependencies.create(conda_packages=['scikit-learn'])

## Add data
### Overview of data workflow in Azure Machine Learning
There is good documentation [here](https://docs.microsoft.com/en-us/azure/machine-learning/concept-data#data-workflow). Generally, the process is:
1. **Keep data in some type of data storage _service_** (can be Azure blob, Azure fileshare, PostgreSQL database, external URL, etc.)
2. **Register the storage service as an Azure datastore**, which is basically a small file with connection details
3. **Create an Azure dataset** from the datastore, which is just packaging to make that datastore consumable by your models.

Azure recognizes two types of datasets (which are what they sound like):
1. TabularDataset
2. FileDataset

**Memory limitations**  
AML warns that you should have at least twice as much RAM as the amount of data you need to load in memory, and that a CSV file can expand 10X when loaded into RAM and compressed files can expand even further.  
- pandas will only use 1 vCPU regardless of how many are available.
- You can parallelize pandas by changing `import pandas as pd` to `import modin.pandas as pd`
- Otherwise, use Spark, which can run on a cluster if needed (the Python package [Vaex](https://towardsdatascience.com/how-to-process-a-dataframe-with-billions-of-rows-in-seconds-c8212580f447) might also be worth looking at.)

**Downloading vs. mounting**  
_"When you mount a dataset, you attach the files referenced by the dataset to a directory (mount point) and make it available on the compute target. When you download a dataset, all the files referenced by the dataset will be downloaded to the compute target.  If your script processes all files referenced by the dataset, and your compute disk can fit your full dataset, downloading is recommended to avoid the overhead of streaming data from storage services. If your data size exceeds the compute disk size, downloading is not possible and we recommend mounting, since only the data files used by your script are loaded at the time of processing._"

Azure ML includes routines and capabilities for:
- **[Versioning data](https://docs.microsoft.com/en-us/azure/machine-learning/how-to-version-track-datasets)** in experiments and pipelines.  Note: "_If you want to make sure that each dataset version is reproducible, we recommend that you not modify data content referenced by the dataset version. When new data comes in, save new data files into a separate data folder and then create a new dataset version to include data from that new folder._"
- **[Labeling images](https://docs.microsoft.com/en-us/azure/machine-learning/how-to-create-labeling-projects)**
- **[Detecting "data drift"](https://docs.microsoft.com/en-us/azure/machine-learning/how-to-monitor-datasets)**, which might be caused by changing or broken sensors, natural drift like seasonal changes, or change in relationships between covariates.
- **Using Azure Open Datasets**, which are "_curated public datasets that include public-domain data for weather, census, holidays, public safety, and location that help you train machine learning models and enrich predictive solutions._" 


#### Automatically-created datastores
When you create a workspace, an Azure blob container and an Azure file share are automatically registered to the workspace. They're named `workspaceblobstore` and `workspacefilestore`, respectively. `workspaceblobstore` is used to store workspace artifacts and your machine learning experiment logs. `workspacefilestore` is used to store notebooks and R scripts authorized via compute instance. The `workspaceblobstore` container is set as the default datastore.

### DataStores 
A [very useful FAQ](https://docs.microsoft.com/en-us/azure/storage/files/storage-files-faq) that compares the different types of datastore. 

Azure Files and Azure Blob storage both offer ways to store large amounts of data in the cloud, but they are useful for slightly different purposes.
- Azure Blob storage is useful for massive-scale, cloud-native applications that need to store unstructured data. To maximize performance and scale, Azure Blob storage is a simpler storage abstraction than a true file system. __You can access Azure Blob storage only through REST-based client libraries (or directly through the REST-based protocol)__.
- Azure Files is specifically a file system. Azure Files has all the file abstracts that you know and love from years of working with on-premises operating systems. Like Azure Blob storage, Azure Files offers a REST interface and REST-based client libraries. Unlike Azure Blob storage, Azure Files offers SMB access to Azure file shares. By using SMB, __you can mount an Azure file share directly on Windows, Linux, or macOS, either on-premises or in cloud VMs, without writing any code or attaching any special drivers to the file system__. You also can cache Azure file shares on on-premises file servers by using Azure File Sync for quick access, close to where the data is used.
- Azure Blob storage has higher throughput speeds than an Azure file share and will scale to large numbers of jobs started in parallel. For this reason, we recommend configuring your runs to use Blob storage for transferring source code files. (from https://docs.microsoft.com/en-us/azure/machine-learning/how-to-access-data)
- [BlockBlobStorage](https://docs.microsoft.com/en-us/azure/storage/blobs/storage-blob-create-account-block-blob?tabs=azure-portal) is a specialized form of Blob storage with extra-fast access.

#### Automatically-created datastores
When you create a workspace, an Azure blob container and an Azure file share are automatically registered to the workspace. They're named `workspaceblobstore` and `workspacefilestore`, respectively. `workspaceblobstore` is used to store workspace artifacts and your machine learning experiment logs. `workspacefilestore` is used to store notebooks and R scripts authorized via compute instance. The `workspaceblobstore` container is set as the default datastore.

## Writing data
See the long discussion I started in Teams about this -- the bottom line is that you should almost always dump results to the `./outputs` folder, which will then get written to storage.  It's unclear to me whether you have to worry about running out of storage if you're generating lots of big files (e.g., making full-size images with a GAN or UNet).  And what exactly _is_ the `./outputs` directory?  

Answer from Engineering: `./outputs` "is usually a blob store mounted with blob fuse, but there is a local disk cache with upload-on-close".  i.e., it's complicated.  



#### Default names (URI) of storage accounts
For example, if your general-purpose storage account is named mystorageaccount, then the default endpoints for that account are:  
- Blob storage: https://*mystorageaccount*.blob.core.windows.net
- Table storage: https://*mystorageaccount*.table.core.windows.net
- Queue storage: https://*mystorageaccount*.queue.core.windows.net
- Azure Files: https://*mystorageaccount*.file.core.windows.net

Each resource has a corresponding base URI, which refers to the resource itself.
- For the storage account, the base URI includes the name of the account only:  
    https://myaccount.blob.core.windows.net
- For a container, the base URI includes the name of the account and the name of the container:  
    https://myaccount.blob.core.windows.net/mycontainer
- For a blob, the base URI includes the name of the account, the name of the container, and the name of the blob:  
    https://myaccount.blob.core.windows.net/mycontainer/myblob

#### Secure Access Signature
SAS:
https://docs.microsoft.com/en-us/azure/storage/common/storage-sas-overview (scroll down to Best Practices)

In [5]:
#List all datastores registered in the current workspace:
datastores = ws.datastores
for name, datastore in datastores.items():
    print(name, datastore.datastore_type)

workspacefilestore AzureFile
workspaceblobstore AzureBlob


In [None]:
#Get the default datastore
datastore = ws.get_default_datastore()

#Change the default datastore
ws.set_default_datastore(new_default_datastore)

In [None]:
#Upload data (target_path is root by default)
datastore.upload(src_dir='your source file or directory',
                 target_path=None,
                 overwrite=True,
                 show_progress=True)

datastore.upload_files() #multiple files

#Download.  NOTE: target_path is the local path; prefix is the remote path (should be called 'sourcepath')
datastore.download(target_path='your target path',
                   prefix='your prefix',
                   show_progress=True)

### Create and register Datasets
See [details](https://docs.microsoft.com/en-us/azure/machine-learning/how-to-create-register-datasets)

A single "Dataset" can be created from multiple sources.  Generally, the process is:
1. Create a list of tuples of the form `(datastore, path)` that specifies the sources
2. Use that list to define a dataset.  
NOTE: you can use wildcards in the pathnames

In [None]:
from azureml.core import Workspace, Datastore, Dataset

datastore_name = 'your datastore name'jpworkspace3957796892

# get existing workspace
workspace = Workspace.from_config()
    
# retrieve an existing datastore in the workspace by name
datastore = Datastore.get(workspace, datastore_name)

# create a TabularDataset from 3 file paths in datastore (note multiple sources: one dataset!)
datastore_paths = [(datastore, 'weather/2018/11.csv'),
                   (datastore, 'weather/2018/12.csv'),
                   (datastore, 'weather/2019/*.csv')] #note wildcard
weather_ds = Dataset.Tabular.from_delimited_files(path=datastore_paths)

# Create a FileDataset from a single source (root directory)
datastore_paths = [(datastore, 'TA25')]
animal_ds = Dataset.File.from_files(path=datastore_paths)

#Create a FileDataset in one step, using wildcards
dataset = Dataset.File.from_files('https://dprepdata.blob.core.windows.net/demo/green-small/*.csv')

# Create a FileDataset from public image and label files (i.e., data at public web urls)
web_paths = ['https://azureopendatastorage.blob.core.windows.net/mnist/train-images-idx3-ubyte.gz',
             'https://azureopendatastorage.blob.core.windows.net/mnist/train-labels-idx1-ubyte.gz']
mnist_ds = Dataset.File.from_files(path=web_paths)

In [None]:
#Register a dataset
titanic_ds = titanic_ds.register(workspace=workspace,
                                 name='titanic_ds',
                                 description='titanic training data')
# Register a new version of titanic_ds
titanic_ds = titanic_ds.register(workspace = workspace,
                                 name = 'titanic_ds',
                                 description = 'new titanic training data',
                                 create_new_version = True)

#Get info about the registration to a workspace 
image_ds.get_all(ws)

### Mount or download
Note: For mounting a remote file system, Linux uses Blobfuse.  For OS X there is Fuse (https://osxfuse.github.io)

In [None]:
# MOUNT ("mounted_path" is the mount point on the target machine)
import os
import tempfile
mounted_path = tempfile.mkdtemp() #mkdtemp "creates a temporary directory in the most secure manner possible."

# mount dataset onto the mounted_path of a Linux-based compute
mount_context = dataset.mount(mounted_path)
mount_context.start()

print(os.listdir(mounted_path))
print (mounted_path)

# DOWNLOAD
from azureml.opendatasets import MNIST

data_folder = os.path.join(os.getcwd(), 'data')
os.makedirs(data_folder, exist_ok=True)

mnist_file_dataset = MNIST.get_file_dataset()
mnist_file_dataset.download(data_folder, overwrite=True)

mnist_file_dataset = mnist_file_dataset.register(workspace=ws,
                                                 name='mnist_opendataset',
                                                 description='training and test dataset',
                                                 create_new_version=True)

In [None]:
#Other non-working ways of connecting to a datastore
# ds = Datastore.get(ws, datastore_name='workspaceblobstore')
# ds_mounted = ds.as_mount()
# dp = DataPath.create_from_data_reference(ds_mounted)
# rootdir = dp.path_on_datastore()
# print('dp',dp)
# print('rootdir',rootdir)

# with ds.as_mount() as mount_context:
#        # list top level mounted files and folders in the dataset
#        os.listdir(mount_context.mount_point)

### Access data inside a script

In [None]:
# pass dataset object as an input with name 'titanic'
inputs=[titanic_ds.as_named_input('titanic')],




In [None]:
import os
#Create a folder for scripts
script_folder = os.path.join(os.getcwd(), "sklearn-mnist")
os.makedirs(script_folder, exist_ok=True)

#This line is like nbdev; it writes the contents of the CELL (not the whole notebook) into a useable file called train.py.
%%writefile $script_folder/train.py

from azureml.core import Dataset, Run

run = Run.get_context()
workspace = run.experiment.workspace

dataset_name = 'titanic_ds'

# Get a dataset by name
titanic_ds = Dataset.get_by_name(workspace=workspace, name=dataset_name)

# Load a TabularDataset into pandas DataFrame
df = titanic_ds.to_pandas_dataframe()

## Train model on the remote compute
SEE VERY USEFUL notebook [here](https://github.com/Azure/MachineLearningNotebooks/blob/master/tutorials/image-classification-mnist-data/img-classification-part1-training.ipynb)  
Now that you have your data and training script prepared, you are ready to train on your remote compute cluster. You can take advantage of Azure compute to leverage GPUs to cut down your training time. 

### Create a project directory
Create a directory on your local machine that will contain all the necessary code from your local machine that you will need access to on the remote resource. This includes the training script and any additional files your training script depends on.

In [None]:
import os

project_folder = './pytorch-birds'
os.makedirs(project_folder, exist_ok=True)

### Download training data
The dataset we will use (located on a public blob [here](https://azureopendatastorage.blob.core.windows.net/testpublic/temp/fowl_data.zip) as a zip file) consists of about 120 training images each for turkeys and chickens, with 100 validation images for each class. The images are a subset of the [Open Images v5 Dataset](https://storage.googleapis.com/openimages/web/index.html). We will download and extract the dataset as part of our training script `pytorch_train.py`

# PREPARE SCRIPT
See https://docs.microsoft.com/en-us/azure/machine-learning/tutorial-convert-ml-experiment-to-production?view=azure-ml-py.  It goes through the steps:
- Remove non-essential code
- Refactor all code into functions
- Set up a main() function
- Convert Jupyter notebooks to .py files
- Combine related functions in .py files
- Create unit tests for each file

This is for deployment, but may be useful: https://github.com/microsoft/MLOpsPython/blob/master/docs/custom_model.md
- Fix directory structure
- Update train and/or eval script if needed
- Customize the container: https://github.com/microsoft/MLOpsPython/blob/master/docs/custom_container.md
    - Provision a container registry
    - Create a registry service connection
    - Update the environment definition
    - Create/modify container build pipeline

### Parse arguments that are passed in
```python
#Instantiate the parser
parser = argparse.ArgumentParser()

#You have to ADD each argument before you can parse it.  "dest" just creates a different name for the argument (why would you want that?)
parser.add_argument('--data-folder', type=str,dest='data_folder', help='data folder')
parser.add_argument("--split", default="train")
parser.add_argument("--samples", type=int, default=10)
parser.add_argument("--scale", type=float, default=1.0)

#Call the parser
args = parser.parse_args()

#Ways to access it
print('Data folder is at:', args.data_folder)
print('List all files: ', os.listdir(args.data_folder))
X = np.load(os.path.join(args.data_folder, 'features.npy'))
y = np.load(os.path.join(args.data_folder, 'labels.npy'))
```

### Add logging to training script
A "Run" will log certain things by default (most files are put in `'./output'` by default), but you have to add code to your script to get specific things added to the output.
NOTE: print() statements do produce output (perhaps in the log file?)

```python
```
#### Prepare to log
```Python
from azureml.core.run import Run

#You need this line to get the 'run' object so you can log to it
run = Run.get_context()

#If you're going to put stuff directly into the special output directory, you make sure it exists (necessary???)
os.makedirs('./outputs', exist_ok=True)
```
#### Log a value, list, tuple, or table
```python
run.log(name, value, description='') #Log a value
run.log_list(name, value, description='') #Log a list
run.log_row(name, description=None, **kwargs) #Logs a tuple (if called once) or a table (if called multiple)
run.log_table(name, value, description='') #Log a whole table
```
#### Log an image to the run (either as a file or a Matplotlib plot)
```python
# Create a plot
%matplotlib inline
import matplotlib.pyplot as plt
angle = np.linspace(-3, 3, 50) * scale_factor
plt.plot(angle,np.tanh(angle), label='tanh')
plt.legend(fontsize=12)
plt.title('Hyperbolic Tangent', fontsize=16)
plt.grid(True)
#Log it
run.log_image(name='Hyperbolic Tangent', plot=plt)
#To log an arbitrary image file, use the form 
run.log_image(name, path='./image_path.png')
```
#### Upload a file into the run artifacts 
This is only needed if it's not automatically included.
```python
file_name = 'outputs/myfile.txt'
with open(file_name, "w") as f:
    f.write('A line of output.\n')
run.upload_file(name = file_name, path_or_stream = file_name)
```
#### Save a snapshot of a whole directory to the run object
```python
run.take_snapshot('./somedir')
```
#### End logging
```python
run.complete() #End logging
```
Metrics are necessary for hyperparameter tuning (see "Tune model hyperparameters" notebook).

### Convert notebook to .py file

In [None]:
jupyter nbconvert --to script <mynotebook>.ipynb

### copy it to your project directory

In [None]:
import shutil
shutil.copy('pytorch_train.py', project_folder)

### Add logging to training script
A "Run" will log certain things by default (most files are put in `'./output'` by default), but you have to add code to your script to get specific things added to the output.
NOTE: print() statements do produce output (perhaps in the log file?)
#### Log a value, list, tuple, or table
```Python
from azureml.core.run import Run
run.log(name, value, description='') #Log a value
run.log_list(name, value, description='') #Log a list
run.log_row(name, description=None, **kwargs) #Logs a tuple (if called once) or a table (if called multiple)
run.log_table(name, value, description='') #Log a whole table
```
#### Log an image to the run (either as a file or a Matplotlib plot)
```python
# Create a plot
%matplotlib inline
import matplotlib.pyplot as plt
angle = np.linspace(-3, 3, 50) * scale_factor
plt.plot(angle,np.tanh(angle), label='tanh')
plt.legend(fontsize=12)
plt.title('Hyperbolic Tangent', fontsize=16)
plt.grid(True)
#Log it
run.log_image(name='Hyperbolic Tangent', plot=plt)
#To log an arbitrary image file, use the form 
run.log_image(name, path='./image_path.png')
```
#### Upload a file into the run artifacts 
This is only needed if it's not automatically included.
```python
file_name = 'outputs/myfile.txt'
with open(file_name, "w") as f:
    f.write('A line of output.\n')
run.upload_file(name = file_name, path_or_stream = file_name)
```
#### Save a snapshot of a whole directory to the run object
```python
run.take_snapshot('./somedir')
```
#### End logging
```python
run.complete() #End logging
```
Metrics are necessary for hyperparameter tuning (see "Tune model hyperparameters" notebook).

Once your script is ready, **copy it to your project directory**.

Once your script is ready, **copy it to your project directory**.

In [None]:
import shutil

shutil.copy('pytorch_train.py', project_folder)

### Turn the training notebook into a .py file
From the command line:
```python
#One file:
jupyter nbconvert --to script <mynotebook>.ipynb

#All notebooks in a directory:
jupyter nbconvert --to script /<path>/*.ipynb
```

### Copy it to the project folder
What goes in here?
--script file
config files should be automatically included by installing the trident package

What goes in the data store?
--Copy the tiled image files to the blob store



### Create an experiment
Create an [Experiment](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#experiment) to track all the runs in your workspace.  An experiment is associated with a workspace, and it is a way to group information from multiple runs.  Each run has an Estimator or ScriptRunConfig that defines a source_folder containing the main script and other assets needed (additional python files, model weights, etc.) From Santiago: _The folder will typically be associated with a code repository. This is not required, but it will allow you to collaborate with others on the same experiment. The repository can be hosted in any service, from GitHub to Azure DevOps._

In [None]:
from azureml.core import Experiment

experiment_name = 'trident_tz_expt'
experiment = Experiment(ws, name=experiment_name)

## Environments
There are three types of environments:
- Curated (for some of the managed services)
- User-defined.  Must include `azureml-defaults with version >= 1.0.45` as a pip dependency
- System-defined.  Managed by Conda.  This is the default

It is not strictly necessary to create an environment, since when you submit a job, one will be created for you.  
- Environments are automatically registered to your workspace when you submit an experiment. They can also be manually registered;
- You can fetch environments from your workspace to use for training or deployment, or to edit;
- With versioning, you can see changes to your environments over time, which ensures reproducibility;
- You can build Docker images automatically from your environments.

See: https://docs.microsoft.com/en-us/azure/machine-learning/how-to-use-environments for more detail

#### Create an environment

In [None]:
from azureml.core.environment import Environment

Environment(name="myenv") #Instantiate an Environment object

#Create it from an existing conda environment
myenv = Environment.from_existing_conda_environment(name = "myenv",
                                                    conda_environment_name = "mycondaenv")

#### Add packages to an environment

In [None]:
from azureml.core.environment import Environment
from azureml.core.conda_dependencies import CondaDependencies

myenv = Environment(name="myenv")

#Make a section of the environment that holds conda dependencies (see last line for how it's added)
conda_dep = CondaDependencies()

# Install conda package
conda_dep.add_conda_package("numpy==1.17.0")

# Install pip package (Note: it's also added to conda dependencies)
conda_dep.add_pip_package("pillow")

# Adds dependencies to PythonSection of myenv
myenv.python.conda_dependencies=conda_dep


#### Other environment operations

In [None]:
myenv.register(workspace=ws) #Register an environment with a workspace
Environment.list(workspace="workspace_name") #List environments associated with a workspace
Environment.get(workspace=ws,name="myenv",version="1") #Find an existing environment by name+version
Run.get_environment() #Get the environment that was associated with the Run

### Optional: put the environment into a Docker container
This is done by default, but you can choose details if you control it.
_Note: the string below refers to a "base image"_: see https://github.com/Azure/AzureML-Containers.
It specifies the CUDA type and the Ubuntu version but not much else.

#### Find the name of your container registry

In [None]:
ws.get_details() #look through it for a registry.
#If it exists, the last element of this string like thsi is the container registry name (i.e., 'jpworkspacea09a28ea')
#'/subscriptions/6c7f51dc-3592-4952-bf04-f7b2d747bfca/resourceGroups/bluedot/providers/Microsoft.ContainerRegistry/registries/jpworkspacea09a28ea'

In [None]:
# Creates the environment inside a Docker container.
myenv.docker.enabled = True

# Optional: specify a custom Docker base image and registry if you don't want to use the defaults
myenv.docker.base_image="your_base-image"
myenv.docker.base_image_registry="your_registry_location"
#trident_env.docker.base_image_registry=container_registry

#More detail on specifying a container registry
#container_registry = ContainerRegistry()
#container_registry.address = 'jpworkspacea09a28ea.azurecr.io'
#container_registry.username = 'yasiyu'
#container_registry.password = registry_pw

# Specify docker steps as a string (optional). Alternatively, load the string from a file.
dockerfile = r"""
FROM mcr.microsoft.com/azureml/base:intelmpi2018.3-ubuntu16.04
RUN echo "Hello from custom container!"
"""

# Optional: Set base image to None, because the image is defined by dockerfile.
myenv.docker.base_image = None
myenv.docker.base_dockerfile = dockerfile

## Prepare to submit the job
**There are several options for submitting jobs, including:**
1. **Submit as an `Estimator`**.  An Estimator encapsulates an environment and a target.  There are many types of Estimators
2. **Submit as a "ScriptRunConfig**"  

Santiago recommends Estimator, because it's simpler "_Estimator is an abstraction that allows you to build a Script Run Configuration based on high-level specifications._"

### Create an Estimator
**An `Estimator` encapsulates an environment and a target**. By default, an Estimator doesn't come with an environment, but Azure makes Estimators for both Tensorflow and Pytorch that come with packages installed. For more information on the PyTorch estimator, refer [here](https://docs.microsoft.com/azure/machine-learning/service/how-to-train-pytorch). Parameters:  
- `source_directory` on your local machine must contain all files needed (except data) and must be <300 Mb in size.  It gets copied to a Docker container when the job is run, so you also have to specify:
- `entry_script`: the main script;
- `script_params`: any command-line params to be added to the main script call;
- `compute_target`: the machine(s) to run the job on;
- Data can be found and attached or downloaded in the training script, or you can pass the name of a directory
- `use_gpu`: Default False. Indicates if the hardware supports GPU. If true, the image deployed in the virtual machine will have all the drivers and distributions to support GPUs.
- `use_docker`: Default True. Indicates if the job will be submitted as a Docker image into the compute target.

For distributed training, you must 1) specify a cluster as your machine type; then 2) the `distributed_backend` parameter specifies how the nodes of the training cluster will communicate to achieve distributed training.
See https://docs.microsoft.com/en-us/azure/machine-learning/how-to-train-pytorch

In [None]:
from azureml.train.dnn import PyTorch
import horovod #for distributed training

#These are literally command-line parameters to be used with the main .py script file when it is called.
#Example 1: point to a local data directory
script_params = {
    '--data_dir': '/somepath',
    '--num_epochs': 30,
    '--output_dir': './outputs'
}
#Example 2. Here, you've already defined ds as a DataStore, so we mount it.
script_params = {
    '--data-folder': ds.as_mount(),
    '--regularization': 0.8
}

project_folder='./my-proj',

#Using the default Estimator() class, which does not have packages installed by default.  
estimator = Estimator(source_directory=project_folder,
                    entry_script='train.py',              
                    script_params=script_params,
                    # pass dataset object as an input with name 'titanic'
                    inputs=[titanic_ds.as_named_input('titanic')],
                    compute_target=compute_target,
                    pip_packages=['scikit-learn'],
                    conda_packages=['scikit-learn'])

#A PyTorch Estimator, containing all of the packages needed for PyTorch
estimator = PyTorch(source_directory=project_folder,
                    #Here the dataset reference is inside "script_params"
                    script_params=script_params,
                    compute_target=compute_target,
                    entry_script='pytorch_train.py',
                    use_gpu=True,
                    pip_packages=['pillow==5.4.1'])

#A Pytorch estimator for distributed training
estimator= PyTorch(source_directory=project_folder,
                      compute_target=compute_target,
                      script_params=script_params,
                      entry_script='script.py',
                      node_count=2,
                      process_count_per_node=1,
                      distributed_training=MpiConfiguration(),
                      framework_version='1.13',
                      use_gpu=True)

#A TensorFlow Estimator with all needed TensorFlow packages, here set up for distributed processing. 
estimator= TensorFlow(source_directory=project_folder,
                      compute_target=compute_target,
                      script_params=script_params,
                      entry_script='tf_horovod_word2vec.py',
                      node_count=2,
                      process_count_per_node=1,
                      distributed_backend='mpi',
                      use_gpu=True)

The `script_params` parameter is a dictionary containing the command-line arguments to your training script `entry_script`. Please note the following:
- We passed our training data reference `ds_data` to our script's `--data_dir` argument. This will 1) mount our datastore on the remote compute and 2) provide the path to the training data `fowl_data` on our datastore.
- We specified the output directory as `./outputs`. The `outputs` directory is specially treated by Azure ML in that all the content in this directory gets uploaded to your workspace as part of your run history. The files written to this directory are therefore accessible even once your remote run is over. In this tutorial, we will save our trained model to this output directory.

To leverage the Azure VM's GPU for training, we set `use_gpu=True`.

In [None]:
from azureml.train.estimator import Estimator

script_params = {
    '--data-folder': ds.as_mount(),
    '--regularization': 0.8
}

sk_est = Estimator(source_directory='./my-sklearn-proj',
                   script_params=script_params,
                   compute_target=compute_target,
                   entry_script='train.py',
                   conda_packages=['scikit-learn'])

### Create a ScriptRunConfig
I think in some ways this may be more flexible because you can create the environment in detail.  See https://docs.microsoft.com/en-us/azure/machine-learning/how-to-set-up-training-targets

In [None]:
from azureml.core import ScriptRunConfig, Experiment
from azureml.core.environment import Environment

exp = Experiment(name="myexp", workspace = ws)
# Instantiate environment
myenv = Environment(name="myenv")

# Add training script to the runconfig.  This is where the source_directory is specified, which may contain other stuff as well.
runconfig = ScriptRunConfig(source_directory=".", script="train.py")

# GPU support: Azure automatically detects and uses the NVIDIA Docker extension when it is available.

#For a RunConfig setup:
# run_config = RunConfiguration()
# run_config.environment.docker.enabled = True
# run_config.environment.docker.base_image='tfodapi112:190905'
# run_config.environment.docker.base_image_registry=container_registry
# run_config.target = compute_target # specify the compute target

# Attach compute target to run config
runconfig.run_config.target = "local"

# Attach environment to run config
runconfig.run_config.environment = myenv

## Submit the run
As the Run is executed, it goes through the following stages:
- **Preparing**: A docker image is created according to the PyTorch estimator. The image is uploaded to the workspace's container registry and cached for later runs. Logs are also streamed to the run history and can be viewed to monitor progress.
- **Scaling**: The cluster attempts to scale up if the Batch AI cluster requires more nodes to execute the run than are currently available.
- **Running**: All scripts in the script folder are uploaded to the compute target, data stores are mounted or copied, and the entry_script is executed. Outputs from stdout and the ./logs folder are streamed to the run history and can be used to monitor the run.
- **Post-Processing**: The ./outputs folder of the run is copied over to the run history.  

Note: a `Run` can have children (useful for doing a set of hyperparameter testing runs).  See https://docs.microsoft.com/en-us/azure/machine-learning/how-to-manage-runs for details

#### Alternative 1: Running an Estimator

```python
from azureml.widgets import RunDetails

run = <experiment>.submit(<estimator>)
RunDetails(run).show() #uses azureml.widgets
#Or wait til the run is done:
run.wait_for_completion(show_output = True)


# to get more details of your run
print(run)
print(run.get_details())
```

#### Alternative 2: Running a ScriptRunConfig

```python
# Submit run 
from azureml.widgets import RunDetails
run = <experiment>.submit(<runconfig>)

#Submit a run on local()
script_folder = os.getcwd()
run = my_experiment.submit(src)
RunDetails(run).show() #uses azureml.widgets
#Or wait til the run is done:
#run.wait_for_completion(show_output = True)

#To run it on a different resource, the only thing you have to change is the run_config.
src = ScriptRunConfig(source_directory = script_folder, script = 'train.py', run_config = run_amlcompute)
```

### Monitor your run
You can monitor the progress of the run with a Jupyter widget. Like the run submission, the widget is asynchronous and provides live updates every 10-15 seconds until the job completes.

#### Use a Jupyter widget
```python
from azureml.widgets import RunDetails
RunDetails(run).show()
```
Alternatively, you can block until the script has completed training before running more code.
```python
run.wait_for_completion(show_output=True)
```
#### Get run status:
```python
#Get run status:
print(<run>.get_status())
```
#### Cancel a run, or mark a run failed
```python
<run>.cancel() #cancel
<run>.fail() #mark failed
```

### Accessing run output
#### From the portal
You can check output simply by typing:
```python
run
#or
experiment
```
#### From a notebook or script
```python
# access the run id for use later
run_id = run.id

#Attach to a run
fetched_run = Run(<experiment>, run_id)
#or
<experiment>.get_runs(<run_id>) #Check the argument -- I'm guessing

#Get all run metrics
fetched_run.get_metrics()

#Get a particular metric
fetched_run.get_metrics(name = "scale factor")

#Get the filenames of files that were uploaded to the run
fetched_run.get_file_names()

#Download those files to your local machine
import os
os.makedirs('files', exist_ok=True)

for f in run.get_file_names():
    dest = os.path.join('files', f.split('/')[-1]) #using just the filename and dropping the path
    print('Downloading file {} to {}...'.format(f, dest))
    fetched_run.download_file(f, dest)
```

## Register the model
Once you've trained the model, you can register it to your workspace. Model registration lets you store and version your models in your workspace to simplify model management and deployment.

In [None]:
model = run.register_model(model_name='pt-dnn', model_path='outputs/')

## Tune model hyperparameters
Now that we've seen how to do a simple PyTorch training run using the SDK, let's see if we can further improve the accuracy of our model. We can optimize our model's hyperparameters using Azure Machine Learning's hyperparameter tuning capabilities.

### Start a hyperparameter sweep
First, we will define the hyperparameter space to sweep over. Since our training script uses a learning rate schedule to decay the learning rate every several epochs, let's tune the initial learning rate and the momentum parameters. In this example we will use random sampling to try different configuration sets of hyperparameters to maximize our primary metric, the best validation accuracy (`best_val_acc`).

Then, we specify the early termination policy to use to early terminate poorly performing runs. Here we use the `BanditPolicy`, which will terminate any run that doesn't fall within the slack factor of our primary evaluation metric. In this tutorial, we will apply this policy every epoch (since we report our `best_val_acc` metric every epoch and `evaluation_interval=1`). Notice we will delay the first policy evaluation until after the first `10` epochs (`delay_evaluation=10`).
Refer [here](https://docs.microsoft.com/azure/machine-learning/service/how-to-tune-hyperparameters#specify-an-early-termination-policy) for more information on the BanditPolicy and other policies available.

In [None]:
from azureml.train.hyperdrive import RandomParameterSampling, BanditPolicy, HyperDriveConfig, uniform, PrimaryMetricGoal

param_sampling = RandomParameterSampling( {
        'learning_rate': uniform(0.0005, 0.005),
        'momentum': uniform(0.9, 0.99)
    }
)

early_termination_policy = BanditPolicy(slack_factor=0.15, evaluation_interval=1, delay_evaluation=10)

hyperdrive_config = HyperDriveConfig(estimator=estimator,
                                     hyperparameter_sampling=param_sampling, 
                                     policy=early_termination_policy,
                                     primary_metric_name='best_val_acc',
                                     primary_metric_goal=PrimaryMetricGoal.MAXIMIZE,
                                     max_total_runs=8,
                                     max_concurrent_runs=4)

Finally, lauch the hyperparameter tuning job.

In [None]:
# start the HyperDrive run
hyperdrive_run = experiment.submit(hyperdrive_config)

### Monitor HyperDrive runs
You can monitor the progress of the runs with the following Jupyter widget. 

In [None]:
RunDetails(hyperdrive_run).show()

Or block until the HyperDrive sweep has completed:

In [None]:
hyperdrive_run.wait_for_completion(show_output=True)

In [None]:
assert(hyperdrive_run.get_status() == "Completed")

### Warm start a Hyperparameter Tuning experiment and resuming child runs
Often times, finding the best hyperparameter values for your model can be an iterative process, needing multiple tuning runs that learn from previous hyperparameter tuning runs. Reusing knowledge from these previous runs will accelerate the hyperparameter tuning process, thereby reducing the cost of tuning the model and will potentially improve the primary metric of the resulting model. When warm starting a hyperparameter tuning experiment with Bayesian sampling, trials from the previous run will be used as prior knowledge to intelligently pick new samples, so as to improve the primary metric. Additionally, when using Random or Grid sampling, any early termination decisions will leverage metrics from the previous runs to determine poorly performing training runs. 

Azure Machine Learning allows you to warm start your hyperparameter tuning run by leveraging knowledge from up to 5 previously completed hyperparameter tuning parent runs. 

Additionally, there might be occasions when individual training runs of a hyperparameter tuning experiment are cancelled due to budget constraints or fail due to other reasons. It is now possible to resume such individual training runs from the last checkpoint (assuming your training script handles checkpoints). Resuming an individual training run will use the same hyperparameter configuration and mount the storage used for that run. The training script should accept the "--resume-from" argument, which contains the checkpoint or model files from which to resume the training run. You can also resume individual runs as part of an experiment that spends additional budget on hyperparameter tuning. Any additional budget, after resuming the specified training runs is used for exploring additional configurations.

For more information on warm starting and resuming hyperparameter tuning runs, please refer to the [Hyperparameter Tuning for Azure Machine Learning documentation](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-tune-hyperparameters) 

### Find and register the best model
Once all the runs complete, we can find the run that produced the model with the highest accuracy.

In [None]:
best_run = hyperdrive_run.get_best_run_by_primary_metric()
best_run_metrics = best_run.get_metrics()
print(best_run)

In [None]:
print('Best Run is:\n  Validation accuracy: {0:.5f} \n  Learning rate: {1:.5f} \n  Momentum: {2:.5f}'.format(
        best_run_metrics['best_val_acc'][-1],
        best_run_metrics['lr'],
        best_run_metrics['momentum'])
     )

Finally, register the model from your best-performing run to your workspace. The `model_path` parameter takes in the relative path on the remote VM to the model file in your `outputs` directory. In the next section, we will deploy this registered model as a web service.

In [None]:
model = best_run.register_model(model_name = 'pytorch-birds', model_path = 'outputs/model.pt')
print(model.name, model.id, model.version, sep = '\t')

## Pipelines
Pipelines are an additional piece of complication in this whole workflow, but they offer some advantages:
- Unattended execution
- Reliably coordinated computation across heterogeneous and scalable computes and storages. 
- Reusability: They can be triggered from external systems via simple REST calls.
- Tracking and versioning of data sources, inputs, and outputs

A very simple, easy example of a single pipeline step here:  
https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-how-to-use-estimatorstep.ipynb  
A fuller example here:  
https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/machine-learning-pipelines/parallel-run/file-dataset-image-inference-mnist.ipynb  
and here (near the bottom):  
https://docs.microsoft.com/en-us/azure/machine-learning/tutorial-pipeline-batch-scoring-classification

# Deploy model as web service
Once you have your trained model, you can deploy the model on Azure. In this tutorial, we will deploy the model as a web service in [Azure Container Instances](https://docs.microsoft.com/en-us/azure/container-instances/) (ACI). For more information on deploying models using Azure ML, refer [here](https://docs.microsoft.com/azure/machine-learning/service/how-to-deploy-and-where).

### Create scoring script

First, we will create a scoring script that will be invoked by the web service call. Note that the scoring script must have two required functions:
* `init()`: In this function, you typically load the model into a `global` object. This function is executed only once when the Docker container is started. 
* `run(input_data)`: In this function, the model is used to predict a value based on the input data. The input and output typically use JSON as serialization and deserialization format, but you are not limited to that.

Refer to the scoring script `pytorch_score.py` for this tutorial. Our web service will use this file to predict whether an image is a chicken or a turkey. When writing your own scoring script, don't forget to test it locally first before you go and deploy the web service.

### Create environment file
Then, we will need to create an environment file (`myenv.yml`) that specifies all of the scoring script's package dependencies. This file is used to ensure that all of those dependencies are installed in the Docker image by Azure ML. In this case, we need to specify `azureml-core`, `torch` and `torchvision`.

In [None]:
from azureml.core.conda_dependencies import CondaDependencies 

myenv = CondaDependencies.create(pip_packages=['azureml-defaults', 'torch', 'torchvision>=0.5.0'])

with open("myenv.yml","w") as f:
    f.write(myenv.serialize_to_string())
    
print(myenv.serialize_to_string())

### Deploy to ACI container
We are ready to deploy. Create an inference configuration which gives specifies the inferencing environment and scripts. Create a deployment configuration file to specify the number of CPUs and gigabytes of RAM needed for your ACI container. While it depends on your model, the default of `1` core and `1` gigabyte of RAM is usually sufficient for many models. This cell will run for about 7-8 minutes.

In [None]:
from azureml.core.webservice import AciWebservice
from azureml.core.model import InferenceConfig
from azureml.core.webservice import Webservice
from azureml.core.model import Model
from azureml.core.environment import Environment


myenv = Environment.from_conda_specification(name="myenv", file_path="myenv.yml")
inference_config = InferenceConfig(entry_script="pytorch_score.py", environment=myenv)

aciconfig = AciWebservice.deploy_configuration(cpu_cores=1, 
                                               memory_gb=1, 
                                               tags={'data': 'birds',  'method':'transfer learning', 'framework':'pytorch'},
                                               description='Classify turkey/chickens using transfer learning with PyTorch')

service = Model.deploy(workspace=ws, 
                           name='aci-birds', 
                           models=[model], 
                           inference_config=inference_config, 
                           deployment_config=aciconfig)
service.wait_for_deployment(True)
print(service.state)

If your deployment fails for any reason and you need to redeploy, make sure to delete the service before you do so: `service.delete()`

**Tip: If something goes wrong with the deployment, the first thing to look at is the logs from the service by running the following command:**

In [None]:
service.get_logs()

Get the web service's HTTP endpoint, which accepts REST client calls. This endpoint can be shared with anyone who wants to test the web service or integrate it into an application.

In [None]:
print(service.scoring_uri)

### Test the web service
Finally, let's test our deployed web service. We will send the data as a JSON string to the web service hosted in ACI and use the SDK's `run` API to invoke the service. Here we will take an image from our validation data to predict on.

In [None]:
import json
from PIL import Image
import matplotlib.pyplot as plt

%matplotlib inline
plt.imshow(Image.open('test_img.jpg'))

In [None]:
import torch
from torchvision import transforms
    
def preprocess(image_file):
    """Preprocess the input image."""
    data_transforms = transforms.Compose([
        transforms.Resize(256),
        transforms.CenterCrop(224),
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
    ])

    image = Image.open(image_file)
    image = data_transforms(image).float()
    image = torch.tensor(image)
    image = image.unsqueeze(0)
    return image.numpy()

In [None]:
input_data = preprocess('test_img.jpg')
result = service.run(input_data=json.dumps({'data': input_data.tolist()}))
print(result)

## Clean up
Once you no longer need the web service, you can delete it with a simple API call.

In [None]:
service.delete()