The goal of this notebook is to submit training job to Azure Machine Learning Studio. The code for model initialization is contained in the code.zip folder.
To run this notebook only requires a small compute resource, but you have to create a large GPU cluster for the actual model training in AML studio.

Why we would like to submit a training job instead of directly running in the notebook? 
Since we may want to use different combinations of hyperparameters for the LSTM model training. With AML Studio, you are able to better keep track of the training pipeline and results. Aside from Auto-logging, you are able to make your custom logs for key metrics cared about.

#### Package Installation

In [None]:
!pip install azure-ai-ml
!pip install azure-identity

In [1]:
from azure.ai.ml import MLClient
from azure.identity import DefaultAzureCredential
import os
from azure.ai.ml import command
from azure.ai.ml import Input

#### Set Workplace Variables

In [8]:
#only for illustration, substitude for your own resources
subscription_id="UNKNOWN"
resource_group_name="UNKNOWN"
workspace_name="UNKNOWN"

code_folder = './esri_train'
experiment_name="esri-lstm"
env_name='esri-lstm-docker'
env_version='latest'
compute_cluster_name='UNKNOWN'  # GPU



#### Configure Input Style
Add your own hyperparameter as input to the LSTM model training pipeline.

In [None]:
# default
# arch = "lstm(nodes,dropout)"
# arch = "lstm(50,0.2);dense(40);lstm(50,0.01)"

arch = "lstm(25,0.2,0)"

inputs = {
    "data": Input(type="uri_file", path="UNKNOWN",), # newseq05
    # "lstm_nodes": 100,
    # "dropout": 0.2,
    # "batch_size": 4,
    "batch_length": 500,
    "input_length": 5,
    "epochs": 10,
    "embedding_vector_length": 32,
    "validation_split":0.2,
    "arch":arch,
}

#create model display name according to the configurations, easier for comparison
job_display_name = f"{arch}-epochs_{inputs['epochs']}-{compute_cluster_name}-batch_{inputs['batch_length']}"


#### Setup Training Job

In [9]:
credential = DefaultAzureCredential()
ml_client = MLClient(
    credential=credential,
    subscription_id=subscription_id, 
    resource_group_name=resource_group_name, 
    workspace_name=workspace_name, 
)


job = command(
    compute=compute_cluster_name,
    environment=f"{env_name}@{env_version}",
    code= code_folder,  # location of source code
    command='python main.py --data ${{inputs.data}} --input_length ${{inputs.input_length}}  --epochs ${{inputs.epochs}}  --embedding_vector_length ${{inputs.embedding_vector_length}}  --batch_length ${{inputs.batch_length}}   --validation_split ${{inputs.validation_split}} --arch "${{inputs.arch}}"',
    # command="python main.py --data ${{inputs.data}} --lstm_nodes ${{inputs.lstm_nodes}}  --input_length ${{inputs.input_length}}  --epochs ${{inputs.epochs}}  --embedding_vector_length ${{inputs.embedding_vector_length}}  --batch_size ${{inputs.batch_size}}  --dropout ${{inputs.dropout}}  --validation_split ${{inputs.validation_split}}",
    inputs=inputs,
    experiment_name=experiment_name,
    display_name=job_display_name,
)

#print to see how we configured the model training
print(job)


type: command
inputs:
  data:
    type: uri_file
    path: azureml:new_seq05:1
  batch_length: 500
  input_length: 5
  epochs: 10
  embedding_vector_length: 32
  validation_split: 0.2
  arch: lstm(25,0.2,0)
environment: azureml:esri-lstm-docker@latest
component:
  name: azureml_anonymous
  version: '1'
  display_name: lstm(25,0.2,0)-epochs_10-gpu2-batch_500
  type: command
  inputs:
    data:
      type: uri_file
    batch_length:
      type: integer
      default: '500'
    input_length:
      type: integer
      default: '5'
    epochs:
      type: integer
      default: '10'
    embedding_vector_length:
      type: integer
      default: '32'
    validation_split:
      type: number
      default: '0.2'
    arch:
      type: string
      default: lstm(25,0.2,0)
  command: python main.py --data ${{inputs.data}} --input_length ${{inputs.input_length}}  --epochs
    ${{inputs.epochs}}  --embedding_vector_length ${{inputs.embedding_vector_length}}  --batch_length
    ${{inputs.batch_len

In [10]:
# Submit the job 
# Click the link to view training status in AML studio
ml_client.create_or_update(job)


Experiment,Name,Type,Status,Details Page
esri-lstm,tidy_oxygen_689ck2m9dl,command,Starting,Link to Azure Machine Learning studio


In [7]:
a = ml_client.environments.list()
for d in a:
    print(d.name)

esri-lstm-docker
esri-lstm-env
aml-scikit-learn
AzureML-AI-Studio-Development
AzureML-ACPT-pytorch-1.13-py38-cuda11.7-gpu
AzureML-ACPT-pytorch-1.12-py38-cuda11.6-gpu
AzureML-ACPT-pytorch-1.12-py39-cuda11.6-gpu
AzureML-ACPT-pytorch-1.11-py38-cuda11.5-gpu
AzureML-ACPT-pytorch-1.11-py38-cuda11.3-gpu
AzureML-responsibleai-0.21-ubuntu20.04-py38-cpu
AzureML-responsibleai-0.20-ubuntu20.04-py38-cpu
AzureML-tensorflow-2.5-ubuntu20.04-py38-cuda11-gpu
AzureML-tensorflow-2.6-ubuntu20.04-py38-cuda11-gpu
AzureML-tensorflow-2.7-ubuntu20.04-py38-cuda11-gpu
AzureML-sklearn-1.0-ubuntu20.04-py38-cpu
AzureML-pytorch-1.10-ubuntu18.04-py38-cuda11-gpu
AzureML-pytorch-1.9-ubuntu18.04-py37-cuda11-gpu
AzureML-pytorch-1.8-ubuntu18.04-py37-cuda11-gpu
AzureML-sklearn-0.24-ubuntu18.04-py37-cpu
AzureML-lightgbm-3.2-ubuntu18.04-py37-cpu
AzureML-pytorch-1.7-ubuntu18.04-py37-cuda11-gpu
AzureML-tensorflow-2.4-ubuntu18.04-py37-cuda11-gpu
AzureML-Triton
AzureML-Designer-Score
AzureML-VowpalWabbit-8.8.0
AzureML-PyTorch-1.3