# Info

Train and deploy the SVD model into an Azure ML Workspace.

In [1]:
# Connect to workspace
from azure.identity import DefaultAzureCredential, InteractiveBrowserCredential
from azure.ai.ml import MLClient
import yaml

with open('azure_config.yml', 'r') as file:
  azure_config = yaml.safe_load(file)

subscription_id = azure_config['azure_workspace_details']['sub_id']
resource_group = azure_config['azure_workspace_details']['rg_name']
workspace = azure_config['azure_workspace_details']['ws_name']

ml_client = MLClient(
  credential=DefaultAzureCredential(),
  subscription_id=subscription_id,
  resource_group_name=resource_group,
  workspace_name=workspace
)

Class DeploymentTemplateOperations: This is an experimental class, and may change at any time. Please see https://aka.ms/azuremlexperimental for more information.


# Setup

Azure ML Workspace runtime requirements for job: Environment and Compute Cluster

## Conda Environment

Keep an up-to-date Conda environment YAML configuration file under src folder. Create via cmd: `conda env export > conda-env.yml`

In [2]:
from azure.ai.ml.entities import Environment

env_name = 'svd-env'

try: 
  env = ml_client.environments.get(env_name, version=6)
  print(f"Existing environment named {env_name} already exists w/ version {env.version}")
except Exception: 
  print("Creating a new environment...")
  env_docker_conda = Environment(
    image="mcr.microsoft.com/azureml/openmpi4.1.0-ubuntu20.04",
    conda_file="./src/conda-env.yml",
    name="svd-env",
    description="Environment created for SVD modeling",
  )

  ml_client.environments.create_or_update(env_docker_conda)

Existing environment named svd-env already exists w/ version 6


## Compute Cluster

In [3]:
from azure.ai.ml.entities import AmlCompute

# Name assigned to the compute cluster
cpu_compute_target = "aml-cluster"

try:
  cpu_cluster = ml_client.compute.get(cpu_compute_target)
  print(f"Using existing cluster named {cpu_compute_target}")
except Exception:
  print("Creating a new cpu compute target...")

  cpu_cluster = AmlCompute(
    name=cpu_compute_target,
    type="amlcompute",
    size="STANDARD_DS11_V2",
    min_instances=0,
    max_instances=2,
    idle_time_before_scale_down=120,
    tier="Dedicated",
  )

  cpu_cluster = ml_client.compute.begin_create_or_update(cpu_cluster)

Using existing cluster named aml-cluster


# Create Jobs

Run job from script `svd_params.py` with local ratings parquet file (not bothering with registering data assets here). MLFlow in the script will log hyperparameters as well as the primary metric of RMSE.

## Initial Run

In [4]:
from azure.ai.ml import command
from azure.ai.ml.entities import Data
from azure.ai.ml import Input, Output
from azure.ai.ml.constants import AssetTypes
from azure.ai.ml.sweep import Choice

experiment_name='svd_training'

job = command(
  code='./src',
  command='python train_svd_params.py --training_data ratings_combined.parquet --n_epochs ${{inputs.n_epochs}} --lr_all ${{inputs.lr_all}} --reg_all ${{inputs.reg_all}}',
  inputs={
    'n_epochs':20,
    'lr_all':0.005,
    'reg_all':0.02
  },
  environment='svd-env:6',
  compute='aml-cluster',
  display_name='svd-train',
  experiment_name=experiment_name
)

returned_job = ml_client.create_or_update(job)
aml_url = returned_job.studio_url
print("Job created at:", aml_url)

Class AutoDeleteSettingSchema: This is an experimental class, and may change at any time. Please see https://aka.ms/azuremlexperimental for more information.
Class AutoDeleteConditionSchema: This is an experimental class, and may change at any time. Please see https://aka.ms/azuremlexperimental for more information.
Class BaseAutoDeleteSettingSchema: This is an experimental class, and may change at any time. Please see https://aka.ms/azuremlexperimental for more information.
Class IntellectualPropertySchema: This is an experimental class, and may change at any time. Please see https://aka.ms/azuremlexperimental for more information.
Class ProtectionLevelSchema: This is an experimental class, and may change at any time. Please see https://aka.ms/azuremlexperimental for more information.
Class BaseIntellectualPropertySchema: This is an experimental class, and may change at any time. Please see https://aka.ms/azuremlexperimental for more information.


Job created at: https://ml.azure.com/runs/olive_beet_8jlx0h1gfq?wsid=/subscriptions/d11a1a82-5001-4aca-ad0c-9632beca4a9f/resourcegroups/rg-ec/workspaces/mlw-ec&tid=e39e8e96-a6c1-42d4-9c0f-bf7830e0d1e6


## Sweep Job for Hyperparameter Tuning

Will use discrete search space for the following hyperparameters:

* `n_epochs` (20 was default) = [20, 50]
* `lr_all` (0.005 was default) = [0.001, 0.005, 0.01, 0.1, 0.2]
* `reg_all` (0.02 was default)= [0.001, 0.02, 0.1, 1, 10]

In [5]:
from azure.ai.ml.sweep import Choice
from azure.ai.ml.sweep import MedianStoppingPolicy

cmd_job_for_sweep = job(
  n_epochs=Choice(values=[20, 50]),
  lr_all=Choice(values=[0.001, 0.005, 0.01, 0.1, 0.2]),
  reg_all=Choice(values=[0.001, 0.02, 0.1, 1, 10])
)

sweep_job = cmd_job_for_sweep.sweep(
  compute="aml-cluster",
  sampling_algorithm="grid",
  primary_metric="RMSE",
  goal="Minimize"
)
sweep_job.set_limits(
  max_total_trials=50,
  max_concurrent_trials=4,
  timeout=7200
)
sweep_job.experiment_name="svd-sweep"
sweep_job.early_termination = MedianStoppingPolicy(delay_evaluation = 5, evaluation_interval = 1)

returned_sweep_job = ml_client.create_or_update(sweep_job)
aml_url = returned_sweep_job.studio_url
print("Sweep job created at:", aml_url)

Sweep job created at: https://ml.azure.com/runs/bold_octopus_qgdw7blg80?wsid=/subscriptions/d11a1a82-5001-4aca-ad0c-9632beca4a9f/resourcegroups/rg-ec/workspaces/mlw-ec&tid=e39e8e96-a6c1-42d4-9c0f-bf7830e0d1e6
