# MLOps in Azure

A notebook version of [README.md](../README.md)

## Prerequisites

* Subscription
* Resource group `learn`
* ML workspace `mlops` (and `learn-prod`)
* Compute `cheapest-instance` and `cheapest-cluster`
* [`azure-cli`](https://learn.microsoft.com/en-us/cli/azure/install-azure-cli)
* Configure default resource group and ML workspace

In [None]:
!az configure --defaults workspace=mlops group=learn

## 0: Create Azure resources

### 0.1: Create Azure ML workspace

TODO: add `azure-cli` command or check the corresponding [section in README](../README.md#01-create-azure-ml-workspace).

### 0.2: Create Compute instance

TODO: add `azure-cli` command or check the corresponding [section in README](../README.md#02-create-compute-instance).

### 0.3: Create Data asset

In [None]:
!az ml data create --name diabetes-dev-dir --type uri_folder --path experimentation/data
!az ml data create --name diabetes-dev-table --type mltable --path experimentation/data

## 1: Create an ML job

In [None]:
!az ml job create --file job.yml

## 2: Use GH Actions for model training

### 2.1: Create Service Principal

The command below registers an app with *contributor* role and access to these resources:
* *mlops* ML workspace
* *learn-mlops* ML registry

In [None]:
!az ad sp create-for-rbac \
    --name github-aml-sp \
    --role contributor \
    --scopes \
        /subscriptions/b42b69cf-27c0-4ee2-99a0-a718ebd91945/resourceGroups/learn/providers/Microsoft.MachineLearningServices/workspaces/mlops \
        /subscriptions/b42b69cf-27c0-4ee2-99a0-a718ebd91945/resourceGroups/DefaultResourceGroup-eastus2/providers/Microsoft.MachineLearningServices/registries/learn-mlops \
    --sdk-auth

> IMPORTANT: do not commit the output of this command!

### 2.2: Create GitHub Action Secret

* [mslearn-mlops settings] > Secrets and variables > Actions > New repository secret
* Name: **AZURE_CREDENTIALS**
* Secret: [**\<output from the previous command\>**](#2.1:-create-service-principal)


### 2.3: Create Compute cluster

TODO: create with `azure-cli`, right now check the [README](../README.md#23-create-compute-cluster)


### 2.4: Run GH Action manually

* [mslearn-mlops action](https://github.com/ficinator/mslearn-mlops/actions/workflows/02-manual-trigger-job.yml) > Run workflow > hopefully it runs

## 3: Trigger workflow on PR

In [None]:
%%writefile ../.github/workflows/03-trigger-ml-job-pr.yml
name: Trigger AML job on PR

on: [pull_request]

jobs:
  deploy:
    runs-on: ubuntu-latest
    steps:
    - name: Check out repo
      uses: actions/checkout@main
    - name: Azure login
      uses: azure/login@v1
      with:
        creds: ${{secrets.AZURE_CREDENTIALS}}
    - name: Set up azure-cli
      run: |
        az extension add -n ml -y
        az configure --defaults workspace=mlops group=learn
    - name: Create endpoint
      run: az ml job create --file job.yml

## 4: Trigger linting and unit testing on PR

### 4.1: Python linting in vs code

TODO...

### 4.2: Update code checks workflow

In [None]:
%%writefile ../.github/workflows/04-code-checks.yml
name: Code checks

on: [pull_request]

jobs:
  linting:
    runs-on: ubuntu-latest
    steps:
    - name: Check out repo
      uses: actions/checkout@main
    - name: Use Python version 3.8
      uses: actions/setup-python@v3
      with:
        python-version: '3.8'
    - name: Install Flake8
      run: |
        python -m pip install flake8
    - name: Run linting tests
      run: | 
        flake8 \
          --max-line-length=88 \
          --extend-ignore=E203 \
          src \
          tests

  unit-tests:
    runs-on: ubuntu-latest
    steps:
    - name: Check out repo
      uses: actions/checkout@main
    - name: Use Python version 3.8
      uses: actions/setup-python@v3
      with:
        python-version: '3.8'
    - name: Install requirements
      run: |
        python -m pip install -r requirements.txt
    - name: Run unit tests
      run: |
        pytest


## 5: Use environments

Term *environment* appears all over the place in different contexts:
* **GitHub environment**: a set of protection rules, variables and secrets
* **Azure environment**: = resource group
* **Azure ML workspace**: a place to run experiments, train and deploy models
* **Azure ML environment**: a collection of python packages to run a script   

### 5.1: Create a GitHub environment

Check the corresponding section in [README](../README.md#51-create-an-environment).

### 5.2: Create data asset and training component

In [None]:
!mkdir -p data components

In [None]:
%%writefile data/diabetes-prod.yml
$schema: https://azuremlschemas.azureedge.net/latest/data.schema.json
name: diabetes-prod-folder
type: uri_folder
path: ../production/data

In [None]:
!az ml data create --file data/diabetes-prod.yml --workspace learn-prod

In [None]:
%%writefile components/train.yml
$schema: https://azuremlschemas.azureedge.net/latest/commandComponent.schema.json
type: command

name: train_diabetes_model
display_name: Train Diabetes Classification Model

environment: azureml:AzureML-sklearn-1.0-ubuntu20.04-py38-cpu@latest

inputs:
  training_data:
    type: uri_folder
  reg_rate:
    type: number

code: ../model
command: >-
  python train.py
  --training_data ${{inputs.training_data}}
  --reg_rate ${{inputs.reg_rate}}

In [None]:
!az ml component create --file components/train.yml

### 5.3: Create production pipeline job

In [None]:
!mkdir -p jobs

In [None]:
%%writefile jobs/pipeline-prod.yml
$schema: https://azuremlschemas.azureedge.net/latest/pipelineJob.schema.json
type: pipeline

inputs:
  training_data: 
    path: azureml:diabetes-prod-folder:1
    mode: ro_mount
  reg_rate: 0.01

jobs:
  train:
    component: components/train.yml
    inputs:
        training_data: ${{parent.inputs.training_data}}
        reg_rate: ${{parent.inputs.reg_rate}}

compute: azureml:cheapest-cluster
experiment_name: diabetes-prod-data-example
description: Train a classification model on diabetes data using a registered dataset as input.


In [None]:
!az ml job create --file jobs/pipeline-prod.yml --workspace learn-prod

## 6: Deploy the model

TODO

# 7: Use registry to share ML resources

## 7.1: Create a registry

* [Azure ML](https://ml.azure.com/home?tid=6571d690-b42e-4b19-90e7-d85b945aa165) > Registries > + Create
* Name: **learn-mlops**
* Subscription: **Azure subscription 1**
* Resource group: **DefaultResourceGroup-eastus2**
* Next
* Primary region: **East US 2**
* Additional regions (optional): **Germany West Central**
* Next > Create > wait for it...

## 7.2: Add a component to the registry

Use the component from [5.2](#52-create-data-asset-and-training-component).

In [None]:
!az ml component create --file components/train.yml --registry-name learn-mlops

### 7.3: Use the component in the pipeline job

In [None]:
%%writefile jobs/pipeline-prod.yml
$schema: https://azuremlschemas.azureedge.net/latest/pipelineJob.schema.json
type: pipeline

inputs:
  training_data: 
    path: azureml:diabetes-prod-folder:1
    mode: ro_mount
  reg_rate: 0.01

jobs:
  train:
    component: azureml://registries/learn-mlops/components/train_diabetes_model/versions/1
    inputs:
        training_data: ${{parent.inputs.training_data}}
        reg_rate: ${{parent.inputs.reg_rate}}
    compute: azureml:cheapest-cluster

experiment_name: diabetes-prod-data-example
description: Train a classification model on diabetes data using a registered dataset as input.


In [None]:
!az ml job create --file jobs/pipeline-prod.yml --workspace learn-prod

## 7.4: Create a GitHub Action workflow

In [None]:
%%writefile ../.github/workflows/07-train-dev-prod.yml
name: Train dev and prod

on:
  push:
    branches:
    - main

jobs:
  experiment:
    runs-on: ubuntu-latest
    environment: dev
    steps:
    - name: Check out repo
      uses: actions/checkout@main
    - name: Azure login
      uses: azure/login@v1
      with:
        creds: ${{secrets.AZURE_CREDENTIALS}}
    - name: Set up azure-cli
      run: |
        az extension add -n ml -y
        az configure --defaults workspace=mlops group=learn
    - name: Trigger ML job
      run: |
        az ml job create \
        --file src/jobs/pipeline-dev.yml \
        --stream

  production:
    runs-on: ubuntu-latest
    environment: prod
    needs: experiment
    steps:
    - name: Check out repo
      uses: actions/checkout@main
    - name: Azure login
      uses: azure/login@v1
      with:
        creds: ${{secrets.AZURE_CREDENTIALS}}

    - name: Set up azure-cli
      run: |
        az extension add -n ml -y
        az configure --defaults workspace=learn-prod group=learn
    - name: Trigger ML job
      run: |
        az ml job create \
        --file src/jobs/pipeline-prod.yml \
        --stream