# Overview - Why MLOps

In this first notebook we will create & deploy a model.  When we leverage a model in the cloud, the data will reside somewhere. We will start off by leveraging the default storage associated with your AML workspace.

There is no MLOps, but a model will be successfully deployed for inferencing.  



In [2]:
experiment_name = 'AML_Automation_ManualRun'
training_folder = 'training'
conda_yml_file = '../configuration/environment.yml'
training_dataset = 'diabetes.csv'
model_name = 'diabetes_model'

In [3]:
import os
# Create a folder for the pipeline step files
os.makedirs(training_folder, exist_ok=True)

print(training_folder)

training


## Connect to your workspace

To get started, connect to your workspace.

> **Note**: If you haven't already established an authenticated session with your Azure subscription, you'll be prompted to authenticate by clicking a link, entering an authentication code, and signing into Azure.

In [4]:
import azureml.core
from azureml.core import Workspace

# Load the workspace from the saved config file
ws = Workspace.from_config()
print('Ready to use Azure ML {} to work with {}'.format(azureml.core.VERSION, ws.name))

Ready to use Azure ML 1.37.0 to work with mm-aml-dev-ops


In order to train a model, the dataset needs to be fed to the model.  

In [4]:
# Get the default datastore
default_ds = ws.get_default_datastore()


default_ds.upload_files(files=['../data/diabetes.csv'], # Upload the diabetes csv files in /data
                       target_path='diabetes-data/', # Put it in a folder path in the datastore
                       overwrite=True, # Replace existing files of the same name
                       show_progress=True)

"datastore.upload_files" is deprecated after version 1.0.69. Please use "FileDatasetFactory.upload_directory" instead. See Dataset API change notice at https://aka.ms/dataset-deprecation.


Uploading an estimated of 1 files
Uploading ../data/diabetes.csv
Uploaded ../data/diabetes.csv, 1 files out of an estimated total of 1
Uploaded 1 files


$AZUREML_DATAREFERENCE_bbff299931164f5fbfd7cd8257744bc2

In [5]:
conda_yml_file = '../configuration/environment.yml'

In [6]:
%%writefile $conda_yml_file
name: experiment_env
dependencies:
- python=3.6.2
- scikit-learn
- ipykernel
- matplotlib
- pandas
- pip
- pip:
  - azureml-defaults
  - pyarrow

Overwriting ../configuration/environment.yml


In [8]:
%%writefile $training_folder/training.py

# Import libraries
from azureml.core import Run, Workspace, Datastore, Dataset
import pandas as pd
import numpy as np
import joblib
import os
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import roc_auc_score
from sklearn.metrics import roc_curve

# Get the experiment run context
run = Run.get_context()

# load the diabetes dataset
print("Loading Data...")
ws = run.experiment.workspace
ds = ws.get_default_datastore()
tab_data_set = Dataset.Tabular.from_delimited_files(path=(ds, 'diabetes-data/*.csv'))
diabetes = tab_data_set.to_pandas_dataframe()


# Separate features and labels
X, y = diabetes[['Pregnancies','PlasmaGlucose','DiastolicBloodPressure','TricepsThickness','SerumInsulin','BMI','DiabetesPedigree','Age']].values, diabetes['Diabetic'].values

# Split data into training set and test set
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30, random_state=0)

print('****************')
print(type(X_train))
print(type(X_test))
print('****************')
# Set regularization hyperparameter
reg = 0.01

# Train a logistic regression model
print('Training a logistic regression model with regularization rate of', reg)
run.log('Regularization Rate',  np.float(reg))
model = LogisticRegression(C=1/reg, solver="liblinear").fit(X_train, y_train)

# calculate accuracy
y_hat = model.predict(X_test)
acc = np.average(y_hat == y_test)
print('Accuracy:', acc)
run.log('Accuracy', np.float(acc))

# calculate AUC
y_scores = model.predict_proba(X_test)
auc = roc_auc_score(y_test,y_scores[:,1])
print('AUC: ' + str(auc))
run.log('AUC', np.float(auc))

# Save the trained model in the outputs folder
os.makedirs('outputs', exist_ok=True)
joblib.dump(value=model, filename='outputs/diabetes_model.pkl')

run.complete()

Overwriting training/training.py


In [9]:
from azureml.core import Experiment, ScriptRunConfig, Environment
from azureml.widgets import RunDetails

# Create a Python environment for the experiment (from a .yml file)
env = Environment.from_conda_specification("experiment_env", conda_yml_file)

# Create a script config
script_config = ScriptRunConfig(source_directory=training_folder,
                                script='training.py',
                                environment=env) 

# submit the experiment run

experiment = Experiment(workspace=ws, name=experiment_name)
run = experiment.submit(config=script_config)

# Show the running experiment run in the notebook widget
RunDetails(run).show()

# Block until the experiment run has completed
run.wait_for_completion()

_UserRunWidget(widget_settings={'childWidgetDisplay': 'popup', 'send_telemetry': False, 'log_level': 'INFO', '…

{'runId': 'AML_Automation_ManualRun_1643586195_31ce55ae',
 'target': 'local',
 'status': 'Finalizing',
 'startTimeUtc': '2022-01-30T23:43:16.670509Z',
 'services': {},
 'properties': {'_azureml.ComputeTargetType': 'local',
  'ContentSnapshotId': '83984cb9-d5e2-4c9f-a240-5e11197b6217',
  'azureml.git.repository_uri': 'https://github.com/memasanz/AMLHackathonWithMLOps.git',
  'mlflow.source.git.repoURL': 'https://github.com/memasanz/AMLHackathonWithMLOps.git',
  'azureml.git.branch': 'main',
  'mlflow.source.git.branch': 'main',
  'azureml.git.commit': '50315573bf46439a76a891a01dc41f3e719db2ad',
  'mlflow.source.git.commit': '50315573bf46439a76a891a01dc41f3e719db2ad',
  'azureml.git.dirty': 'True'},
 'inputDatasets': [],
 'outputDatasets': [],
 'runDefinition': {'script': 'training.py',
  'command': '',
  'useAbsolutePath': False,
  'arguments': [],
  'sourceDirectoryDataStore': None,
  'framework': 'Python',
  'communicator': 'None',
  'target': 'local',
  'dataReferences': {},
  'data'

In [10]:
# Get logged metrics and files
metrics = run.get_metrics()
for key in metrics.keys():
        print(key, metrics.get(key))
print('\n')
for file in run.get_file_names():
    print(file)

Regularization Rate 0.01
Accuracy 0.774
AUC 0.848485994328076


azureml-logs/60_control_log.txt
azureml-logs/70_driver_log.txt
logs/azureml/25397_azureml.log
logs/azureml/dataprep/backgroundProcess.log
logs/azureml/dataprep/backgroundProcess_Telemetry.log
outputs/diabetes_model.pkl


In [11]:
from azureml.core import Model

# Register the model
run.register_model(model_path='outputs/diabetes_model.pkl', model_name='diabetes_model',
                   tags={'AUC':run.get_metrics()['AUC']},
                   properties={'AUC': run.get_metrics()['AUC'], 'Accuracy': run.get_metrics()['Accuracy']})

# List registered models
for model in Model.list(ws):
    print(model.name, 'version:', model.version)
    for tag_name in model.tags:
        tag = model.tags[tag_name]
        print ('\t',tag_name, ':', tag)
    for prop_name in model.properties:
        prop = model.properties[prop_name]
        print ('\t',prop_name, ':', prop)
    print('\n')

diabetes_model version: 2
	 AUC : 0.848485994328076
	 AUC : 0.848485994328076
	 Accuracy : 0.774


diabetes_model_remote version: 12
	 AUC : 0.8484934573859395
	 AUC : 0.8484934573859395
	 Accuracy : 0.774


diabetes_model_remote version: 11
	 AUC : 0.848485994328076
	 AUC : 0.848485994328076
	 Accuracy : 0.774


diabetes_model_remote version: 10
	 AUC : 0.2
	 AUC : 0.848485994328076
	 Accuracy : 0.774


diabetes_model_remote version: 9
	 AUC : 0.2
	 AUC : 0.848485994328076
	 Accuracy : 0.774


diabetes_model_remote version: 8
	 AUC : 0.2
	 AUC : 0.848485994328076
	 Accuracy : 0.774


diabetes_model_remote version: 7
	 AUC : 0.2
	 AUC : 0.8484934573859395
	 Accuracy : 0.774


diabetes_model_remote version: 6
	 AUC : 0.2
	 AUC : 0.8484934573859395
	 Accuracy : 0.774


diabetes_model_remote version: 5
	 AUC : 0.2
	 AUC : 0.8484934573859395
	 Accuracy : 0.774


diabetes_model_remote version: 4
	 AUC : 0.2
	 AUC : 0.8484934573859395
	 Accuracy : 0.774


diabetes_model_remote version: 3
	 A

Deploy Endpoint

In [16]:
import os

# Create a folder for the deployment files
deployment_folder = './service'
os.makedirs(deployment_folder, exist_ok=True)
print(deployment_folder, 'folder created.')

# Set path for scoring script
script_file = 'score_diabetes.py'
script_path = os.path.join(deployment_folder,script_file)

./service folder created.


In [17]:
%%writefile $script_path
import json
import joblib
import numpy as np
import os

# Called when the service is loaded
def init():
    global model
    # Get the path to the deployed model file and load it
    model_path = os.path.join(os.getenv('AZUREML_MODEL_DIR'), 'diabetes_model.pkl')
    model = joblib.load(model_path)

# Called when a request is received
def run(raw_data):
    # Get the input data as a numpy array
    data = np.array(json.loads(raw_data)['data'])
    # Get a prediction from the model
    predictions = model.predict(data)
    # Get the corresponding classname for each prediction (0 or 1)
    classnames = ['not-diabetic', 'diabetic']
    predicted_classes = []
    for prediction in predictions:
        predicted_classes.append(classnames[prediction])
    # Return the predictions as JSON
    return json.dumps(predicted_classes)

Writing ./service/score_diabetes.py


In [18]:
model = ws.models[model_name]
print(model.name, 'version', model.version)

diabetes_model version 1


In [19]:
from azureml.core import Environment, Model
from azureml.core.model import InferenceConfig
from azureml.core.webservice import AciWebservice

# Configure the scoring environment
service_env = Environment(name='service-env')
python_packages = ['scikit-learn', 'azureml-defaults', 'azure-ml-api-sdk']
for package in python_packages:
    service_env.python.conda_dependencies.add_pip_package(package)
inference_config = InferenceConfig(source_directory=deployment_folder,
                                   entry_script=script_file,
                                   environment=service_env)

# Configure the web service container
deployment_config = AciWebservice.deploy_configuration(cpu_cores=1, memory_gb=1)

# Deploy the model as a service
print('Deploying model...')
service_name = "diabetes-service-local-training"
service = Model.deploy(ws, service_name, [model], inference_config, deployment_config, overwrite=True)
service.wait_for_deployment(True)
print(service.state)

Deploying model...
Tips: You can try get_logs(): https://aka.ms/debugimage#dockerlog or local deployment: https://aka.ms/debugimage#debug-locally to debug if deployment takes longer than 10 minutes.
Running
2022-01-28 15:31:08+00:00 Creating Container Registry if not exists..
2022-01-28 15:41:08+00:00 Registering the environment..
2022-01-28 15:41:12+00:00 Building image..
2022-01-28 15:46:54+00:00 Generating deployment configuration.
2022-01-28 15:46:55+00:00 Submitting deployment to compute..
2022-01-28 15:47:00+00:00 Checking the status of deployment diabetes-service-local-training..
2022-01-28 15:49:01+00:00 Checking the status of inference endpoint diabetes-service-local-training.
Succeeded
ACI service creation operation finished, operation "Succeeded"
Healthy


In [None]:
print(service.state)

## Use Web service
With the service deployed, now you can consume it from a client application

In [10]:
import json
import requests
from azureml.core.webservice import Webservice


#get service endpoint
service = Webservice(workspace=ws, name='diabetes-service-local-training')
print(service.state)
url = service.scoring_uri
print(url)
headers = {'Content-Type':'application/json'}


def MakePrediction():
    endpoint_url = url
    x_new = [[2,180,74,24,21,23.9091702,1.488172308,22]]
    input_json = json.dumps({"data": x_new})
    print(input_json)
    body = input_json
    r = requests.post(endpoint_url, headers=headers, data=body)
    return (r.json())


results = MakePrediction()
print(results)

Healthy
http://cc70226a-aa99-417a-910f-16e35ce50768.eastus.azurecontainer.io/score
{"data": [[2, 180, 74, 24, 21, 23.9091702, 1.488172308, 22]]}
["not-diabetic"]
