# Predicting Car Battery Failure

Your goal in this notebook is to **predict how much time a car battery has left until it is expected to fail**. You are provided training data that includes telemetry from different vehicles, as well as the expected battery life that remains. From this you will train a model that given just the vehicle telemetry predicts the expected battery life. 

You will use compute resources provided by Azure Machine Learning (AML) to **remotely** train a **set** of models using **Automated Machine Learning**, evaluate performance of each model and pick the best performing model to deploy as a web service hosted by **Azure Container Instance**.

Because you will be using the Azure Machine Learning SDK, you will be able to provision all your required Azure resources directly from this notebook, without having to use the Azure Portal to create any resources.

## Setup
To begin, you will need to provide the following information about your Azure Subscription. 

In the following cell, be sure to set the values for `subscription_id`, `resource_group`, `workspace_name` and `workspace_region` as directed by the comments (*these values can be acquired from the Azure Portal*). Also provide the values for the pre-created CPU cluster (`cluster_name`). 

**Be sure to replace XXXXX in the values below with your unique identifier.**

To get these values, do the following:
1. Navigate to the Azure Portal and login with the credentials provided.
2. From the left hand menu, under Favorites, select `Resource Groups`.
3. In the list, select the resource group with the name similar to `tech-immersion-XXXXX`.
4. From the Overview tab, capture the desired values.

Execute the following cell by selecting the `>|Run` button in the command bar above.

Execute the following cell by selecting the `>|Run` button in the command bar above.

In [1]:
#Provide the Subscription ID of your existing Azure subscription
subscription_id = "ca9a2d87-d8dc-48d4-b027-17e79b799b00" # <- needs to be the subscription with the resource group

#Provide values for the existing Resource Group 
resource_group = "tech-immersion-226564" # <- replace XXXXX with your unique identifier

#Provide the Workspace Name and Azure Region of the Azure Machine Learning Workspace
workspace_name = "gpu-tech-immersion-aml-226564" # <- replace XXXXX with your unique identifier (should be lowercase)
workspace_region = "westcentralus" # <- region of your resource group
#other options for region include eastus, westcentralus, southeastasia, australiaeast, westeurope

In [2]:
# constants, you can leave these values as they are or experiment with changing them after you have completed the notebook once
experiment_name = 'automl-regression'
project_folder = './automl-regression'

# this is the URL to the CSV file containing the training data
data_url = "https://databricksdemostore.blob.core.windows.net/data/connected-car/training-formatted.csv"

# this is the URL to the CSV file containing a small set of test data
test_data_url = "https://databricksdemostore.blob.core.windows.net/data/connected-car/fleet-formatted.csv"

# provide the pre-created CPU machine learning compute found under the Compute > Training Clusters section
cluster_name = "aml-compute-cpu"
aci_service_name ='contoso-service'

### Import required packages

The Azure Machine Learning SDK provides a comprehensive set of a capabilities that you can use directly within a notebook including:
- Creating a **Workspace** that acts as the root object to organize all artifacts and resources used by Azure Machine Learning.
- Creating **Experiments** in your Workspace that capture versions of the trained model along with any desired model performance telemetry. Each time you train a model and evaluate its results, you can capture that run (model and telemetry) within an Experiment.
- Creating **Compute** resources that can be used to scale out model training, so that while your notebook may be running in a lightweight container in Azure Notebooks, your model training can actually occur on a powerful cluster that can provide large amounts of memory, CPU or GPU. 
- Using **Automated Machine Learning (AutoML)** to automatically train multiple versions of a model using a mix of different ways to prepare the data and different algorithms and hyperparameters (algorithm settings) in search of the model that performs best according to a performance metric that you specify. 
- Packaging a Docker **Image** that contains everything your trained model needs for scoring (prediction) in order to run as a web service.
- Deploying your Image to either Azure Kubernetes or Azure Container Instances, effectively hosting the **Web Service**.

In Azure Notebooks, all of the libraries needed for Azure Machine Learning are pre-installed. To use them, you just need to import them. Run the following cell to do so:

In [3]:
import logging
import os
import random
import re
import urllib.request

import numpy as np
import pandas as pd
from sklearn import datasets

import azureml.core
from azureml.core.experiment import Experiment
from azureml.core.workspace import Workspace
from azureml.core.compute import AksCompute, ComputeTarget
from azureml.core.webservice import Webservice, AksWebservice
from azureml.core.image import Image
from azureml.core.model import Model
from azureml.train.automl import AutoMLConfig
from azureml.train.automl.run import AutoMLRun
from azureml.core import Workspace
from azureml.data.azure_storage_datastore import AzureBlobDatastore
from azureml.core import Dataset

# Check core SDK version number
print("SDK version:", azureml.core.VERSION)

SDK version: 1.14.0


## Create and connect to an Azure Machine Learning Workspace

Run the following cell to create a new Azure Machine Learning **Workspace** and save the configuration to disk (next to the Jupyter notebook). 

**Important Note**: You will be prompted to login in the text that is output below the cell. Be sure to navigate to the URL displayed and enter the code that is provided. Once you have entered the code, return to this notebook and wait for the output to read `Workspace configuration succeeded`.

In [4]:
# By using the exist_ok param, if the worskpace already exists you get a reference to the existing workspace
# allowing you to re-run this cell multiple times as desired (which is fairly common in notebooks).
ws = Workspace.create(
    name = workspace_name,
    subscription_id = subscription_id,
    resource_group = resource_group, 
    location = workspace_region,
    exist_ok = True)

ws.write_config()
print('Workspace configuration succeeded')


Workspace configuration succeeded


## Create a Workspace Experiment

Notice in the first line of the cell below, we can re-load the config we saved previously and then display a summary of the environment.

In [5]:
ws = Workspace.from_config()

# Display a summary of the current environment 
output = {}
output['SDK version'] = azureml.core.VERSION
output['Subscription ID'] = ws.subscription_id
output['Workspace'] = ws.name
output['Resource Group'] = ws.resource_group
output['Location'] = ws.location
output['Project Directory'] = project_folder
pd.set_option('display.max_colwidth', -1)
pd.DataFrame(data=output, index=['']).T

Unnamed: 0,Unnamed: 1
SDK version,1.14.0
Subscription ID,ca9a2d87-d8dc-48d4-b027-17e79b799b00
Workspace,gpu-tech-immersion-aml-226564
Resource Group,tech-immersion-226564
Location,westcentralus
Project Directory,./automl-regression


Next, create a new Experiment. 

In [6]:
experiment = Experiment(ws, experiment_name)

## Get and explore the Vehicle Telemetry Data

Run the following cell to download and examine the vehicle telemetry data. The model you will build will try to predict how many days until the battery has a freeze event. Which features (columns) do you think will be useful?

In [7]:
data = pd.read_csv(data_url)
data

Unnamed: 0,Survival_In_Days,Province,Region,Trip_Length_Mean,Trip_Length_Sigma,Trips_Per_Day_Mean,Trips_Per_Day_Sigma,Battery_Rated_Cycles,Manufacture_Month,Manufacture_Year,...,Sensor_Reading_52,Sensor_Reading_53,Sensor_Reading_54,Sensor_Reading_55,Sensor_Reading_56,Sensor_Reading_57,Sensor_Reading_58,Sensor_Reading_59,Sensor_Reading_60,Sensor_Reading_61
0,1283,Bretagne,West,18.10325,6.034416,4.733162,1.183291,275,M8,Y2010,...,16.418910,17.441310,24.718290,11.812310,19.437210,15.079740,16.982440,18.893610,13.590000,14.510940
1,1427,Occitanie,South,14.63707,4.879023,4.325950,1.081487,250,M8,Y2014,...,14.703280,16.154500,27.789550,22.292230,29.158610,21.739530,23.830780,19.480210,10.264120,18.009700
2,1436,Auvergne_Rhone_Alpes,South,14.50564,4.835215,4.418737,1.104684,250,M9,Y2018,...,22.389700,21.834420,28.743260,26.313940,15.589060,15.317560,19.613730,28.397800,19.807990,15.425770
3,894,Martinique,West,20.85052,6.950172,4.284968,1.071242,200,M10,Y2003,...,2.794836,13.993500,15.524580,6.298875,11.355190,14.396860,2.890394,6.362495,10.916070,10.004320
4,1539,Reunion,South,11.57959,3.859862,4.561532,1.140383,200,M10,Y2007,...,26.631860,26.116980,18.011900,25.257760,25.320780,26.894640,18.863220,25.744930,24.027720,23.657220
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
9995,1774,Provence_Alpes_Cote_d_Azure,South,19.72264,6.574214,4.394478,1.098619,200,M6,Y2017,...,33.689770,14.716640,19.794820,20.966630,20.486930,24.942620,28.590580,22.982010,21.230130,23.120810
9996,1078,Martinique,West,22.00332,7.334442,4.232257,1.058064,200,M7,Y2002,...,11.976990,19.494010,14.296590,20.887060,21.960480,23.866730,15.518700,22.327410,25.118430,17.900190
9997,778,Reunion,South,11.60729,3.869097,4.758821,1.189705,300,M8,Y2006,...,22.362340,20.986320,23.823440,24.835710,20.677540,19.468390,25.733030,18.123240,20.128850,25.896020
9998,1772,Marseille,South,13.12306,4.374355,4.922841,1.230710,300,M8,Y2010,...,6.414015,6.227712,10.510530,5.051135,4.304516,2.131638,11.710170,1.592314,4.341129,3.215770


## Remotely train multiple models using Auto ML and Azure ML Compute

In the following cells, you will *not* train the model against the data you just downloaded using the resources provided by the VM. Instead, you will deploy an Azure ML Compute cluster that will download the data and use Auto ML to train multiple models, evaluate the performance and allow you to retrieve the best model that was trained. In other words, all of the training will be performed remotely with respect to this notebook. 


As you will see this is almost entirely done thru configuration, with very little code required. 

### Create Azure Machine Learning TabularDataset

Download the training dataset to the project_folder, and then upload the data to the default workspace datastore which is backed by the Azure blob storage. Next, using the training data saved in the default workspace datastore, we will create an unregistered TabularDataset pointing to the path in the datastore. This dataset reference, will allow us to seamlessly access the training data during model training without worrying about connection strings or data paths.

In [8]:
# create project folder
if not os.path.exists(project_folder):
    os.makedirs(project_folder)

# download the training dataset from data_url to the project folder
urllib.request.urlretrieve(data_url, os.path.join(project_folder, 'training-formatted.csv'))

# upload training dataset to default workspace datastore
datastore = ws.get_default_datastore()
datastore.upload_files(files = [os.path.join(project_folder, 'training-formatted.csv')],
                       target_path = 'train-dataset/tabular/',
                       overwrite = True,
                       show_progress = True)

# create TabularDataset reference
dataset = Dataset.Tabular.from_delimited_files(path = [(datastore, 
                                                        'train-dataset/tabular/training-formatted.csv')])

# target or label column name
target_column_name = 'Survival_In_Days'

# preview the first 5 rows of the dataset
dataset.take(5).to_pandas_dataframe()

Uploading an estimated of 1 files
Uploading ./automl-regression/training-formatted.csv
Uploaded ./automl-regression/training-formatted.csv, 1 files out of an estimated total of 1
Uploaded 1 files


Unnamed: 0,Survival_In_Days,Province,Region,Trip_Length_Mean,Trip_Length_Sigma,Trips_Per_Day_Mean,Trips_Per_Day_Sigma,Battery_Rated_Cycles,Manufacture_Month,Manufacture_Year,...,Sensor_Reading_52,Sensor_Reading_53,Sensor_Reading_54,Sensor_Reading_55,Sensor_Reading_56,Sensor_Reading_57,Sensor_Reading_58,Sensor_Reading_59,Sensor_Reading_60,Sensor_Reading_61
0,1283,Bretagne,West,18.10325,6.034416,4.733162,1.183291,275,M8,Y2010,...,16.41891,17.44131,24.71829,11.81231,19.43721,15.07974,16.98244,18.89361,13.59,14.51094
1,1427,Occitanie,South,14.63707,4.879023,4.32595,1.081487,250,M8,Y2014,...,14.70328,16.1545,27.78955,22.29223,29.15861,21.73953,23.83078,19.48021,10.26412,18.0097
2,1436,Auvergne_Rhone_Alpes,South,14.50564,4.835215,4.418737,1.104684,250,M9,Y2018,...,22.3897,21.83442,28.74326,26.31394,15.58906,15.31756,19.61373,28.3978,19.80799,15.42577
3,894,Martinique,West,20.85052,6.950172,4.284968,1.071242,200,M10,Y2003,...,2.794836,13.9935,15.52458,6.298875,11.35519,14.39686,2.890394,6.362495,10.91607,10.00432
4,1539,Reunion,South,11.57959,3.859862,4.561532,1.140383,200,M10,Y2007,...,26.63186,26.11698,18.0119,25.25776,25.32078,26.89464,18.86322,25.74493,24.02772,23.65722


### Create AML Compute Cluster

Now you are ready to create the compute cluster. Run the following cell to create a new compute cluster (or retrieve the existing cluster if it already exists). The code below will create a *CPU based* cluster where each node in the cluster is of the size `STANDARD_DS3_V2`, and the cluster will have *1* node. 

In [9]:
### Create AML CPU based Compute Cluster
from azureml.core.compute import ComputeTarget, AmlCompute
from azureml.core.compute_target import ComputeTargetException

try:
    compute_target = ComputeTarget(workspace=ws, name=cluster_name)
    print('Found existing compute target.')
except ComputeTargetException:
    print('Creating a new compute target...')
    compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_DS3_V2',
                                                           max_nodes=1)

    # create the cluster
    compute_target = ComputeTarget.create(ws, cluster_name, compute_config)

    compute_target.wait_for_completion(show_output=True)

# Use the 'status' property to get a detailed status for the current AmlCompute. 
print(compute_target.status.serialize())

Found existing compute target.
{'currentNodeCount': 1, 'targetNodeCount': 1, 'nodeStateCounts': {'preparingNodeCount': 0, 'runningNodeCount': 0, 'idleNodeCount': 1, 'unusableNodeCount': 0, 'leavingNodeCount': 0, 'preemptedNodeCount': 0}, 'allocationState': 'Steady', 'allocationStateTransitionTime': '2020-09-24T12:34:50.207000+00:00', 'errors': None, 'creationTime': '2020-09-24T12:32:42.794589+00:00', 'modifiedTime': '2020-09-24T12:33:00.480140+00:00', 'provisioningState': 'Succeeded', 'provisioningStateTransitionTime': None, 'scaleSettings': {'minNodeCount': 1, 'maxNodeCount': 1, 'nodeIdleTimeBeforeScaleDown': 'PT120S'}, 'vmPriority': 'Dedicated', 'vmSize': 'STANDARD_D2_V2'}


## Run our Experiment on AML Compute

### Instantiate an Automated ML Config

Run the following cell to configure the Auto ML run. In short what you are configuring here is the training of a regressor model that will attempt to predict the value of the first feature (`Survival_in_days`) based on all the other features in the data set. The run is configured to try at most 3 iterations where no iteration can run longer that 2 minutes. 

Additionally, the data will be automatically pre-processed in different ways as a part of the automated model training (as indicated by the `preprocess` attribute having a value of `True`). This is a very powerful feature of Auto ML as it tries many best practices approaches for you, and saves you a lot of time and effort in the process.

The goal of Auto ML in this case is to find the best models that result, as measure by the normalized root mean squared error metric (as indicated by the `primary_metric` attribute). The error is basically a measure of what the model predicts versus what was provided as the "answer" in the training data. In short, AutoML will try to get the error as low as possible when trying its combination of approaches.  

The local path to the script you created to retrieve the data is supplied to the AutoMLConfig, ensuring the file is made available to the remote cluster. The actual execution of this training will occur on the compute cluster you created previously. 

In general, the AutoMLConfig is very flexible, allowing you to specify all of the following:
- Task type (classification, regression, forecasting)
- Number of algorithm iterations and maximum time per iteration
- Accuracy metric to optimize
- Algorithms to blacklist (skip)/whitelist (include)
- Number of cross-validations
- Compute targets
- Training data

Run the following cell to create the configuration.

In [10]:
automl_config = AutoMLConfig(task = 'regression',
                             iterations = 3,
                             iteration_timeout_minutes = 5, 
                             max_cores_per_iteration = 10,
                             featurization='auto',
                             primary_metric='normalized_root_mean_squared_error',
                             n_cross_validations = 5,
                             debug_log = 'automl.log',
                             verbosity = logging.DEBUG,
                             training_data = dataset, 
                             label_column_name=target_column_name,
                             compute_target = compute_target,
                             path = project_folder)

### Run the Experiment

Run the following cell to execute the experiment on the remote compute cluster.

This will remotely train multiple models, evaluate them and allow you review the performance characteristics of each one, as well as to pick the *best model* that was trained and download it. 

In [11]:
remote_run = experiment.submit(automl_config, show_output=False)
remote_run

Running on remote or ADB.


Experiment,Id,Type,Status,Details Page,Docs Page
automl-regression,AutoML_44b40174-81cc-406c-9424-df0569de9f92,automl,NotStarted,Link to Azure Machine Learning studio,Link to Documentation


Once the above cell completes, the run is starting but will likely have a status of **Preparing** for you. To wait for the run to complete before continuing (and to view the training status updates as they happen), run the next cell.

Run the next cell, and wait for run status to be in **Completed** state.

*Note: The first time you run this, it will take about 15 minutes to complete as the cluster is configured and then the AutoML job is run*

In [12]:
from azureml.widgets import RunDetails
RunDetails(remote_run).show()

_AutoMLWidget(widget_settings={'childWidgetDisplay': 'popup', 'send_telemetry': False, 'log_level': 'INFO', 's…

### List the Experiments from your Workspace

Using the Azure Machine Learning SDK, you can retrieve any of the experiments in your Workspace and drill into the details of any runs the experiment contains. Run the following cell to explore the number of runs by experiment name.

In [13]:
ws = Workspace.from_config()
experiment_list = Experiment.list(workspace=ws)

summary_df = pd.DataFrame(index = ['No of Runs'])
pattern = re.compile('^AutoML_[^_]*$')
for experiment in experiment_list:
    all_runs = list(experiment.get_runs())
    automl_runs = []
    for run in all_runs:
        if(pattern.match(run.id)):
            automl_runs.append(run)    
    summary_df[experiment.name] = [len(automl_runs)]
    
pd.set_option('display.max_colwidth', -1)
summary_df.T

Unnamed: 0,No of Runs
automl-regression,3


### List the Automated ML Runs for the Experiment

Similarly, you can view all of the runs that ran supporting Auto ML:

In [14]:
import json
proj = ws.experiments[experiment_name]
summary_df = pd.DataFrame(index = ['Type', 'Status', 'Primary Metric', 'Iterations', 'Compute', 'Name'])
pattern = re.compile('^AutoML_[^_]*$')
all_runs = list(proj.get_runs(properties={'azureml.runsource': 'automl'}))
for run in all_runs:
    if(pattern.match(run.id)):
        properties = run.get_properties()
        tags = run.get_tags()
        amlsettings = json.loads(properties['AMLSettingsJsonString'])
        if 'iterations' in tags:
            iterations = tags['iterations']
        else:
            iterations = properties['num_iterations']
        summary_df[run.id] = [amlsettings['task_type'], run.get_details()['status'], properties['primary_metric'], iterations, properties['target'], amlsettings['name']]
    
from IPython.display import HTML
projname_html = HTML("<h3>{}</h3>".format(proj.name))

from IPython.display import display
display(projname_html)
display(summary_df.T)

Unnamed: 0,Type,Status,Primary Metric,Iterations,Compute,Name
AutoML_44b40174-81cc-406c-9424-df0569de9f92,regression,Running,normalized_root_mean_squared_error,3,aml-compute-cpu,automl-regression
AutoML_798fcded-6cf2-478b-b773-c16781206c3e,regression,Completed,normalized_root_mean_squared_error,1000,aml-compute-cpu,automl-regression
AutoML_6fe87304-83fd-4abf-89b1-b21edf1bef1e,regression,Canceled,spearman_correlation,1000,aml-compute-cpu,automl-regression


### Display Automated ML Run Details
For a particular run, you can display the details of how the run performed against the performance metric. The Azure Machine Learning SDK includes a built-in widget that graphically summarizes the run. 

Execute the following cell to see it.

In [15]:
run_id = remote_run.id

from azureml.widgets import RunDetails

experiment = Experiment(ws, experiment_name)
ml_run = AutoMLRun(experiment=experiment, run_id=run_id)

RunDetails(ml_run).show() 

_AutoMLWidget(widget_settings={'childWidgetDisplay': 'popup', 'send_telemetry': False, 'log_level': 'INFO', 's…

### Get the best run and the trained model

At this point you have multiple runs, each with a different trained models. How can you get the model that performed the best? Run the following cells to learn how.

In [23]:
best_run, fitted_model = remote_run.get_output()
print(best_run)
print(fitted_model)

Run(Experiment: automl-regression,
Id: AutoML_44b40174-81cc-406c-9424-df0569de9f92_0,
Type: azureml.scriptrun,
Status: Completed)
RegressionPipeline(pipeline=Pipeline(memory=None,
                                     steps=[('datatransformer',
                                             DataTransformer(enable_dnn=None,
                                                             enable_feature_sweeping=None,
                                                             feature_sweeping_config=None,
                                                             feature_sweeping_timeout=None,
                                                             featurization_config=None,
                                                             force_text_dnn=None,
                                                             is_cross_validation=None,
                                                             is_onnx_compatible=None,
                                                             

You can query for the best run when evaluated using a specific metric. 

In [24]:
# show run and model by a specific metric
lookup_metric = "root_mean_squared_error"
best_run, fitted_model = remote_run.get_output(metric = lookup_metric)
print(best_run)
print(fitted_model)

Run(Experiment: automl-regression,
Id: AutoML_44b40174-81cc-406c-9424-df0569de9f92_0,
Type: azureml.scriptrun,
Status: Completed)
RegressionPipeline(pipeline=Pipeline(memory=None,
                                     steps=[('datatransformer',
                                             DataTransformer(enable_dnn=None,
                                                             enable_feature_sweeping=None,
                                                             feature_sweeping_config=None,
                                                             feature_sweeping_timeout=None,
                                                             featurization_config=None,
                                                             force_text_dnn=None,
                                                             is_cross_validation=None,
                                                             is_onnx_compatible=None,
                                                             

You can retrieve a specific iteration from a run.

In [25]:
# show run and model from iteration 0
iteration = 0
first_run, first_model = remote_run.get_output(iteration=iteration)
print(first_run)
print(first_model)

Run(Experiment: automl-regression,
Id: AutoML_44b40174-81cc-406c-9424-df0569de9f92_0,
Type: azureml.scriptrun,
Status: Completed)
RegressionPipeline(pipeline=Pipeline(memory=None,
                                     steps=[('datatransformer',
                                             DataTransformer(enable_dnn=None,
                                                             enable_feature_sweeping=None,
                                                             feature_sweeping_config=None,
                                                             feature_sweeping_timeout=None,
                                                             featurization_config=None,
                                                             force_text_dnn=None,
                                                             is_cross_validation=None,
                                                             is_onnx_compatible=None,
                                                             

At this point you now have a model you could use for predicting the time until battery failure. You would typically use this model in one of two ways:
- Use the model file within other notebooks to batch score predictions.
- Deploy the model file as a web service that applications can call. 

In the following, you will explore the latter option to deploy the best model as a web service.

### Download the best model 
With a run object in hand, it is trivial to download the model. 

In [26]:
# fetch the best model
best_run.download_file("outputs/model.pkl",
                       output_file_path = "./model.pkl")

## Deploy the Model as a Web Service

Azure Machine Learning provides a Model Registry that acts like a version controlled repository for each of your trained models. To version a model, you use  the SDK as follows. Run the following cell to register the best model with Azure Machine Learning. 

In [27]:
# register the model for deployment
model = Model.register(model_path = "model.pkl",
                       model_name = "model.pkl",
                       tags = {'area': "auto", 'type': "regression"},
                       description = "Contoso Auto model to predict battery failure",
                       workspace = ws)

print(model.name, model.description, model.version)

Registering model model.pkl
model.pkl Contoso Auto model to predict battery failure 2


Once you have a model added to the registry in this way, you can deploy web services that pull their model directly from this repository when they first start up.

### Create Scoring File

Azure Machine Learning SDK gives you control over the logic of the web service, so that you can define how it retrieves the model and how the model is used for scoring. This is an important bit of flexibility. For example, you often have to prepare any input data before sending it to your model for scoring. You can define this data preparation logic (as well as the model loading approach) in the scoring file. 

Run the following cell to create a scoring file that will be included in the Docker Image that contains your deployed web service.

In [32]:
%%writefile scoring_service.py
import pickle
import json
import numpy
import pandas as pd
import azureml.train.automl
from sklearn.externals import joblib
from azureml.core.model import Model

def init():
    global model
    model_path = Model.get_model_path('model.pkl') # this name is model.id of model that we want to deploy
    # deserialize the model file back into a sklearn model
    model = joblib.load(model_path)

def run(rawdata):
    try:
        data = pd.read_json(rawdata,orient="split")
        result = model.predict(data)
    except Exception as e:
        result = str(e)
        return json.dumps({"error": result})
    return json.dumps({"result":result.tolist()})

Overwriting scoring_service.py


### Environments

Azure ML environments are an encapsulation of the environment where your machine learning training happens. They define Python packages, environment variables, Docker settings and other attributes in declarative fashion. Environments are versioned: you can update them and retrieve old versions to revisit and review your work.

Environments allow you to:
* Encapsulate dependencies of your training process, such as Python packages and their versions.
* Reproduce the Python environment on your local computer in a remote run on VM or ML Compute cluster
* Reproduce your experimentation environment in production setting.
* Revisit and audit the environment in which an existing model was trained.

Environment, compute target and training script together form run configuration: the full specification of training run.

#### Create and register your environment

You can manage environments by registering them. This allows you to track their versions, and reuse them in future runs. For example, once you've constructed an environment that meets your requirements, you can register it and use it in other experiments so as to standardize your workflow.

If you register the environment with same name, the version number is increased by one. Note that Azure ML keeps track of differences between the version, so if you re-register an identical version, the version number is not increased.

In [34]:
from azureml.core import Environment
myEnv = Environment.from_conda_specification('myenv', './automl_dependencies.yml')
myEnv.register(workspace=ws)

{
    "databricks": {
        "eggLibraries": [],
        "jarLibraries": [],
        "mavenLibraries": [],
        "pypiLibraries": [],
        "rcranLibraries": []
    },
    "docker": {
        "arguments": [],
        "baseDockerfile": null,
        "baseImage": "mcr.microsoft.com/azureml/intelmpi2018.3-ubuntu16.04:20200821.v1",
        "baseImageRegistry": {
            "address": null,
            "password": null,
            "registryIdentity": null,
            "username": null
        },
        "enabled": false,
        "platform": {
            "architecture": "amd64",
            "os": "Linux"
        },
        "sharedVolumes": true,
        "shmSize": null
    },
    "environmentVariables": {
        "EXAMPLE_ENV_VAR": "EXAMPLE_VALUE"
    },
    "inferencingStackVersion": null,
    "name": "myenv",
    "python": {
        "baseCondaEnvironment": null,
        "condaDependencies": {
            "channels": [
                "anaconda",
                "conda-forge"
      

### Deploy ACI Hosted Web Service

If you want more control over how your model is run, if it uses another framework, or if it has special runtime requirements, you can instead specify your own environment and scoring method. Custom environments can be used for any model you want to deploy.

In previous code, you specified the model's runtime environment by creating an [Environment](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.environment%28class%29?view=azure-ml-py) object and providing the [CondaDependencies](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.conda_dependencies.condadependencies?view=azure-ml-py) needed by your model.

In the following cells you will use the Azure Machine Learning SDK to package the model and scoring script in a container, and deploy that container to an Azure Container Instance.

Run the following cell: you may be waiting 20-25 minutes for completion, while the Running tag adds progress dots.

In [None]:
from azureml.core.model import InferenceConfig
from azureml.core.webservice import AciWebservice

inference_config = InferenceConfig(entry_script='scoring_service.py', environment=myEnv)
aci_config = AciWebservice.deploy_configuration(
    cpu_cores = 1, 
    memory_gb = 1, 
    tags = {'name': 'aci-cluster'}, 
    description = 'Scoring web service.')

from azureml.core import Webservice

service_name = 'predict-battery-life'

webservice = Model.deploy(workspace=ws,
                       name=service_name,
                       models=[model],
                       inference_config=inference_config,
                       deployment_config=aci_config, 
                       overwrite=True)
webservice.wait_for_deployment(show_output=True)
print(webservice.state)

Tips: You can try get_logs(): https://aka.ms/debugimage#dockerlog or local deployment: https://aka.ms/debugimage#debug-locally to debug if deployment takes longer than 10 minutes.
Running.............................

### Test the deployed web service

With the deployed web service ready, you are now ready to test calling the service with some car telemetry to see the scored results. There are three ways to approach this:
1. You could use the `Webservice` object that you acquired in the previous cell to call the service directly.
2. You could use the `Webservice` class to get a reference to a deployed web service by name.
3. You could use any client capable of making a REST call.

In this notebook, we will take the first approach. Run the following cells to retrieve the web service by name and then to invoke it using some sample car telemetry.

The output of this cell will be an array of numbers, where each number represents the expected battery lifetime in days for the corresponding row of vehicle data.

In [None]:
%%time
# load some test vehicle data that the model has not seen
test_data = pd.read_csv(test_data_url)

# prepare the data and select five vehicles
test_data = test_data.drop(columns=["Car_ID", "Battery_Age"])
test_data.rename(columns={'Twelve_hourly_temperature_forecast_for_next_31_days_reversed': 'Twelve_hourly_temperature_history_for_last_31_days_before_death_last_recording_first'}, inplace=True)
test_data_json = test_data.iloc[:5, 0:73].to_json(orient="split")
prediction = webservice.run(input_data = test_data_json)
print(prediction)