# Automated ML

TODO: Import Dependencies. In the cell below, import all the dependencies that you will need to complete the project.

In [1]:
from azureml.core import Workspace, Experiment, Dataset
from azureml.core.compute import ComputeTarget, AmlCompute
from azureml.core.compute_target import ComputeTargetException
from azureml.train.automl.utilities import get_primary_metrics
from azureml.train.automl import AutoMLConfig
from azureml.pipeline.steps import AutoMLStep
from azureml.widgets import RunDetails
import pickle
from azureml.core.environment import Environment
from azureml.core.model import InferenceConfig, Model
from azureml.core.webservice.aci import AciWebservice
from azureml.core.webservice import Webservice
import os

## Dataset

### Overview

Task- Prediction of house prices in King County (A regression problem)
Data- Dataset has been taken from: https://www.kaggle.com/harlfoxem/housesalesprediction

In [3]:
ws = Workspace.from_config()

# choose a name for experiment
experiment_name = 'exp1'

experiment=Experiment(ws, experiment_name)

In [4]:
try:
    cluster = ComputeTarget(workspace=ws, name="compute11")
    print("Cluster exists")
except:
    config = AmlCompute.provisioning_configuration(vm_size='STANDARD_D2_V2', max_nodes=4)
    cluster = ComputeTarget.create(ws, "compute11", config)

cluster.wait_for_completion()

In [5]:
train_data = Dataset.get_by_name(ws, name="housedata")

train_data.take(5).to_pandas_dataframe()

Unnamed: 0,price,bedrooms,bathrooms,sqft_living,sqft_lot,floors,view,condition,grade,sqft_above,sqft_basement,lat,long,sqft_living15,sqft_lot15
0,221900.0,3,1.0,1180,5650,1.0,0,3,7,1180,0,47.5112,-122.257,1340,5650
1,538000.0,3,2.25,2570,7242,2.0,0,3,7,2170,400,47.721,-122.319,1690,7639
2,180000.0,2,1.0,770,10000,1.0,0,3,6,770,0,47.7379,-122.233,2720,8062
3,604000.0,4,3.0,1960,5000,1.0,0,5,7,1050,910,47.5208,-122.393,1360,5000
4,510000.0,3,2.0,1680,8080,1.0,0,3,8,1680,0,47.6168,-122.045,1800,7503


## AutoML Configuration

The task specified is regression according to the problem statement. Metric is accuracy as its a good way to evaluate the model. Cross validation helps to prevent overfitting of the model. Concurrent iterations has been set to 4 and its value has to be less than or equal to the number of nodes provided during creation of compute cluster.

In [8]:
automl_settings = {
    "experiment_timeout_minutes": 30,
    "task": "regression", 
    "primary_metric": "normalized_root_mean_squared_error",
    "training_data": train_data,
    "label_column_name": "price",
    "n_cross_validations": 3,
    "enable_early_stopping": True,
    "max_cores_per_iteration": -1,
    "max_concurrent_iterations": 4,
    "compute_target": cluster
}


automl_config = AutoMLConfig(**automl_settings)

In [9]:
# TODO: Submit your experiment
remote_run = experiment.submit(automl_config)

Running on remote.


## Run Details

OPTIONAL: Write about the different models trained and their performance. Why do you think some models did better than others?

TODO: In the cell below, use the `RunDetails` widget to show the different experiments.

In [10]:
RunDetails(remote_run).show()

_AutoMLWidget(widget_settings={'childWidgetDisplay': 'popup', 'send_telemetry': False, 'log_level': 'INFO', 's…

In [11]:
remote_run.wait_for_completion(show_output=True)



****************************************************************************************************
DATA GUARDRAILS: 

TYPE:         Missing feature values imputation
STATUS:       PASSED
DESCRIPTION:  No feature missing values were detected in the training data.
              Learn more about missing value imputation: https://aka.ms/AutomatedMLFeaturization

****************************************************************************************************

TYPE:         High cardinality feature detection
STATUS:       PASSED
DESCRIPTION:  Your inputs were analyzed, and no high cardinality features were detected.
              Learn more about high cardinality feature handling: https://aka.ms/AutomatedMLFeaturization

****************************************************************************************************

****************************************************************************************************
ITERATION: The iteration being evaluated.
PIPELINE: A summary de

{'runId': 'AutoML_b9d7d298-0956-4f81-8e0d-86aef7096950',
 'target': 'compute11',
 'status': 'Completed',
 'startTimeUtc': '2021-02-12T17:41:23.149865Z',
 'endTimeUtc': '2021-02-12T18:07:27.836937Z',
 'properties': {'num_iterations': '1000',
  'training_type': 'TrainFull',
  'acquisition_function': 'EI',
  'primary_metric': 'normalized_root_mean_squared_error',
  'train_split': '0',
  'acquisition_parameter': '0',
  'num_cross_validation': '3',
  'target': 'compute11',
  'DataPrepJsonString': '{\\"training_data\\": \\"{\\\\\\"blocks\\\\\\": [{\\\\\\"id\\\\\\": \\\\\\"8516b795-e35e-4f2c-afd8-eff3a5f50a77\\\\\\", \\\\\\"type\\\\\\": \\\\\\"Microsoft.DPrep.GetDatastoreFilesBlock\\\\\\", \\\\\\"arguments\\\\\\": {\\\\\\"datastores\\\\\\": [{\\\\\\"datastoreName\\\\\\": \\\\\\"workspaceblobstore\\\\\\", \\\\\\"path\\\\\\": \\\\\\"UI/02-12-2021_053636_UTC/kc_house_data.csv\\\\\\", \\\\\\"resourceGroup\\\\\\": \\\\\\"aml-quickstarts-138714\\\\\\", \\\\\\"subscription\\\\\\": \\\\\\"5a4ab2ba-6c

## Best Model

TODO: In the cell below, get the best model from the automl experiments and display all the properties of the model.



In [12]:
best_auto_run, best_auto_model = remote_run.get_output()
best_auto_model._final_estimator

Package:azureml-automl-runtime, training version:1.21.0, current version:1.20.0
Package:azureml-core, training version:1.21.0.post1, current version:1.20.0
Package:azureml-dataprep, training version:2.8.2, current version:2.7.3
Package:azureml-dataprep-native, training version:28.0.0, current version:27.0.0
Package:azureml-dataprep-rslex, training version:1.6.0, current version:1.5.0
Package:azureml-dataset-runtime, training version:1.21.0, current version:1.20.0
Package:azureml-defaults, training version:1.21.0, current version:1.20.0
Package:azureml-interpret, training version:1.21.0, current version:1.20.0
Package:azureml-pipeline-core, training version:1.21.0, current version:1.20.0
Package:azureml-telemetry, training version:1.21.0, current version:1.20.0
Package:azureml-train-automl-client, training version:1.21.0, current version:1.20.0
Package:azureml-train-automl-runtime, training version:1.21.0, current version:1.20.0


PreFittedSoftVotingRegressor(estimators=[('0',
                                          Pipeline(memory=None,
                                                   steps=[('maxabsscaler',
                                                           MaxAbsScaler(copy=True)),
                                                          ('lightgbmregressor',
                                                           LightGBMRegressor(boosting_type='gbdt',
                                                                             class_weight=None,
                                                                             colsample_bytree=1.0,
                                                                             importance_type='split',
                                                                             learning_rate=0.1,
                                                                             max_depth=-1,
                                                                  

In [13]:
#TODO: Save the best model

model_name = "automl_model.pkl"
with open(model_name, 'wb') as f:
    pickle.dump(best_auto_model,f)

## Model Deployment




In [14]:
model = Model.register(ws, model_path='./automl_model.pkl', model_name='houseprice_prediction')

Registering model houseprice_prediction


In [15]:
env = best_auto_run.get_environment()
inference_config = InferenceConfig(entry_script = "score.py", environment = env)
deployment_config = AciWebservice.deploy_configuration(cpu_cores = 1, memory_gb = 1)

In [None]:
# env = best_auto_run.get_environment()
# entry_script='score.py'
# best_auto_run.download_file('outputs/scoring_file_v_1_0_0.py', entry_script)

In [None]:
# model = remote_run.register_model(model_name=best_auto_run.properties['model_name'], 
#                                            description='AutoML model')
# inference_config = InferenceConfig(entry_script = entry_script, environment = env)
#  deployment_config = AciWebservice.deploy_configuration(cpu_cores = 1, memory_gb = 1)

In [19]:
service = Model.deploy(ws,'deploy1',[model], inference_config, deployment_config)
service.wait_for_deployment(True)
print("State: " + service.state)
print("Scoring URI: " + service.scoring_uri)

Tips: You can try get_logs(): https://aka.ms/debugimage#dockerlog or local deployment: https://aka.ms/debugimage#debug-locally to debug if deployment takes longer than 10 minutes.
Running......................................
Succeeded
ACI service creation operation finished, operation "Succeeded"
State: Healthy
Scoring URI: http://c3459647-297c-4f7d-84bb-88d7239127c9.southcentralus.azurecontainer.io/score


In [20]:
%run endpoint.py

{"result": [248290.61697906838, 826442.737010149]}


In [21]:
service.get_logs()

'2021-02-12T18:30:07,802688956+00:00 - iot-server/run \n2021-02-12T18:30:07,804193652+00:00 - gunicorn/run \n2021-02-12T18:30:07,806289287+00:00 - rsyslog/run \n2021-02-12T18:30:07,810555862+00:00 - nginx/run \n/usr/sbin/nginx: /azureml-envs/azureml_20a8278aa8b20dd48cc50f56a6d2586c/lib/libcrypto.so.1.0.0: no version information available (required by /usr/sbin/nginx)\n/usr/sbin/nginx: /azureml-envs/azureml_20a8278aa8b20dd48cc50f56a6d2586c/lib/libcrypto.so.1.0.0: no version information available (required by /usr/sbin/nginx)\n/usr/sbin/nginx: /azureml-envs/azureml_20a8278aa8b20dd48cc50f56a6d2586c/lib/libssl.so.1.0.0: no version information available (required by /usr/sbin/nginx)\n/usr/sbin/nginx: /azureml-envs/azureml_20a8278aa8b20dd48cc50f56a6d2586c/lib/libssl.so.1.0.0: no version information available (required by /usr/sbin/nginx)\n/usr/sbin/nginx: /azureml-envs/azureml_20a8278aa8b20dd48cc50f56a6d2586c/lib/libssl.so.1.0.0: no version information available (required by /usr/sbin/nginx)

In [22]:
service.update(enable_app_insights=True)

In [25]:
service.delete()