# Hyperparameter Tuning using HyperDrive

TODO: Import Dependencies. In the cell below, import all the dependencies that you will need to complete the project.

In [25]:
import joblib
from azureml.core import Model
from azureml.widgets import RunDetails
from azureml.train.sklearn import SKLearn
from azureml.train.hyperdrive.run import PrimaryMetricGoal
from azureml.train.hyperdrive.policy import BanditPolicy
from azureml.core.compute import ComputeTarget, AmlCompute
from azureml.core.webservice import AciWebservice, Webservice
from azureml.core.model import InferenceConfig
from azureml.train.hyperdrive.sampling import RandomParameterSampling
from azureml.train.hyperdrive.runconfig import HyperDriveConfig
from azureml.core import Dataset, Environment, Experiment, Workspace
from azureml.train.hyperdrive.parameter_expressions import uniform, choice

## Dataset
This is the iris dataset found in [UCI repository](https://archive.ics.uci.edu/ml/datasets/Iris). It is a classification problem in which we need to classify type/class of Iris plant. It contains 50 samples of each class totaling to 150 samples.

TODO: Get data. In the cell below, write code to access the data you will be using in this project. Remember that the dataset needs to be external.

In [2]:
ws = Workspace.from_config()

# Get dataset
dataset = Dataset.get_by_name(ws, name='iris')
iris = dataset.to_pandas_dataframe()
print("Shape of data: ", str(iris.shape))
iris.head()

Shape of data:  (150, 5)


Unnamed: 0,sepal length (cm),sepal width (cm),petal length (cm),petal width (cm),labels
0,5.1,3.5,1.4,0.2,Iris-setosa
1,4.9,3.0,1.4,0.2,Iris-setosa
2,4.7,3.2,1.3,0.2,Iris-setosa
3,4.6,3.1,1.5,0.2,Iris-setosa
4,5.0,3.6,1.4,0.2,Iris-setosa


In [3]:
# choose a name for experiment
experiment_name = 'udacity_capstone_project'

experiment=Experiment(ws, experiment_name)
experiment

Name,Workspace,Report Page,Docs Page
udacity_capstone_project,quick-starts-ws-129170,Link to Azure Machine Learning studio,Link to Documentation


# Create or Use compute target

In [66]:
vm_size = "Standard_DS12_V2"
compute_cluster_name = "udacity-cc"

try:
    compute_target = ComputeTarget(workspace=ws, name=compute_cluster_name)
    print("Compute cluster found.")
except Exception as e:
    print("Creating compute cluster...")
    compute_cluster_config = AmlCompute.provisioning_configuration(vm_size=vm_size,
        max_nodes=4)
    compute_target = ComputeTarget.create(workspace=ws, name=compute_cluster_name, 
        provisioning_configuration=compute_cluster_config)
    compute_target.wait_for_completion(show_output=True)

Creating compute cluster...
CreatingAmlCompute is getting created. Consider calling wait_for_completion() first

AmlCompute is getting created. Consider calling wait_for_completion() first


Succeeded
AmlCompute wait for completion finished

Minimum number of nodes requested have been provisioned


## Hyperdrive Configuration

TODO: Explain the model you are using and the reason for chosing the different hyperparameters, termination policy and config settings.

**As it is classification problem and the problem is simple for an algorithm to learn so I choose "Logistic Regression". Below is the hyperparameters and reason(s) of choosing them.**

- C : This is the inverse regularization parameter. Regularization is the process in which the non significant features are penalised and their effect reduces based on the regularization strength. It’s a penalty term, meant to regulate against Overfitting.

- max_iter : Maximum number of iterations taken for the solvers to converge. As the algorithm runs and we don't know when it will converge hence we should give this to tune the algorithm.

As for termination policy, I used BanditPolicy in which every two iterations checks if the primary metric i.e. accuracy falls outside the top 15% range. I used "max_total_runs" as 10 as dataset is pretty easy for algorithm to learn.

In [9]:
# TODO: Create an early termination policy. This is not required if you are using Bayesian sampling.
early_termination_policy = BanditPolicy(slack_factor=0.15, evaluation_interval=2)

#TODO: Create the different params that you will be using during training
param_sampling = RandomParameterSampling({
    "C": uniform(0.1, 1),
    "max_iter": choice(50, 100, 150)
})

#TODO: Create your estimator and hyperdrive config
estimator = SKLearn(
    source_directory="./", 
    compute_target=compute_target,
    script_params={'--input_data_name': "iris"}, 
    entry_script="train.py"
)

hyperdrive_run_config = HyperDriveConfig(
    estimator=estimator,
    hyperparameter_sampling=param_sampling,
    policy=early_termination_policy,
    primary_metric_name="Accuracy",
    primary_metric_goal=PrimaryMetricGoal.MAXIMIZE,
    max_total_runs=10,
    max_concurrent_runs=4
)

In [10]:
#TODO: Submit your experiment
hyperdrive_run = experiment.submit(hyperdrive_run_config, show_output=True)



## Run Details
TODO: In the cell below, use the `RunDetails` widget to show the different experiments.

In [11]:
RunDetails(hyperdrive_run).show()
hyperdrive_run.wait_for_completion(show_output=True)

_HyperDriveWidget(widget_settings={'childWidgetDisplay': 'popup', 'send_telemetry': False, 'log_level': 'INFO'…

RunId: HD_8e0dc70e-a0ca-4a64-8616-b71cf98d2d79
Web View: https://ml.azure.com/experiments/udacity_capstone_project/runs/HD_8e0dc70e-a0ca-4a64-8616-b71cf98d2d79?wsid=/subscriptions/5781be4e-7862-42f9-8ae8-e879c711039b/resourcegroups/aml-quickstarts-129170/workspaces/quick-starts-ws-129170

Streaming azureml-logs/hyperdrive.txt

"<START>[2020-12-04T14:36:04.894635][API][INFO]Experiment created<END>\n"<START>[2020-12-04T14:36:06.8022612Z][SCHEDULER][INFO]The execution environment is being prepared. Please be patient as it can take a few minutes.<END>"<START>[2020-12-04T14:36:07.798950][GENERATOR][INFO]Trying to sample '4' jobs from the hyperparameter space<END>\n""<START>[2020-12-04T14:36:08.029457][GENERATOR][INFO]Successfully sampled '4' jobs, they will soon be submitted to the execution target.<END>\n"

Execution Summary
RunId: HD_8e0dc70e-a0ca-4a64-8616-b71cf98d2d79
Web View: https://ml.azure.com/experiments/udacity_capstone_project/runs/HD_8e0dc70e-a0ca-4a64-8616-b71cf98d2d79?wsid=/s

{'runId': 'HD_8e0dc70e-a0ca-4a64-8616-b71cf98d2d79',
 'target': 'udacity-cc',
 'status': 'Completed',
 'startTimeUtc': '2020-12-04T14:36:04.355064Z',
 'endTimeUtc': '2020-12-04T14:40:41.950204Z',
 'properties': {'primary_metric_config': '{"name": "Accuracy", "goal": "maximize"}',
  'resume_from': 'null',
  'runTemplate': 'HyperDrive',
  'azureml.runsource': 'hyperdrive',
  'platform': 'AML',
  'ContentSnapshotId': 'd9421f07-c020-4c57-b9f1-602630542f75',
  'score': '1.0',
  'best_child_run_id': 'HD_8e0dc70e-a0ca-4a64-8616-b71cf98d2d79_0',
  'best_metric_status': 'Succeeded'},
 'inputDatasets': [],
 'outputDatasets': [],
 'logFiles': {'azureml-logs/hyperdrive.txt': 'https://mlstrg129170.blob.core.windows.net/azureml/ExperimentRun/dcid.HD_8e0dc70e-a0ca-4a64-8616-b71cf98d2d79/azureml-logs/hyperdrive.txt?sv=2019-02-02&sr=b&sig=GuQYAMIobYDVvsd7tR2CEsSoSIZ5a9d%2FgbhCdk316B4%3D&st=2020-12-04T14%3A31%3A02Z&se=2020-12-04T22%3A41%3A02Z&sp=r'}}

_UserRunWidget(widget_settings={'childWidgetDisplay': 'popup', 'send_telemetry': False, 'log_level': 'INFO', '…

## Best Model

TODO: In the cell below, get the best model from the hyperdrive experiments and display all the properties of the model.

In [27]:
best_run = hyperdrive_run.get_best_run_by_primary_metric()
print(best_run.id)
best_run.get_metrics()

HD_8e0dc70e-a0ca-4a64-8616-b71cf98d2d79_0


{'Regularization Strength:': 0.4034112191815217,
 'Max iterations:': 50,
 'Accuracy': 1.0}

In [29]:
#TODO: Save the best model
best_run.download_file(name="outputs/model.joblib", output_file_path="./hyperdrive_model.joblib")

# Register
model = best_run.register_model(
    model_name='hd_model', 
    model_path='outputs/model.joblib',
    model_framework=Model.Framework.SCIKITLEARN,
    model_framework_version="0.20.3"
)
model

Model(workspace=Workspace.create(name='quick-starts-ws-129170', subscription_id='5781be4e-7862-42f9-8ae8-e879c711039b', resource_group='aml-quickstarts-129170'), name=hd_model, id=hd_model:5, version=5, tags={}, properties={})

In [26]:
joblib.load("./hyperdrive_model.joblib")

The sklearn.linear_model.logistic module is  deprecated in version 0.22 and will be removed in version 0.24. The corresponding classes / functions should instead be imported from sklearn.linear_model. Anything that cannot be imported from sklearn.linear_model is now part of the private API.
Trying to unpickle estimator LogisticRegression from version 0.20.3 when using version 0.22.2.post1. This might lead to breaking code or invalid results. Use at your own risk.
From version 0.24, get_params will raise an AttributeError if a parameter cannot be retrieved as an instance attribute. Previously it would return None.


LogisticRegression(C=0.4034112191815217, class_weight=None, dual=False,
                   fit_intercept=True, intercept_scaling=1, l1_ratio=None,
                   max_iter=50, multi_class='warn', n_jobs=None, penalty='l2',
                   random_state=None, solver='warn', tol=0.0001, verbose=0,
                   warm_start=False)

## Model Deployment

Remember you have to deploy only one of the two models you trained.. Perform the steps in the rest of this notebook only if you wish to deploy this model.

TODO: In the cell below, register the model, create an inference config and deploy the model as a web service.

In [30]:
env = Environment.get(ws, "AzureML-Minimal").clone("udacity_ml")

for pip_package in ["scikit-learn"]:
    env.python.conda_dependencies.add_pip_package(pip_package)

inference_config = InferenceConfig(entry_script='score.py',
                                    environment=env)

In [33]:
env.save_to_directory("./env/", overwrite=True)

In [67]:
aci_config = AciWebservice.deploy_configuration(cpu_cores=2, memory_gb=2,
                                                enable_app_insights=True, auth_enabled=True) 

service_name = 'best-model-service'
service = Model.deploy(ws, service_name, [model], inference_config=inference_config, 
                       deployment_config=aci_config, overwrite=True)
service.wait_for_deployment(show_output = True)

Tips: You can try get_logs(): https://aka.ms/debugimage#dockerlog or local deployment: https://aka.ms/debugimage#debug-locally to debug if deployment takes longer than 10 minutes.
Running.........................
Succeeded
ACI service creation operation finished, operation "Succeeded"


TODO: In the cell below, send a request to the web service you deployed to test it.

In [70]:
x = iris.iloc[15, :-1].tolist()
y = iris.loc[15, "labels"]
print(x)
print(y)

[5.7, 4.4, 1.5, 0.4]
Iris-setosa


In [71]:
import json
input_payload = json.dumps([x])

output = service.run(input_payload)
print(f"Predicted: {output}\nExpected: {y}")

Predicted: ['Iris-setosa']
Expected: Iris-setosa


TODO: In the cell below, print the logs of the web service and delete the service

In [47]:
print(service.get_logs())

/bin/bash: /azureml-envs/azureml_c7a4628cc1f7736f5280b86fa3e2ddbc/lib/libtinfo.so.5: no version information available (required by /bin/bash)
/bin/bash: /azureml-envs/azureml_c7a4628cc1f7736f5280b86fa3e2ddbc/lib/libtinfo.so.5: no version information available (required by /bin/bash)
/bin/bash: /azureml-envs/azureml_c7a4628cc1f7736f5280b86fa3e2ddbc/lib/libtinfo.so.5: no version information available (required by /bin/bash)
/bin/bash: /azureml-envs/azureml_c7a4628cc1f7736f5280b86fa3e2ddbc/lib/libtinfo.so.5: no version information available (required by /bin/bash)
bash: /azureml-envs/azureml_c7a4628cc1f7736f5280b86fa3e2ddbc/lib/libtinfo.so.5: no version information available (required by bash)
2020-12-04T15:32:59,347753010+00:00 - iot-server/run 
2020-12-04T15:32:59,349819597+00:00 - gunicorn/run 
2020-12-04T15:32:59,349022464+00:00 - rsyslog/run 
2020-12-04T15:32:59,377438670+00:00 - nginx/run 
/usr/sbin/nginx: /azureml-envs/azureml_c7a4628cc1f7736f5280b86fa3e2ddbc/lib/libcrypto.so.1.0.0

In [72]:
service.delete()
compute_target.delete()

Current provisioning state of AmlCompute is "Deleting"

Current provisioning state of AmlCompute is "Deleting"

