# Automated ML

TODO: Import Dependencies. In the cell below, import all the dependencies that you will need to complete the project.

In [1]:
from azureml.core import Workspace, Experiment, Dataset
from azureml.core.compute import ComputeTarget, AmlCompute
from azureml.core.compute_target import ComputeTargetException
from azureml.data.dataset_factory import TabularDatasetFactory
from azureml.train.automl import AutoMLConfig
#from azureml.pipeline.core import PipelineData, TrainingOutput
#from azureml.pipeline.core import Pipeline
from azureml.widgets import RunDetails


import pandas as pd
import requests
import logging
import os
import csv

## ML setup + Dataset

### Overview

Retrieve dataset from kaggle (https://www.kaggle.com/ruthgn/wine-quality-data-set-red-white-wine?select=wine-quality-white-and-red.csv). 

Task: classification of wine quality

In [2]:
url_wine_data="https://docs.google.com/spreadsheets/d/e/2PACX-1vQ0_ymTF3299kfZvr0KJq5JMLX7ZK4yRg9RXYTqEsMqm2eeUrABv_4MVQMzrqfw1CsbmcrnTqIluMA0/pub?output=csv"
wine_data = pd.read_csv(url_wine_data)
description_data = "This data set contains records related to red and white variants of the Portuguese Vinho Verde wine."
print(wine_data.head())
print("")
print("INFO")
print(wine_data.info())
print("")
print("DESCRIPTION")
print(wine_data.describe())

    type  fixed acidity  volatile acidity  citric acid  residual sugar  \
0  white            7.0              0.27         0.36            20.7   
1  white            6.3              0.30         0.34             1.6   
2  white            8.1              0.28         0.40             6.9   
3  white            7.2              0.23         0.32             8.5   
4  white            7.2              0.23         0.32             8.5   

   chlorides  free sulfur dioxide  total sulfur dioxide  density    pH  \
0      0.045                 45.0                 170.0   1.0010  3.00   
1      0.049                 14.0                 132.0   0.9940  3.30   
2      0.050                 30.0                  97.0   0.9951  3.26   
3      0.058                 47.0                 186.0   0.9956  3.19   
4      0.058                 47.0                 186.0   0.9956  3.19   

   sulphates  alcohol  quality  
0       0.45      8.8        6  
1       0.49      9.5        6  
2       0.4

### Workspace and Experiment

In [3]:
ws = Workspace.from_config()

# choose a name for experiment
experiment_name = 'Capstone_project'

experiment=Experiment(ws, experiment_name)


print('Workspace name: ' + ws.name, 
      'Azure region: ' + ws.location, 
      'Subscription id: ' + ws.subscription_id, 
      'Resource group: ' + ws.resource_group, sep = '\n')

Performing interactive authentication. Please follow the instructions on the terminal.
To sign in, use a web browser to open the page https://microsoft.com/devicelogin and enter the code E9VVAHDM7 to authenticate.
You have logged in. Now let us find all the subscriptions to which you have access...
Interactive authentication successfully completed.
Workspace name: quick-starts-ws-164108
Azure region: southcentralus
Subscription id: f9d5a085-54dc-4215-9ba6-dad5d86e60a0
Resource group: aml-quickstarts-164108


### Compute Cluster

In [4]:
# Create compute cluster
# Use vm_size = "STANDARD_DS2_V2" 

cpu_cluster_name = 'auto-ml'
try:
    compute_target = ComputeTarget(workspace=ws,name=cpu_cluster_name)
    print('existing cluster found and will be used')
except ComputeTargetException:
    compute_config = AmlCompute.provisioning_configuration(vm_size='Standard_DS12_v2', max_nodes=4)
    compute_target = ComputeTarget.create(ws,cpu_cluster_name,compute_config)
    compute_target.wait_for_completion(show_output=True)

InProgress....
SucceededProvisioning operation finished, operation "Succeeded"
Succeeded
AmlCompute wait for completion finished

Minimum number of nodes requested have been provisioned


### Dataset

In [5]:
#Getting default ds
data_store = ws.get_default_datastore()
print('get default datastore')

#Registering dataframe as dataset in AzureML
ds = Dataset.Tabular.register_pandas_dataframe(wine_data,data_store,'Wine_Data_Project3',description=description_data)
ds = ds.register(workspace=ws, name='Wine_Data_Project3', description=description_data)

print("dataset registered.")

Method register_pandas_dataframe: This is an experimental method, and may change at any time. Please see https://aka.ms/azuremlexperimental for more information.


get default datastore
Validating arguments.
Arguments validated.
Successfully obtained datastore reference and path.
Uploading file to managed-dataset/a3a6b08c-7aa2-4f74-aa73-c68d0c5ac307/
Successfully uploaded file to datastore.
Creating and registering a new dataset.
Successfully created and registered a new dataset.
dataset registered.


In [6]:
ds.take(5).to_pandas_dataframe()

Unnamed: 0,type,fixed acidity,volatile acidity,citric acid,residual sugar,chlorides,free sulfur dioxide,total sulfur dioxide,density,pH,sulphates,alcohol,quality
0,white,7.0,0.27,0.36,20.7,0.045,45.0,170.0,1.001,3.0,0.45,8.8,6
1,white,6.3,0.3,0.34,1.6,0.049,14.0,132.0,0.994,3.3,0.49,9.5,6
2,white,8.1,0.28,0.4,6.9,0.05,30.0,97.0,0.9951,3.26,0.44,10.1,6
3,white,7.2,0.23,0.32,8.5,0.058,47.0,186.0,0.9956,3.19,0.4,9.9,6
4,white,7.2,0.23,0.32,8.5,0.058,47.0,186.0,0.9956,3.19,0.4,9.9,6


## AutoML Configuration

TODO: Explain why you chose the automl settings and cofiguration you used below.

In [7]:
# TODO: Put your automl settings here
automl_settings = {
       "n_cross_validations": 5,
       "primary_metric": 'AUC_weighted',
       "enable_early_stopping": True,
       "experiment_timeout_hours": 1.0,
       "max_concurrent_iterations": 4,
       "verbosity": logging.INFO,
   }

# TODO: Put your automl config here
automl_config = AutoMLConfig(
    compute_target = compute_target,
    task='classification',
    training_data=ds,
    label_column_name='quality',
    **automl_settings)

In [8]:
# TODO: Submit your experiment
remote_run = experiment.submit(automl_config, show_output=True)

Submitting remote run.
No run_configuration provided, running on auto-ml with default configuration
Running on remote compute: auto-ml


Experiment,Id,Type,Status,Details Page,Docs Page
Capstone_project,AutoML_f205d8b0-a033-4b0a-810b-d5b1b2d129f9,automl,NotStarted,Link to Azure Machine Learning studio,Link to Documentation



Current status: FeaturesGeneration. Generating features for the dataset.
Current status: DatasetCrossValidationSplit. Generating individually featurized CV splits.
Current status: ModelSelection. Beginning model selection.

****************************************************************************************************
DATA GUARDRAILS: 

TYPE:         Class balancing detection
STATUS:       ALERTED
DESCRIPTION:  To decrease model bias, please cancel the current run and fix balancing problem.
              Learn more about imbalanced data: https://aka.ms/AutomatedMLImbalancedData
DETAILS:      Imbalanced data can lead to a falsely perceived positive effect of a model's accuracy because the input data has bias towards one class.
+---------------------------------+---------------------------------+--------------------------------------+
|Size of the smallest class       |Name/Label of the smallest class |Number of samples in the training data|
|5                                |9    

## Run Details

OPTIONAL: Write about the different models trained and their performance. Why do you think some models did better than others?

TODO: In the cell below, use the `RunDetails` widget to show the different experiments.

In [9]:
RunDetails(remote_run).show()
remote_run.wait_for_completion()

_AutoMLWidget(widget_settings={'childWidgetDisplay': 'popup', 'send_telemetry': False, 'log_level': 'INFO', 's…

{'runId': 'AutoML_f205d8b0-a033-4b0a-810b-d5b1b2d129f9',
 'target': 'auto-ml',
 'status': 'Completed',
 'startTimeUtc': '2021-11-21T07:30:50.738109Z',
 'endTimeUtc': '2021-11-21T07:54:02.592211Z',
 'services': {},
   'message': 'No scores improved over last 20 iterations, so experiment stopped early. This early stopping behavior can be disabled by setting enable_early_stopping = False in AutoMLConfig for notebook/python SDK runs.'}],
 'properties': {'num_iterations': '1000',
  'training_type': 'TrainFull',
  'acquisition_function': 'EI',
  'primary_metric': 'AUC_weighted',
  'train_split': '0',
  'acquisition_parameter': '0',
  'num_cross_validation': '5',
  'target': 'auto-ml',
  'DataPrepJsonString': '{\\"training_data\\": {\\"datasetId\\": \\"e1100ed1-1647-49a5-9663-6d4c7d0585a0\\"}, \\"datasets\\": 0}',
  'EnableSubsampling': None,
  'runTemplate': 'AutoML',
  'azureml.runsource': 'automl',
  'display_task_type': 'classification',
  'dependencies_versions': '{"azureml-widgets": "1.

## Best Model

TODO: In the cell below, get the best model from the automl experiments and display all the properties of the model.



In [10]:
best_automl_run, best_automl_model = remote_run.get_output()
print("Best model: ", best_automl_model)
print("Best estimator: ", best_automl_model.steps[-1])

Package:azureml-automl-runtime, training version:1.35.1, current version:1.34.0
Package:azureml-core, training version:1.35.0.post1, current version:1.34.0
Package:azureml-dataprep, training version:2.23.2, current version:2.22.2
Package:azureml-dataprep-rslex, training version:1.21.2, current version:1.20.1
Package:azureml-dataset-runtime, training version:1.35.0, current version:1.34.0
Package:azureml-defaults, training version:1.35.0, current version:1.34.0
Package:azureml-interpret, training version:1.35.0, current version:1.34.0
Package:azureml-mlflow, training version:1.35.0, current version:1.34.0
Package:azureml-pipeline-core, training version:1.35.0, current version:1.34.0
Package:azureml-responsibleai, training version:1.35.0, current version:1.34.0
Package:azureml-telemetry, training version:1.35.0, current version:1.34.0
Package:azureml-train-automl-client, training version:1.35.0, current version:1.34.0
Package:azureml-train-automl-runtime, training version:1.35.1, current

Best model:  Pipeline(memory=None,
         steps=[('datatransformer',
                 DataTransformer(enable_dnn=False, enable_feature_sweeping=True, feature_sweeping_config={}, feature_sweeping_timeout=86400, featurization_config=None, force_text_dnn=False, is_cross_validation=True, is_onnx_compatible=False, observer=None, task='classification', working_dir='/mnt/batch/tasks/shared/LS_root/mount...
)), ('logisticregression', LogisticRegression(C=2222.996482526191, class_weight=None, dual=False, fit_intercept=True, intercept_scaling=1, l1_ratio=None, max_iter=100, multi_class='multinomial', n_jobs=1, penalty='l2', random_state=None, solver='lbfgs', tol=0.0001, verbose=0, warm_start=False))], verbose=False))], meta_learner=LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True, intercept_scaling=1, l1_ratio=None, max_iter=100, multi_class='auto', n_jobs=None, penalty='l2', random_state=None, solver='lbfgs', tol=0.0001, verbose=0, warm_start=False), training_cv_fol

In [17]:
model_name = best_automl_run.properties['model_name']
model_name

'AutoMLf205d8b0a37'

In [37]:
#TODO: Save the best model
best_model = best_automl_run.register_model(model_name='automl-model', model_path = '/outputs/model.pkl')

## Model Deployment

Remember you have to deploy only one of the two models you trained but you still need to register both the models. Perform the steps in the rest of this notebook only if you wish to deploy this model.

TODO: In the cell below, register the model, create an inference config and deploy the model as a web service.

In [12]:
# deployment config
from azureml.core.webservice import Webservice, AciWebservice
from azureml.core.model import Model
my_deployment_config = AciWebservice.deploy_configuration(cpu_cores=1, memory_gb=1, auth_enabled=True, enable_app_insights=True)

In [27]:
best_automl_run.download_file('outputs/scoring_file_v_1_0_0.py','scoreScript.py')

In [38]:
best_automl_run.download_file('outputs/model.pkl','model.pkl')

In [45]:
# inference config
from azureml.core.model import InferenceConfig
from azureml.core.environment import Environment
#environment
#my_env = Environment(name="AzureML-AutoML")
my_env= best_automl_run.get_environment()

#script_file_name='scoring_easy.py'
my_inference_config = InferenceConfig(environment=my_env, source_directory='./', entry_script="scoreScript.py")

# https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.model.inferenceconfig?view=azure-ml-py


In [46]:
service = Model.deploy(workspace=ws,  
                       name="automldeployment-4", 
                       models=[best_model], 
                       inference_config=my_inference_config, 
                       deployment_config=my_deployment_config)

service.wait_for_deployment(show_output=True)

Tips: You can try get_logs(): https://aka.ms/debugimage#dockerlog or local deployment: https://aka.ms/debugimage#debug-locally to debug if deployment takes longer than 10 minutes.
Running
2021-11-21 08:54:53+00:00 Creating Container Registry if not exists.
2021-11-21 08:54:53+00:00 Registering the environment.
2021-11-21 08:54:53+00:00 Use the existing image.
2021-11-21 08:54:53+00:00 Generating deployment configuration.
2021-11-21 08:54:54+00:00 Submitting deployment to compute.
2021-11-21 08:54:58+00:00 Checking the status of deployment automldeployment-4..
2021-11-21 08:58:25+00:00 Checking the status of inference endpoint automldeployment-4.
Failed


ERROR:azureml.core.webservice.webservice:Service deployment polling reached non-successful terminal state, current service state: Failed
Operation ID: 139e4aae-2a0e-4cea-a397-ad362e1f88e1
More information can be found using '.get_logs()'
Error:
{
  "code": "AciDeploymentFailed",
  "statusCode": 400,
  "message": "Aci Deployment failed with exception: Error in entry script, FileNotFoundError: [Errno 2] No such file or directory: '/var/azureml-app/azureml-models/automl-model/2/model.pkl', please run print(service.get_logs()) to get details.",
  "details": [
    {
      "code": "CrashLoopBackOff",
      "message": "Error in entry script, FileNotFoundError: [Errno 2] No such file or directory: '/var/azureml-app/azureml-models/automl-model/2/model.pkl', please run print(service.get_logs()) to get details."
    }
  ]
}



WebserviceException: WebserviceException:
	Message: Service deployment polling reached non-successful terminal state, current service state: Failed
Operation ID: 139e4aae-2a0e-4cea-a397-ad362e1f88e1
More information can be found using '.get_logs()'
Error:
{
  "code": "AciDeploymentFailed",
  "statusCode": 400,
  "message": "Aci Deployment failed with exception: Error in entry script, FileNotFoundError: [Errno 2] No such file or directory: '/var/azureml-app/azureml-models/automl-model/2/model.pkl', please run print(service.get_logs()) to get details.",
  "details": [
    {
      "code": "CrashLoopBackOff",
      "message": "Error in entry script, FileNotFoundError: [Errno 2] No such file or directory: '/var/azureml-app/azureml-models/automl-model/2/model.pkl', please run print(service.get_logs()) to get details."
    }
  ]
}
	InnerException None
	ErrorResponse 
{
    "error": {
        "message": "Service deployment polling reached non-successful terminal state, current service state: Failed\nOperation ID: 139e4aae-2a0e-4cea-a397-ad362e1f88e1\nMore information can be found using '.get_logs()'\nError:\n{\n  \"code\": \"AciDeploymentFailed\",\n  \"statusCode\": 400,\n  \"message\": \"Aci Deployment failed with exception: Error in entry script, FileNotFoundError: [Errno 2] No such file or directory: '/var/azureml-app/azureml-models/automl-model/2/model.pkl', please run print(service.get_logs()) to get details.\",\n  \"details\": [\n    {\n      \"code\": \"CrashLoopBackOff\",\n      \"message\": \"Error in entry script, FileNotFoundError: [Errno 2] No such file or directory: '/var/azureml-app/azureml-models/automl-model/2/model.pkl', please run print(service.get_logs()) to get details.\"\n    }\n  ]\n}"
    }
}

In [48]:
service.get_logs()

'2021-11-21T09:02:49,503576600+00:00 - iot-server/run \n2021-11-21T09:02:49,506654700+00:00 - rsyslog/run \n2021-11-21T09:02:49,518116800+00:00 - gunicorn/run \nDynamic Python package installation is disabled.\nStarting HTTP server\n2021-11-21T09:02:49,613083200+00:00 - nginx/run \nrsyslogd: /azureml-envs/azureml_705720c76ff57b57c77d577152dabb18/lib/libuuid.so.1: no version information available (required by rsyslogd)\nEdgeHubConnectionString and IOTEDGE_IOTHUBHOSTNAME are not set. Exiting...\n2021-11-21T09:02:49,895561900+00:00 - iot-server/finish 1 0\n2021-11-21T09:02:49,901391700+00:00 - Exit code 1 is normal. Not restarting iot-server.\nStarting gunicorn 20.1.0\nListening at: http://127.0.0.1:31311 (403)\nUsing worker: sync\nworker timeout is set to 300\nBooting worker with pid: 431\nSPARK_HOME not set. Skipping PySpark Initialization.\nGenerating new fontManager, this may take some time...\nInitializing logger\n2021-11-21 09:02:52,718 | root | INFO | Starting up app insights clien

In [None]:
print(service.state)

TODO: In the cell below, send a request to the web service you deployed to test it.

In [None]:
import requests
import json


# get the keys:
primary_key, secondary_key = service.get_keys()

scoring_uri = service.scoring_uri
# If the service is authenticated, set the key or token
key = primary_key

# Two sets of data to score, so we get two results back
data = {"data":
        [
          {
            "type": white,
            "fixed acidity": 7,
            "volatile acidity": 0.27,
            "citric acid": 0.36,
            "residual sugar": 20.7,
            "chlorides": 0.045,
            "free sulfur dioxide": 45,
            "total sulfur dioxide": 170,
            "desnsity": 1.001,
            "pH": 3,
            "sulphates": 0.45,
            "alcohol": 8.8,
          },
          {
            "type": red,
            "fixed acidity": 6,
            "volatile acidity": 0.31,
            "citric acid": 0.47,
            "residual sugar": 3.6,
            "chlorides": 0.067,
            "free sulfur dioxide": 18,
            "total sulfur dioxide": 42,
            "desnsity": 0.99549,
            "pH": 3.39,
            "sulphates": 0.66,
            "alcohol": 11,
          },
      ]
    }
# Convert to JSON string
input_data = json.dumps(data)
with open("data.json", "w") as _f:
    _f.write(input_data)

# Set the content type
headers = {'Content-Type': 'application/json'}
# If authentication is enabled, set the authorization header
headers['Authorization'] = f'Bearer {key}'

# Make the request and display the response
resp = requests.post(scoring_uri, input_data, headers=headers)
print(resp.json())




TODO: In the cell below, print the logs of the web service and delete the service

In [None]:
service.delete()

**Submission Checklist**
- I have registered the model.
- I have deployed the model with the best accuracy as a webservice.
- I have tested the webservice by sending a request to the model endpoint.
- I have deleted the webservice and shutdown all the computes that I have used.
- I have taken a screenshot showing the model endpoint as active.
- The project includes a file containing the environment details.
