# Automated ML

Import Dependencies. In the cell below, I import all the dependencies that I will need to complete the project.

In [None]:


from azureml.core import Workspace, Experiment

from azureml.core.compute import ComputeTarget, AmlCompute
from azureml.core.compute_target import ComputeTargetException

from azureml.core.dataset import Dataset

from azureml.train.automl import AutoMLConfig

from azureml.widgets import RunDetails

from azureml.core.webservice import AciWebservice
from azureml.core.model import InferenceConfig, Model

import json
import requests


## Dataset

### Overview

Employee turnover impacts all organizations. The IBM HR Attrition Case Study aims to identify the factors contributing to employee turnover and forecast those individuals who are more likely to depart from the company.

With a dataset comprising 35 columns, I intend to utilize Microsoft Azure's AutoML feature to develop and evaluate various models for predicting employee turnover. Subsequently, I will deploy the most effective model as a web service for further interaction.

Get data. In the cell below, I write code to access the data I will be using in this project.

In [None]:
ws = Workspace.from_config()

experiment_name = 'AutoML'

experiment=Experiment(ws, experiment_name)

In [None]:
found = False
key = "Employee Attrition"
description_text = "IBM HR Analytics Employee Attrition & Performance"

if key in ws.datasets.keys(): 
        found = True
        dataset = ws.datasets[key] 

if not found:
        url = "https://raw.githubusercontent.com/eljandoubi/Azure-Machine-Learning-Engineer/main/attrition-dataset.csv"
        dataset = Dataset.Tabular.from_delimited_files(url)        
        dataset = dataset.register(workspace=ws,
                                   name=key,description=description_text)


df = dataset.to_pandas_dataframe()
df.describe()

In [None]:
cluster_name = "automl-vs-hpyer"

try:
    cpu_cluster = ComputeTarget(workspace=ws, name=cluster_name)
    print('Found existing compute target, using it!')
    
except ComputeTargetException:
    print('Creating a new compute target!')
    compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_D2_V2',min_nodes=1, max_nodes=8)
    cpu_cluster = ComputeTarget.create(ws, cluster_name, compute_config)
    
cpu_cluster.wait_for_completion(show_output=True)
 
cpu_cluster.get_status().serialize()

## AutoML Configuration

This task involves binary classification, where the target variable is `Attrition`, with possible outcomes being `True` or `False`. The experiment's timeout duration is set to 30 minutes, allowing for a maximum of 8 concurrent iterations. The primary evaluation metric used is `AUC_weighted`.

In [None]:
automl_settings = automl_settings = {
    "experiment_timeout_minutes": 30,
    "max_concurrent_iterations": 8,
    "primary_metric" : 'AUC_weighted'
}

# Automl config
automl_config = AutoMLConfig(compute_target=cpu_cluster,
                             task = "classification",
                             training_data=dataset,
                             label_column_name="Attrition",   
                             path = './automl',
                             enable_early_stopping= True,
                             featurization= 'auto',
                             debug_log = "automl_errors.log",
                             **automl_settings
                             )

In [None]:
autoML_run = experiment.submit(automl_config)

## Run Details

In the cell below, i use the `RunDetails` widget to show the different experiments.

In [None]:
autoML_run.wait_for_completion(show_output=True)

In [None]:
RunDetails(autoML_run).show()

## Best Model

In the cell below, get the best model from the automl experiments and display all the properties of the model.



In [None]:
best_run, fitted_model = autoML_run.get_output()
best_run_metrics = best_run.get_metrics()

In [None]:
fitted_model

In [None]:
best_run_metrics

In [None]:
print('Best Run Id: ', best_run.id)
print('AUC_weighted of Best Run is:', best_run_metrics['AUC_weighted'])

In [None]:
automodel = best_run.register_model(model_name='automl_model', 
                                    model_path='outputs/model.pkl',
                                    tags={'Method':'AutoML'},
                                    properties={'AUC_weighted': best_run_metrics['AUC_weighted']})

automodel

## Model Deployment

In the cell below, I register the model, create an inference config and deploy the model as a web service.

In [None]:
best_run.download_file('outputs/scoring_file_v_1_0_0.py', 'score.py')
best_run.download_file('outputs/conda_env_v_1_0_0.yml', 'env.yml')

In [None]:
aciconfig = AciWebservice.deploy_configuration(cpu_cores=1, 
                                               memory_gb=1, 
                                               description='Predict employee attrition')

In [None]:
inference_config = InferenceConfig(entry_script="score.py", environment=best_run.get_environment())

service = Model.deploy(workspace=ws, 
                       name='automl-webservice', 
                       models=[automodel], 
                       inference_config=inference_config, 
                       deployment_config=aciconfig)

In [None]:
service.wait_for_deployment(show_output=True)

In [None]:
service.update(enable_app_insights=True)

In [None]:
print("Service State: ",service.state)
scoring_uri = service.scoring_uri
scoring_uri

In the cell below, I send a request to the web service I deployed to test it.

In [None]:
test_df = df.sample(4)
test_df.pop('Attrition')

input_data = json.dumps({'data': test_df.to_dict(orient='records')})
input_data

In [None]:
headers = {'Content-Type': 'application/json'}

resp = requests.post(scoring_uri, input_data, headers=headers)

resp.text

In the cell below, I print the logs of the web service and delete the service

In [None]:
service.get_logs()

In [None]:
service.delete()