# Automated ML Embedding
In this notebook, we'll use Azure ML to train a model using our graph embedding as an additional feature.

In [None]:
%pip install graphdatascience
%pip install azure-ai-ml
%pip install azureml-mlflow
%pip install mlflow
%pip install mltable

## Azure ML Connection
Let's setup our Azure ML connection.

### Import the required libraries

In [83]:
# Import required libraries
from azure.identity import DefaultAzureCredential
from azure.identity import AzureCliCredential
from azure.ai.ml import automl, Input, MLClient

from azure.ai.ml.constants import AssetTypes
from azure.ai.ml.automl import (
    classification,
    ClassificationPrimaryMetrics,
    ClassificationModels,
)

### Workspace details

To connect to a workspace, we need identifier parameters - a subscription, resource group and workspace name. We will use these details in the `MLClient` from `azure.ai.ml` to get a handle to the required Azure Machine Learning workspace. We use the default [default azure authentication](https://docs.microsoft.com/en-us/python/api/azure-identity/azure.identity.defaultazurecredential?view=azure-python) for this tutorial. Check the [configuration notebook](../../configuration.ipynb) for more details on how to configure credentials and connect to a workspace.

By default, we try to use the by default workspace configuration (available out-of-the-box in Compute Instances) or from any Config.json file you might have copied into the folders structure.
If no Config.json is found, then you need to manually introduce the subscription_id, resource_group and workspace when creating MLClient .

In [84]:
credential = DefaultAzureCredential()
ml_client = None
try:
    ml_client = MLClient.from_config(credential)
except Exception as ex:
    print(ex)
    # Enter details of your AML workspace
    subscription_id = "<YOUR_SUBSCRIPTION_ID>"
    resource_group = "<YOUR_RESOURCE_GROUP>"
    workspace = "<YOUR_AZURE_ML_WORKSPACE>"
    ml_client = MLClient(credential, subscription_id, resource_group, workspace)

Found the config file in: /config.json


In [None]:
workspace = ml_client.workspaces.get(name=ml_client.workspace_name)

output = {}
output["Workspace"] = ml_client.workspace_name
output["Subscription ID"] = ml_client.connections._subscription_id
output["Resource Group"] = workspace.resource_group
output["Location"] = workspace.location
output

## Upload Training Data MLTable
Now we're going to upload the training data to Azureml

In [86]:
# MLTable definition file
with open('./data/training-mltable-folder/MLTable', 'w') as f:
    f.write("""
        paths:
            - file: ./train.csv
        transformations:
            - read_delimited:
                    delimiter: ','
                    encoding: 'ascii'
    """)

    # MLTable definition file
with open('./data/test-mltable-folder/MLTable', 'w') as f:
    f.write("""
        paths:
            - file: ./test.csv
        transformations:
            - read_delimited:
                    delimiter: ','
                    encoding: 'ascii'
    """)
    
# MLTable definition file
with open('./data/validation-mltable-folder/MLTable', 'w') as f:
    f.write("""
        paths:
            - file: ./validate.csv
        transformations:
            - read_delimited:
                    delimiter: ','
                    encoding: 'ascii'
    """)

In [87]:
# Create MLTables for training dataset

my_training_data_input = Input(
    type=AssetTypes.MLTABLE, path="./data/training-mltable-folder"
)

In [88]:
# General job parameters
compute_name = "azureml-compute"
max_trials = 5
exp_name = "claim-prediction-experiment-2"

## Setting up the Classification Job
After uploading the dataset, you can invoke AutoML to find the best ML pipeline to train a model on this dataset.

In [89]:
# Create the AutoML classification job with the related factory-function.

classification_job = automl.classification(
    compute=compute_name,
    experiment_name=exp_name,
    training_data=my_training_data_input,
    target_column_name="target",
    primary_metric="accuracy",
    n_cross_validations=5,
    enable_model_explainability=True,
    tags={"classification_task": "insurance-fraud"},
)

# Limits are all optional
classification_job.set_limits(
    timeout_minutes=600,
    trial_timeout_minutes=20,
    max_trials=max_trials,
    # max_concurrent_trials = 4,
    # max_cores_per_trial: -1,
    enable_early_termination=True,
)

# Training properties are optional
classification_job.set_training(
    blocked_training_algorithms=["LogisticRegression"],
    enable_onnx_compatible_models=True,
)

In [None]:
# Submit the AutoML job (CDLTLL: Is it ml_client.create_or_update(classification_job))
returned_job = ml_client.jobs.create_or_update(
    classification_job
)  # submit the job to the backend

print(f"Created job: {returned_job}")

# Get a URL for the status of the job
# returned_job.services["Studio"].endpoint

## Launching the AutoML Job

This job is going to take close to 15 minutes

In [None]:
ml_client.jobs.stream(returned_job.name)

In [None]:
# Get a URL for the status of the job
returned_job.services["Studio"].endpoint

In [94]:
print(returned_job.name)

zen_foot_z2cn39t9vt


# It's optional to proceed to the sections below

## Retrieve the Best Trial (Best Model's trial/run)
Use the MLFLowClient to access the results (such as Models, Artifacts, Metrics) of a previously completed AutoML Trial.

## Initialize MLFlow Client
The models and artifacts that are produced by AutoML can be accessed via the MLFlow interface. 
Initialize the MLFlow client here, and set the backend as Azure ML, via. the MLFlow Client.

*IMPORTANT*, you need to have installed the latest MLFlow packages with:

    pip install azureml-mlflow

    pip install mlflow

### Obtain the tracking URI for MLFlow

In [None]:
import mlflow

# Obtain the tracking URL from MLClient
MLFLOW_TRACKING_URI = ml_client.workspaces.get(
    name=ml_client.workspace_name
).mlflow_tracking_uri

print(MLFLOW_TRACKING_URI)

In [None]:
# Set the MLFLOW TRACKING URI

mlflow.set_tracking_uri(MLFLOW_TRACKING_URI)

print("\nCurrent tracking uri: {}".format(mlflow.get_tracking_uri()))

In [40]:
from mlflow.tracking.client import MlflowClient

# Initialize MLFlow client
mlflow_client = MlflowClient()

### Get the AutoML parent Job

In [41]:
job_name = returned_job.name

# Example if providing an specific Job name/ID
# job_name = "b4e95546-0aa1-448e-9ad6-002e3207b4fc"

# Get the parent run
mlflow_parent_run = mlflow_client.get_run(job_name)

print("Parent Run: ")
print(mlflow_parent_run)

Parent Run: 
<Run: data=<RunData: metrics={'AUC_macro': 0.5490186868098261,
 'AUC_micro': 0.9419168915699154,
 'AUC_weighted': 0.5490186868098261,
 'accuracy': 0.9360261698193714,
 'average_precision_score_macro': 0.5117692757642988,
 'average_precision_score_micro': 0.9295816055434472,
 'average_precision_score_weighted': 0.8904370258383925,
 'balanced_accuracy': 0.5,
 'f1_score_macro': 0.4834762412568109,
 'f1_score_micro': 0.9360261698193714,
 'f1_score_weighted': 0.9050998571251208,
 'log_loss': 0.23782531949216668,
 'matthews_correlation': 0.0,
 'norm_macro_recall': 0.0,
 'precision_score_macro': 0.4680130849096857,
 'precision_score_micro': 0.9360261698193714,
 'precision_score_weighted': 0.8761581666869324,
 'recall_score_macro': 0.5,
 'recall_score_micro': 0.9360261698193714,
 'recall_score_weighted': 0.9360261698193714,
 'weighted_accuracy': 0.9953316698607239}, params={}, tags={'automl_best_child_run_id': 'serene_goat_40s2k8j8yd_4',
 'classification_task': 'sec',
 'fit_time':

In [42]:
# Print parent run tags. 'automl_best_child_run_id' tag should be there.
print(mlflow_parent_run.data.tags)

{'classification_task': 'sec', 'model_explain_run': 'best_run', 'pipeline_id': '', 'score': '', 'predicted_cost': '', 'fit_time': '', 'training_percent': '', 'iteration': '', 'run_preprocessor': '', 'run_algorithm': '', 'automl_best_child_run_id': 'serene_goat_40s2k8j8yd_4', 'model_explain_best_run_child_id': 'serene_goat_40s2k8j8yd_4', 'mlflow.rootRunId': 'serene_goat_40s2k8j8yd', 'mlflow.runName': 'serene_goat_40s2k8j8yd', 'mlflow.user': 'Ezhil Vezhavendhan'}


## Get the AutoML best child run

In [43]:
# Get the best model's child run

best_child_run_id = mlflow_parent_run.data.tags["automl_best_child_run_id"]
print("Found best child run id: ", best_child_run_id)

best_run = mlflow_client.get_run(best_child_run_id)

print("Best child run: ")
print(best_run)

Found best child run id:  serene_goat_40s2k8j8yd_4
Best child run: 
<Run: data=<RunData: metrics={'AUC_macro': 0.5490186868098261,
 'AUC_micro': 0.9419168915699154,
 'AUC_weighted': 0.5490186868098261,
 'accuracy': 0.9360261698193714,
 'average_precision_score_macro': 0.5117692757642988,
 'average_precision_score_micro': 0.9295816055434472,
 'average_precision_score_weighted': 0.8904370258383925,
 'balanced_accuracy': 0.5,
 'f1_score_macro': 0.4834762412568109,
 'f1_score_micro': 0.9360261698193714,
 'f1_score_weighted': 0.9050998571251208,
 'log_loss': 0.23782531949216668,
 'matthews_correlation': 0.0,
 'norm_macro_recall': 0.0,
 'precision_score_macro': 0.4680130849096857,
 'precision_score_micro': 0.9360261698193714,
 'precision_score_weighted': 0.8761581666869324,
 'recall_score_macro': 0.5,
 'recall_score_micro': 0.9360261698193714,
 'recall_score_weighted': 0.9360261698193714,
 'weighted_accuracy': 0.9953316698607239}, params={}, tags={'mlflow.parentRunId': 'serene_goat_40s2k8j8y

## Get best model run's metrics

Access the results (such as Models, Artifacts, Metrics) of a previously completed AutoML Run.

In [44]:
best_run.data.metrics

{'precision_score_macro': 0.4680130849096857,
 'accuracy': 0.9360261698193714,
 'average_precision_score_micro': 0.9295816055434472,
 'balanced_accuracy': 0.5,
 'recall_score_macro': 0.5,
 'recall_score_weighted': 0.9360261698193714,
 'recall_score_micro': 0.9360261698193714,
 'f1_score_macro': 0.4834762412568109,
 'precision_score_micro': 0.9360261698193714,
 'AUC_micro': 0.9419168915699154,
 'matthews_correlation': 0.0,
 'norm_macro_recall': 0.0,
 'log_loss': 0.23782531949216668,
 'AUC_weighted': 0.5490186868098261,
 'precision_score_weighted': 0.8761581666869324,
 'average_precision_score_macro': 0.5117692757642988,
 'f1_score_micro': 0.9360261698193714,
 'f1_score_weighted': 0.9050998571251208,
 'weighted_accuracy': 0.9953316698607239,
 'AUC_macro': 0.5490186868098261,
 'average_precision_score_weighted': 0.8904370258383925}