# Azure AutoML Training on Subset 2
### Supporting: *Leveraging AutoML for Advanced Network Traffic Analysis and Intrusion Detection by Enhancing Security with a Multi-Feature IDS Dataset*

**Author:** Chibuike S Abana, Doctoral Candidate, George Washington University  
**Date of Experiment:** March 22, 2025  

**Purpose:**  
This notebook documents the final experiment using Microsoft Azure Machine Learning AutoML on `training_subset2.csv`, part of the enhanced IDS dataset. Azure AutoML automates the end-to-end model development process, including data preprocessing, algorithm selection, and hyperparameter optimization. The experiment aims to evaluate the platform’s effectiveness in building high-performance intrusion detection models and contributes to the comparative framework analysis in the research.

**License:** MIT License  
This code and related materials are made available under the MIT License. You may use, modify, and distribute with proper attribution. Refer to the `LICENSE` section for complete terms.

In [1]:
import logging

from matplotlib import pyplot as plt
import pandas as pd
import numpy as np
import azureml.core
from azureml.core import Workspace, Experiment
from azureml.train.automl import AutoMLConfig
from azureml.core.dataset import Dataset


In [None]:
ws = Workspace.from_config()

# choosing a name for the experiment
experiment_name = "subset2_final_azure_praxis_03_22"

experiment = Experiment(ws, experiment_name)

## Loading Dataset

In [3]:
# training dataset subset1

from azureml.core import Workspace, Dataset, Datastore

subscription_id = '7170b4e7-9dbe-4589-ac0c-eee9334e27f0'
resource_group = 'G34918111-rg'
workspace_name = 'azure_myphd_praxis'

workspace = Workspace(subscription_id, resource_group, workspace_name)

datastore = Datastore.get(workspace, "workspaceblobstore")
dataset = Dataset.Tabular.from_delimited_files(path=(datastore, 'UI/2025-03-22/training_subset2.csv'))

training_data = dataset
# to visualize dataset
mytraining_data = dataset.to_pandas_dataframe()

{'infer_column_types': 'False', 'activity': 'to_pandas_dataframe'}
{'infer_column_types': 'False', 'activity': 'to_pandas_dataframe', 'activityApp': 'TabularDataset'}


In [4]:
mytraining_data.head()

Unnamed: 0,Protocol,Flow Duration,Total Fwd Packet,Total Bwd packets,Total Length of Fwd Packet,Total Length of Bwd Packet,Fwd Packet Length Max,Fwd Packet Length Min,Fwd Packet Length Mean,Fwd Packet Length Std,...,Active Mean,Active Std,Active Max,Active Min,Idle Mean,Idle Std,Idle Max,Idle Min,Total TCP Flow Time,Label
0,6,4838,5,5,310,935,310,0,62.0,138.636215,...,0.0,0.0,0,0,0.0,0.0,0,0,4838,DDoS-HOIC
1,6,117880,5,5,302,935,302,0,60.4,135.058506,...,0.0,0.0,0,0,0.0,0.0,0,0,117880,DoS Hulk
2,6,959813,8,5,1164,935,291,0,145.5,155.546043,...,0.0,0.0,0,0,0.0,0.0,0,0,959813,DoS Hulk
3,6,152344,5,5,354,935,354,0,70.8,158.313613,...,0.0,0.0,0,0,0.0,0.0,0,0,152344,DoS Hulk
4,6,11674,4,1,77,0,46,0,19.25,23.056091,...,0.0,0.0,0,0,0.0,0.0,0,0,172114889,Benign


In [5]:
# Loading testing dataset
datastore = Datastore.get(workspace, "workspaceblobstore")
dataset2 = Dataset.Tabular.from_delimited_files(path=(datastore, 'UI/2025-03-22/testing.csv'))
testing_data = dataset2
mytesting_data = dataset2.to_pandas_dataframe()
mytesting_data.head()

{'infer_column_types': 'False', 'activity': 'to_pandas_dataframe'}
{'infer_column_types': 'False', 'activity': 'to_pandas_dataframe', 'activityApp': 'TabularDataset'}


Unnamed: 0,Protocol,Flow Duration,Total Fwd Packet,Total Bwd packets,Total Length of Fwd Packet,Total Length of Bwd Packet,Fwd Packet Length Max,Fwd Packet Length Min,Fwd Packet Length Mean,Fwd Packet Length Std,...,Active Mean,Active Std,Active Max,Active Min,Idle Mean,Idle Std,Idle Max,Idle Min,Total TCP Flow Time,Label
0,17,11284,1,1,33,94,33,33,33.0,0.0,...,0.0,0.0,0,0,0.0,0.0,0,0,0,Benign
1,6,24209,4,2,77,31,46,0,19.25,23.056091,...,0.0,0.0,0,0,0.0,0.0,0,0,170385666,Benign
2,6,218096,3,3,0,0,0,0,0.0,0.0,...,0.0,0.0,0,0,0.0,0.0,0,0,218096,Benign
3,6,1033,5,5,161,488,161,0,32.2,72.001389,...,0.0,0.0,0,0,0.0,0.0,0,0,1033,Benign
4,6,4238522,11,7,1148,1581,677,0,104.363636,202.294475,...,0.0,0.0,0,0,0.0,0.0,0,0,4238522,Benign


In [6]:
mytesting_data['Label'].value_counts()

Benign                      5002433
DoS Hulk                     216379
DDoS-HOIC                    129875
DDoS-LOIC-HTTP                34720
Botnet Ares                   17150
SSH-BruteForce                11303
DoS GoldenEye                  2707
DoS Slowloris                  1019
DDoS-LOIC-UDP                   303
Web Attack - Brute Force         16
Web Attack - XSS                 14
Name: Label, dtype: int64

In [7]:
mytraining_data['Label'].value_counts()

Benign                      750000
DoS Hulk                    721264
DDoS-HOIC                   432917
DDoS-LOIC-HTTP              115731
Botnet Ares                  57169
SSH-BruteForce               37679
DoS GoldenEye                25000
DDoS-LOIC-UDP                25000
DoS Slowloris                24994
Web Attack - XSS             10447
Web Attack - Brute Force     10406
Name: Label, dtype: int64

In [8]:
# Loading testing dataset
datastore = Datastore.get(workspace, "workspaceblobstore")
dataset3 = Dataset.Tabular.from_delimited_files(path=(datastore, 'UI/2025-03-22/validation.csv'))
validation_data = dataset3
myvalidation_data = dataset3.to_pandas_dataframe()
myvalidation_data.head()

{'infer_column_types': 'False', 'activity': 'to_pandas_dataframe'}
{'infer_column_types': 'False', 'activity': 'to_pandas_dataframe', 'activityApp': 'TabularDataset'}


Unnamed: 0,Protocol,Flow Duration,Total Fwd Packet,Total Bwd packets,Total Length of Fwd Packet,Total Length of Bwd Packet,Fwd Packet Length Max,Fwd Packet Length Min,Fwd Packet Length Mean,Fwd Packet Length Std,...,Active Mean,Active Std,Active Max,Active Min,Idle Mean,Idle Std,Idle Max,Idle Min,Total TCP Flow Time,Label
0,6,117060497,35,29,2552,6419,405,0,72.914286,83.162977,...,374939.0,420152.949886,672032,77846,58116350.0,232072.445585,58280453,57952253,117060497,Benign
1,6,117268395,16,18,1005,5592,362,0,62.8125,101.487746,...,327000.0,241901.229844,498050,155950,58229200.0,370398.088335,58491106,57967284,117268395,Benign
2,6,61394552,15,14,667,3391,333,0,44.466667,99.398093,...,211008.666667,346886.613892,919088,69357,10009840.0,37887.43089,10031069,9936831,61394552,Benign
3,6,5117283,9,8,1308,2364,436,0,145.333333,218.0,...,0.0,0.0,0,0,0.0,0.0,0,0,5117283,Benign
4,6,1700526,8,7,1144,1581,677,0,143.0,227.969923,...,0.0,0.0,0,0,0.0,0.0,0,0,1700526,Benign


In [9]:
myvalidation_data['Label'].value_counts()

Benign                      3334954
DoS Hulk                     144253
DDoS-HOIC                     86584
DDoS-LOIC-HTTP                23146
Botnet Ares                   11434
SSH-BruteForce                 7536
DoS GoldenEye                  1805
DoS Slowloris                   679
DDoS-LOIC-UDP                   202
Web Attack - Brute Force         10
Web Attack - XSS                  9
Name: Label, dtype: int64

# Creating compute for training model

In [10]:
from azureml.core.compute import AmlCompute, ComputeTarget

cpu_compute_target = "mycpu-cluster"

# Checking if the compute target already exists in the workspace
cts = ws.compute_targets
if cpu_compute_target in cts and cts[cpu_compute_target].type == 'AmlCompute':
    print(f"Compute target '{cpu_compute_target}' exists.")
    compute_target = cts[cpu_compute_target]
    print(f"Details: {compute_target}")
else:
    print(f"Compute target '{cpu_compute_target}' does not exist.")
    print("Creating a new CPU compute target...")
    provisioning_config = AmlCompute.provisioning_configuration(vm_size="Standard_E8s_v3", min_nodes=0, max_nodes=4)
    compute_target = ComputeTarget.create(ws, cpu_compute_target, provisioning_config)

    print('Checking cluster status...')
    compute_target.wait_for_completion(show_output=True, min_node_count=None, timeout_in_minutes=20)
    print(f"Compute target '{cpu_compute_target}' created successfully.")

# Printing compute target details
print(f"Compute target '{cpu_compute_target}' details:")
print(compute_target)



Compute target 'mycpu-cluster' exists.
Details: AmlCompute(workspace=Workspace.create(name='azure_myphd_praxis', subscription_id='7170b4e7-9dbe-4589-ac0c-eee9334e27f0', resource_group='g34918111-rg'), name=mycpu-cluster, id=/subscriptions/7170b4e7-9dbe-4589-ac0c-eee9334e27f0/resourceGroups/g34918111-rg/providers/Microsoft.MachineLearningServices/workspaces/azure_myphd_praxis/computes/mycpu-cluster, type=AmlCompute, provisioning_state=Succeeded, location=eastus2, tags={})
Compute target 'mycpu-cluster' details:
AmlCompute(workspace=Workspace.create(name='azure_myphd_praxis', subscription_id='7170b4e7-9dbe-4589-ac0c-eee9334e27f0', resource_group='g34918111-rg'), name=mycpu-cluster, id=/subscriptions/7170b4e7-9dbe-4589-ac0c-eee9334e27f0/resourceGroups/g34918111-rg/providers/Microsoft.MachineLearningServices/workspaces/azure_myphd_praxis/computes/mycpu-cluster, type=AmlCompute, provisioning_state=Succeeded, location=eastus2, tags={})


## Creating the AutoML configuration

In [11]:
# setting up automl configuration # maximum training time to 5 hours
target_label ='Label'
automl_settings = {
    "primary_metric":'AUC_weighted',
    "enable_early_stopping": True,
    "verbosity": logging.INFO,
    "enable_stack_ensemble":True,
    "enable_voting_ensemble":True,
    "model_explainability": True,
    "enable_code_generation": True,
    "experiment_timeout_hours": 5,  

}
automl_config = AutoMLConfig(
    task='classification',
    compute_target=cpu_compute_target,
    debug_log='automl_praxis_errors.log',
    training_data=training_data,
    validation_data = validation_data,
    test_data=testing_data,
    label_column_name= target_label,
    featurization='auto',
    **automl_settings
)


In [None]:
# running the automl experiment
automl_run = experiment.submit(automl_config, show_output=True)
automl_run.wait_for_completion(show_output=True) 

Submitting remote run.
No run_configuration provided, running on mycpu-cluster with default configuration
Running on remote compute: mycpu-cluster


Experiment,Id,Type,Status,Details Page,Docs Page
subset2_final_azure_praxis_03_22,AutoML_aea54d4d-b1d9-42a5-8b13-7a096210e8bf,automl,NotStarted,Link to Azure Machine Learning studio,Link to Documentation



Current status: DatasetEvaluation. Gathering dataset statistics.
Current status: FeaturesGeneration. Generating features for the dataset.
Current status: DatasetFeaturization. Beginning to fit featurizers and featurize the dataset.
Current status: DatasetBalancing. Performing class balancing sweeping
Current status: ModelSelection. Beginning model selection.

********************************************************************************************
DATA GUARDRAILS: 

TYPE:         Class balancing detection
STATUS:       ALERTED
DESCRIPTION:  To decrease model bias, please cancel the current run and fix balancing problem.
              Learn more about imbalanced data: https://aka.ms/AutomatedMLImbalancedData
DETAILS:      Imbalanced data can lead to a falsely perceived positive effect of a model's accuracy because the input data has bias towards one class.
+------------------------------+--------------------------------+--------------------------------------+
|Size of the smallest c

# Loading back Automl Status after disconnection to confirm status

In [1]:
import logging

from matplotlib import pyplot as plt
import pandas as pd
import numpy as np
import azureml.core
from azureml.core import Workspace, Experiment
from azureml.train.automl import AutoMLConfig
from azureml.core.dataset import Dataset


In [2]:
ws = Workspace.from_config()

# choosing a name for your experiment
experiment_name = "subset2_final_azure_praxis_03_22"

experiment = Experiment(ws, experiment_name)

In [3]:


from azureml.train.automl.run import AutoMLRun


run_id = 'AutoML_aea54d4d-b1d9-42a5-8b13-7a096210e8bf' 
 


In [4]:
from azureml.train.automl.run import AutoMLRun
automl_run = AutoMLRun(experiment=experiment, run_id=run_id)
automl_run.wait_for_completion(show_output=True) 

Experiment,Id,Type,Status,Details Page,Docs Page
subset2_final_azure_praxis_03_22,AutoML_aea54d4d-b1d9-42a5-8b13-7a096210e8bf,automl,Completed,Link to Azure Machine Learning studio,Link to Documentation




********************************************************************************************
DATA GUARDRAILS: 

TYPE:         Class balancing detection
STATUS:       ALERTED
DESCRIPTION:  To decrease model bias, please cancel the current run and fix balancing problem.
              Learn more about imbalanced data: https://aka.ms/AutomatedMLImbalancedData
DETAILS:      Imbalanced data can lead to a falsely perceived positive effect of a model's accuracy because the input data has bias towards one class.
+------------------------------+--------------------------------+--------------------------------------+
|Size of the smallest class    |Name/Label of the smallest class|Number of samples in the training data|
|10406                         |Web Attack - Brute Force        |2210607                               |
+------------------------------+--------------------------------+--------------------------------------+

********************************************************************

{'runId': 'AutoML_aea54d4d-b1d9-42a5-8b13-7a096210e8bf',
 'target': 'mycpu-cluster',
 'status': 'Completed',
 'startTimeUtc': '2025-03-23T04:43:10.010286Z',
 'endTimeUtc': '2025-03-23T14:15:19.932257Z',
 'services': {},
   'message': 'Experiment timeout reached, hence experiment stopped. Current experiment timeout: 5 hour(s) 0 minute(s)'}],
 'properties': {'num_iterations': '1000',
  'training_type': 'TrainFull',
  'acquisition_function': 'EI',
  'primary_metric': 'AUC_weighted',
  'train_split': '0',
  'acquisition_parameter': '0',
  'num_cross_validation': None,
  'target': 'mycpu-cluster',
  'DataPrepJsonString': '{\\"training_data\\": {\\"datasetId\\": \\"683d0e6b-9da9-450d-888e-90b69c2de007\\"}, \\"validation_data\\": {\\"datasetId\\": \\"558c064b-2660-4a08-8ed3-0317c08fdc33\\"}, \\"test_data\\": {\\"datasetId\\": \\"807d7148-ac48-4cc2-b29b-e2929d5e00ad\\"}, \\"datasets\\": 0}',
  'EnableSubsampling': None,
  'runTemplate': 'AutoML',
  'azureml.runsource': 'automl',
  'display_tas

# Getting the Result Details

In [5]:

# Retrieving the best model 
best_run, fitted_model = automl_run.get_output()

test_run = next(best_run.get_children(type='automl.model_test'))
test_run.wait_for_completion(show_output=True, wait_post_processing=True)

RunId: c07a17e3-616f-4bbe-8c86-68e53d4700fb
Web View: https://ml.azure.com/runs/c07a17e3-616f-4bbe-8c86-68e53d4700fb?wsid=/subscriptions/7170b4e7-9dbe-4589-ac0c-eee9334e27f0/resourcegroups/g34918111-rg/workspaces/azure_myphd_praxis&tid=d689239e-c492-40c6-b391-2c5951d31d14

Execution Summary
RunId: c07a17e3-616f-4bbe-8c86-68e53d4700fb
Web View: https://ml.azure.com/runs/c07a17e3-616f-4bbe-8c86-68e53d4700fb?wsid=/subscriptions/7170b4e7-9dbe-4589-ac0c-eee9334e27f0/resourcegroups/g34918111-rg/workspaces/azure_myphd_praxis&tid=d689239e-c492-40c6-b391-2c5951d31d14



{'runId': 'c07a17e3-616f-4bbe-8c86-68e53d4700fb',
 'target': 'mycpu-cluster',
 'status': 'Completed',
 'startTimeUtc': '2025-03-23T14:18:49.070707Z',
 'endTimeUtc': '2025-03-23T14:44:54.769028Z',
 'services': {},
 'properties': {'azureml.runsource': 'automl',
  'parentRunId': 'AutoML_aea54d4d-b1d9-42a5-8b13-7a096210e8bf_20',
  '_azureml.ComputeTargetType': 'amlctrain',
  '_azureml.ClusterName': 'mycpu-cluster',
  'ContentSnapshotId': 'cc63f103-3577-41e3-9021-7c450455dba1',
  'ProcessInfoFile': 'azureml-logs/process_info.json',
  'ProcessStatusFile': 'azureml-logs/process_status.json',
  'inference_results': "['ExperimentRun/dcid.c07a17e3-616f-4bbe-8c86-68e53d4700fb/predictions/predictions.csv']"},
 'inputDatasets': [{'dataset': {'id': '807d7148-ac48-4cc2-b29b-e2929d5e00ad'}, 'consumptionDetails': {'type': 'RunInput', 'inputName': 'test_data', 'mechanism': 'Direct'}}],
 'outputDatasets': [{'identifier': {'savedId': 'e815e3ee-557c-4dbd-86c2-75a06b0fb8ec'},
   'outputType': 'Reference',
 

In [6]:
# Get model metrics
automl_run_metrics = automl_run.get_metrics()
for name, value in automl_run_metrics.items():
    print(f"{name}: {value}")

experiment_status_description: ['Gathering dataset statistics.', 'Generating features for the dataset.', 'Beginning to fit featurizers and featurize the dataset.', 'Completed fit featurizers and featurizing the dataset.', 'Performing class balancing sweeping', 'Beginning model selection.', 'Best run model explanations started', 'Model explanations data setup completed', 'Choosing LightGBM as the surrogate model for explanations', 'Computation of engineered features started', 'Computation of engineered features completed', 'Computation of raw features started', 'Computation of raw features completed', 'Best run model explanations completed']
experiment_status: ['DatasetEvaluation', 'FeaturesGeneration', 'DatasetFeaturization', 'DatasetFeaturizationCompleted', 'DatasetBalancing', 'ModelSelection', 'BestRunExplainModel', 'ModelExplanationDataSetSetup', 'PickSurrogateModel', 'EngineeredFeatureExplanations', 'EngineeredFeatureExplanations', 'RawFeaturesExplanations', 'RawFeaturesExplanation

In [7]:
# Get test metrics
test_run_metrics = test_run.get_metrics()
for name, value in test_run_metrics.items():
    print(f"{name}: {value}")

precision_score_weighted: 0.9999970459817586
average_precision_score_micro: 0.9999999873269212
precision_score_macro: 0.9999337304377768
precision_score_micro: 0.9999970457460682
recall_score_micro: 0.9999970457460682
weighted_accuracy: 0.9999974079087451
accuracy: 0.9999970457460682
log_loss: 0.021115712854875172
recall_score_weighted: 0.9999970457460682
AUC_weighted: 0.9999999903836737
AUC_micro: 0.9999999986676851
norm_macro_recall: 0.9997053338457871
matthews_correlation: 0.9999795750657372
average_precision_score_macro: 0.9999990463057606
f1_score_macro: 0.9998327246084937
AUC_macro: 0.9999999988500181
recall_score_macro: 0.9997321216779883
average_precision_score_weighted: 0.9999999969908266
f1_score_weighted: 0.9999970453953799
balanced_accuracy: 0.9997321216779883
f1_score_micro: 0.9999970457460682
accuracy_table: aml://artifactId/ExperimentRun/dcid.c07a17e3-616f-4bbe-8c86-68e53d4700fb/accuracy_table
confusion_matrix: aml://artifactId/ExperimentRun/dcid.c07a17e3-616f-4bbe-8c86-

In [8]:
# Get test predictions as a Dataset
test_run_details = test_run.get_details()
dataset_id = test_run_details['outputDatasets'][0]['identifier']['savedId']
test_run_predictions = Dataset.get_by_id(ws, dataset_id)
predictions_df = test_run_predictions.to_pandas_dataframe()

{'infer_column_types': 'False', 'activity': 'to_pandas_dataframe'}
{'infer_column_types': 'False', 'activity': 'to_pandas_dataframe', 'activityApp': 'TabularDataset'}


In [9]:
predictions_df.head(200)

Unnamed: 0,Label_orig,Label_predicted,Benign_predicted_proba,Botnet Ares_predicted_proba,DDoS-HOIC_predicted_proba,DDoS-LOIC-HTTP_predicted_proba,DDoS-LOIC-UDP_predicted_proba,DoS GoldenEye_predicted_proba,DoS Hulk_predicted_proba,DoS Slowloris_predicted_proba,...,Fwd Seg Size Min_orig,Active Mean_orig,Active Std_orig,Active Max_orig,Active Min_orig,Idle Mean_orig,Idle Std_orig,Idle Max_orig,Idle Min_orig,Total TCP Flow Time_orig
0,Benign,Benign,0.99,0.00,0.00,0.00,0.00,0.00,0.00,0.00,...,8,0.00,0.00,0,0,0.00,0.00,0,0,0
1,Benign,Benign,0.98,0.00,0.00,0.01,0.00,0.00,0.00,0.00,...,20,0.00,0.00,0,0,0.00,0.00,0,0,170385666
2,Benign,Benign,0.99,0.00,0.00,0.00,0.00,0.00,0.00,0.00,...,20,0.00,0.00,0,0,0.00,0.00,0,0,218096
3,Benign,Benign,0.98,0.01,0.00,0.00,0.00,0.00,0.00,0.00,...,20,0.00,0.00,0,0,0.00,0.00,0,0,1033
4,Benign,Benign,0.99,0.00,0.00,0.00,0.00,0.00,0.00,0.00,...,20,0.00,0.00,0,0,0.00,0.00,0,0,4238522
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
195,Benign,Benign,0.97,0.00,0.00,0.01,0.02,0.00,0.00,0.00,...,20,405154.00,0.00,405154,405154,58794664.00,0.00,58794664,58794664,61391483
196,Benign,Benign,0.98,0.00,0.00,0.00,0.01,0.00,0.00,0.00,...,20,61906.50,87509.41,123785,28,29973916.50,34119666.42,54100164,5847669,60071863
197,Benign,Benign,0.98,0.00,0.00,0.00,0.00,0.00,0.00,0.00,...,20,4790307.00,0.00,4790307,4790307,59179589.00,0.00,59179589,59179589,64065818
198,Benign,Benign,0.98,0.00,0.00,0.00,0.00,0.00,0.00,0.00,...,20,189148.00,0.00,189148,189148,57967059.00,0.00,57967059,57967059,59355366


In [10]:
predictions_df.tail(50)

Unnamed: 0,Label_orig,Label_predicted,Benign_predicted_proba,Botnet Ares_predicted_proba,DDoS-HOIC_predicted_proba,DDoS-LOIC-HTTP_predicted_proba,DDoS-LOIC-UDP_predicted_proba,DoS GoldenEye_predicted_proba,DoS Hulk_predicted_proba,DoS Slowloris_predicted_proba,...,Fwd Seg Size Min_orig,Active Mean_orig,Active Std_orig,Active Max_orig,Active Min_orig,Idle Mean_orig,Idle Std_orig,Idle Max_orig,Idle Min_orig,Total TCP Flow Time_orig
5415869,Benign,Benign,0.94,0.0,0.05,0.0,0.0,0.0,0.0,0.0,...,20,0.0,0.0,0,0,0.0,0.0,0,0,2768824
5415870,Benign,Benign,0.99,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,20,0.0,0.0,0,0,0.0,0.0,0,0,1115
5415871,Benign,Benign,0.97,0.0,0.01,0.0,0.01,0.0,0.0,0.0,...,20,81121.5,4909.44,84593,77650,58583225.0,693634.98,59073699,58092751,117406388
5415872,Benign,Benign,0.99,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,8,0.0,0.0,0,0,0.0,0.0,0,0,0
5415873,Benign,Benign,0.99,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,20,0.0,0.0,0,0,0.0,0.0,0,0,3440677
5415874,Benign,Benign,0.99,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,8,0.0,0.0,0,0,0.0,0.0,0,0,0
5415875,Benign,Benign,0.99,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,20,0.0,0.0,0,0,0.0,0.0,0,0,1135
5415876,Benign,Benign,0.96,0.0,0.0,0.0,0.02,0.0,0.0,0.01,...,20,499557.5,412454.68,791207,207908,57115100.5,2142340.51,58629964,55600237,116737454
5415877,Benign,Benign,0.99,0.0,0.0,0.0,0.01,0.0,0.0,0.0,...,20,0.0,0.0,0,0,0.0,0.0,0,0,831263
5415878,Benign,Benign,0.99,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,20,0.0,0.0,0,0,0.0,0.0,0,0,3442992


# transparency

In [11]:
import json
# Download the featurization summary JSON file locally
best_run.download_file(
    "outputs/featurization_summary.json", "featurization_summary.json"
)

# Render the JSON as a pandas DataFrame
with open("featurization_summary.json", "r") as f:
    records = json.load(f)

pd.DataFrame.from_records(records)

Unnamed: 0,RawFeatureName,TypeDetected,Dropped,EngineeredFeatureCount,Transformations,TransformationParams
0,Protocol,Categorical,No,4,[StringCast-CharGramCountVectorizer],"{'Transformer1': {'Input': ['Protocol'], 'Tran..."
1,FIN Flag Count,Categorical,No,14,[StringCast-CharGramCountVectorizer],"{'Transformer1': {'Input': ['FIN Flag Count'],..."
2,SYN Flag Count,Categorical,No,25,[StringCast-CharGramCountVectorizer],"{'Transformer1': {'Input': ['SYN Flag Count'],..."
3,RST Flag Count,Categorical,No,1,[ModeCatImputer-StringCast-LabelEncoder],"{'Transformer1': {'Input': ['RST Flag Count'],..."
4,CWR Flag Count,Categorical,No,11,[StringCast-CharGramCountVectorizer],"{'Transformer1': {'Input': ['CWR Flag Count'],..."
...,...,...,...,...,...,...
74,Idle Max,Numeric,No,1,[MeanImputer],"{'Transformer1': {'Input': ['Idle Max'], 'Tran..."
75,Idle Min,Numeric,No,1,[MeanImputer],"{'Transformer1': {'Input': ['Idle Min'], 'Tran..."
76,Total TCP Flow Time,Numeric,No,1,[MeanImputer],{'Transformer1': {'Input': ['Total TCP Flow Ti...
77,Fwd URG Flags,Ignore,Yes,0,[],"{'Transformer1': {'Input': ['Fwd URG Flags'], ..."


# Result

In [12]:

from azureml.widgets import RunDetails

RunDetails(automl_run).show()

2025-03-23 15:46:58.950858: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:485] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2025-03-23 15:46:59.719260: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:8454] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2025-03-23 15:46:59.969864: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1452] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2025-03-23 15:47:01.750756: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.


_AutoMLWidget(widget_settings={'childWidgetDisplay': 'popup', 'send_telemetry': False, 'log_level': 'INFO', 's…

# Retrieve the Best Model's explanation


In [13]:
best_run, fitted_model = automl_run.get_output()
best_run
fitted_model

In [14]:
best_run = automl_run.get_best_child()
best_run

Experiment,Id,Type,Status,Details Page,Docs Page
subset2_final_azure_praxis_03_22,AutoML_aea54d4d-b1d9-42a5-8b13-7a096210e8bf_20,azureml.scriptrun,Completed,Link to Azure Machine Learning studio,Link to Documentation


In [15]:
best_run.properties

{'runTemplate': 'automl_child',
 'pipeline_id': '__AutoML_Ensemble__',
 'pipeline_spec': '{"pipeline_id":"__AutoML_Ensemble__","objects":[{"module":"azureml.train.automl.ensemble","class_name":"Ensemble","spec_class":"sklearn","param_args":[],"param_kwargs":{"automl_settings":"{\'task_type\':\'classification\',\'primary_metric\':\'AUC_weighted\',\'verbosity\':20,\'ensemble_iterations\':15,\'is_timeseries\':False,\'name\':\'subset2_final_azure_praxis_03_22\',\'compute_target\':\'mycpu-cluster\',\'subscription_id\':\'7170b4e7-9dbe-4589-ac0c-eee9334e27f0\',\'region\':\'eastus2\',\'spark_service\':None}","ensemble_run_id":"AutoML_aea54d4d-b1d9-42a5-8b13-7a096210e8bf_20","experiment_name":"subset2_final_azure_praxis_03_22","workspace_name":"azure_myphd_praxis","subscription_id":"7170b4e7-9dbe-4589-ac0c-eee9334e27f0","resource_group_name":"g34918111-rg"}}]}',
 'training_percent': '100',
 'predicted_cost': None,
 'iteration': '20',
 '_aml_system_scenario_identification': 'Remote.Child',
 '_az