# Azure Machine Learning Jobs Cost Calculator

## Before you start

You'll need the latest version of the **azure-ai-ml** package to run the code in this notebook. Run the cell below to verify that it is installed.

> **Note**:
> If the **azure-ai-ml** package is not installed, run `pip install azure-ai-ml` to install it.

In [None]:
pip install azure-ai-ml azure-identity

In [None]:
pip show azure-ai-ml

## Connect to your workspace

With the required SDK packages installed, now you're ready to connect to your workspace.

To connect to a workspace, we need identifier parameters - a subscription ID, resource group name, and workspace name. Since you're working with a compute instance, managed by Azure Machine Learning, you can use the default values to connect to the workspace.

In [1]:
from azure.identity import DefaultAzureCredential, InteractiveBrowserCredential
from azure.ai.ml import MLClient

try:
    credential = DefaultAzureCredential()
    # Check if given credential can get token successfully.
    credential.get_token("https://management.azure.com/.default")
except Exception as ex:
    # Fall back to InteractiveBrowserCredential in case DefaultAzureCredential not work
    credential = InteractiveBrowserCredential()


In [69]:
# Get a handle to workspace
ml_client = MLClient.from_config(credential=credential)

Found the config file in: /config.json
Overriding of current TracerProvider is not allowed
Overriding of current LoggerProvider is not allowed
Overriding of current MeterProvider is not allowed
Attempting to instrument while already instrumented
Attempting to instrument while already instrumented
Attempting to instrument while already instrumented
Attempting to instrument while already instrumented
Attempting to instrument while already instrumented


## Assumptions

1. I have jobs with `envType` Tag with values:  `test-env-1`,  `test-env-2` and  `test-env-3`
![Jobs List](img/azure_ml_studio_jobs_list.png)

2. Compute Cluster named `aml-cluster` with 0-2 `Standard_DS3_v2` VMs
![Compute Cluster](img/azure_ml_studio_cluster.png)


## List Jobs Runs

In [24]:
runs = [run for run in ml_client.jobs.list()]
len(runs)

69

## Print job data

In [25]:
for run in runs[0:1]:
    print(run)

type: command
outputs:
  default:
    mode: rw_mount
    type: uri_folder
    path: azureml://datastores/workspaceartifactstore/ExperimentRun/dcid.dynamic_apple_3kpvlg1l83
environment: azureml:AzureML-sklearn-0.24-ubuntu18.04-py37-cpu@latest
resources:
  instance_count: 1
  shm_size: 2g
component:
  name: dynamic_apple_3kpvlg1l83
  display_name: diabetes-pythonv2-train
  tags:
    envType: test-env-1
    _aml_system_ComputeTargetStatus: '{"AllocationState":"steady","PreparingNodeCount":0,"RunningNodeCount":0,"CurrentNodeCount":2}'
  type: command
  outputs:
    default:
      type: uri_folder
      mode: rw_mount
  command: python diabetes-training.py
  environment: azureml:AzureML-sklearn-0.24-ubuntu18.04-py37-cpu@latest
  code: azureml:/subscriptions/a647c11c-fe4c-43c4-b672-963b71adab36/resourceGroups/my-machine-learning-rg-eastus2-112024/providers/Microsoft.MachineLearningServices/workspaces/my-azure-ml-ws-eastus2-112024/codes/17926021-1345-41b9-919e-cb1b7af5f700/versions/1
  creati

In [26]:
from datetime import datetime

def calc_job_duration(job_start_date, job_end_date):
    start_date = datetime.strptime(job_start_date, "%Y-%m-%d %H:%M:%S")
    end_date = datetime.strptime(job_end_date, "%Y-%m-%d %H:%M:%S")
    difference = end_date - start_date
    return(difference.seconds)

In [27]:
for run in runs[0:1]:
    print(f"name: {run.component.name}")
    print(f"type: {run.type}")
    print(f"display_name: {run.component.display_name}")
    print(f"envType: {run.component.tags['envType']}")
    print(f"compute: {run.compute}")
    print(f"StartTimeUtc: {run.properties['StartTimeUtc']}")
    print(f"EndTimeUtc: {run.properties['EndTimeUtc']}")
    print(f"Job Duration (Seconds): {calc_job_duration(run.properties['StartTimeUtc'], run.properties['EndTimeUtc'])}")

name: dynamic_apple_3kpvlg1l83
type: command
display_name: diabetes-pythonv2-train
envType: test-env-1
compute: aml-cluster
StartTimeUtc: 2025-03-23 12:13:04
EndTimeUtc: 2025-03-23 12:13:12
Job Duration (Seconds): 8


## Prepare a list of jobs data relevant for costs

In [28]:
runs_list = []
for run in runs:
    if 'command' in run.type and 'envType' in run.component.tags:
        job = dict(name=run.component.name, 
                 display_name=run.component.display_name, 
                 tag=run.component.tags['envType'],
                 compute=run.compute,
                 duration_sec=calc_job_duration(run.properties['StartTimeUtc'], run.properties['EndTimeUtc'])
        )
        
        runs_list.append(job)

In [29]:
len(runs_list)

39

In [30]:
runs_list

[{'name': 'dynamic_apple_3kpvlg1l83',
  'display_name': 'diabetes-pythonv2-train',
  'tag': 'test-env-1',
  'compute': 'aml-cluster',
  'duration_sec': 8},
 {'name': 'olden_fish_4q2vxwl4bh',
  'display_name': 'diabetes-pythonv2-train',
  'tag': 'test-env-1',
  'compute': 'aml-cluster',
  'duration_sec': 16},
 {'name': 'busy_arm_ml2zxk4rzb',
  'display_name': 'diabetes-pythonv2-train',
  'tag': 'test-env-1',
  'compute': 'aml-cluster',
  'duration_sec': 9},
 {'name': 'affable_window_nlp1l4vt7h',
  'display_name': 'diabetes-pythonv2-train',
  'tag': 'test-env-1',
  'compute': 'aml-cluster',
  'duration_sec': 9},
 {'name': 'tidy_vinegar_6x5hwk37jr',
  'display_name': 'diabetes-pythonv2-train',
  'tag': 'test-env-1',
  'compute': 'aml-cluster',
  'duration_sec': 8},
 {'name': 'icy_board_bhyg1zt02q',
  'display_name': 'diabetes-pythonv2-train',
  'tag': 'test-env-1',
  'compute': 'aml-cluster',
  'duration_sec': 9},
 {'name': 'icy_cat_1cfbrjrfxv',
  'display_name': 'diabetes-pythonv2-train'

## Create a dataframe to help us calculate

In [31]:
import pandas as pd

In [32]:
df = pd.DataFrame(runs_list)

In [33]:
df

Unnamed: 0,name,display_name,tag,compute,duration_sec
0,dynamic_apple_3kpvlg1l83,diabetes-pythonv2-train,test-env-1,aml-cluster,8
1,olden_fish_4q2vxwl4bh,diabetes-pythonv2-train,test-env-1,aml-cluster,16
2,busy_arm_ml2zxk4rzb,diabetes-pythonv2-train,test-env-1,aml-cluster,9
3,affable_window_nlp1l4vt7h,diabetes-pythonv2-train,test-env-1,aml-cluster,9
4,tidy_vinegar_6x5hwk37jr,diabetes-pythonv2-train,test-env-1,aml-cluster,8
5,icy_board_bhyg1zt02q,diabetes-pythonv2-train,test-env-1,aml-cluster,9
6,icy_cat_1cfbrjrfxv,diabetes-pythonv2-train,test-env-1,aml-cluster,9
7,loving_star_3l6thwgfd7,diabetes-pythonv2-train,test-env-3,aml-cluster,9
8,frank_double_wp877r58sg,diabetes-pythonv2-train,test-env-3,aml-cluster,9
9,ivory_bucket_8hpcsyscys,diabetes-pythonv2-train,test-env-3,aml-cluster,10


In [34]:
totals_df = df.groupby(['tag']).sum()
totals_df = totals_df.reset_index()
totals_df

Unnamed: 0,tag,name,display_name,compute,duration_sec
0,test-env-1,dynamic_apple_3kpvlg1l83olden_fish_4q2vxwl4bhb...,diabetes-pythonv2-traindiabetes-pythonv2-train...,aml-clusteraml-clusteraml-clusteraml-clusteram...,138
1,test-env-2,helpful_ant_y1x4qp8gy9teal_worm_mfd6nn1cjckhak...,diabetes-pythonv2-traindiabetes-pythonv2-train...,aml-clusteraml-clusteraml-clusteraml-cluster,228
2,test-env-3,loving_star_3l6thwgfd7frank_double_wp877r58sgi...,diabetes-pythonv2-traindiabetes-pythonv2-train...,aml-clusteraml-clusteraml-clusteraml-clusteram...,481


## Total seconds per Tag

In [35]:
totals_df[['tag','duration_sec']] 

Unnamed: 0,tag,duration_sec
0,test-env-1,138
1,test-env-2,228
2,test-env-3,481


In [36]:
COMPUTE_CLUSER_NAME = "aml-cluster"

In [37]:
compute_obj = ml_client.compute.get(COMPUTE_CLUSER_NAME)

In [38]:
compute_obj

AmlCompute({'type': 'amlcompute', 'created_on': None, 'provisioning_state': 'Succeeded', 'provisioning_errors': None, 'name': 'aml-cluster', 'description': None, 'tags': None, 'properties': {}, 'print_as_yaml': False, 'id': '/subscriptions/a647c11c-fe4c-43c4-b672-963b71adab36/resourceGroups/my-machine-learning-rg-eastus2-112024/providers/Microsoft.MachineLearningServices/workspaces/my-azure-ml-ws-eastus2-112024/computes/aml-cluster', 'Resource__source_path': '', 'base_path': '/mnt/batch/tasks/shared/LS_root/mounts/clusters/compute-instance-cpu-1a/code/eitansela-azureml-examples/cost_calculator', 'creation_context': None, 'serialize': <msrest.serialization.Serializer object at 0x7f18a621f340>, 'resource_id': None, 'location': 'eastus2', 'size': 'Standard_DS3_v2', 'min_instances': 0, 'max_instances': 2, 'idle_time_before_scale_down': 600.0, 'identity': None, 'ssh_public_access_enabled': True, 'ssh_settings': <azure.ai.ml.entities._compute.aml_compute.AmlComputeSshSettings object at 0x7f1

In [39]:
compute_obj.size, compute_obj.location

('Standard_DS3_v2', 'eastus2')

## Azure Retail Prices API

In [63]:
import requests
import json

api_url = "https://prices.azure.com/api/retail/prices?api-version=2021-10-01-preview"
query = "armRegionName eq '"+compute_obj.location+"' and armSkuName eq '"+compute_obj.size+"' " + \
        "and priceType eq 'Consumption' "+ \
        "and contains(meterName, 'Spot') eq false " + \
        "and contains(meterName, 'Low Priority') eq false " + \
        "and contains(productName, 'Windows') eq false "
response = requests.get(api_url, params={'$filter': query})
json_data = json.loads(response.text)
json_data

{'BillingCurrency': 'USD',
 'CustomerEntityId': 'Default',
 'CustomerEntityType': 'Retail',
 'Items': [{'currencyCode': 'USD',
   'tierMinimumUnits': 0.0,
   'retailPrice': 0.229,
   'unitPrice': 0.229,
   'armRegionName': 'eastus2',
   'location': 'US East 2',
   'effectiveStartDate': '2016-09-01T00:00:00Z',
   'meterId': '51b394a1-a487-4d04-883b-a38c04b1d9eb',
   'meterName': 'DS3 v2',
   'productId': 'DZH318Z0BQ4C',
   'skuId': 'DZH318Z0BQ4C/015F',
   'productName': 'Virtual Machines DSv2 Series',
   'skuName': 'DS3 v2',
   'serviceName': 'Virtual Machines',
   'serviceId': 'DZH313Z7MMC8',
   'serviceFamily': 'Compute',
   'unitOfMeasure': '1 Hour',
   'type': 'Consumption',
   'isPrimaryMeterRegion': False,
   'armSkuName': 'Standard_DS3_v2'}],
 'NextPageLink': None,
 'Count': 1}

## Price per hour

In [64]:
json_data["Items"][0]["retailPrice"]

0.229

In [65]:
def calculate_price(price_per_hour, seconds):
    price = price_per_hour*seconds/3600
    return(price)

## Calculate total prices per Tag

In [67]:
totals_df['total_price'] = totals_df.apply(lambda x: calculate_price(json_data["Items"][0]["retailPrice"], x['duration_sec']), axis=1)

In [68]:
totals_df[['tag', 'duration_sec', 'total_price']] 

Unnamed: 0,tag,duration_sec,total_price
0,test-env-1,138,0.008778
1,test-env-2,228,0.014503
2,test-env-3,481,0.030597
