# Azure Job Dashboard

This notebook shows how to build a dashboard of Azure jobs and manage them (for instance, inspect or kill).

Recall that Azure uses the `Experiment` class in reference to a computation task, and the `Run` class in reference to one of (possibly many) executions.

## Preliminaries

The code below picks up the default workspace (can be changed) and spans artificial jobs for the sake of demonstration (can be skipped).

In [1]:
from azureml.core import Workspace, Experiment, Run
ws = Workspace.from_config()

In [2]:
exp1 = Experiment(ws,name='mskorski_test_job_1')
exp2 = Experiment(ws,name='mskorski_test_job_2')
r11 = exp1.start_logging(snapshot_directory=None)
r12 = exp1.start_logging(snapshot_directory=None)
r21 = exp2.start_logging(snapshot_directory=None)
r22 = exp2.start_logging(snapshot_directory=None)
r23 = exp2.start_logging(snapshot_directory=None)
r23.cancel()

## Utilities

The core contribution is a generator returning all jobs with their properties. Records are stored as `namedtuples` which makes postprocessing handy.

In [3]:
from collections import namedtuple

def get_runs_summary(ws):
    """Summarise all runs under a given workspace, with experiment name, run id and run status
    Args:
        ws (azureml.core.Workspace): Azure workspace to look into
    """
    # NOTE: extend the scope of run details if needed
    record = namedtuple('Run_Description',['job_name','run_id','run_status'])
    for exp_name,exp_obj in ws.experiments.items():
        for run_obj in exp_obj.get_runs():
            yield(record(exp_name,run_obj.id,run_obj.status))

## Examples

### Build Dashboard and Inspect Runs

Let's display the jobs in a detailed table:

In [4]:
import pandas as pd
runs = get_runs_summary(ws)
runs = filter(lambda t: 'mskorski_test_job' in t.job_name, runs)
summary_df = pd.DataFrame(runs)
summary_df

Unnamed: 0,job_name,run_id,run_status
0,mskorski_test_job_1,51a97b96-66a4-4e55-bfbb-996cd654e244,Running
1,mskorski_test_job_1,21f6f6cc-6312-4918-ba3b-65cd17c847a6,Running
2,mskorski_test_job_2,362dff70-95ee-4b2f-ac13-dcb343f6bea6,Canceled
3,mskorski_test_job_2,ad972397-2d65-4203-b834-4ff7cbbbadb6,Running
4,mskorski_test_job_2,c272ca00-479c-431e-8fe3-a41be9a0c8c9,Running


Now, let's take an aggregated perspective:

In [5]:
summary_df.groupby(['job_name','run_status']).size().unstack('run_status')

run_status,Canceled,Running
job_name,Unnamed: 1_level_1,Unnamed: 2_level_1
mskorski_test_job_1,,2.0
mskorski_test_job_2,1.0,2.0


### Kill Jobs

Let's use the dashboard to kill running jobs with matching names.

In [6]:
jobs_to_kill = summary_df['job_name'].str.contains('mskorski_test_job') & (summary_df['run_status']=='Running')
for run_id in summary_df.loc[jobs_to_kill,'run_id']:
    run = ws.get_run(run_id)
    run.cancel()

In [7]:
runs = get_runs_summary(ws)
runs = filter(lambda t: 'mskorski_test_job' in t.job_name, runs)
summary_df = pd.DataFrame(runs)
summary_df.groupby(['job_name','run_status']).size().unstack('run_status')

run_status,Canceled
job_name,Unnamed: 1_level_1
mskorski_test_job_1,2
mskorski_test_job_2,3
