# showq dynamic analysis

## Preprocessing

The output of Adaptive Computing Moab `showq` command has to be preprocessed by `scripts/showq2csv.py`. It will create a CSV file that can be imported by pandas.

## Prerequisites

Import required modules.

In [1]:
import matplotlib.pyplot as plt
import pandas as pd

In [2]:
%matplotlib inline

Ensure that modules are reloaded automatically when modified.

In [3]:
%load_ext autoreload
%autoreload 2

In [4]:
from lib.vsc.moab.nb_utils import extract_category

## Analysis

Load the current and the previous showq data.

In [5]:
curr_df = pd.read_csv('data/showq_1528968510.csv', parse_dates=['time_stamp', 'datetime'])

In [6]:
prev_df = pd.read_csv('data/showq_1528967910.csv', parse_dates=['time_stamp', 'datetime'])

Which jobs have started since the last showq epoch? These are the jobs that are currently active, and were previously either idle or blocked.

In [7]:
curr_running_df = extract_category(curr_df, 'ActiveJob')

In [8]:
prev_idle_df = extract_category(prev_df, 'EligibleJob')

In [9]:
prev_blocked_df = extract_category(prev_df, 'BlockedJob')

In [10]:
curr_running_df.merge(prev_idle_df, on='job_id', how='inner', suffixes=('', '_prev'))[curr_running_df.columns]

Unnamed: 0,category,time_stamp,job_id,user_id,state,procs,remaining,start_time,walltime_limit
0,ActiveJob,2018-06-14 11:28:30,20842191,vsc31821,Running,20,35429,2018-06-14 11:19:22,35977.0
1,ActiveJob,2018-06-14 11:28:30,20842195,vsc31835,Running,80,172601,2018-06-14 11:25:34,172777.0


So two jobs that were idle are now running. Which jobs were blocked and are now running?

In [11]:
curr_running_df.merge(prev_blocked_df, on='job_id', how='inner', suffixes=('', '_prev'))[curr_running_df.columns]

Unnamed: 0,category,time_stamp,job_id,user_id,state,procs,remaining,start_time,walltime_limit


Which jobs were idle, and are now blocked?

In [12]:
curr_idle_df = extract_category(curr_df, 'EligibleJob')

In [13]:
curr_idle_df.merge(prev_blocked_df, on='job_id', how='inner', suffixes=('', '_prev'))[curr_idle_df.columns]

Unnamed: 0,category,time_stamp,job_id,user_id,state,procs,walltime_limit,queue_time,time_in_queue


How many jobs are idle, and how many were idle in the previous epoch?

In [15]:
curr_idle_df.job_id.count(), prev_idle_df.job_id.count()

(116, 117)

In [16]:
prev_inactive_df = pd.concat((prev_idle_df, prev_blocked_df), ignore_index=True)

In [17]:
prev_inactive_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 197 entries, 0 to 196
Data columns (total 9 columns):
category          197 non-null object
time_stamp        197 non-null datetime64[ns]
job_id            197 non-null object
user_id           197 non-null object
state             197 non-null object
procs             197 non-null int64
walltime_limit    197 non-null int64
queue_time        197 non-null datetime64[ns]
time_in_queue     197 non-null float64
dtypes: datetime64[ns](2), float64(1), int64(2), object(4)
memory usage: 13.9+ KB


In [18]:
prev_inactive_df.tail()

Unnamed: 0,category,time_stamp,job_id,user_id,state,procs,walltime_limit,queue_time,time_in_queue
192,BlockedJob,2018-06-14 11:18:30,20840637,vsc31179,BatchHold,20,259200,2018-06-11 15:06:26,245524.0
193,BlockedJob,2018-06-14 11:18:30,20840638,vsc31179,BatchHold,20,259200,2018-06-11 15:08:05,245425.0
194,BlockedJob,2018-06-14 11:18:30,20832595,vsc30957,NotQueued,20,1800000,2018-05-30 16:57:52,1275638.0
195,BlockedJob,2018-06-14 11:18:30,20832596,vsc30957,NotQueued,20,1800000,2018-05-30 16:57:55,1275635.0
196,BlockedJob,2018-06-14 11:18:30,20832597,vsc30957,NotQueued,20,1800000,2018-05-30 16:57:57,1275633.0


In [19]:
prev_blocked_df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 80 entries, 329 to 408
Data columns (total 9 columns):
category          80 non-null object
time_stamp        80 non-null datetime64[ns]
job_id            80 non-null object
user_id           80 non-null object
state             80 non-null object
procs             80 non-null int64
walltime_limit    80 non-null int64
queue_time        80 non-null datetime64[ns]
time_in_queue     80 non-null float64
dtypes: datetime64[ns](2), float64(1), int64(2), object(4)
memory usage: 6.2+ KB
