## Import the data

This workbook relies on importing data as follows:

```
./epmt -v submit ./sample/query/18431
./epmt -v submit ./sample/query/30385
```

In [1]:
# import the query api module
import epmt_query as eq

{'host': 'localhost', 'password': 'example', 'user': 'postgres', 'dbname': 'EPMT', 'provider': 'postgres'}


## Basic Queries

The API has a only a few queries: `get_jobs`, `get_procs` and `get_thread_metrics`

Each of these operate at distinct levels: job, process and threads.

### Job Query

In [4]:
# let's get jobs, let's try first without any filter
# we purposely set fmt='terse' to get just the job ids list
eq.get_jobs(fmt='terse')

[u'18431', u'30385', u'32046']

In [5]:
# above we got a list of job ids. That's not terribly useful.
# let's get a pandas dataframe of the actual process objects
eq.get_jobs(jobids = ['18431', '30385'], fmt='pandas')

Unnamed: 0,account,cpu_time,duration,end,env_changes_dict,exitcode,info_dict,jobid,jobname,jobscriptname,ppr,queue,sessionid,start,submit,tags,updated_at,user
0,,27058683,45905753.0,2019-06-04 20:10:26.838998,{},0,[],18431,18431,18431,,,,2019-06-04 20:09:40.933245,,{},2019-06-04 14:41:22.192377,tushar
1,,63523,97869.0,2019-06-04 20:14:24.878195,{},0,[],30385,30385,30385,,,,2019-06-04 20:14:24.780326,,{},2019-06-04 14:44:53.964737,tushar


In [6]:
# if you prefer dealing with python lists and dictionaries,
# you can set fmt='dict' or just leave it out (as it's the default)
# below we get a list containing two dictionaries: one for each job
eq.get_jobs(jobids = ['18431', '30385'])

[{'account': None,
  'cpu_time': 27058683,
  'duration': 45905753.0,
  'end': datetime.datetime(2019, 6, 4, 20, 10, 26, 838998),
  'env_changes_dict': {},
  'exitcode': 0,
  'info_dict': [],
  'jobid': u'18431',
  'jobname': u'18431',
  'jobscriptname': u'18431',
  'ppr': None,
  'queue': None,
  'sessionid': None,
  'start': datetime.datetime(2019, 6, 4, 20, 9, 40, 933245),
  'submit': None,
  'tags': {},
  'updated_at': datetime.datetime(2019, 6, 4, 14, 41, 22, 192377),
  'user': u'tushar'},
 {'account': None,
  'cpu_time': 63523,
  'duration': 97869.0,
  'end': datetime.datetime(2019, 6, 4, 20, 14, 24, 878195),
  'env_changes_dict': {},
  'exitcode': 0,
  'info_dict': [],
  'jobid': u'30385',
  'jobname': u'30385',
  'jobscriptname': u'30385',
  'ppr': None,
  'queue': None,
  'sessionid': None,
  'start': datetime.datetime(2019, 6, 4, 20, 14, 24, 780326),
  'submit': None,
  'tags': {},
  'updated_at': datetime.datetime(2019, 6, 4, 14, 44, 53, 964737),
  'user': u'tushar'}]

### Process Query

In [8]:
# If you want to get the processes belonging to a job
# here each row in the pandas dataframe contains one job process
# again, you can use the 'terse' fmt option to get just the list of database ids of the processes
eq.get_procs(['18431'], fmt='pandas')

Unnamed: 0,PERF_COUNT_SW_CPU_CLOCK,args,cancelled_write_bytes,delayacct_blkio_time,duration,end,exclusive_cpu_time,exename,exitcode,gen,...,time_oncpu,time_waiting,timeslices,updated_at,user,user+system,usertime,vol_ctxsw,wchar,write_bytes
0,2595549,./test-process-tree.sh,0,0,45848263.0,2019-06-04 14:40:26.835416,8940,bash,0,0,...,8940761,96932,6,2019-06-04 14:41:41.103896,tushar,8940,4470,5,0,0
1,1046856,./test-process-tree.sh,0,0,35373741.0,2019-06-04 14:40:16.821898,3947,bash,0,0,...,3947356,4787,3,2019-06-04 14:41:59.645593,tushar,3947,3947,2,0,0
2,1208717443,/etc -exec stat {} ;,0,0,35350474.0,2019-06-04 14:40:16.807963,1291036,find,1,0,...,1291037628,3901716,3652,2019-06-04 14:41:24.858969,tushar,1291036,149926,3614,246,0
3,894723,/etc/ssl/certs/8b59b1ad.0,0,0,901.0,2019-06-04 14:39:44.848089,6790,stat,0,0,...,6790409,0,1,2019-06-04 14:41:22.483269,tushar,6790,3395,0,0,0
4,839139,/etc/speech-dispatcher/modules/epos-generic.conf,0,0,845.0,2019-06-04 14:40:16.337848,8589,stat,0,0,...,8589298,59372,3,2019-06-04 14:41:22.497194,tushar,8589,5726,0,0,0
5,667088,/etc/ssl/certs/455f1b52.0,0,0,672.0,2019-06-04 14:39:46.602664,4080,stat,0,0,...,4080777,0,1,2019-06-04 14:41:22.509718,tushar,4080,0,0,0,0
6,599278,/etc/alternatives/LOCK.7.gz,0,0,604.0,2019-06-04 14:40:06.241266,7440,stat,0,0,...,7441143,0,1,2019-06-04 14:41:22.522131,tushar,7440,3720,0,0,0
7,564519,/etc/glusterfs,0,0,569.0,2019-06-04 14:39:51.551017,8175,stat,0,0,...,8176194,0,1,2019-06-04 14:41:22.533441,tushar,8175,5450,0,0,0
8,955785,/etc/fonts/conf.d/40-nonlatin.conf,0,0,963.0,2019-06-04 14:39:53.886599,8658,stat,0,0,...,8658841,30614,1,2019-06-04 14:41:22.544705,tushar,8658,5772,0,0,0
9,1018917,/etc/brltty/Input/mm/common.kti,0,0,1027.0,2019-06-04 14:39:58.203046,9202,stat,0,0,...,9202984,0,1,2019-06-04 14:41:22.555840,tushar,9202,9202,0,0,0


In [9]:
# suppose you want to filter all processes by tags
eq.get_procs(tags = {'app':'w', 'phase': 'load'}, fmt='terse')

[3619, 3622]

In [10]:
# we could have got the process metadata and metric sums if we used fmt='pandas' or no fmt
# below, each row in the dataframe represents a single process
# You will observe that thread-level metrics (such as usertime, systemtime) are
# already aggregated and available as columns below
eq.get_procs(tags = {'app':'w', 'phase': 'load'}, fmt='pandas')

Unnamed: 0,PERF_COUNT_SW_CPU_CLOCK,args,cancelled_write_bytes,delayacct_blkio_time,duration,end,exclusive_cpu_time,exename,exitcode,gen,...,time_oncpu,time_waiting,timeslices,updated_at,user,user+system,usertime,vol_ctxsw,wchar,write_bytes
0,264920,load,0,0,8583.0,2019-06-04 14:44:24.867268,9898,grep,0,0,...,9898934,0,2,2019-06-04 14:44:54.233242,tushar,9898,4949,1,71,0
1,7906450,,0,0,7911.0,2019-06-04 14:44:24.867006,17448,w.procps,0,0,...,17449098,0,1,2019-06-04 14:44:54.266024,tushar,17448,13086,0,0,0


### Thread Query

In [11]:
# How about getting the threads metrics for these two processes?
eq.get_thread_metrics([3619, 3622])

Unnamed: 0,tid,start,end,usertime,systemtime,rssmax,minflt,majflt,inblock,outblock,...,syscr,syscw,read_bytes,write_bytes,cancelled_write_bytes,time_oncpu,time_waiting,timeslices,rdtsc_duration,PERF_COUNT_SW_CPU_CLOCK
0,27608,1559659464858685,1559659464867268,4949,4949,5172,596,0,6,0,...,72,1,3328,0,0,9898934,0,2,22253488,264920
0,27607,1559659464859095,1559659464867006,13086,4362,6472,765,0,6,0,...,696,0,3328,0,0,17449098,0,1,20512108,7906450


## Getting familiar with useful metrics and keys

`get_jobs` and `get_procs` take a `fltr` and `order` option that can
filter and sort the output based on schema columns. 

In [12]:
# below we filter those processes of the job that exceed a certain
# wallclock time, and then sort them by the exclusive cpu time (user+system)
# fltr can be a lamdba function or a string
eq.get_procs('18431', fltr = lambda p: p.duration > 100000, order = 'desc(p.exclusive_cpu_time)', fmt='pandas')

Unnamed: 0,PERF_COUNT_SW_CPU_CLOCK,args,cancelled_write_bytes,delayacct_blkio_time,duration,end,exclusive_cpu_time,exename,exitcode,gen,...,time_oncpu,time_waiting,timeslices,updated_at,user,user+system,usertime,vol_ctxsw,wchar,write_bytes
0,1208717443,/etc -exec stat {} ;,0,0,35350474.0,2019-06-04 14:40:16.807963,1291036,find,1,0,...,1291037628,3901716,3652,2019-06-04 14:41:24.858969,tushar,1291036,149926,3614,246,0
1,440452151,/usr,0,0,443123.0,2019-06-04 14:39:41.444421,453101,find,0,0,...,453102131,2715435,50,2019-06-04 14:41:32.842824,tushar,453101,194754,0,13661217,0
2,140609,10,0,0,10000270.0,2019-06-04 14:40:26.834114,10896,sleep,0,0,...,10896060,15084,2,2019-06-04 14:41:57.092744,tushar,10896,3632,1,0,0
3,2595549,./test-process-tree.sh,0,0,45848263.0,2019-06-04 14:40:26.835416,8940,bash,0,0,...,8940761,96932,6,2019-06-04 14:41:41.103896,tushar,8940,4470,5,0,0
4,1046856,./test-process-tree.sh,0,0,35373741.0,2019-06-04 14:40:16.821898,3947,bash,0,0,...,3947356,4787,3,2019-06-04 14:41:59.645593,tushar,3947,3947,2,0,0


### Useful metrics and keys

Below are some of the most useful keys in no particular order:

#### Job Keys
 - duration: this is the wallclock time in microseconds
 - cpu_time: user+system time aggregated across all processes of the job
 - start:    start time in microseconds since epoch
 - end:      end time in microseconds since epoch
 - jobid:    database id for job (unique)
 - exitcode: return code from job
 - tags:     dict of key/value pairs
 - processes:list of processes belonging to job
 
 #### Process Keys
 - duration: this is the wallclock time in microseconds
 - exclusive_cpu_time: user+system time for process (aggregated across it's threads)
 - inclusive_cpu_time: user+system time for the process and *all its descendants*
 - start:    start time in microseconds since epoch
 - end:      end time in microseconds since epoch
 - tags:     dict of key/value pairs
 - threads_df: json serialized dataframe of process threads (ADVANCED)
 - threads_sums: key/value pairs consisting of sums of thread metrics (ADVANCED)
 - numtids:  number of threads
 - exename
 - args
 - pid
 - ppid
 - id:       database ID for process
 - exitcode
 - parent
 - children
 - ancestors
 - descendants
 
 #### Thread Keys
 - usertime
 - systemtime
 - user+system
 - rssmax
 - majflt
 - read_bytes
 - write_bytes

In [None]:
# Now let's do some more queries that use some of these fields