## Data Introduction

In [2]:
import os
os.chdir('..')
print(f'Current working directory is {os.getcwd()}')

Current working directory is C:\Users\Gubbz\Documents\NSS\NSS_Projects\accre-pumpkin-pie


In [3]:
import pandas as pd
import re

In [4]:
jobs = pd.read_csv("data/fullsample.csv")
jobs.head(5)

Unnamed: 0,JOBID,STATE,BEGIN,END,REQMEM,USEDMEM,REQTIME,USEDTIME,NODES,CPUS,PARTITION,EXITCODE
0,30616928,RUNNING,2021-07-31T22:15:00,Unknown,2048Mn,0,10:04:00,67-22:14:22,1,1,production,0:0
1,30853133,COMPLETED,2021-08-06T11:36:09,2021-09-05T11:36:32,262144Mn,20604.62M,30-00:00:00,30-00:00:23,1,1,cgw-platypus,0:0
2,30858137,COMPLETED,2021-08-06T19:04:39,2021-09-05T19:04:53,204800Mn,57553.77M,30-00:00:00,30-00:00:14,1,32,cgw-tbi01,0:0
3,30935078,COMPLETED,2021-08-09T16:52:51,2021-09-07T20:52:55,65536Mn,20577.96M,29-04:00:00,29-04:00:04,1,8,cgw-platypus,0:0
4,31364111_2,COMPLETED,2021-08-17T07:45:07,2021-09-10T16:45:24,16384Mn,9733.43M,24-09:00:00,24-09:00:17,1,1,production,0:0


REQTIME - Time client asked for
USEDTIME - Actual time
BEGIN - Date Client submitted/System recognized submission
END - Date/Time job was completed/rejected

In [5]:
jobs.shape

(7395885, 12)

In [6]:
jobs['BEGIN'].info

<bound method Series.info of 0          2021-07-31T22:15:00
1          2021-08-06T11:36:09
2          2021-08-06T19:04:39
3          2021-08-09T16:52:51
4          2021-08-17T07:45:07
                  ...         
7395880    2020-10-31T23:39:00
7395881    2020-10-31T23:39:13
7395882    2020-10-31T23:46:29
7395883    2020-10-31T23:49:44
7395884    2020-10-31T23:56:49
Name: BEGIN, Length: 7395885, dtype: object>

In [7]:
jobs.shape

(7395885, 12)

The fullsample dataset contains job records, with one row per job.

Each job gets a unique ID, contained in the **JOBID** column.

Some jobs can be submitted as arrays of similar jobs. These are listed with an underscore in the JOBID, where the number after the underscore indicates the tasknumber. For example. JOBID 31781951 was an array job with 10 parts. 

In [9]:
jobs[jobs['JOBID'].str.contains('31781951')]

Unnamed: 0,JOBID,STATE,BEGIN,END,REQMEM,USEDMEM,REQTIME,USEDTIME,NODES,CPUS,PARTITION,EXITCODE
533,31781951_1,COMPLETED,2021-08-30T12:51:30,2021-09-08T02:17:41,16384Mn,10234.37M,12-00:00:00,8-13:26:11,1,12,production,0:0
534,31781951_2,COMPLETED,2021-08-30T12:51:30,2021-09-07T18:04:48,16384Mn,10247.40M,12-00:00:00,8-05:13:18,1,12,production,0:0
535,31781951_3,COMPLETED,2021-08-31T09:14:29,2021-09-08T16:36:06,16384Mn,10064.47M,12-00:00:00,8-07:21:37,1,12,production,0:0
536,31781951_4,COMPLETED,2021-09-01T01:59:50,2021-09-08T08:48:28,16384Mn,10004.80M,12-00:00:00,7-06:48:38,1,12,production,0:0
537,31781951_5,COMPLETED,2021-09-02T00:09:27,2021-09-08T23:58:57,16384Mn,9858.72M,12-00:00:00,6-23:49:30,1,12,production,0:0
538,31781951_6,COMPLETED,2021-09-02T16:19:55,2021-09-10T11:16:57,16384Mn,10065.06M,12-00:00:00,7-18:57:02,1,12,production,0:0
539,31781951_7,COMPLETED,2021-09-02T22:26:08,2021-09-10T18:48:31,16384Mn,10092.55M,12-00:00:00,7-20:22:23,1,12,production,0:0
540,31781951_8,COMPLETED,2021-09-03T10:54:14,2021-09-11T09:32:28,16384Mn,10146.98M,12-00:00:00,7-22:38:14,1,12,production,0:0
541,31781951_9,COMPLETED,2021-09-04T22:54:03,2021-09-12T16:16:04,16384Mn,10050.81M,12-00:00:00,7-17:22:01,1,12,production,0:0
542,31781951_10,COMPLETED,2021-09-06T06:54:35,2021-09-14T13:02:37,16384Mn,10042.53M,12-00:00:00,8-06:08:02,1,12,production,0:0


Jobs can have a few differents states, with the most common one being 'COMPLETED'. 

In [11]:
jobs['STATE'].value_counts()

STATE
COMPLETED              7375084
CANCELLED                 9055
FAILED                    3766
CANCELLED by 9201         1776
OUT_OF_MEMORY             1739
                        ...   
CANCELLED by 891323          1
CANCELLED by 889553          1
CANCELLED by 793827          1
CANCELLED by 790983          1
CANCELLED by 907426          1
Name: count, Length: 145, dtype: int64

The **BEGIN** field indicates when the job was started (initiated on a computer node).

The **END** field indicates when the job ended (completed, failed, or was cancelled while running).

The **REQMEM** field is the amount of memory requested in megabytes. It can be per-core/CPU (Mc) or per-node (Mn).



In [13]:
# Jobs where memory was requested per core.
jobs[jobs['REQMEM'].str[-2:] == 'Mc'].head()

Unnamed: 0,JOBID,STATE,BEGIN,END,REQMEM,USEDMEM,REQTIME,USEDTIME,NODES,CPUS,PARTITION,EXITCODE
501,31776583_1,COMPLETED,2021-08-30T10:16:59,2021-09-01T02:04:11,4096Mc,1792.43M,14-00:00:00,1-15:47:12,1,1,production,0:0
502,31776584_12,COMPLETED,2021-08-30T10:17:00,2021-09-01T00:20:15,4096Mc,1792.43M,14-00:00:00,1-14:03:15,1,1,production,0:0
915,31793401_958,COMPLETED,2021-08-31T19:36:46,2021-09-01T00:37:11,4096Mc,2788.05M,05:00:00,05:00:25,1,1,production,0:0
916,31793401_987,COMPLETED,2021-08-31T20:33:46,2021-09-01T00:02:57,4096Mc,2779.27M,05:00:00,03:29:11,1,1,production,0:0
4727,31813223_1296,COMPLETED,2021-08-31T19:42:46,2021-09-01T00:43:15,4096Mc,2786.44M,05:00:00,05:00:29,1,1,production,0:0


In [14]:
# Jobs where memory was requested per node.
jobs[jobs['REQMEM'].str[-2:] == 'Mn'].head()

Unnamed: 0,JOBID,STATE,BEGIN,END,REQMEM,USEDMEM,REQTIME,USEDTIME,NODES,CPUS,PARTITION,EXITCODE
0,30616928,RUNNING,2021-07-31T22:15:00,Unknown,2048Mn,0,10:04:00,67-22:14:22,1,1,production,0:0
1,30853133,COMPLETED,2021-08-06T11:36:09,2021-09-05T11:36:32,262144Mn,20604.62M,30-00:00:00,30-00:00:23,1,1,cgw-platypus,0:0
2,30858137,COMPLETED,2021-08-06T19:04:39,2021-09-05T19:04:53,204800Mn,57553.77M,30-00:00:00,30-00:00:14,1,32,cgw-tbi01,0:0
3,30935078,COMPLETED,2021-08-09T16:52:51,2021-09-07T20:52:55,65536Mn,20577.96M,29-04:00:00,29-04:00:04,1,8,cgw-platypus,0:0
4,31364111_2,COMPLETED,2021-08-17T07:45:07,2021-09-10T16:45:24,16384Mn,9733.43M,24-09:00:00,24-09:00:17,1,1,production,0:0


The USEDMEM column is the amount of memory used in MB per node.

The requested time (REQTIME) and used time (USEDTIME) columns are in d-hh:mm:ss or hh:mm:ss for jobs less than one day in duration.

**NODES** is the number of servers used for the job. Most jobs are single node. For multiple node jobs, memory usage is the maximum over all nodes.

**CPUS** is the total number of CPU cores allocated to the job, and for multi-node jobs, this includes all nodes.

Most jobs are run in the "production" or "nogpfs" partition. The "debug" and "sam" partitions are test jobs that are expected to be short, and the "maxwell", "pascal", and "turing" partitions are for GPU resources.

In [16]:
jobs['PARTITION'].value_counts()

PARTITION
production              7019578
nogpfs                   147229
pascal                   124453
sam                       64967
turing                    21424
maxwell                   11278
cgw-maizie                 4309
debug                      1616
cgw-platypus                379
cgw-dsi-gw                  228
cgw-capra1                  157
cgw-dougherty1              125
cgw-horus                    61
cgw-cqs1                     28
cgw-hanuman                  21
cgw-sideshowbob              14
cgw-vm-qa-flatearth1          9
cgw-tbi01                     8
cgw-rocksteady                1
Name: count, dtype: int64

The **EXITCODE** gives the [exit code](https://www.agileconnection.com/article/overview-linux-exit-codes) for the job, with "0:0" indicating a successful job. Exit codes have two numbers, where if the first number is non-zero, it indicates a problem on the server side and if the second is nonzero, it indicates a problem on the user side.

In [18]:
jobs['EXITCODE'].value_counts()

EXITCODE
0:0      7384480
1:0         4958
0:15        1887
0:125       1739
0:9         1361
2:0          508
0:7          389
121:0         89
127:0         88
13:0          68
24:0          67
0:11          35
38:0          32
28:0          29
6:0           27
126:0         24
0:6           18
0:2           16
7:0           12
29:0          12
16:0           9
59:0           8
9:0            4
0:40           4
8:0            3
125:0          3
0:105          2
76:0           1
85:0           1
0:12           1
30:0           1
0:98           1
43:0           1
3:0            1
67:0           1
4:0            1
11:0           1
0:8            1
103:0          1
116:0          1
Name: count, dtype: int64

In [19]:
jobs[jobs['EXITCODE'] == '1:0']

Unnamed: 0,JOBID,STATE,BEGIN,END,REQMEM,USEDMEM,REQTIME,USEDTIME,NODES,CPUS,PARTITION,EXITCODE
18,31418105,NODE_FAIL,2021-08-19T10:09:50,2021-09-17T08:45:10,92160Mn,0,41-16:00:00,28-22:35:20,1,8,cgw-dougherty1,1:0
31996,31934490,FAILED,2021-09-01T09:08:52,2021-09-01T09:24:08,92160Mn,65881.35M,3-08:00:00,00:15:16,1,1,maxwell,1:0
32199,31934755,FAILED,2021-09-01T09:35:01,2021-09-01T15:37:14,20000Mn,13323.77M,1-08:00:00,06:02:13,1,1,production,1:0
32204,31934760,FAILED,2021-09-01T09:35:02,2021-09-01T13:07:26,20000Mn,10697.71M,1-08:00:00,03:32:24,1,1,production,1:0
32205,31934762,FAILED,2021-09-01T09:35:04,2021-09-01T16:37:19,20000Mn,20336.22M,1-08:00:00,07:02:15,1,1,production,1:0
...,...,...,...,...,...,...,...,...,...,...,...,...
7381905,25455341,FAILED,2020-10-29T23:21:16,2020-10-31T05:21:14,10240Mc,27792.37M,1-06:00:00,1-05:59:58,29,4,production,1:0
7381940,25455788,FAILED,2020-10-30T00:26:40,2020-10-30T20:26:43,5120Mc,28869.91M,20:00:00,20:00:03,22,4,production,1:0
7387598,25469985_10,FAILED,2020-10-30T13:46:33,2020-10-30T13:46:54,8192Mn,0,1-00:00:00,00:00:21,1,4,production,1:0
7387599,25469985_11,FAILED,2020-10-30T13:46:26,2020-10-30T13:46:37,8192Mn,11.43M,1-00:00:00,00:00:11,1,4,production,1:0


In [20]:
jobs[jobs['EXITCODE'] == '0:15']

Unnamed: 0,JOBID,STATE,BEGIN,END,REQMEM,USEDMEM,REQTIME,USEDTIME,NODES,CPUS,PARTITION,EXITCODE
42,31669402,CANCELLED,2021-08-28T10:53:59,2021-09-05T10:53:57,65536Mn,5229.75M,8-00:00:00,7-23:59:58,9,10,production,0:15
1023,31798622,CANCELLED,2021-08-31T02:46:09,2021-09-05T02:46:09,40960Mn,1440.95M,5-00:00:00,5-00:00:00,4,4,turing,0:15
1029,31798672,CANCELLED,2021-08-31T02:53:48,2021-09-05T02:54:09,40960Mn,1438.46M,5-00:00:00,5-00:00:21,4,4,turing,0:15
32185,31934719,CANCELLED,2021-09-01T09:31:02,2021-09-01T09:31:07,20000Mn,0,1-08:00:00,00:00:05,1,1,production,0:15
36103,31940094,CANCELLED,2021-09-01T14:55:55,2021-09-09T14:56:17,65536Mn,5054.56M,8-00:00:00,8-00:00:22,9,10,production,0:15
...,...,...,...,...,...,...,...,...,...,...,...,...
7387593,25469985_5,CANCELLED,2020-10-30T13:45:59,2020-10-31T13:46:06,8192Mn,4969.53M,1-00:00:00,1-00:00:07,1,4,production,0:15
7387594,25469985_6,CANCELLED,2020-10-30T13:45:57,2020-10-31T13:46:06,8192Mn,4976.77M,1-00:00:00,1-00:00:09,1,4,production,0:15
7387595,25469985_7,CANCELLED,2020-10-30T13:45:59,2020-10-31T13:46:06,8192Mn,4954.54M,1-00:00:00,1-00:00:07,1,4,production,0:15
7387596,25469985_8,CANCELLED,2020-10-30T13:45:59,2020-10-31T13:46:06,8192Mn,4951.88M,1-00:00:00,1-00:00:07,1,4,production,0:15


The slurm_wrapper_ce5.log and slurm_wrapper_ce6.log files contain logs of jobs submitted from the Open Science Grid.

## CE5

In [23]:
ce5 = pd.read_csv('data/slurm_wrapper_ce5.log',
                  header=None,
                  delimiter=' - ',
                  engine='python')
ce5 = ce5.rename(columns={0:'Time_Stamp', 1:'User', 2:'Retry', 3:'Length_of_Time', 4:'ReturnCode', 5:'Command'})

ce5.head()

Unnamed: 0,Time_Stamp,User,Retry,Length_of_Time,ReturnCode,Command
0,2020-10-16 08:15:39.278699,user 0,retry 0,time 0.07347559928894043,returncode 0,"command ['/usr/bin/sacct', '-u', 'appelte1', '..."
1,2020-10-16 08:18:08.313309,user 0,retry 0,time 0.18363237380981445,returncode 0,"command ['/usr/bin/sacct', '-u', 'appelte1', '..."
2,2020-10-16 08:22:48.128689,user 0,retry 0,time 0.07547116279602051,returncode 0,"command ['/usr/bin/sacct', '-u', 'appelte1', '..."
3,2020-10-16 08:25:13.257408,user 0,retry 0,time 0.09484362602233887,returncode 0,"command ['/usr/bin/sacct', '-u', 'appelte1', '..."
4,2020-10-16 08:31:01.460723,user 0,retry 0,time 0.07498788833618164,returncode 0,"command ['/usr/bin/sacct', '-u', 'appelte1', '..."


In [24]:
ce5_nine_two_oh_four=ce5[ce5['User'] == 'user 9204']
ce5_nine_two_oh_four['Length_of_Time'] = ce5_nine_two_oh_four['Length_of_Time'].str.replace('time ', '').astype(float)
ce5_nine_two_oh_four.head()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  ce5_nine_two_oh_four['Length_of_Time'] = ce5_nine_two_oh_four['Length_of_Time'].str.replace('time ', '').astype(float)


Unnamed: 0,Time_Stamp,User,Retry,Length_of_Time,ReturnCode,Command
136,2020-10-16 08:34:42.779719,user 9204,retry 0,5.240251,returncode 0,"command ['/usr/bin/sbatch', '/tmp/condor_g_scr..."
198,2020-10-16 08:35:50.747136,user 9204,retry 0,8.597585,returncode 0,"command ['/usr/bin/scontrol', 'show', 'job']"
361,2020-10-16 08:41:00.160523,user 9204,retry 0,4.925761,returncode 0,"command ['/usr/bin/scontrol', 'show', 'job']"
362,2020-10-16 08:41:01.419377,user 9204,retry 0,0.102166,returncode 0,"command ['/usr/bin/sacct', '-j', '24995424', '..."
478,2020-10-16 08:53:15.711346,user 9204,retry 0,0.142444,returncode 0,"command ['/usr/bin/sbatch', '/tmp/condor_g_scr..."


In [25]:
ce5_nine_two_oh_four = ce5_nine_two_oh_four[ce5_nine_two_oh_four['Length_of_Time'] >= 15]

In [26]:
ce5_nine_two_oh_four.head()

Unnamed: 0,Time_Stamp,User,Retry,Length_of_Time,ReturnCode,Command
5223,2020-10-16 13:09:43.208448,user 9204,retry 0,22.331895,returncode 0,"command ['/usr/bin/scontrol', 'show', 'job']"
6330,2020-10-16 13:54:35.894156,user 9204,retry 0,16.478916,returncode 0,"command ['/usr/bin/scontrol', 'show', 'job']"
6757,2020-10-16 14:24:51.502580,user 9204,retry 0,21.227667,returncode 0,"command ['/usr/bin/scontrol', 'show', 'job']"
7706,2020-10-16 15:11:47.522653,user 9204,retry 0,23.678934,returncode 0,"command ['/usr/bin/scontrol', 'show', 'job']"
9118,2020-10-16 16:00:16.038701,user 9204,retry 0,18.315583,returncode 0,"command ['/usr/bin/scontrol', 'show', 'job']"


In [27]:
ce5_nine_two_oh_four = ce5_nine_two_oh_four[ce5_nine_two_oh_four['Command'].str.contains('sbatch')]

In [28]:
ce5_nine_two_oh_four.head()

Unnamed: 0,Time_Stamp,User,Retry,Length_of_Time,ReturnCode,Command
49958,2020-10-18 06:53:44.272915,user 9204,retry 0,20.038464,returncode 1,"command ['/usr/bin/sbatch', '/tmp/condor_g_scr..."
49972,2020-10-18 06:54:04.322412,user 9204,retry 1,20.048906,returncode 1,"command ['/usr/bin/sbatch', '/tmp/condor_g_scr..."
50467,2020-10-18 07:47:25.825172,user 9204,retry 0,20.082628,returncode 1,"command ['/usr/bin/sbatch', '/tmp/condor_g_scr..."
50473,2020-10-18 07:47:45.871008,user 9204,retry 1,20.045221,returncode 1,"command ['/usr/bin/sbatch', '/tmp/condor_g_scr..."
50582,2020-10-18 07:53:33.972840,user 9204,retry 0,20.041486,returncode 1,"command ['/usr/bin/sbatch', '/tmp/condor_g_scr..."


In [29]:
ce5_nine_two_oh_four['Server'] = 'ce5'
ce5_nine_two_oh_four

Unnamed: 0,Time_Stamp,User,Retry,Length_of_Time,ReturnCode,Command,Server
49958,2020-10-18 06:53:44.272915,user 9204,retry 0,20.038464,returncode 1,"command ['/usr/bin/sbatch', '/tmp/condor_g_scr...",ce5
49972,2020-10-18 06:54:04.322412,user 9204,retry 1,20.048906,returncode 1,"command ['/usr/bin/sbatch', '/tmp/condor_g_scr...",ce5
50467,2020-10-18 07:47:25.825172,user 9204,retry 0,20.082628,returncode 1,"command ['/usr/bin/sbatch', '/tmp/condor_g_scr...",ce5
50473,2020-10-18 07:47:45.871008,user 9204,retry 1,20.045221,returncode 1,"command ['/usr/bin/sbatch', '/tmp/condor_g_scr...",ce5
50582,2020-10-18 07:53:33.972840,user 9204,retry 0,20.041486,returncode 1,"command ['/usr/bin/sbatch', '/tmp/condor_g_scr...",ce5
...,...,...,...,...,...,...,...
4661384,2021-09-24 19:13:14.894282,user 9204,retry 0,20.051321,returncode 1,"command ['/usr/bin/sbatch', '/tmp/condor_g_scr...",ce5
4726331,2021-10-02 08:14:16.557499,user 9204,retry 0,19.083227,returncode 1,"command ['/usr/bin/sbatch', '/tmp/condor_g_scr...",ce5
4731181,2021-10-02 18:29:08.267199,user 9204,retry 0,20.043146,returncode 1,"command ['/usr/bin/sbatch', '/tmp/condor_g_scr...",ce5
4731399,2021-10-02 18:57:09.500701,user 9204,retry 0,15.495682,returncode 0,"command ['/usr/bin/sbatch', '/tmp/condor_g_scr...",ce5


## CE6



In [31]:
ce6 = pd.read_csv('data/slurm_wrapper_ce6.log',
                  header=None,
                  delimiter=' - ',
                  engine='python')
ce6 = ce6.rename(columns={0:'Time_Stamp', 1:'User', 2:'Retry', 3:'Length_of_Time', 4:'ReturnCode', 5:'Command'})

ce6.head()

Unnamed: 0,Time_Stamp,User,Retry,Length_of_Time,ReturnCode,Command
0,2020-10-16 10:37:44.163454,user 9202,retry 0,time 0.08495402336120605,returncode 0,"command ['/usr/bin/scontrol', 'show', 'job', '..."
1,2020-10-16 10:37:44.206654,user 9202,retry 0,time 0.08943057060241699,returncode 0,"command ['/usr/bin/scontrol', 'show', 'job', '..."
2,2020-10-16 10:37:44.218760,user 9202,retry 0,time 0.05928945541381836,returncode 0,"command ['/usr/bin/scontrol', 'show', 'job', '..."
3,2020-10-16 10:37:44.256403,user 9202,retry 0,time 0.038695573806762695,returncode 0,"command ['/usr/bin/scontrol', 'show', 'job', '..."
4,2020-10-16 10:37:44.611603,user 9202,retry 0,time 0.03343677520751953,returncode 0,"command ['/usr/bin/scontrol', 'show', 'job', '..."


In [32]:
ce6_nine_two_oh_four=ce6[ce6['User'] == 'user 9204']
ce6_nine_two_oh_four.head()

Unnamed: 0,Time_Stamp,User,Retry,Length_of_Time,ReturnCode,Command
13,2020-10-16 10:38:29.869156,user 9204,retry 0,time 0.06946611404418945,returncode 0,"command ['/usr/bin/sbatch', '/tmp/condor_g_scr..."
20,2020-10-16 10:39:44.355935,user 9204,retry 0,time 8.835923194885254,returncode 0,"command ['/usr/bin/scontrol', 'show', 'job']"
36,2020-10-16 10:40:51.756875,user 9204,retry 0,time 6.003079652786255,returncode 0,"command ['/usr/bin/scontrol', 'show', 'job']"
37,2020-10-16 10:40:55.596886,user 9204,retry 0,time 0.14368605613708496,returncode 0,"command ['/usr/bin/sacct', '-j', '24997282', '..."
307,2020-10-16 11:08:28.127242,user 9204,retry 0,time 2.43306303024292,returncode 0,"command ['/usr/bin/sbatch', '/tmp/condor_g_scr..."


In [33]:
ce6_nine_two_oh_four['Length_of_Time'] = ce6_nine_two_oh_four['Length_of_Time'].str.replace('time ', '').astype(float)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  ce6_nine_two_oh_four['Length_of_Time'] = ce6_nine_two_oh_four['Length_of_Time'].str.replace('time ', '').astype(float)


In [34]:
ce6_nine_two_oh_four = ce6_nine_two_oh_four[ce6_nine_two_oh_four['Length_of_Time'] >= 15]

For this project, we are interested in jobs from user 9204 (the test user) where the command starts with '/usr/bin/squeue', the returncode is non-zero and the time is greater than 15. These conditions indicate that the scheduler becaem unresponsive at that point in time.

In [36]:
ce6_nine_two_oh_four = ce6_nine_two_oh_four[ce6_nine_two_oh_four['Command'].str.contains('sbatch')]

In [37]:
ce6_nine_two_oh_four['Server'] = 'ce6'
ce6_nine_two_oh_four

Unnamed: 0,Time_Stamp,User,Retry,Length_of_Time,ReturnCode,Command,Server
11319,2020-10-16 22:38:52.542223,user 9204,retry 0,19.019137,returncode 0,"command ['/usr/bin/sbatch', '/tmp/condor_g_scr...",ce6
36913,2020-10-18 06:16:25.392946,user 9204,retry 0,20.037672,returncode 1,"command ['/usr/bin/sbatch', '/tmp/condor_g_scr...",ce6
37605,2020-10-18 06:38:44.172473,user 9204,retry 0,20.038736,returncode 1,"command ['/usr/bin/sbatch', '/tmp/condor_g_scr...",ce6
39075,2020-10-18 07:47:32.241050,user 9204,retry 0,20.018348,returncode 1,"command ['/usr/bin/sbatch', '/tmp/condor_g_scr...",ce6
39356,2020-10-18 08:08:49.366063,user 9204,retry 0,20.030497,returncode 1,"command ['/usr/bin/sbatch', '/tmp/condor_g_scr...",ce6
...,...,...,...,...,...,...,...
4662070,2021-09-24 12:56:56.057323,user 9204,retry 0,19.568814,returncode 0,"command ['/usr/bin/sbatch', '/tmp/condor_g_scr...",ce6
4662752,2021-09-24 13:29:48.498748,user 9204,retry 0,20.085085,returncode 1,"command ['/usr/bin/sbatch', '/tmp/condor_g_scr...",ce6
4667202,2021-09-24 20:59:45.540176,user 9204,retry 0,16.153547,returncode 0,"command ['/usr/bin/sbatch', '/tmp/condor_g_scr...",ce6
4737128,2021-10-02 19:03:06.524282,user 9204,retry 0,15.063486,returncode 0,"command ['/usr/bin/sbatch', '/tmp/condor_g_scr...",ce6


In [78]:
ce6_nine_two_oh_four.loc[11319,'Command']

"command ['/usr/bin/sbatch', '/tmp/condor_g_scratch.0x5572a740ce90.3390891/bl_c7e97aa70fdc']"

In [80]:
ce6_nine_two_oh_four.loc[36913,'Command']

"command ['/usr/bin/sbatch', '/tmp/condor_g_scratch.0x5572a7c77310.3390891/bl_23341e2dd5ae']"

In [38]:
server_merged = pd.concat([ce5_nine_two_oh_four,ce6_nine_two_oh_four], ignore_index=False).reset_index(drop=True)
server_merged

Unnamed: 0,Time_Stamp,User,Retry,Length_of_Time,ReturnCode,Command,Server
0,2020-10-18 06:53:44.272915,user 9204,retry 0,20.038464,returncode 1,"command ['/usr/bin/sbatch', '/tmp/condor_g_scr...",ce5
1,2020-10-18 06:54:04.322412,user 9204,retry 1,20.048906,returncode 1,"command ['/usr/bin/sbatch', '/tmp/condor_g_scr...",ce5
2,2020-10-18 07:47:25.825172,user 9204,retry 0,20.082628,returncode 1,"command ['/usr/bin/sbatch', '/tmp/condor_g_scr...",ce5
3,2020-10-18 07:47:45.871008,user 9204,retry 1,20.045221,returncode 1,"command ['/usr/bin/sbatch', '/tmp/condor_g_scr...",ce5
4,2020-10-18 07:53:33.972840,user 9204,retry 0,20.041486,returncode 1,"command ['/usr/bin/sbatch', '/tmp/condor_g_scr...",ce5
...,...,...,...,...,...,...,...
4112,2021-09-24 12:56:56.057323,user 9204,retry 0,19.568814,returncode 0,"command ['/usr/bin/sbatch', '/tmp/condor_g_scr...",ce6
4113,2021-09-24 13:29:48.498748,user 9204,retry 0,20.085085,returncode 1,"command ['/usr/bin/sbatch', '/tmp/condor_g_scr...",ce6
4114,2021-09-24 20:59:45.540176,user 9204,retry 0,16.153547,returncode 0,"command ['/usr/bin/sbatch', '/tmp/condor_g_scr...",ce6
4115,2021-10-02 19:03:06.524282,user 9204,retry 0,15.063486,returncode 0,"command ['/usr/bin/sbatch', '/tmp/condor_g_scr...",ce6


Time_Stamp - In military time, recognized the command at specific time

In [39]:
server_merged['Time_Stamp'] = pd.to_datetime(server_merged['Time_Stamp'])
server_merged['Date'] = server_merged['Time_Stamp'].dt.strftime("%Y-%m-%d")
server_merged['Time'] = server_merged['Time_Stamp'].dt.strftime("%H:%M:%S")
server_merged = server_merged.drop(columns='Time_Stamp')

In [40]:
server_list = server_merged.columns.tolist()
server_list
new_order = ['Date', 'Time'] + [col_name for col_name in server_list if col_name not in ['Date', 'Time']]
server_merged = server_merged.reindex(columns=new_order)
server_merged = server_merged.sort_values(['Date','Time']).reset_index(drop=True)

In [41]:
server_merged = server_merged[server_merged['ReturnCode'] == 'returncode 1']
server_merged

Unnamed: 0,Date,Time,User,Retry,Length_of_Time,ReturnCode,Command,Server
1,2020-10-18,06:16:25,user 9204,retry 0,20.037672,returncode 1,"command ['/usr/bin/sbatch', '/tmp/condor_g_scr...",ce6
2,2020-10-18,06:38:44,user 9204,retry 0,20.038736,returncode 1,"command ['/usr/bin/sbatch', '/tmp/condor_g_scr...",ce6
3,2020-10-18,06:53:44,user 9204,retry 0,20.038464,returncode 1,"command ['/usr/bin/sbatch', '/tmp/condor_g_scr...",ce5
4,2020-10-18,06:54:04,user 9204,retry 1,20.048906,returncode 1,"command ['/usr/bin/sbatch', '/tmp/condor_g_scr...",ce5
5,2020-10-18,07:47:25,user 9204,retry 0,20.082628,returncode 1,"command ['/usr/bin/sbatch', '/tmp/condor_g_scr...",ce5
...,...,...,...,...,...,...,...,...
4108,2021-09-24,18:14:35,user 9204,retry 0,20.041436,returncode 1,"command ['/usr/bin/sbatch', '/tmp/condor_g_scr...",ce5
4109,2021-09-24,19:13:14,user 9204,retry 0,20.051321,returncode 1,"command ['/usr/bin/sbatch', '/tmp/condor_g_scr...",ce5
4111,2021-10-02,08:14:16,user 9204,retry 0,19.083227,returncode 1,"command ['/usr/bin/sbatch', '/tmp/condor_g_scr...",ce5
4112,2021-10-02,18:29:08,user 9204,retry 0,20.043146,returncode 1,"command ['/usr/bin/sbatch', '/tmp/condor_g_scr...",ce5


In [42]:
server_merged.to_csv('data/final_log_csv.csv')