## Data Introduction

In [1]:
import pandas as pd
from tqdm import tqdm
import numpy as np

In [2]:
jobs = pd.read_csv("../data/fullsample.csv")

jobs = jobs[jobs['END'] != 'Unknown']
jobs = jobs[jobs['STATE'] == 'COMPLETED']
jobs['BEGIN'] = pd.to_datetime(jobs['BEGIN'])
jobs['END'] = pd.to_datetime(jobs['END'])
jobs['REQTIME'] = pd.to_timedelta(jobs['REQTIME'])
jobs['USEDTIME'] = pd.to_timedelta(jobs['USEDTIME'])

jobs.head(5)

Unnamed: 0,JOBID,STATE,BEGIN,END,REQMEM,USEDMEM,REQTIME,USEDTIME,NODES,CPUS,PARTITION,EXITCODE
1,30853133,COMPLETED,2021-08-06 11:36:09,2021-09-05 11:36:32,262144Mn,20604.62M,-125 days +00:00:00,-126 days +23:59:37,1,1,cgw-platypus,0:0
2,30858137,COMPLETED,2021-08-06 19:04:39,2021-09-05 19:04:53,204800Mn,57553.77M,-125 days +00:00:00,-126 days +23:59:46,1,32,cgw-tbi01,0:0
3,30935078,COMPLETED,2021-08-09 16:52:51,2021-09-07 20:52:55,65536Mn,20577.96M,-121 days +00:00:00,-122 days +23:59:56,1,8,cgw-platypus,0:0
4,31364111_2,COMPLETED,2021-08-17 07:45:07,2021-09-10 16:45:24,16384Mn,9733.43M,-101 days +15:00:00,-101 days +14:59:43,1,1,production,0:0
5,31364111_3,COMPLETED,2021-08-17 07:45:07,2021-09-06 16:17:34,16384Mn,9708.04M,-101 days +15:00:00,-84 days +07:27:33,1,1,production,0:0


In [3]:
jobs['END'].value_counts()

END
2021-07-12 11:36:02    312
2021-02-17 16:45:58    297
2021-02-17 16:45:57    284
2021-02-25 23:56:42    274
2021-06-16 22:34:57    247
                      ... 
2021-06-26 13:15:40      1
2021-06-26 14:25:14      1
2021-06-26 14:15:18      1
2021-06-26 13:16:56      1
2020-10-31 23:49:43      1
Name: count, Length: 4100858, dtype: int64

The fullsample dataset contains job records, with one row per job.

Each job gets a unique ID, contained in the **JOBID** column.

Some jobs can be submitted as arrays of similar jobs. These are listed with an underscore in the JOBID, where the number after the underscore indicates the tasknumber. For example. JOBID 31781951 was an array job with 10 parts. 

In [4]:
jobs[jobs['JOBID'].str.contains('31781951')]

Unnamed: 0,JOBID,STATE,BEGIN,END,REQMEM,USEDMEM,REQTIME,USEDTIME,NODES,CPUS,PARTITION,EXITCODE
533,31781951_1,COMPLETED,2021-08-30 12:51:30,2021-09-08 02:17:41,16384Mn,10234.37M,-50 days,-34 days +02:33:49,1,12,production,0:0
534,31781951_2,COMPLETED,2021-08-30 12:51:30,2021-09-07 18:04:48,16384Mn,10247.40M,-50 days,-34 days +10:46:42,1,12,production,0:0
535,31781951_3,COMPLETED,2021-08-31 09:14:29,2021-09-08 16:36:06,16384Mn,10064.47M,-50 days,-34 days +08:38:23,1,12,production,0:0
536,31781951_4,COMPLETED,2021-09-01 01:59:50,2021-09-08 08:48:28,16384Mn,10004.80M,-50 days,-30 days +13:11:22,1,12,production,0:0
537,31781951_5,COMPLETED,2021-09-02 00:09:27,2021-09-08 23:58:57,16384Mn,9858.72M,-50 days,-26 days +00:10:30,1,12,production,0:0
538,31781951_6,COMPLETED,2021-09-02 16:19:55,2021-09-10 11:16:57,16384Mn,10065.06M,-50 days,-30 days +01:02:58,1,12,production,0:0
539,31781951_7,COMPLETED,2021-09-02 22:26:08,2021-09-10 18:48:31,16384Mn,10092.55M,-50 days,-31 days +23:37:37,1,12,production,0:0
540,31781951_8,COMPLETED,2021-09-03 10:54:14,2021-09-11 09:32:28,16384Mn,10146.98M,-50 days,-31 days +21:21:46,1,12,production,0:0
541,31781951_9,COMPLETED,2021-09-04 22:54:03,2021-09-12 16:16:04,16384Mn,10050.81M,-50 days,-30 days +02:37:59,1,12,production,0:0
542,31781951_10,COMPLETED,2021-09-06 06:54:35,2021-09-14 13:02:37,16384Mn,10042.53M,-50 days,-34 days +09:51:58,1,12,production,0:0


Jobs can have a few differents states, with the most common one being 'COMPLETED'. 

In [5]:
jobs['STATE'].value_counts()

STATE
COMPLETED    7375084
Name: count, dtype: int64

The **BEGIN** field indicates when the job was started (initiated on a computer node).

The **END** field indicates when the job ended (completed, failed, or was cancelled while running).

The **REQMEM** field is the amount of memory requested in megabytes. It can be per-core/CPU (Mc) or per-node (Mn).



In [6]:
# Jobs where memory was requested per core.
jobs[jobs['REQMEM'].str[-2:] == 'Mc'].head()

Unnamed: 0,JOBID,STATE,BEGIN,END,REQMEM,USEDMEM,REQTIME,USEDTIME,NODES,CPUS,PARTITION,EXITCODE
501,31776583_1,COMPLETED,2021-08-30 10:16:59,2021-09-01 02:04:11,4096Mc,1792.43M,-59 days +16:00:00,-5 days +04:12:48,1,1,production,0:0
502,31776584_12,COMPLETED,2021-08-30 10:17:00,2021-09-01 00:20:15,4096Mc,1792.43M,-59 days +16:00:00,-5 days +05:56:45,1,1,production,0:0
915,31793401_958,COMPLETED,2021-08-31 19:36:46,2021-09-01 00:37:11,4096Mc,2788.05M,0 days 05:00:00,0 days 05:00:25,1,1,production,0:0
916,31793401_987,COMPLETED,2021-08-31 20:33:46,2021-09-01 00:02:57,4096Mc,2779.27M,0 days 05:00:00,0 days 03:29:11,1,1,production,0:0
4727,31813223_1296,COMPLETED,2021-08-31 19:42:46,2021-09-01 00:43:15,4096Mc,2786.44M,0 days 05:00:00,0 days 05:00:29,1,1,production,0:0


In [7]:
# Jobs where memory was requested per node.
jobs[jobs['REQMEM'].str[-2:] == 'Mn'].head()

Unnamed: 0,JOBID,STATE,BEGIN,END,REQMEM,USEDMEM,REQTIME,USEDTIME,NODES,CPUS,PARTITION,EXITCODE
1,30853133,COMPLETED,2021-08-06 11:36:09,2021-09-05 11:36:32,262144Mn,20604.62M,-125 days +00:00:00,-126 days +23:59:37,1,1,cgw-platypus,0:0
2,30858137,COMPLETED,2021-08-06 19:04:39,2021-09-05 19:04:53,204800Mn,57553.77M,-125 days +00:00:00,-126 days +23:59:46,1,32,cgw-tbi01,0:0
3,30935078,COMPLETED,2021-08-09 16:52:51,2021-09-07 20:52:55,65536Mn,20577.96M,-121 days +00:00:00,-122 days +23:59:56,1,8,cgw-platypus,0:0
4,31364111_2,COMPLETED,2021-08-17 07:45:07,2021-09-10 16:45:24,16384Mn,9733.43M,-101 days +15:00:00,-101 days +14:59:43,1,1,production,0:0
5,31364111_3,COMPLETED,2021-08-17 07:45:07,2021-09-06 16:17:34,16384Mn,9708.04M,-101 days +15:00:00,-84 days +07:27:33,1,1,production,0:0


The USEDMEM column is the amount of memory used in MB per node.

The requested time (REQTIME) and used time (USEDTIME) columns are in d-hh:mm:ss or hh:mm:ss for jobs less than one day in duration.

**NODES** is the number of servers used for the job. Most jobs are single node. For multiple node jobs, memory usage is the maximum over all nodes.

**CPUS** is the total number of CPU cores allocated to the job, and for multi-node jobs, this includes all nodes.

Most jobs are run in the "production" or "nogpfs" partition. The "debug" and "sam" partitions are test jobs that are expected to be short, and the "maxwell", "pascal", and "turing" partitions are for GPU resources.

In [8]:
jobs['PARTITION'].value_counts()

PARTITION
production              7002182
nogpfs                   146659
pascal                   122963
sam                       64965
turing                    20638
maxwell                   10980
cgw-maizie                 4265
debug                      1434
cgw-platypus                371
cgw-dsi-gw                  227
cgw-capra1                  151
cgw-dougherty1              112
cgw-horus                    61
cgw-cqs1                     26
cgw-hanuman                  21
cgw-sideshowbob              13
cgw-vm-qa-flatearth1          9
cgw-tbi01                     7
Name: count, dtype: int64

The **EXITCODE** gives the [exit code](https://www.agileconnection.com/article/overview-linux-exit-codes) for the job, with "0:0" indicating a successful job. Exit codes have two numbers, where if the first number is non-zero, it indicates a problem on the server side and if the second is nonzero, it indicates a problem on the user side.

In [9]:
jobs['EXITCODE'].value_counts()

EXITCODE
0:0    7375084
Name: count, dtype: int64

In [10]:
jobs[jobs['EXITCODE'] == '1:0']

Unnamed: 0,JOBID,STATE,BEGIN,END,REQMEM,USEDMEM,REQTIME,USEDTIME,NODES,CPUS,PARTITION,EXITCODE


In [11]:
jobs[jobs['EXITCODE'] == '0:15']

Unnamed: 0,JOBID,STATE,BEGIN,END,REQMEM,USEDMEM,REQTIME,USEDTIME,NODES,CPUS,PARTITION,EXITCODE


In [12]:
def df_to_datelist(df1):
    
    df = df1.copy(deep = True)
    df[3] = df[3].str.replace('time ', '')
    df[3] = df[3].astype(float)


    df = df[df[1] == 'user 9204']
    df = df[df[3] >= 15]
    df = df[df[4] == "returncode 1"]
    
    df['sbatch'] = df[5].apply(lambda x: 1 if 'sbatch' in x else 0)
    df = df[df['sbatch'] == 1]
    df[0] = pd.to_datetime(df[0])
    
    
    return df[0].to_list()

The slurm_wrapper_ce5.log and slurm_wrapper_ce6.log files contain logs of jobs submitted from the Open Science Grid.

In [13]:
"""
chunksize = 10 ** 6
for chunk in pd.read_csv(filename, chunksize=chunksize):
# chunk is a DataFrame. To "process" the rows in the chunk:
for index, row in chunk.iterrows():
print(row)
"""


'\nchunksize = 10 ** 6\nfor chunk in pd.read_csv(filename, chunksize=chunksize):\n# chunk is a DataFrame. To "process" the rows in the chunk:\nfor index, row in chunk.iterrows():\nprint(row)\n'

In [14]:
%%time
ce5 = pd.read_csv('../data/slurm_wrapper_ce5.log',
                  header=None,
                  delimiter=' - ',
                  engine='python')

ce6 = pd.read_csv('../data/slurm_wrapper_ce6.log',
                  header=None,
                  delimiter=' - ',
                  engine='python')

ce5.head()

errors_ce6 = df_to_datelist(ce6)
errors_ce5 = df_to_datelist(ce5)
all_errors = errors_ce5 + errors_ce6


CPU times: total: 6.97 s
Wall time: 33.2 s


In [15]:
all_errors[0]

Timestamp('2020-10-18 06:53:44.272915')

For this project, we are interested in jobs from user 9204 (the test user) where the command starts with '/usr/bin/squeue', the returncode is non-zero and the time is greater than 15. These conditions indicate that the scheduler becaem unresponsive at that point in time.

In [16]:
jobs['END'].value_counts()



END
2021-07-12 11:36:02    312
2021-02-17 16:45:58    297
2021-02-17 16:45:57    284
2021-02-25 23:56:42    274
2021-06-16 22:34:57    247
                      ... 
2021-06-26 13:15:40      1
2021-06-26 14:25:14      1
2021-06-26 14:15:18      1
2021-06-26 13:16:56      1
2020-10-31 23:49:43      1
Name: count, Length: 4100858, dtype: int64

In [17]:
jobs.loc[1]['END'] - jobs.loc[1]['BEGIN']

Timedelta('30 days 00:00:23')

In [18]:
jobs.loc[1]['END'] - jobs.loc[1]['REQTIME']

Timestamp('2022-01-08 11:36:32')

In [19]:
jobs.loc[1]['END'] - all_errors[1]

Timedelta('322 days 04:42:27.677588')

In [20]:
all_errors

[Timestamp('2020-10-18 06:53:44.272915'),
 Timestamp('2020-10-18 06:54:04.322412'),
 Timestamp('2020-10-18 07:47:25.825172'),
 Timestamp('2020-10-18 07:47:45.871008'),
 Timestamp('2020-10-18 07:53:33.972840'),
 Timestamp('2020-10-18 16:02:01.338468'),
 Timestamp('2020-10-18 20:52:15.737852'),
 Timestamp('2020-10-19 00:23:37.945125'),
 Timestamp('2020-10-19 00:23:57.979047'),
 Timestamp('2020-10-19 01:01:02.211847'),
 Timestamp('2020-10-19 01:01:22.392363'),
 Timestamp('2020-10-19 01:23:30.029296'),
 Timestamp('2020-10-19 01:23:50.057180'),
 Timestamp('2020-10-19 02:49:45.479887'),
 Timestamp('2020-10-19 03:38:35.181625'),
 Timestamp('2020-10-19 03:59:19.595905'),
 Timestamp('2020-10-19 17:54:52.404285'),
 Timestamp('2020-10-19 21:46:47.528895'),
 Timestamp('2020-10-20 00:18:40.214164'),
 Timestamp('2020-10-22 21:19:15.358639'),
 Timestamp('2020-10-28 14:24:14.349391'),
 Timestamp('2020-10-29 09:39:11.896357'),
 Timestamp('2020-10-29 09:39:32.287111'),
 Timestamp('2020-11-01 12:04:57.38

In [42]:
def count_jobs_before_interr(all_errors_func = all_errors, jobs_func = jobs, typeTime = 'h', countTime = 1, on = 'END'):
    """
    Calculates the number of jobs occurring within a specified time window 
    relative to each error timestamp, based on the relationship specified 
    (BEGIN, DURING, END, or ALL). Returns a DataFrame where each row corresponds 
    to an error and the number of jobs meeting the specified criteria.

    Parameters:
    ----------
    all_errors : pd.Series or iterable
        A list or Series of error timestamps. Each timestamp is used as 
        a reference point to count the jobs within the specified time window.

    jobs : pd.DataFrame
        A DataFrame containing job details with at least the following columns:
        - 'BEGIN': The start times of jobs.
        - 'END': The end times of jobs.

    typeTime : str, optional
        The unit of time for the countTime parameter. Accepted values are:
        - 'm': Minutes
        - 'h': Hours (default)
        - 'd': Days

    countTime : float, optional
        The size of the time window in the units specified by typeTime. 
        For example:
        - countTime=1 with typeTime='h' means a 1-hour window.
        - countTime=30 with typeTime='m' means a 30-minute window.

    on : str, optional
        Defines the relationship between the jobs and the error timestamp. 
        Accepted values are:
        - 'BEGIN': Count jobs whose start times fall within the time window 
                   before the error.
        - 'DURING': Count jobs that were active (spanning) during the error.
        - 'END': Count jobs whose end times fall within the time window 
                 before the error. (Default)
        - 'ALL': Generates a DataFrame with counts for all relationships:
            - 'Start Count': Number of jobs starting within the time window.
            - 'During Count': Number of jobs spanning the error timestamp.
            - 'End Count': Number of jobs ending within the time window.

    Returns:
    -------
    pd.DataFrame
        - For 'BEGIN', 'DURING', or 'END': A DataFrame where each row corresponds 
          to an error and its associated count of jobs based on the specified criteria.
        - For 'ALL': A DataFrame with columns 'Interruption Time', 'Start Count', 
          'During Count', and 'End Count'.

    Notes:
    -----
    - If an invalid value for `on` is provided, the function defaults to 'END' 
      and prints a warning message.
    - The 'ALL' option adds comprehensive job counts across all specified 
      relationships to the error timestamps.
    """
    
    
    
#     error_min_time = all_errors.min() - pd.Timedelta(hours=time_hours)
#         error_max_time = all_errors.max()

#          Filter jobs within the global range
#         jobs_filtered = jobs[(jobs['BEGIN'] <= error_max_time) & (jobs['END'] >= error_min_time)]
    
    time_dict = {
        'm': 60,
        'h': 1,
        'd': 1/24    
    }
    time_hours = countTime / time_dict[typeTime]
    error_min_time = min(all_errors) - pd.Timedelta(hours=time_hours)
    error_max_time = max(all_errors)
    on = on.strip().upper()
    errors_array = np.array(all_errors_func)
    all_errors_func = sorted(all_errors_func)
    
    if on == 'BEGIN':
        
        jobs_copy = jobs_func.copy(deep = True)
        jobs_copy = jobs_copy[(jobs_copy['BEGIN'] <= error_max_time) & (jobs_copy['BEGIN'] >= error_min_time)]
        jobs_copy = jobs_copy.sort_values('BEGIN')
        job_counts_for_interrupt = {}
        last_error_date = all_errors_func[0]
        for i, error in enumerate(tqdm(all_errors_func, desc="Processing Errors")):
            hour_less_than_given = error - pd.Timedelta(hours=time_hours)
            if (error - last_error_date).days >= 30:
                jobs_copy = jobs_copy[jobs_copy['BEGIN'] >= hour_less_than_given]
                print(last_error_date)
                last_error_date = error
                
            
            count = ((jobs_copy['BEGIN'] > hour_less_than_given) & (jobs_copy['BEGIN'] <= error)).sum()
            job_counts_for_interrupt[error] = count
    
    elif on == 'DURING':
        
        jobs_copy = jobs_func.copy(deep = True)
        jobs_copy = jobs_copy[(jobs_copy['BEGIN'] <= error_max_time) & (jobs_copy['END'] >= error_min_time)]
        jobs_copy = jobs_copy.sort_values('END')
        job_counts_for_interrupt = {}
        last_error_date = all_errors_func[0]
        for i, error in enumerate(tqdm(all_errors_func, desc="Processing Errors")):
            hour_less_than_given = error - pd.Timedelta(hours=time_hours)
            if (error - last_error_date).days >= 30:
                jobs_copy = jobs_copy[jobs_copy['END'] >= error]
                print(last_error_date)
                last_error_date = error
            #hour_less_than_given = error - pd.Timedelta(hours=time_hours)
            count = ((jobs_copy['END'] > error) & (jobs_copy['BEGIN'] < error)).sum()
            job_counts_for_interrupt[error] = count
            
    elif on == 'END':
        
        jobs_copy = jobs_func.copy(deep = True)
        jobs_copy = jobs_copy[(jobs_copy['END'] <= error_max_time) & (jobs_copy['END'] >= error_min_time)]
        jobs_copy = jobs_copy.sort_values('END')
        job_counts_for_interrupt = {}
        last_error_date = all_errors_func[0]
        for i, error in enumerate(tqdm(all_errors_func, desc="Processing Errors")):
            hour_less_than_given = error - pd.Timedelta(hours=time_hours)
            if (error - last_error_date).days >= 30:
                jobs_copy = jobs_copy[jobs_copy['END'] >= hour_less_than_given]
                print(last_error_date)
                last_error_date = error
            hour_less_than_given = error - pd.Timedelta(hours=time_hours)
            count = ((jobs_copy['END'] > hour_less_than_given) & (jobs_copy['END'] <= error)).sum()
            job_counts_for_interrupt[error] = count
        
    elif on == 'ALL':
        
        jobs_copy = jobs_func.copy(deep = True)
        jobs_copy = jobs_copy[((jobs_copy['END'] <= error_max_time) & (jobs_copy['END'] >= error_min_time)) | ((jobs_copy['BEGIN'] <= error_max_time) & (jobs_copy['BEGIN'] >= error_min_time)) | ((jobs_copy['BEGIN'] <= error_max_time) & (jobs_copy['END'] >= error_min_time))]
        jobs_copy = jobs_copy.sort_values('END')
        
        job_counts_for_interrupt_begin = {}
        job_counts_for_interrupt_during = {}
        job_counts_for_interrupt_end = {}
        last_error_date = all_errors_func[0]

        for i, error in enumerate(tqdm(all_errors_func, desc="Processing Errors")):
            hour_less_than_given = error - pd.Timedelta(hours=time_hours)
            if (error - last_error_date).days >= 30:
                jobs_copy = jobs_copy[jobs_copy['END'] >= hour_less_than_given]
                print(last_error_date)
                last_error_date = error

            countbegin = ((jobs_copy['BEGIN'] > hour_less_than_given) & (jobs_copy['BEGIN'] <= error)).sum()
            countduring = ((jobs_copy['END'] > error) & (jobs_copy['BEGIN'] < error)).sum()
            countend = ((jobs_copy['END'] > hour_less_than_given) & (jobs_copy['END'] <= error)).sum()
            
            job_counts_for_interrupt_begin[error] = countbegin
            job_counts_for_interrupt_during[error] = countduring
            job_counts_for_interrupt_end[error] = countend
            
        df1 =  pd.DataFrame(job_counts_for_interrupt_begin.items())
        #df.rename(columns={'A': 'a', 'B': 'c'}, inplace=True)
        df1.rename(columns = {1:'Start Count', 0:'Interruption Time'}, inplace = True)
        df1['During Count'] = job_counts_for_interrupt_during.values()
        df1['End Count'] = job_counts_for_interrupt_end.values()
        
        return df1
        
            
    else:
        
        
        
        print(f'Your "ON" variable of "{on}" was not found to be (BEGIN, END, DURING, or ALL), so defaulted to END.')
        
        jobs_copy = jobs_func.copy(deep = True)
        jobs_copy = jobs_copy[(jobs_copy['END'] <= error_max_time) & (jobs_copy['END'] >= error_min_time)]
        jobs_copy = jobs_copy.sort_values('END')
        job_counts_for_interrupt = {}
        last_error_date = all_errors_func[0]
        for i, error in enumerate(tqdm(all_errors_func, desc="Processing Errors")):
            hour_less_than_given = error - pd.Timedelta(hours=time_hours)
            if (error - last_error_date).days >= 30:
                jobs_copy = jobs_copy[jobs_copy['END'] >= hour_less_than_given]
                print(last_error_date)
                last_error_date = error
            hour_less_than_given = error - pd.Timedelta(hours=time_hours)
            count = ((jobs_copy['END'] > hour_less_than_given) & (jobs_copy['END'] <= error)).sum()
            job_counts_for_interrupt[error] = count

        
    
    
    return pd.DataFrame(job_counts_for_interrupt.items())
   

In [23]:
# sort each list/dictionary by date (jobs by start or end depending) and then only go through specific dates each iteration.

In [50]:
dfall = count_jobs_before_interr(all_errors, jobs, 'h',1,'ALL')
dfall

Processing Errors:   3%|▎         | 91/3296 [00:09<10:44,  4.97it/s]

2020-10-18 06:16:25.392946


Processing Errors:  35%|███▍      | 1145/3296 [01:44<05:26,  6.59it/s]

2020-11-18 15:03:14.439449


Processing Errors:  49%|████▉     | 1615/3296 [02:23<04:13,  6.64it/s]

2020-12-18 15:18:59.450549


Processing Errors:  53%|█████▎    | 1735/3296 [02:33<03:25,  7.61it/s]

2021-01-17 16:28:49.469932


Processing Errors:  63%|██████▎   | 2061/3296 [02:56<02:29,  8.28it/s]

2021-02-16 16:58:11.049951


Processing Errors:  67%|██████▋   | 2193/3296 [03:04<01:37, 11.29it/s]

2021-03-19 08:29:39.070946


Processing Errors:  79%|███████▉  | 2611/3296 [03:26<00:48, 14.24it/s]

2021-04-18 14:59:17.312041


Processing Errors:  85%|████████▍ | 2791/3296 [03:35<00:29, 16.90it/s]

2021-05-20 08:10:47.902061


Processing Errors:  92%|█████████▏| 3046/3296 [03:46<00:13, 19.16it/s]

2021-06-19 11:57:52.170544


Processing Errors:  95%|█████████▌| 3138/3296 [03:49<00:05, 30.67it/s]

2021-07-19 12:36:10.601915


Processing Errors: 100%|██████████| 3296/3296 [03:51<00:00, 14.23it/s] 

2021-08-20 12:03:28.102533





Unnamed: 0,Interruption Time,Start Count,During Count,End Count
0,2020-10-18 06:16:25.392946,180,3083,199
1,2020-10-18 06:38:44.172473,174,3063,205
2,2020-10-18 06:53:44.272915,202,3065,215
3,2020-10-18 06:54:04.322412,202,3065,214
4,2020-10-18 07:47:25.825172,274,3064,271
...,...,...,...,...
3291,2021-09-24 18:14:35.862916,764,1730,455
3292,2021-09-24 19:13:14.894282,552,1956,330
3293,2021-10-02 08:14:16.557499,619,3520,626
3294,2021-10-02 18:29:08.267199,299,3550,344


In [51]:
jobs['end_day_hour'] = 

Unnamed: 0,JOBID,STATE,BEGIN,END,REQMEM,USEDMEM,REQTIME,USEDTIME,NODES,CPUS,PARTITION,EXITCODE
1,30853133,COMPLETED,2021-08-06 11:36:09,2021-09-05 11:36:32,262144Mn,20604.62M,-125 days +00:00:00,-126 days +23:59:37,1,1,cgw-platypus,0:0
2,30858137,COMPLETED,2021-08-06 19:04:39,2021-09-05 19:04:53,204800Mn,57553.77M,-125 days +00:00:00,-126 days +23:59:46,1,32,cgw-tbi01,0:0
3,30935078,COMPLETED,2021-08-09 16:52:51,2021-09-07 20:52:55,65536Mn,20577.96M,-121 days +00:00:00,-122 days +23:59:56,1,8,cgw-platypus,0:0
4,31364111_2,COMPLETED,2021-08-17 07:45:07,2021-09-10 16:45:24,16384Mn,9733.43M,-101 days +15:00:00,-101 days +14:59:43,1,1,production,0:0
5,31364111_3,COMPLETED,2021-08-17 07:45:07,2021-09-06 16:17:34,16384Mn,9708.04M,-101 days +15:00:00,-84 days +07:27:33,1,1,production,0:0
...,...,...,...,...,...,...,...,...,...,...,...,...
7395880,25493434,COMPLETED,2020-10-31 23:39:00,2020-10-31 23:40:46,2000Mn,0.09M,-9 days +16:00:00,0 days 00:01:46,1,1,sam,0:0
7395881,25493435,COMPLETED,2020-10-31 23:39:13,2020-10-31 23:40:38,2000Mn,187.92M,-9 days +16:00:00,0 days 00:01:25,1,1,sam,0:0
7395882,25493476,COMPLETED,2020-10-31 23:46:29,2020-10-31 23:49:43,4096Mc,803.97M,0 days 12:00:00,0 days 00:03:14,1,1,production,0:0
7395883,25493515,COMPLETED,2020-10-31 23:49:44,2020-10-31 23:51:40,2000Mn,0.09M,-9 days +16:00:00,0 days 00:01:56,1,1,sam,0:0
