# Project Overview
The Advanced Computing Center for Research and Education (ACCRE) operates Vanderbilt University's high-performance computing cluster. Jobs submitted to ACCRE are managed by the [slurm scheduler](https://slurm.schedmd.com/documentation.html), which tracks compute and memory resources.

ACCRE staff have hypothesized that the scheduler sometimes becomes unresponsive because it is processing large bursts of job completions. This especially affects automated job submitters, such as members of the Open Science Grid.

The goal is to evaluate whether the data supports the hypothesis of bursts of job completions contributing to scheduler unresponsiveness.

**Datasets:**
* fullsample.csv: Contains slurm job records. Job completions correspond to jobs in the "COMPLETED" state with exit code "0:0".  
* slurm_wrapper_ce5.log, slurm_wrapper_ce6.log: These log files contain every slurm command executed by the CE5 and CE6 servers (gateways to the Open Science Grid).

Unresponsive periods are indicated by "sbatch" commands from user 9204 that have:  
* return code = 1
* execution time > 15 seconds

## Phase 1: Explore the Data
**Objectives:**
* Understand the purpose of each dataset.  
* Inspect column types, sizes, and example rows.  

**Notebook Sections:**
* Code: Load each dataset, preview rows, summarize columns.  
* Markdown: Notes on data quality and initial observations.  

In [1]:
# IMPORT PYTHON LIBRARIES
import matplotlib.pyplot as plt
import pandas as pd
import statsmodels.formula.api as smf

### Explore Job Data

In [2]:
# READ fullsample.csv
jobs_df = pd.read_csv("../data/fullsample.csv")

In [3]:
# Display dataframe information
jobs_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7395885 entries, 0 to 7395884
Data columns (total 12 columns):
 #   Column     Dtype 
---  ------     ----- 
 0   JOBID      object
 1   STATE      object
 2   BEGIN      object
 3   END        object
 4   REQMEM     object
 5   USEDMEM    object
 6   REQTIME    object
 7   USEDTIME   object
 8   NODES      int64 
 9   CPUS       int64 
 10  PARTITION  object
 11  EXITCODE   object
dtypes: int64(2), object(10)
memory usage: 677.1+ MB


In [4]:
# Display head and tail data
jobs_df

Unnamed: 0,JOBID,STATE,BEGIN,END,REQMEM,USEDMEM,REQTIME,USEDTIME,NODES,CPUS,PARTITION,EXITCODE
0,30616928,RUNNING,2021-07-31T22:15:00,Unknown,2048Mn,0,10:04:00,67-22:14:22,1,1,production,0:0
1,30853133,COMPLETED,2021-08-06T11:36:09,2021-09-05T11:36:32,262144Mn,20604.62M,30-00:00:00,30-00:00:23,1,1,cgw-platypus,0:0
2,30858137,COMPLETED,2021-08-06T19:04:39,2021-09-05T19:04:53,204800Mn,57553.77M,30-00:00:00,30-00:00:14,1,32,cgw-tbi01,0:0
3,30935078,COMPLETED,2021-08-09T16:52:51,2021-09-07T20:52:55,65536Mn,20577.96M,29-04:00:00,29-04:00:04,1,8,cgw-platypus,0:0
4,31364111_2,COMPLETED,2021-08-17T07:45:07,2021-09-10T16:45:24,16384Mn,9733.43M,24-09:00:00,24-09:00:17,1,1,production,0:0
...,...,...,...,...,...,...,...,...,...,...,...,...
7395880,25493434,COMPLETED,2020-10-31T23:39:00,2020-10-31T23:40:46,2000Mn,0.09M,2-00:00:00,00:01:46,1,1,sam,0:0
7395881,25493435,COMPLETED,2020-10-31T23:39:13,2020-10-31T23:40:38,2000Mn,187.92M,2-00:00:00,00:01:25,1,1,sam,0:0
7395882,25493476,COMPLETED,2020-10-31T23:46:29,2020-10-31T23:49:43,4096Mc,803.97M,12:00:00,00:03:14,1,1,production,0:0
7395883,25493515,COMPLETED,2020-10-31T23:49:44,2020-10-31T23:51:40,2000Mn,0.09M,2-00:00:00,00:01:56,1,1,sam,0:0


#### JOBID
Each row is a job with a unique ID. Jobs that are submitted as arrays of similar jobs have an ID with an underscore where the number after the underscore indicates the tasknumber. For example: JOBID 31781951 was an array job with 10 parts.

In [5]:
# Inspect the JOBID values
jobs_df['JOBID'].value_counts().head()

JOBID
30616928         1
27209123_6864    1
27209123_6827    1
27209123_6826    1
27209123_6825    1
Name: count, dtype: int64

#### STATE
Jobs can have a few differents states, with the most common one being 'COMPLETED'.

In [6]:
# Inspect the STATE values
jobs_df['STATE'].value_counts().head()

STATE
COMPLETED            7375084
CANCELLED               9055
FAILED                  3766
CANCELLED by 9201       1776
OUT_OF_MEMORY           1739
Name: count, dtype: int64

#### BEGIN
Indicates when the job was started (initiated on a computer node).

In [7]:
# Inspect the BEGIN values
jobs_df['BEGIN'].value_counts().head()

BEGIN
2020-12-11T08:26:59    579
2020-12-11T08:38:00    579
2020-12-11T08:36:01    577
2020-12-11T09:46:44    576
2020-12-11T09:28:45    572
Name: count, dtype: int64

#### END
Indicates when the job ended (completed, failed, or was cancelled while running).

In [8]:
# Inspect the END values
jobs_df['END'].value_counts().head()

END
Unknown                651
2020-12-18T09:41:23    559
2021-10-02T22:48:56    551
2021-07-12T11:36:02    312
2021-10-03T13:36:05    311
Name: count, dtype: int64

#### USEDMEM
The amount of memory used in MB per node.

In [9]:
# Inspect USEDMEM values
jobs_df['USEDMEM'].value_counts().head()

USEDMEM
0           1099732
0.09M         65651
6.23M         26712
6.24M         19920
1637.41M       8863
Name: count, dtype: int64

#### USEDTIME
The used time is in d-hh:mm:ss or hh:mm:ss for jobs less than one day in duration.

In [10]:
# Inspect USEDTIME values
jobs_df['USEDTIME'].value_counts().head()

USEDTIME
00:00:07    41436
00:00:08    39442
00:00:10    39327
00:00:06    38977
00:00:09    38476
Name: count, dtype: int64

#### EXITCODE
The [exit code](https://www.agileconnection.com/article/overview-linux-exit-codes) for the job, with "0:0" indicating a successful job. Exit codes have two numbers, where if the first number is non-zero, it indicates a problem on the server side and if the second is nonzero, it indicates a problem on the user side.

In [11]:
# Inspect EXITCODE values
jobs_df['EXITCODE'].value_counts().head()

EXITCODE
0:0      7384480
1:0         4958
0:15        1887
0:125       1739
0:9         1361
Name: count, dtype: int64

### Explore CE5 logs

In [12]:
# READ slurm_wrapper_ce5.log
ce5_df = pd.read_csv('../data/slurm_wrapper_ce5.log', header=None, delimiter=' - ', engine='python')

In [13]:
# Display dataframe information
ce5_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4770893 entries, 0 to 4770892
Data columns (total 6 columns):
 #   Column  Dtype 
---  ------  ----- 
 0   0       object
 1   1       object
 2   2       object
 3   3       object
 4   4       object
 5   5       object
dtypes: object(6)
memory usage: 218.4+ MB


In [14]:
# Display head and tail data
ce5_df.head()

Unnamed: 0,0,1,2,3,4,5
0,2020-10-16 08:15:39.278699,user 0,retry 0,time 0.07347559928894043,returncode 0,"command ['/usr/bin/sacct', '-u', 'appelte1', '..."
1,2020-10-16 08:18:08.313309,user 0,retry 0,time 0.18363237380981445,returncode 0,"command ['/usr/bin/sacct', '-u', 'appelte1', '..."
2,2020-10-16 08:22:48.128689,user 0,retry 0,time 0.07547116279602051,returncode 0,"command ['/usr/bin/sacct', '-u', 'appelte1', '..."
3,2020-10-16 08:25:13.257408,user 0,retry 0,time 0.09484362602233887,returncode 0,"command ['/usr/bin/sacct', '-u', 'appelte1', '..."
4,2020-10-16 08:31:01.460723,user 0,retry 0,time 0.07498788833618164,returncode 0,"command ['/usr/bin/sacct', '-u', 'appelte1', '..."


In [15]:
# Inspect column 0 values
ce5_df[0].value_counts().head()

0
2021-03-04 14:18:01.059810    2
2021-03-18 11:10:18.373655    2
2021-03-06 22:34:13.472133    2
2020-10-21 20:49:16.303212    2
2021-07-12 11:43:35.543998    2
Name: count, dtype: int64

In [16]:
# Inspect column 1 values
ce5_df[1].value_counts().head()

1
user 9201    3093747
user 9202     639795
user 9203     386689
user 9221     312727
user 9219     178075
Name: count, dtype: int64

In [17]:
# Inspect column 2 values
ce5_df[2].value_counts().head()

2
retry 0    4345805
retry 1     369962
retry 2      55126
Name: count, dtype: int64

In [18]:
# Inspect column 3 values
ce5_df[3].value_counts().head()

3
time 0.10300111770629883     19
time 0.020383119583129883    19
time 0.10220503807067871     18
time 0.10304713249206543     18
time 0.1036684513092041      18
Name: count, dtype: int64

In [19]:
# Inspect column 4 values
ce5_df[4].value_counts().head()

4
returncode 0      4053244
returncode 1       697666
returncode 140      13735
returncode 255       6242
returncode 8            6
Name: count, dtype: int64

In [20]:
# Inspect column 5 values
ce5_df[5].value_counts().head()

5
command ['/usr/bin/scontrol', 'show', 'job']                      551116
command ['/usr/bin/squeue', '-o', '%i %T', '-u', 'cmspilot']       59796
command ['/usr/bin/squeue', '-o', '%i %T', '-u', 'lscpilot']       56818
command ['/usr/bin/squeue', '-o', '%i %T', '-u', 'uscmslocal']     24164
command ['/usr/bin/squeue', '-o', '%i %T', '-u', 'cmslocal']       23133
Name: count, dtype: int64

### Explore CE6 logs

In [22]:
# READ slurm_wrapper_ce6.log
ce6_df = pd.read_csv('../data/slurm_wrapper_ce6.log', header=None, delimiter=' - ', engine='python')

In [23]:
# Display dataframe information
ce6_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4776520 entries, 0 to 4776519
Data columns (total 6 columns):
 #   Column  Dtype 
---  ------  ----- 
 0   0       object
 1   1       object
 2   2       object
 3   3       object
 4   4       object
 5   5       object
dtypes: object(6)
memory usage: 218.7+ MB


In [24]:
# Display head and tail data
ce6_df.head()

Unnamed: 0,0,1,2,3,4,5
0,2020-10-16 10:37:44.163454,user 9202,retry 0,time 0.08495402336120605,returncode 0,"command ['/usr/bin/scontrol', 'show', 'job', '..."
1,2020-10-16 10:37:44.206654,user 9202,retry 0,time 0.08943057060241699,returncode 0,"command ['/usr/bin/scontrol', 'show', 'job', '..."
2,2020-10-16 10:37:44.218760,user 9202,retry 0,time 0.05928945541381836,returncode 0,"command ['/usr/bin/scontrol', 'show', 'job', '..."
3,2020-10-16 10:37:44.256403,user 9202,retry 0,time 0.038695573806762695,returncode 0,"command ['/usr/bin/scontrol', 'show', 'job', '..."
4,2020-10-16 10:37:44.611603,user 9202,retry 0,time 0.03343677520751953,returncode 0,"command ['/usr/bin/scontrol', 'show', 'job', '..."


In [25]:
# Inspect column 0 values
ce6_df[0].value_counts().head()

0
2021-09-24 04:20:47.691512    2
2020-10-21 05:35:11.193169    2
2021-02-28 10:07:50.419558    2
2021-02-27 17:54:52.010808    2
2020-11-23 00:26:07.162025    2
Name: count, dtype: int64

In [26]:
# Inspect column 1 values
ce6_df[1].value_counts().head()

1
user 9201    2710665
user 9202     653123
user 9203     448291
user 9219     440225
user 9221     369354
Name: count, dtype: int64

In [27]:
# Inspect column 2 values
ce6_df[2].value_counts().head()

2
retry 0    4299816
retry 1     425174
retry 2      51530
Name: count, dtype: int64

In [28]:
# Inspect column 3 values
ce6_df[3].value_counts().head()

3
time 0.10167908668518066    16
time 0.10226058959960938    15
time 0.10218477249145508    15
time 0.10299229621887207    15
time 0.10190343856811523    14
Name: count, dtype: int64

In [29]:
# Inspect column 4 values
ce6_df[4].value_counts().head()

4
returncode 0      4165185
returncode 1       598974
returncode 140      11252
returncode 255       1105
returncode 8            4
Name: count, dtype: int64

In [30]:
# Inspect column 5 values
ce6_df[5].value_counts().head()

5
command ['/usr/bin/scontrol', 'show', 'job']                      987351
command ['/usr/bin/squeue', '-o', '%i %T', '-u', 'cmspilot']       57020
command ['/usr/bin/squeue', '-o', '%i %T', '-u', 'lscpilot']       54705
command ['/usr/bin/squeue', '-o', '%i %T', '-u', 'uscmslocal']     24166
command ['/usr/bin/squeue', '-o', '%i %T', '-u', 'cmslocal']       23972
Name: count, dtype: int64

##  Phase 2: Clean and Transform the Data
**Objectives:**
* Extract job completions from fullsample.csv.  
* Parse CE5 and CE6 logs to identify unresponsive events.  
* Create analysis-ready features (time windows, completion counts, unresponsiveness indicators).  
* Optionally include other features (currently running jobs or resource usage, time-of-day).  

**Notebook Sections:**
* Code: Filtering and transforming datasets.  
* Markdown: Document preprocessing steps and reasoning.  
* Code: Combine datasets into a single dataset suitable for analysis.

### Clean and transform log data

In [31]:
# Concatenate ce5 and ce6 logs
logs_df = pd.concat([ce5_df, ce6_df])
logs_df.shape

(9547413, 6)

In [32]:
# Rename log columns
logs_df = logs_df.rename(columns={0: "TIMESTAMP", 1: "USER", 2: "RETRY", 3: "RUNTIME", 4: "RETURNCODE", 5: "COMMAND"})
logs_df.columns

Index(['TIMESTAMP', 'USER', 'RETRY', 'RUNTIME', 'RETURNCODE', 'COMMAND'], dtype='object')

In [40]:
# Convert RUNTIME values to floats
logs_df['RUNTIME'] = logs_df['RUNTIME'].str.replace("time ", "").astype(float)
logs_df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 9547413 entries, 0 to 4776519
Data columns (total 6 columns):
 #   Column      Dtype  
---  ------      -----  
 0   TIMESTAMP   object 
 1   USER        object 
 2   RETRY       object 
 3   RUNTIME     float64
 4   RETURNCODE  object 
 5   COMMAND     object 
dtypes: float64(1), object(5)
memory usage: 509.9+ MB


In [41]:
# Convert TIMESTAMP values to datetime objects and sort
logs_df['TIMESTAMP'] = pd.to_datetime(logs_df['TIMESTAMP'], format="ISO8601")
logs_df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 9547413 entries, 0 to 4776519
Data columns (total 6 columns):
 #   Column      Dtype         
---  ------      -----         
 0   TIMESTAMP   datetime64[ns]
 1   USER        object        
 2   RETRY       object        
 3   RUNTIME     float64       
 4   RETURNCODE  object        
 5   COMMAND     object        
dtypes: datetime64[ns](1), float64(1), object(4)
memory usage: 509.9+ MB


In [42]:
sbatch_9204_logs_df = (
    logs_df
    .loc[(logs_df['USER'] == "user 9204") & (logs_df['COMMAND'].str.contains("/usr/bin/sbatch"))]
    .sort_values("TIMESTAMP")
    .reset_index(drop=True)
)
sbatch_9204_logs_df

Unnamed: 0,TIMESTAMP,USER,RETRY,RUNTIME,RETURNCODE,COMMAND
0,2020-10-16 08:34:42.779719,user 9204,retry 0,5.240251,returncode 0,"command ['/usr/bin/sbatch', '/tmp/condor_g_scr..."
1,2020-10-16 08:53:15.711346,user 9204,retry 0,0.142444,returncode 0,"command ['/usr/bin/sbatch', '/tmp/condor_g_scr..."
2,2020-10-16 09:04:56.472464,user 9204,retry 0,0.066345,returncode 0,"command ['/usr/bin/sbatch', '/tmp/condor_g_scr..."
3,2020-10-16 09:24:26.694758,user 9204,retry 0,0.072883,returncode 0,"command ['/usr/bin/sbatch', '/tmp/condor_g_scr..."
4,2020-10-16 09:34:24.594440,user 9204,retry 0,0.076409,returncode 0,"command ['/usr/bin/sbatch', '/tmp/condor_g_scr..."
...,...,...,...,...,...,...
61242,2021-10-07 21:30:31.591816,user 9204,retry 0,0.030386,returncode 0,"command ['/usr/bin/sbatch', '/tmp/condor_g_scr..."
61243,2021-10-07 21:39:15.524139,user 9204,retry 0,0.033684,returncode 0,"command ['/usr/bin/sbatch', '/tmp/condor_g_scr..."
61244,2021-10-07 21:44:59.008524,user 9204,retry 0,0.029954,returncode 0,"command ['/usr/bin/sbatch', '/tmp/condor_g_scr..."
61245,2021-10-07 21:53:47.800229,user 9204,retry 0,0.030737,returncode 0,"command ['/usr/bin/sbatch', '/tmp/condor_g_scr..."


In [44]:
# Create a column for unresponsive logs
sbatch_9204_logs_df['UNRESPONSIVE'] = ((sbatch_9204_logs_df['RUNTIME'] > 15) & (sbatch_9204_logs_df['RETURNCODE'] != "returncode 0"))
sbatch_9204_logs_df['UNRESPONSIVE'].value_counts()

UNRESPONSIVE
False    57951
True      3296
Name: count, dtype: int64

### Clean and transform job data

In [45]:
# Convert END values to datetime objects and sort
jobs_df['END'] = pd.to_datetime(jobs_df['END'], format="ISO8601", errors="coerce").sort_values().reset_index(drop=True)
jobs_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7395885 entries, 0 to 7395884
Data columns (total 12 columns):
 #   Column     Dtype         
---  ------     -----         
 0   JOBID      object        
 1   STATE      object        
 2   BEGIN      object        
 3   END        datetime64[ns]
 4   REQMEM     object        
 5   USEDMEM    object        
 6   REQTIME    object        
 7   USEDTIME   object        
 8   NODES      int64         
 9   CPUS       int64         
 10  PARTITION  object        
 11  EXITCODE   object        
dtypes: datetime64[ns](1), int64(2), object(9)
memory usage: 677.1+ MB


In [46]:
# Remove unknown END times
jobs_df = jobs_df.dropna(subset=["END"])
jobs_df.shape

(7395234, 12)

In [47]:
# Create a column for rolling count of jobs ending in a given minute
jobs_df['JOBS_ENDING_PER_MIN'] = jobs_df.rolling(window="1min", on="END")['END'].count()
jobs_df.head()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  jobs_df['JOBS_ENDING_PER_MIN'] = jobs_df.rolling(window="1min", on="END")['END'].count()


Unnamed: 0,JOBID,STATE,BEGIN,END,REQMEM,USEDMEM,REQTIME,USEDTIME,NODES,CPUS,PARTITION,EXITCODE,JOBS_ENDING_PER_MIN
0,30616928,RUNNING,2021-07-31T22:15:00,2020-10-01 00:10:15,2048Mn,0,10:04:00,67-22:14:22,1,1,production,0:0,1.0
1,30853133,COMPLETED,2021-08-06T11:36:09,2020-10-01 00:12:58,262144Mn,20604.62M,30-00:00:00,30-00:00:23,1,1,cgw-platypus,0:0,1.0
2,30858137,COMPLETED,2021-08-06T19:04:39,2020-10-01 00:13:31,204800Mn,57553.77M,30-00:00:00,30-00:00:14,1,32,cgw-tbi01,0:0,2.0
3,30935078,COMPLETED,2021-08-09T16:52:51,2020-10-01 00:17:23,65536Mn,20577.96M,29-04:00:00,29-04:00:04,1,8,cgw-platypus,0:0,1.0
4,31364111_2,COMPLETED,2021-08-17T07:45:07,2020-10-01 00:17:53,16384Mn,9733.43M,24-09:00:00,24-09:00:17,1,1,production,0:0,2.0


### Get count of jobs ending per minute for each log

In [48]:
def get_job_end_count_by_timestamp(timestamp) -> int:
    """This function gets the max number of jobs ending for the given timestamp to the minute.
    Params:
        timestamp (datetime)
    Returns: int
        """
    return (
        jobs_df
        .loc[jobs_df['END'].dt.floor("min") == timestamp.floor("min")]
    )['JOBS_ENDING_PER_MIN'].max()
    

In [49]:
sbatch_9204_logs_df['JOBS_END_PER_MIN'] = sbatch_9204_logs_df['TIMESTAMP'].apply(func=get_job_end_count_by_timestamp)
sbatch_9204_logs_df

Unnamed: 0,TIMESTAMP,USER,RETRY,RUNTIME,RETURNCODE,COMMAND,UNRESPONSIVE,JOBS_END_PER_MIN
0,2020-10-16 08:34:42.779719,user 9204,retry 0,5.240251,returncode 0,"command ['/usr/bin/sbatch', '/tmp/condor_g_scr...",False,8.0
1,2020-10-16 08:53:15.711346,user 9204,retry 0,0.142444,returncode 0,"command ['/usr/bin/sbatch', '/tmp/condor_g_scr...",False,16.0
2,2020-10-16 09:04:56.472464,user 9204,retry 0,0.066345,returncode 0,"command ['/usr/bin/sbatch', '/tmp/condor_g_scr...",False,12.0
3,2020-10-16 09:24:26.694758,user 9204,retry 0,0.072883,returncode 0,"command ['/usr/bin/sbatch', '/tmp/condor_g_scr...",False,15.0
4,2020-10-16 09:34:24.594440,user 9204,retry 0,0.076409,returncode 0,"command ['/usr/bin/sbatch', '/tmp/condor_g_scr...",False,129.0
...,...,...,...,...,...,...,...,...
61242,2021-10-07 21:30:31.591816,user 9204,retry 0,0.030386,returncode 0,"command ['/usr/bin/sbatch', '/tmp/condor_g_scr...",False,
61243,2021-10-07 21:39:15.524139,user 9204,retry 0,0.033684,returncode 0,"command ['/usr/bin/sbatch', '/tmp/condor_g_scr...",False,
61244,2021-10-07 21:44:59.008524,user 9204,retry 0,0.029954,returncode 0,"command ['/usr/bin/sbatch', '/tmp/condor_g_scr...",False,
61245,2021-10-07 21:53:47.800229,user 9204,retry 0,0.030737,returncode 0,"command ['/usr/bin/sbatch', '/tmp/condor_g_scr...",False,


In [54]:
sbatch_9204_logs_df['UNRESPONSIVE'] = sbatch_9204_logs_df['UNRESPONSIVE'].astype(int)

In [56]:
logs_df.to_csv("../data/sbatch_9204_logs_df.csv")

# Once run, you can now load the data instead of re-running
# sbatch_9204_logs_df = pd.read_csv("../data/sbatch_9204_logs_df.csv")

## Phase 3: Analyze and Visualize
**Objectives:**
* Explore the relationship between job completions and unresponsiveness.  
* Create visualizations and basic summary statistics.  

**Notebook Sections:**
* Code: Time-series plots, scatterplots, boxplots, summary statistics.
* Markdown: Interpret the visualizations and describe patterns.  
* Code: Fit a simple logistic regression to test the hypothesis.
* Markdown: Summarize the results and draw conclusions from the model.  
* Optional: Explore additional factors (eg. day of week).

In [55]:
unresponsive_end_model = smf.logit(formula="UNRESPONSIVE ~ JOBS_END_PER_MIN", data=sbatch_9204_logs_df).fit()
unresponsive_end_model.summary()

Optimization terminated successfully.
         Current function value: 0.178930
         Iterations 7


0,1,2,3
Dep. Variable:,UNRESPONSIVE,No. Observations:,55293.0
Model:,Logit,Df Residuals:,55291.0
Method:,MLE,Df Model:,1.0
Date:,"Mon, 15 Dec 2025",Pseudo R-squ.:,0.0001341
Time:,09:04:38,Log-Likelihood:,-9893.6
converged:,True,LL-Null:,-9894.9
Covariance Type:,nonrobust,LLR p-value:,0.1033

0,1,2,3,4,5,6
,coef,std err,z,P>|z|,[0.025,0.975]
Intercept,-3.0718,0.024,-129.515,0.000,-3.118,-3.025
JOBS_END_PER_MIN,-0.0010,0.001,-1.582,0.114,-0.002,0.000


### HYPOTHESIS: The slurm scheduler is failing due to the number of jobs that are ending at the same time.

### NULL HYPOTHESIS: The slurm scheduler is failing due to random chance.

### ANALYSIS: The number of jobs ending per minute has a p-value (0.114) that is greater than the threshold to be statistically significant (0.05). Therefore we cannot reject the null hypothesis that the slurm scheduler is failing due to random chance.