# Model Time Analysis (BETA)

### READ BEFORE STARTING:

### YOU HAVE TO SET UP YOUR LOG FILES TO RUN THIS CODE.
#### Option 1) Editing submit.py so all trained models output their logs in a location for the program.
1) Open submit.py, and go to line 64 (Or find the line that starts with f'log... it should be around line 60-65 if I'm wrong)
2) Change that line to the following: ```f'log = {os.path.join("condor", "logs", f"{args.model_name}.log")}',```
3) Retrain any models and gain access to their logs
#### Option 2) Manually input the logs yourself
1) Open the terminal, and ssh into submit
2) Run the following command: ```/scratch/YOUR_NAME/icetop-cnn/condor/logs/``` (Take note of the YOUR_NAME field, be sure to change it!)
3) Run ```ls``` to see all of the logs
4) Once you've found a log you want to copy, run the following command: ```cat DESIRED_LOG.log``` (This will display all the file contents)
5) Keep the terminal open. On the left-hand side of the screen, navigate to: ```./home/YOUR_NAME/icetop-cnn/condor```
6) Once there, create a new folder named ```logs```
7) Open up the new directory, and create a new file ```DESIRED_LOG.log```, where DESIRED_LOG is the same name of the log you opened on the terminal
8) Open the new file, copy the contents in the terminal, and paste into the new log file
9) Repeat for all logs desired, and now update submit.py so you never have to do this again!

#### THIS IS A BETA!! If you encounter any bugs or errors please report them to me!!

## Import Required Modules

In [30]:
%matplotlib inline

import json
import os
from glob import glob

import matplotlib.pyplot as plt
import numpy as np

from utils import get_cuts, get_event_parameters, get_training_assessment_cut

from datetime import datetime as dt
from datetime import timedelta as td

## Model and Assessment Selection

In [35]:
# The keys will be the names of the models you wish to analyze.
# The values will be the nuclei to assess for each model.
MODEL_NAMES_AND_NUCLEI = {
    'comp_baseline': 'phof',
}

## Training Resource Consumption

### Access the Condor Logs

In [36]:
# Get the directory of where all the logs are located
LOG_DIR = os.path.join(os.getcwd(), 'condor', 'logs')

# Get every log file within the user's home directory with it's file path
ALL_LOGS = glob(os.path.join(LOG_DIR, '*.log'))

# Create a second list that only contains the name of the model pertaining to said log
LOG_NAMES = []
for log in ALL_LOGS:
    LOG_NAMES.append(log[log.rfind('/')+1:log.find('.log')])

# Create a third list that only contains the logs of the models that are being assessed 
MODEL_LOGS = sorted(set(LOG_NAMES).intersection(MODEL_NAMES_AND_NUCLEI))
print(MODEL_LOGS)
# Create a list of file objects of each log
# Exists as a function so time and memory can be ran seperately while still properly closing files after opening
def get_log_files():
    LOG_FILES = []
    for log in MODEL_LOGS:
        LOG_FILES.append(
            open(LOG_DIR +'/'+log+'.log')
        )
    return LOG_FILES

[]


### Assess Time Needed to Train

In [37]:
#Take in the file
#Take this time: 005 (238358931.000.000) 2025-09-11 21:57:58 Job terminated.
##Which is seen right above the run-time stats
#And subtract it from: 040 (238358931.000.000) 2025-09-11 19:19:24 Started transferring input files
##Which is when files start being transfered 

# Track what job is currently being worked on 
jobCount = 1
# Create list to store dicts of modelNameJobNum and CPU Time
JOB_INFO = []
# Go model by model 
for file in get_log_files():
    # On new file, reset variables to inital state
    jobCount = 1
    # Go line by line 
    for line in file.readlines():
        # Get the start time
        if line.find("Started transferring input files") != -1:  
            #Calculate year, month, day of the start
            year  = int(line[ line.find('-')-4 :  line.find('-')    ])
            month = int(line[ line.find('-')+1 : line.rfind('-')    ]) 
            day   = int(line[line.rfind('-')+1 : line.rfind('-') + 3])
            #Calculate hour, minute, second of the start
            hr    = int(line[ line.find(':')-2 :  line.find(':')    ])
            mn    = int(line[ line.find(':')+1 : line.rfind(':')    ])
            sc    = int(line[line.rfind(':')+1 : line.rfind(':') + 3])
            #Create the datetime object for the start
            startTime = dt(year, month, day, hr, mn, sc)        
        # Get the end time
        elif line.find("Job terminated") != -1 and line.find("of its own accord") == -1:
            #Calculate year, month, day of the end
            year  = int(line[ line.find('-')-4 :  line.find('-')    ])
            month = int(line[ line.find('-')+1 : line.rfind('-')    ]) 
            day   = int(line[line.rfind('-')+1 : line.rfind('-') + 3])
            #Calculate hour, minute, second of the end
            hr    = int(line[ line.find(':')-2 :  line.find(':')    ])
            mn    = int(line[ line.find(':')+1 : line.rfind(':')    ])
            sc    = int(line[line.rfind(':')+1 : line.rfind(':') + 3])
            #Create the datetime object for the end
            endTime = dt(year, month, day, hr, mn, sc-1)

            #Calculate the time difference between endTime and startTime
            CPUTime = endTime - startTime;
            #Get the name of the file
            fileName= file.name[file.name.rfind('/')+1:file.name.find('.log')]
            #Append a dictionary into JOB_INFO
            JOB_INFO.append({
                "modelAndJob" : fileName + ", Iteration "+str(jobCount),
                "CPUTimeSeconds" : CPUTime.total_seconds()
            })
            #Iterate jobCount
            jobCount+=1
        # If the line doesn't have data that we want
    # Close the file
    file.close()

for job in JOB_INFO:
    print("Time to train " + job["modelAndJob"] + ":")
    print("\tCPU Runtime: " + str(td(seconds=job["CPUTimeSeconds"])))
    print()