## Create Job Scripts for SAC-SMA Runs

Notebook 2/X

NOTE: NEED SACSMA ENVIRONMENT ACTIVATED TO SEND OFF RUNS.

In notebook 1/X we created configuration files for both Kratzert's NeuralHydrology's (NH) Long Short-Term Memory model and Nearing's SACSMA-SNOW17 (SAC-SMA) model. In this notebook, we create slurm job scripts pointing to the configuration files _specifically_ made for the SAC-SMA model and send them off.

In [1]:
#Automatically reload modules; ensures most recent versions
%load_ext autoreload
%autoreload 2

### Import Libraries

In [2]:
#Import Python Libraries
import os
import glob
import subprocess
import pickle as pkl
from pathlib import Path

### Define Parameters

Similar to the create_climate_experiments notebook (1/X), the most important parameters need to be defined first. As a reminder, 'inputs' refers to the nature of the inputs (static or dynamic), 'exp_type' refers to the experiment type (extreme or random), 'forcing' refers to the source of forcing data, and 'years' refers to whether we want to use all years or only years avaliable for the National Water Model as well.

Defining these parameters directs the notebook to the corresponding folder containing the configuration files created in the first notebook.

##### Most Important Experiment Parameters

In [3]:
#########################################################################################

#If files already exist, should they be overwritten by this notebook?
overwrite = False

#Explicitly define if you want to send off the created/defined runs
send_runs = False

#Specify experiment type; options include 'extreme' and 'random' (random experiments often used as benchmark)
exp_type = 'extreme'

#Specify ONE forcing data source; 5 options, including daymet, nldas(_extended), maurer(_extended)
forcing = "daymet"

#Specify years to use for experiments; 'all' or 'nwm'
years = 'nwm'

#########################################################################################

Again, just like the first notebook, additional experiment parameters are defined below. The SAC-SMA models are trained and tested on a CPU node so run characteristics need to be specified, including the maximum amount of model runs, amount of DDS trials, how much of the node to use, and which algorithm to use.

##### Additional Experiment Parameters

In [4]:
#########################################################################################

max_model_runs = 1e4
dds_trials = 1
use_cores_frac = .90
algorithm = 'DDS'

#########################################################################################

##### Paths

In [5]:
#########################################################################################

#Path to orking directory
working_dir = Path(os.getcwd()) 

#Path to run_configs directory (../configs/run_configs)
run_configs_dir = working_dir / 'nh_lstm' / 'configs' / 'run_configs' 

#Path to job_scripts directory (../sacsma/job_scripts)
job_scripts_path = working_dir / 'sacsma' / 'job_scripts'

#########################################################################################

You should not have to edit anything below this cell.

### Explicit Warnings

In [7]:
#If we want to overwrite the job scripts, that's fine, but...
if overwrite == True:
    #Warn us!
    print('\033[91m'+'\033[1m'+'Job scripts to be overwritten.')

### Load Source Data

Source data for this notebook consists of the SAC-SMA experiment configuration files we want to make slurm jobs for.

In [None]:
#SAC-SMA experiments only created with dynamic input tags for structure directory consistency;
#SAC-SMA does not accept static or dynamic inputs
inputs = 'dynamic'

In [8]:
#Navigate to SAC-SMA experiment directory
exp_config_dir = run_configs_dir / 'sacsma' / inputs / exp_type / forcing / years

#Glob to get list of files (ending in .yml) in exp_config_dir
config_files = list(exp_config_dir.glob('*.yml'))

#Print number of experiment configurations in exp_config_dir
print(f'There are {len(config_files)} experiments.')

There are 40 experiments.


### Create and Save Slurm Job Script Files

Now we loop through all of the configuration files in config_files, reference the "dummy", or representative, slurm file and replace its dummy variables. The dummy variables include experiment name, path to configuration file, and the optimizer hyperparameters defined above. Note that the experiment name is sourced from the configuration file name and that the job file is saved to the job_scripts directory.

In [11]:
#If job scripts for this experiment have not yet been made or if we want to overwrite them...
if len(list(job_scripts_path.glob(f'**/sacsma_{exp_type}_{forcing}_{years}*.slurm'))) == 0 or overwrite == 'True':

    #For every config file...
    for c, config_file in enumerate(config_files):

        #Open dummy slurm file
        with open('run_job.slurm', 'r') as file:
            filedata = file.read()

        #Extract experiment name from config file name
        exp_name = str(config_file).split('/')[-1]

        #Replace dummy experiment name
        filedata = filedata.replace('dummy', f'{exp_name}')

        #Replace dummy config filepath
        filedata = filedata.replace('${1}', str(config_file))

        #Replace dummy max model runs
        filedata = filedata.replace('${2}', str(int(max_model_runs)))

        #Replace dummy algorithm
        filedata = filedata.replace('${3}', algorithm)

        #Extract config name
        conf = str(config_file).split('/')[-1].split('.')[0]

        #Define path to experiment job script file and save
        job_file = Path(f'job_scripts/{conf}.slurm')
        
        #Write the job file
        with open(job_file, 'w') as file:
            file.write(filedata)

To double check, save a list of the experiment slurm files in job_scripts and print how many there are.

In [13]:
#Make list of slurm files in job_files folder for defined experiment
job_files = list(Path('job_scripts').glob(f'**/sacsma_{inputs}_{exp_type}_{forcing}_{years}_*.slurm'))

#Print number of job_files
print(f'There are {len(job_files)} experiments.')    

There are 40 experiments.


You should now a job_scripts folder of runs ready to be run... but be warned (refer below).

### Run SAC-SMA Experiments

##### WARNING: BE CAREFUL WHEN RUNNING NEXT CELL

The next cell is a loop that loops through the corresponding experiment files in the job_script directory and sends them to be run on the CPU node, so you better be ready.

In [14]:
#If we defined that we wanted runs to be sent off to the cpu...
if send_runs:

    #For every slurm file...
    for file in job_files:

        #Define run command
        run_cmd = f"sbatch {file}"

        #Define path to write log file to
        log_file = f"log_files/{str(file).split('/')[-1].split('.')[0]}"

        #Execute command and send off runs
        with open(Path(log_file), 'w') as f:
            subprocess.Popen(run_cmd, stderr=subprocess.STDOUT, stdout=f, shell=True)