Skip to content

evcu/exp.bootstrp

master
Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
img
 
 
 
 
 
 

exp.bootstrp

This repo is a bootstrap for experiments and includes helper functions scripts for pytorch training and slurm job scheduler.

Basic idea: Create a python script(experiment) which accepts command line arguments. Provide arg_lists and generate slurm_jobs using the cross product of the given arg lists.

Quick Start

First thing setup your ssh workflow. Then lets kick-start our experiment.

ssh prince
git clone git@github.com:evcu/exp.bootstrp.git
mv exp.bootstrp my_exp
cd my_exp

First debug the experiment on a interactive session prince_slurm_bootstrap.sh loads the modules needed, update as needed. Personally I am using python3 with pip --user packages. You can call it with install for the first time

srun -t2:30:00 --mem=5000 --gres=gpu:1 --pty /bin/bash

. ./prince_slurm_bootstrap.sh install
cd experiments/cifar10/
python main.py --epoch 1

After we are sure that our main script works, we can start create automated experiments with create_experiment_jobs.py scripts. First thing to do is updating some of the SLURM fields under experiments/default_conf.yaml. Replace NET_ID with you net_id for example if you are a fellow NYU student and using prince. You may need to completely change this file according to your needs if you are working in another system or have different requirements.

log

Note that each element of the experiment key in the yaml file is a dictionary itself involves argument lists for <exp_name>/main.py. Each of the values in these argument lists are cross-product with others in the dictionary to generate all possible combinations.

Now we can generate experiment scripts.

cd ../
python create_experiment_jobs.py --debug

if they all look nice then you can create the experiment folder. and submit the jobs

python create_experiment_jobs.py
bash /scratch/ue225/my_project/exps/cifar10/cifarLR_03.26/submit_all.sh

which would output something like this

log Let say you wanna define a new experiment. You would do by creating a new folder experiments/new_folder/ and a experiments/new_folder/main.pyscript that is intended to be run. The main.py script should accept --log_folder and --conf_file flags at minimum. Then you can change exp_name at experiments/default_conf.yaml to new_folder and create new experiments.

Features

Visualizing Tensorboard Events

there are several options

  • You can scp like
scp prince:/scratch/ue225/my_project/exps/cifar10/cifarLR
.26/tb_logs ./
  • You can open a tunnel to the prince and run tensorboard on prince and connect to it through port forwarding. You can look my (remote Jupyter and port forwarding](https://evcu.github.io/notes/port-forwarding/) notes.
  • You can use sshfs and get the logs sync into your local file system. Details here

and read your results log log log

Contribution

I am excited to collaborate and learn from you if you figured out better ways experimenting or wanna add text/code to this repo. Please create an issue or reach_out to me.

TODO

  • change create_experiments such that maybe the defaults included in the experiment.yaml and dumped.
  • Source code needs to be copied!

About

This repo is a bootstrap for experiments and includes helper functions scripts for pytorch training and slurm job scheduler.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published