# Sratch vs. project

Within Alliance, you have a scratch and project directory. I typically use the scratch directory because the data loading is faster. However, be aware that unused files get deleted every 60 days or so. Therefore, save any useful files in either the project directory or copy them to canfar.


# Setup virtual environment

Create a virtual environment in your project directory:

`module load python/3.9.6`

`python -m venv /home/wiltonf/project/wiltonf/torchnet`

Load the new environment:

`source /home/wiltonf/project/wiltonf/torchnet/bin/activate`

`module load hdf5/1.10.6`

Install any necessary packages in this environment:

`pip install torch torchvision torchsummary`

`pip install h5py seaborn`

## Training the Network

### Option 1

1. The model architecture and hyper-parameters are set within configuration file in [the config directory](./configs). For instance, I have already created the [original configuration file](./configs/starnet_1.ini). You can copy this file under a new name and change whichever parameters you choose.
  
2. If you were to create a config file called `starnet_2.ini` in Step 1, this model could be trained by running `python train_starnet.py starnet_2 -v 1000 -ct 5.00` which will train your model displaying the progress every 1000 batch iterations and the model would be saved every 5 minutes. This same command will continue training the network if you already have the model saved in the [model directory](./models) from previous training iterations.

### Option 1 b

You can do this type of command line training using a GPU by first launching a gpu node within your command terminal:

`salloc --time=03:00:00 --gres=gpu:1 --cpus-per-task=12 --account=def-sfabbro --mem=16000M`

### Option 2

Alternatively, if operating on compute-canada, you can use the `cc/launch_starnet.py` script to simultaneously create a new configuration file and launch a bunch of jobs to train your model. 

1. Change the [load modules file](./cc/module_loads.txt) to include the lines necessary to load your own environment with pytorch, etc. 
2. Then, to copy the [original configuration](./configs/starnet_1.ini), but use, say, a batch size of 64 spectra, you could use the command `python cc/launch_starnet.py starnet_2 -bs 64`. This will launch one 3-hour job on the GPU nodes to finish the training. You can checkout the other parameters that can be changed with the command `python cc/launch_starnet.py -h`.

### Model Grid Search

To launch a bunch of models and test a grid search of different configuration parameters, checkout the [gridsearch file](./cc/launch_starnet_gridsearch.py) for some ideas.