# Model set up and installation

I keep the very basic framework of code in github. When sharing with collaborators, I explain I work in my /project/ directory. The framework can be immidiately cloned from my github page as below. 

In [None]:
cd /expanse/lustre/projects/sio134/gmooers/
git clone https://github.com/gmooers96/VAE_Workflow.git

I prefer to use my own environments rather than load modules. I generally first install miniconda:

(You can download the .sh file for it here https://docs.conda.io/en/latest/miniconda.html). Then in the command line simply put

In [None]:
./Miniconda3-latest-Linux-x86_64.sh

Once miniconda is installed, I set up my environments.

In this cloned repo there are two environments that should have all the packages you need to both train the neural network and do any post-processing. the first one:

MOOERS_GPU_ENV.yml

can be used to train the model. The cpu environment can be used for post-processing. You can set the environments up like so:

In [None]:
conda env create -f MOOERS_CPU_ENV.yml -n CPU
conda env create -f MOOERS_GPU_ENV.yml -n GPU

Note that each of these can take ~1 hour to set up.

At this point, you can try out the neural networks. This repository contains two neural networks the Pritchard Group uses often for our research. The first is a single channel VAE. The files for it are

- train_fully_conv.py
- model_config/config_1.json
- Bash_Scripts/vae_1.sh
- sample_fully_conv_improved.py

The location of the training data can be found in the config file (model_config/config_1.json. More specifically, you should care about lines 8-15.

In [None]:
    "data": {
        "training_data_path": "/expanse/lustre/projects/sio134/gmooers/CBRAIN-CAM/MAPS/Preprocessed_Data/Big_Randomized_Trackable/Multi_Sim_Randomized_Space_Time_W_Training.npy",
        "test_data_path": "/expanse/lustre/projects/sio134/gmooers/CBRAIN-CAM/MAPS/Preprocessed_Data/Big_Randomized_Trackable/Multi_Sim_Randomized_Space_Time_W_Test.npy",
        "train_labels": "/fast/gmooers/Preprocessed_Data/Centered_50_50/Y_Train.npy",
        "test_labels": "/fast/gmooers/Preprocessed_Data/Centered_50_50/Improved_Y_Test.npy",
        "max_scalar": "/expanse/lustre/projects/sio134/gmooers/CBRAIN-CAM/MAPS/Preprocessed_Data/Big_Randomized_Trackable/Multi_Sim_Randomized_Space_Time_Max_Scalar.npy",
        "min_scalar": "/expanse/lustre/projects/sio134/gmooers/CBRAIN-CAM/MAPS/Preprocessed_Data/Big_Randomized_Trackable/Multi_Sim_Randomized_Space_Time_Min_Scalar.npy"
    },

Essential are the: 
- training_data_path
- test_data_path
- max_scalar
- min_scalar

But the data is all scaled and ready to go. In the config file you can adjust any hyperparameters (batch size, learning rate, filter size, ect..) and it will be automatically read into the training file. You can either launch the model from the command line like so

In [None]:
conda activate {name of the GPU Environment}
python3 train_fully_conv.py --id 1

But given the VAE takes hundreds of epochs (typically over 24 hours) to train, I usually rely on the Expanse queue. You can submit the model immidiately to the queue via:

In [None]:
cd Bash_Scripts
sbatch vae_1.sh

Though you may need to change line 22 depending on what you name your GPU environment (default below)

In [None]:
source activate GPU2

In terms of analysis, the VAE archetexture you specify in the config file will auotmatically send a diagrma of the VAE encoder and decoder to:

In [None]:
model_graphs/model_diagrams/

Additionally, upon successful completion of training, the loss curves will be saved to:

In [None]:
model_graphs/losses/

In my experience you want to see a reconstruction learning curve immidiately minimzing, finding a minima, and the validation loss curve overfitting after several hundred epochs (the code will save the best model, so this is not a problem)

On the otherhand, since this is a (linearly) annealling VAE, the KL Divergence will spike up for the first several epochs, the begin to minimize as we weight in the term in the loss function more with each passing epoch. 

Further analysis of the model can be done using the sampling script:

In [None]:
python3 sample_fully_conv_improved.py --id 1

It is a bit of a clunky script, with some residual hardcoding I have never had the time to correct. Vasically there are two built in function you can use to analyze the trained VAE. On line 594, comment in this function:

In [None]:
sample_latent_space_var(encoder_result.vae_encoder, train_data, test_data, args.id, dataset_min, dataset_max,  args.dataset_type)

When the sampling script is now run, it will create a visualization of the latent space (you can hard code in the final diensionality within the function itself, I reccomend two or three). In my experience, this has been the best way to tell if (for representation learning) our VAE is successfully trained.

Another option is to comment in line 593 and visualize reconstrctions of specific vertical velocity fields fro mthe VAE Decoder

In [None]:
reconstruct_targets_paper(vae, test_data, [2, 15, 66 , 85, 94], args.id, dataset_max, dataset_min)

The other model is the multichannel VAE. the procedure to train and analyze it should be almost idential to above.

- train_fully_conv_multichannel.py
- model_config/config_3.json
- Bash_Scripts/vae_3.sh
- sample_fully_conv_improved_multichannel.py

But note in the config file, it is pulling from three variables (vertical velocity again but also temperature and water vapor)

In [None]:
    "data": {
        "training_data_path": "/expanse/lustre/projects/sio134/gmooers/CBRAIN-CAM/MAPS/Preprocessed_Data/Centered_50_50/Space_Time_W_Training.npy",

        
        "training_data_path_T": "/expanse/lustre/projects/sio134/gmooers/CBRAIN-CAM/MAPS/Preprocessed_Data/Trackable_Data/T_Variable/Space_Time_Anon_T_Training.npy",
        
        "training_data_path_Q": "/expanse/lustre/projects/sio134/gmooers/CBRAIN-CAM/MAPS/Preprocessed_Data/Trackable_Data/Q_Variable/Space_Time_Anon_Q_Training.npy",
        
        "training_data_path_W": "/expanse/lustre/projects/sio134/gmooers/CBRAIN-CAM/MAPS/Preprocessed_Data/Trackable_Data/W_Variable/Space_Time_W_Training.npy",
        
        "test_data_path_T" : "/expanse/lustre/projects/sio134/gmooers/CBRAIN-CAM/MAPS/Preprocessed_Data/Trackable_Data/T_Variable/Space_Time_Anon_T_Test.npy",
        
        "test_data_path_Q" : "/expanse/lustre/projects/sio134/gmooers/CBRAIN-CAM/MAPS/Preprocessed_Data/Trackable_Data/Q_Variable/Space_Time_Anon_Q_Test.npy",
        
         "test_data_path_W" : "/expanse/lustre/projects/sio134/gmooers/CBRAIN-CAM/MAPS/Preprocessed_Data/Trackable_Data/W_Variable/Space_Time_W_Test.npy",
        
        "train_labels": "/Preprocessed_Data/Centered_50_50/Y_Train.npy",
        "test_labels": "/Preprocessed_Data/Centered_50_50/Improved_Y_Test.npy",
        "max_scalar_t": "/expanse/lustre/projects/sio134/gmooers/CBRAIN-CAM/MAPS/Preprocessed_Data/Trackable_Data/T_Variable/Space_Time_Anon_Max_Scalar.npy",
        "min_scalar_t": "/expanse/lustre/projects/sio134/gmooers/CBRAIN-CAM/MAPS/Preprocessed_Data/Trackable_Data/T_Variable/Space_Time_Anon_Min_Scalar.npy",
         "max_scalar_q": "/expanse/lustre/projects/sio134/gmooers/CBRAIN-CAM/MAPS/Preprocessed_Data/Trackable_Data/Q_Variable/Space_Time_Anon_Max_Scalar.npy",
        "min_scalar_q": "/expanse/lustre/projects/sio134/gmooers/CBRAIN-CAM/MAPS/Preprocessed_Data/Trackable_Data/Q_Variable/Space_Time_Anon_Min_Scalar.npy",
         "max_scalar_w": "/expanse/lustre/projects/sio134/gmooers/CBRAIN-CAM/MAPS/Preprocessed_Data/Trackable_Data/W_Variable/Space_Time_Max_Scalar.npy",
        "min_scalar_w": "/expanse/lustre/projects/sio134/gmooers/CBRAIN-CAM/MAPS/Preprocessed_Data/Trackable_Data/W_Variable/Space_Time_Min_Scalar.npy"
    }

It will also take much longer (but less epochs) to train