# SYSNet

## Installation
We recommend Anaconda distribution for python. We describe how you can set up a deep learning, data science environment that you can analyze cosmological datasets. We conda to install packages, and recommend you to use conda over pip. The installation can be divided into three steps:<br/>
1. Install/update Conda <br/>
2. Install Pytorch <br/>
3. Install miscellaneous packages <br/>
4. Install _SYSNet_ <br/>
Throughout this note, we use '$>' to denote the commands that ought to be executed in the terminal.


### 1. Conda
First, you should check whether or not you have Conda installed on your system. Use the commandd `$> which conda` to see if conda is installed in the system. If not, please follow the instructions below to install conda:<br/> 
1.a Visit https://docs.conda.io/projects/conda/en/latest/index.html <br/>(or https://docs.conda.io/en/latest/miniconda.html#linux-installers for linux) <br/>

On Linux, we would execute the following commands:<br/>
1.b `$> wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh` <br/>
1.c `$> sha256sum Miniconda3-latest-Linux-x86_64.sh` <br/>

### 2. Pytorch
We recommend to take a look at the Pytorch [website](pytorch.org) to learn more about the framework. The installation of Pytorch on GPU-available machines is different from CPU-only machines. For instance, to set up on the **Ohio State Cluster (OSC)**, you should execute the next two commands to load the CUDA library. For other supercomputers, e.g., NERSC, you may need to read the documentation to see how you can load the CUDA library. <br/>
2.a `$> module spider cuda # on OSC` <br/>
2.b `$> module load cuda/10.1.168 # on OSC`

For all other devices, i.e., CPU only, you can skip steps 2.a and 2.b, and follow the following steps to create the conda environment (e.g., called _sysnet_):<br/>
2.c `$> conda create -n sysnet python=3.8 scikit-learn`<br/>

Once your environment is created, you must activate it and use the appropriate Pytorch installation command to install Pytorch. For instance: <br/>
2.d `$> conda activate sysnet` <br/>
2.e `$> conda install pytorch torchvision -c pytorch`

**Note**: The last step, for the OSC (see https://www.osc.edu/resources/available_software/software_list/cuda & https://www.osc.edu/supercomputing/batch-processing-at-osc/monitoring-and-managing-your-job), will be like:<br/>
2.e `$> conda install pytorch torchvision cudatoolkit=10.1 -c pytorch # on OSC`

### 3. Miscellaneous
After installation of `Pytorch`, execute the following commands to install the required packages:<br/>
3.a `$> conda install git jupyter ipykernel ipython mpi4py`<br/>
3.b `$> conda install -c conda-forge fitsio healpy absl-py pytables`

Use the following command to add your env kernel (e.g., _sysnet_) to Jupyter:

3.c `$> python -m ipykernel install --user --name=sysnet --display-name "python (sysnet)"`

### 4. SYSNet
Currently _SYSNet_ is under development and we have not made it pip installable. The only way to set it up is to clone the git repository. You should go to a desired directory (e.g. 'test' under the home directory):
```
$> cd $HOME
$> mkdir test
$> cd test
```
Then, you will clone the repo:<br/>
`$> git clone https://github.com/mehdirezaie/sysnetdev.git` <br/>


After cloning, we should pull from master branch to make sure the local repo is updated. To this end, go to the root directory of the sysnet software and pull from origin master: <br/>
```
$> cd sysnetdev
$> git pull origin master
```
Then, insert the absolute path to SYSNET (or sysnetdev directory) to the environment variable `PYTHONPATH`:<br/>
```
$> export PYTHONPATH=/Users/mehdi/test/sysnetdev:${PYTHONPATH}
```
Congratulations! you have successfully set up _SYSNet_. You should add this line to `${HOME}/.bashrc` or `${HOME}/.bash_profile`, so everytime your system reboots, the environment variable is set. This is another hack to use the pipeline in Jupyter Notebook:<br/>
```
import sys
sys.path.insert(0, '/Users/mehdi/test/sysnetdev')
```
#### Test installation
Navigate to the 'scripts' directory, and run the script app.py:<br/>
```
$> cd scripts
$> python app.py -ax {0..17}
```
The last command will train the network for one epoch. Use `python app.py --help` to seek help for the full command line interface.

## Using SYSNet in a Jupyter notebook


In [1]:
import argparse

import sys 
sys.path.append('/Users/mehdi/github/sysnetdev') # add the path to SYSNet
import sysnet

In [2]:
config = sysnet.sources.io.Config('../scripts/config.yaml')

In [4]:
config.restore_model

In [11]:
# modeling (feature selection and regression
pipeline = sysnet.SYSNet(config)
pipeline.run()

logging in ../output/model_test/train.log
# --- inputs params ---
input_path: ../input/eBOSS.ELG.NGC.DR7.table.fits
output_path: ../output/model_test
restore_model: None
batch_size: 4098
nepochs: 2
nchains: 1
find_lr: False
find_structure: False
find_l1: False
do_kfold: False
normalization: z-score
model: dnn
optim: adamw
axes: [0, 1, 2]
do_rfe: False
eta_min: 1e-05
learning_rate: 0.001
nn_structure: [4, 20]
l1_alpha: -1.0
loss: mse
loss_kwargs: {'reduction': 'sum'}
optim_kwargs: {'betas': (0.9, 0.999), 'eps': 1e-08, 'weight_decay': 0.0, 'amsgrad': False}
device: cpu
pipeline initialized in 0.260 s
data loaded in 1.013 sec
# running pipeline ...
# training and evaluation
partition_0 with (4, 20, 3, 1)
base_train_loss: 0.140
base_valid_loss: 0.142
base_test_loss: 0.144
# running training and evaluation with seed: 2664485226


RuntimeError: File doesn't exist ../output/model_test/model_0_2664485226/None.pth.tar

SYSNet (add cross reference) <br/>
    1. init <br/>
        1.a initialize logger <br/>
        1.b initialize loss functiom (L108) <br/>
        1.c initialize model (Regression or Poisson Regression) <br/>
        1.d initialize result collector <br/>
        1.e initialize optimizer (e.g., AdamW or SGD) <br/>
        1.f set the device (CPU or GPU) <br/>
        1.g initialize the data loader (with or without K-fold partitioning) <br/>
        1.h set the paths to the outputs (NN-prediction, metrics, best model weights, etc)
        

2.  make a working Jupyter example
    - we just need to provide 

Next meeting agenda
```
I. Run without L1
 Tune NN structure
 Tune Learning Rate Tuning

II. Run with L1
 Tune L1 Scale
 Run with L1
 Compare with I

III. Run with L1+L2
 Tune L1 and L2 scales
 Run with L1+L2
 compare with I & II
```

## Data Pre-processing