# Pytorch-template

Pytorch 코드를 쉽게 짤 수 있도록, 기본 템플릿을 제공하는 Github 코드(이미지 클릭)  
    
[![pytorch-templage](https://user-images.githubusercontent.com/13328380/54335673-12111600-466d-11e9-866a-3b52dc7a125b.png)](https://github.com/victoresque/pytorch-template)

# Folder Structure

```bash
pytorch-template/
│
├── train.py - main script to start training
├── test.py - evaluation of trained model
├── config.json - config file
│
├── base/ - abstract base classes
│   ├── base_data_loader.py - abstract base class for data loaders
│   ├── base_model.py - abstract base class for models
│   └── base_trainer.py - abstract base class for trainers
│
├── data_loader/ - anything about data loading goes here
│   └── data_loaders.py
│
├── data/ - default directory for storing input data
│
├── model/ - models, losses, and metrics
│   ├── loss.py
│   ├── metric.py
│   └── model.py
│
├── saved/ - default checkpoints folder
│   └── runs/ - default logdir for tensorboardX
│
├── trainer/ - trainers
│   └── trainer.py
│
└── utils/
    ├── util.py
    ├── logger.py - class for train logging
    ├── visualization.py - class for tensorboardX visualization support
    └── ...
```

# Just getting started

아래와 같은 명령어로, 작동을 시켜보자.  

## Train

```bash
> python3 train.py -c config.json
```

![getting_started](https://user-images.githubusercontent.com/13328380/54335901-bdba6600-466d-11e9-8607-c4c666e70727.png)  

<br/>

## Train Resume

```bash
> python3 train.py --resume <path/to/checkpoint>
```
     
![resume](https://user-images.githubusercontent.com/13328380/54336176-b051ab80-466e-11e9-829f-5e95f01ceebf.png)
     

<br/>

## Using Multi GPU

```bash
> python3 train.py --device 2,3 -c config.json

OR

> CUDA_VISIBLE_DEVICES=2,3 python train.py -c config.py
```

<br/>

## Test

```bash
> python3 test.py --resume <path/to/checkpoint>
```

![test](https://user-images.githubusercontent.com/13328380/54336271-f3ac1a00-466e-11e9-8779-e012b5c73901.png)


# config.json

`config.json`을 까보자!

```json
{
  "name": "Mnist_LeNet",        // training session name
  "n_gpu": 1,                   // number of GPUs to use for training.
  
  "arch": {
    "type": "MnistModel",       // name of model architecture to train
    "args": {

    }                
  },
  "data_loader": {
    "type": "MnistDataLoader",         // selecting data loader
    "args":{
      "data_dir": "data/",             // dataset path
      "batch_size": 64,                // batch size
      "shuffle": true,                 // shuffle training data before splitting
      "validation_split": 0.1          // validation data ratio
      "num_workers": 2,                // number of cpu processes to be used for data loading
    }
  },
  "optimizer": {
    "type": "Adam",
    "args":{
      "lr": 0.001,                     // learning rate
      "weight_decay": 0,               // (optional) weight decay
      "amsgrad": true
    }
  },
  "loss": "nll_loss",                  // loss
  "metrics": [
    "my_metric", "my_metric2"          // list of metrics to evaluate
  ],                         
  "lr_scheduler": {
    "type": "StepLR",                   // learning rate scheduler
    "args":{
      "step_size": 50,          
      "gamma": 0.1
    }
  },
  "trainer": {
    "epochs": 100,                     // number of training epochs
    "save_dir": "saved/",              // checkpoints are saved in save_dir/name
    "save_freq": 1,                    // save checkpoints every save_freq epochs
    "verbosity": 2,                    // 0: quiet, 1: per epoch, 2: full
  
    "monitor": "min val_loss"          // mode and metric for model performance monitoring. set 'off' to disable.
    "early_stop": 10	                 // number of epochs to wait before early stop. set 0 to disable.
  
    "tensorboardX": true,              // enable tensorboardX visualization support
    "log_dir": "saved/runs"            // directory to save log files for visualization
  }
}
```

# Basic Parsing concept for `config.json`

```python
def get_instance(module, name, config, *args):
    print(config[name]['type'])
    return getattr(module, config[name]['type'])(*args, **config[name]['args'])

model = get_instance(module_arch, 'arch', config)
```  
    
`get_instance` 메소드는 `model`, `key`, `config file`, `*args`순으로 인자를 갖는다.  
따라서 해당 `*.json` 파일의 `key-value`값을 이용하여 model의 인자를 전달하여 instance값을 가져올 수 있다.  
    
<br/>

따라서, 기본적인 parsing 구조는 다음과 같다.
```json
"key" : {
    "type": "class name",
    "args": {
        "parameter1" : value1,
        "parameter2" : value2,
        ...
        "parameterN" : valueN
        }
    }
```
   
- key : 특정 요소(데이터 로더, 모델, 옵티마이저 등등)
- type : class name과 동일해야한다. (해당 모델을 찾아서 가져온다)
- args : 해당 class에 들어가는 argument를 의미한다.

## Session Name

```json
{
  "name": "Mnist_LeNet",        // training session name
  "n_gpu": 1,                   // number of GPUs to use for training.
}
```

Train Session 이름이다.   
해당 이름은 아무거나 들어가도 큰 상관이 없다.  
    
하지만 `n_gpu`의 경우에는 gpu의 개수는 필수적으로 잘 적어야하며,   
0인 경우에는 cpu기반으로 작동한다.

## Dataloader

```json
{
"data_loader": {
    "type": "MnistDataLoader",         // selecting data loader
    "args":{
      "data_dir": "data/",             // dataset path
      "batch_size": 64,                // batch size
      "shuffle": true,                 // shuffle training data before splitting
      "validation_split": 0.1          // validation data ratio
      "num_workers": 2,                // number of cpu processes to be used for data loading
    }
}
```

## Optimizer

```json
{
  "optimizer": {
    "type": "Adam",
    "args":{
      "lr": 0.001,                     // learning rate
      "weight_decay": 0,               // (optional) weight decay
      "amsgrad": true
    }
  }
}
```

## Loss

```json
{
"loss": "nll_loss",                  // loss
  "metrics": [
    "my_metric", "my_metric2"          // list of metrics to evaluate
  ],                         
}
```

## lr scheduler

```json
{  
"lr_scheduler": {
    "type": "StepLR",                   // learning rate scheduler
    "args":{
      "step_size": 50,          
      "gamma": 0.1
    }
  }
}
```

## Trainer

```json
{
"trainer": {
    "epochs": 100,                     // number of training epochs
    "save_dir": "saved/",              // checkpoints are saved in save_dir/name
    "save_freq": 1,                    // save checkpoints every save_freq epochs
    "verbosity": 2,                    // 0: quiet, 1: per epoch, 2: full
  
    "monitor": "min val_loss"          // mode and metric for model performance monitoring. set 'off' to disable.
    "early_stop": 10	                 // number of epochs to wait before early stop. set 0 to disable.
  
    "tensorboardX": true,              // enable tensorboardX visualization support
    "log_dir": "saved/runs"            // directory to save log files for visualization
  }
}
```