# 02. Model Training

The model is trained based on the instructions on https://github.com/acids-ircam/RAVE (RAVE: A variational autoencoder for fast and high-quality neural audio synthesis by Antoine Caillon and Philippe Esling).

Training a RAVE model usually involves 3 steps: dataset preparation, model training, and model export.

It is recommended to train the model through Terminal directly and using TensorBoard to monitor Traning.

#### RAVE Parameters

After installation, using `rave train --helpfull` to check all the parameters

Key Parameters: 
- batch: Batch size (default: '8')
- channels: number of audio channels (default: '0')
- ckpt: Path to previous checkpoint of the run
- config: RAVE configuration to use (default: "['v2.gin']")
- db_path: Preprocessed dataset path
- out_path: Output folder (default: 'runs/')
- gpu: GPU to use
- max_steps: Maximum number of training steps (default: '6000000')
- n_signal: Number of audio samples to use during training (default: '131072')
- name: Name of the run
- save_every: save every n steps (default: '500000')
- val_every: Checkpoint model every n steps (default: '10000')
- workers: Number of workers to spawn for dataset loading (default: '8')

#### Installation

In [None]:
# pip3 install torch torchvision torchaudio
# pip install wget
# pip install acids-rave
# conda install ffmpeg

#### TensorBoard - Monitor Training

Re-visit TensorBoard to access all the metrics is not possible if you do not have all the checkpoints (which have been removed from this repository due to the GitHub file size limitation).

A separate local repository has been zipped and uploaded,  which can be used to access all the metrics of the training. Please download using the link provided: 
- piano model:
- techno model:

In [1]:
import torch
from torch.utils.tensorboard import SummaryWriter
writer = SummaryWriter()

### 1. Dataset Preparation

Prepare the dataset using the following command:

`rave preprocess --input_path $dataset --output_path $preprocessed_dataset --channels $channels`

- input_path: the location of your preprocessed dataset (e.g., sounds/preprocessed_audio/piano-dataset.mp3)
- output_path: the destination path where you want to save the dataset after it has been prepared for training
- channels: the number of audio channels in the input files. 
  - It is recommended to use mono (1 channel) for compatibility, as the current version of RAVE has issues processing multichannel audio and may not be able to export a stereo version successfully.
  - Processing multichannel audio can significantly increase training time.

In [10]:
# my example on mac
! rave preprocess --input_path '/Users/irenex/Documents/GitHub/ai4media-project/sounds/preprocessed_audio/piano-dataset.mp3' --output_path '/Users/irenex/Documents/GitHub/ai4media-project/sounds/rave_processed_audio/piano' --channels 1

### 2. Training

Start training the model using the following command:

`rave train --config $config --db_path /dataset/path --out_path /model/out --name $name_of_your_model --channels X`

[Key parameters]
- config: model architecture selected for training, e.g., v2
  - RAVEv2 has many different configurations. The most common used one is v2 at the moment, which is an mproved continuous model (faster, higher quality)
- dp_path: the location of rave-processed dataset (should be the same as the output_path in 1. Dataset Prepartion, unless you moved directory 
- out_path: the destination path where you want to save the model and its checkpoints 
- name: the name of your model
- channels: the number of input audio channels 

[Optional parameters]
- val_every: checkpoint model every n steps (default is 10000)
- batch: Batch size (default is 8)
  - though it is possible to increase batch size on a powerful GPU, it does not automatically make a better model, the optimal batch size needs be determined with experiments, as different optimal batch size is needed for different traning stage (smaller may be better for the first traning stage, and larger may be  better for the second training state)
  -  it is suggested to using the default setting of 8 to ensure a rather good convergence (at least for first time users)
  - there are lots of discussions reagarding batch size
  - https://github.com/acids-ircam/RAVE/issues/234
  - https://github.com/acids-ircam/RAVE/issues/22 
  - https://discord.com/channels/987249093124452400/1120704270812053595 
- gpu: explicitly set the gpu that will be used for this training (e.g., gpu -1 means all gpu; gpu 0 means using the first gpu)
- workers: define the number of workers to spawn for dataset loading (default is 8, can be changed to other number depending on the number of cpus on your machine - doesn't really make a difference in training speed in my case)

In my case, I used Nvidia GeForce RTX 4080 to train the model, though RAVE is already faster in training compared to other models, it still takes at least 3 days on a GPU to train a complete model (normally a good model needs at least 3 million steps, according to Discord); training on cpu is not recomended.

In [None]:
# my example on wins
! rave train --config v2 --db_path '/Users/irenex/Documents/GitHub/ai4media-project/sounds/rave_processed_audio/piano' --out_path '/Users/irenex/Documents/GitHub/ai4media-project/rave_ckpt/piano' --name rav_piano --channels 1 --workers 16 --gpu 0

### 2. Resume Training

RAVE automatically save checkpoints for every 10000 steps (default setting), training can be resumed from any selected checkpoints.

Resume training the model using the following command:

`rave train --config $config --db_path /dataset/path --name $name_of_your_model --ckpt /dataset/ckpt --out_path /model/out --channels X`

In [None]:
# my example on wins
! rave train --config v2 --db_path '/Users/irenex/Documents/GitHub/ai4media-project/sounds/rave_processed_audio/piano' --name rav_piano --ckpt '/Users/irenex/Documents/GitHub/ai4media-project/rave_ckpt/rav_piano_e18d54798e/version_0/checkpoints' --out_path '/Users/irenex/Documents/GitHub/ai4media-project/rave_ckpt/piano' --channels 1 --workers 16 --gpu 0

### 3. Export

Once trained, the model can be exported to a torchscript (.ts) file using the following command:

`rave export --run /path/to/your/run (--streaming)`

*Setting the --streaming flag will enable cached convolutions, making the model compatible with realtime processing

Different checkpoints can export different models, which can be useful in model comparison (in different steps)

In [None]:
# my example on Mac
! rave export --run /Users/irenex/Desktop/ravmodel/rav_techno_e18d54798e/epoch4059/version_2/ --streaming