<a href="https://colab.research.google.com/github/spwilson2/sample-generator-ez/blob/main/RAVEv2_neutone.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# RAVE (v2) Training + Export to neutone

## About

This version of the RAVE notebook was created by Naotake Masuda for [neutone](https://neutone.space/) @ Qosmo loosely based on the [original RAVE notebook](https://colab.research.google.com/drive/1aK8K186QegnWVMAhfnFRofk_Jf7BBUxl?usp=sharing&pli=1#scrollTo=fwb2J-Nxb4po) by Antoine Caillon and [RAVE v2 notebook](https://colab.research.google.com/drive/1ih-gv1iHEZNuGhHPvCHrleLNXvooQMvI?usp=sharing) by Moisés Horta.

With this version of the RAVE notebook, you can train RAVE models then export it into .nm file for a timbre transfer effect in neutone vst.

If you have any questions or comments, feel free to post them in [our discord](https://discord.com/invite/zaUbtyxDRZ). Read more about tips on training RAVE models [on our blog](https://neutone.space/2022/07/15/neural-timbre-transfer-effects-for-neutone/) (descriptions about arguments are mostly for RAVE v1).

Also, checkout the [Colab notebook for DDSP+neutone](https://colab.research.google.com/drive/15FuafmtGWEyvTOOQbN1AMIQRhGLy23Pg?usp=sharing). DDSP is more limited in terms of type of sounds it can handle, but it is faster to train.

## CREDITS

RAVE algorithm was developed by Antoine Caillon and Philippe Esling, STMS Laboratory (IRCAM, CNRS, Sorbonne University, Ministry of Culture and Communication) and licensed by IRCAM.

<img src='https://drive.google.com/uc?id=1-1AL6CuNocQnA4wvV3lqsPgU54BKGsQ4' width="200"/>

## Check GPU

Make sure your Colab runtime is using the GPU instead of CPU!

To use GPUs:
`Menu bar: Runtime->Runtime type->GPU`

Now we can check the GPU card with `nvidia-smi`.

- V100/A100: GOOD (Colab pro users only, expensive)
- Tesla T4: OK

In [None]:
!nvidia-smi

Wed Jun  7 10:05:39 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.85.12    Driver Version: 525.85.12    CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  Tesla V100-SXM2...  Off  | 00000000:00:04.0 Off |                    0 |
| N/A   34C    P0    25W / 300W |      0MiB / 16384MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces

## Install requirements



In [None]:
!curl -L https://repo.anaconda.com/miniconda/Miniconda3-py39_4.12.0-Linux-x86_64.sh -o miniconda.sh
!chmod +x miniconda.sh
!sh miniconda.sh -b -p /content/miniconda
# necessary for data loading
!/content/miniconda/bin/conda install -y 'ffmpeg<5'
# installing rave to conda environment via pip
!/content/miniconda/bin/pip install acids-rave

## Training data

Put your audio dataset (a folder containing "wav", "mp3", "opus", "aac", "flac" files) somewhere in your drive and specify the path to it in `input_dataset` in the settings cell. You can navigate through the drive via the folder icon on the left sidebar (Google Drive content should be under `drive/MyDrive`). By right-clicking on a folder and selecting "copy path", you can copy the full path of the folder. Make sure the path doesn't contain any whitespaces!

### Tips

- Data preprocessing maybe required for good results.
    - Gain normalization is necessary if the original data is relatively quiet.
        - A model trained on quiet sounds can behave erratically when it is fed loud sounds as input.
- Gather a good audio dataset.
    - Recording a long solo performance of a certain instrument is effective and often leads to clean results.
        - For example, RAVE.drumkit was trained on a large dataset of many performances using a single drum kit.
    - Recording environment should ideally be similar across the dataset.
    - Some amount of variety in the data is good, but too much variety brings poor results.

## Settings

- save_directory
    - This is the directory where all the model checkpoints and logs are saved.
    - You can also set this to somewhere on your Google Drive (ex. /content/drive/MyDrive) so that it is never lost.
        - The log files can become very large (>5 GBs) so the storage limit might be a problem.
    
- run_name
    - The logs and checkpoints are saved under `[save_directory]/runs/[run_name]/`
- architecture: "v2", "v1", "discrete", "onnx", "raspberry"
    - v2 corresponds to the new RAVE model, while v1
    - See [original repository](https://github.com/acids-ircam/RAVE) for more details
- regularization_type: "default", "wasserstein", "spherical"
    - different regularization techniques for the v2 model
    - See [original repository](https://github.com/acids-ircam/RAVE) for more details
- sampling_rate: sampling rate of the model, typically set to 48kHz or 44.1kHz. The plugin will resample when the model is used in other sampling rates.
- no_latency_mode: When turned off, output quality is improved but latency of about 0.5s or more will be introduced, which may be undesirable for a vst plugin.
- validation_every: This sets the interval for saving model checkpoints.

In [None]:
import os
input_dataset = "/content/drive/MyDrive/AUDIO_FOLDER"  #@param {type:"string"}
if ' ' in input_dataset:
    print('WARNING: whitespaces not allowed in input dataset path')
    # https://github.com/acids-ircam/RAVE/issues/190
save_directory = "/content/RAVEruns/"  #@param {type:"string"}
run_name = "testrun"  #@param {type:"string"}
# input_dataset = "/content/drive/MyDrive/shakuhachi"  #@param {type:"string"}
sampling_rate = 48000  #@param {type:"integer"}
no_latency_mode = True  #@param {type:"boolean"}
architecture = "v2" #@param ["v2", "v1", "discrete", "onnx", "raspberry" ]
regularization_type = "wasserstein" #@param ["default", "wasserstein", "spherical"]
# regularization_strength = 0.01  #@param {type:"slider", min:0.01, max:1, step:0.001}
validation_every = 15000  #@param {type:"integer"}

os.makedirs(save_directory, exist_ok=True)
%cd $save_directory
run_name = run_name.replace(" ", "_").lower()
preprocess_dir = os.path.join('/content/preprocessed', run_name)
os.makedirs(preprocess_dir, exist_ok=True)


/content/RAVEruns


## Start tensorboard

Upon first running this cell, you might not see any result. After running the training cell, you can come back to this cell to see the training progress by hitting the refresh button in the top right corner.

### Audio

You can listen to the reconstruction results in the audio tab. The audio consists of the original segment followed by the model reconstruction. The model reconstruction should sound like the original.

In [None]:
# Setup tensorboard
%load_ext tensorboard
%tensorboard --logdir . --bind_all

## Training

Training takes about a day for the first stage (which is 1 million steps, where number of steps = number of epochs * number of batches in the dataset), and 3 days or more for the second stage (depends on your GPU). You can cut off training anytime if the reconstruction results sound good enough for you.

First, every audio file present in your input_dataset folder are resampled to the target sampling rate and compiled in a database file under `content/preprocessed/[run_name]/` with `rave preprocess` command. Then the rave

### Colab limitations

Since there are limits to how long you can keep a Colab notebook open, this cell will be disconnected during training. If so, you can run this cell again (or if that doesn't work, start from the top cell) to resume training from where it was before getting cut off.
There are [some tricks](https://stackoverflow.com/questions/57113226/how-to-prevent-google-colab-from-disconnecting) to prevent disconnection.

On free tiers for Colab, you might hit a limit of GPU time per month during training. Colab pro (\$9.99/mo) or pro+ (\$49.99) may be required for training more models.

### Train from scratch

In [None]:
!/content/miniconda/bin/rave preprocess --input_path $input_dataset --output_path $preprocess_dir --sampling_rate $sampling_rate
train_arg = f"""--config {architecture} --config {regularization_type} \
--db_path {preprocess_dir} --name {run_name} --val_every {validation_every} \
--override SAMPLING_RATE={sampling_rate}"""
if no_latency_mode:
    train_arg += " --config causal"
!/content/miniconda/bin/rave train $train_arg

### ...or resume training

In [None]:
from google.colab import drive
drive.mount('/content/drive')

#### Find Last run

Use the next cell to find the last saved checkpoint or find it manually from the left sidebar.


In [None]:
import os, glob, itertools, time
used_save_dir = "/content/RAVEruns" #@param {type:"string"}
used_run_name = "testrun" #@param {type:"string"}
runs_dir = os.path.join(used_save_dir, 'runs')
ckpts = [glob.glob(os.path.join(runs_dir, d, '**', '*.ckpt'), recursive=True) for d in os.listdir(runs_dir) if d.startswith(used_run_name)]
ckpts = list(itertools.chain.from_iterable(ckpts))
if len(ckpts)>0:
    latest_ckpt = max(ckpts, key=os.path.getctime)
    print(f'Latest ckpt is: {latest_ckpt}')
    print(f'at {time.ctime(os.path.getctime(latest_ckpt))} (UTC)')
else:
    print('No checkpoint found')

#### Resume training

Make sure you run the settings cell (above) with the same settings used during previous training. Fill in `resume_ckpt` with the path to the checkpoint to resume from.

In [None]:
resume_ckpt = "PATH/TO/CHECKPOINT" #@param {type:"string"}
!/content/miniconda/bin/rave preprocess --input_path $input_dataset --output_path $preprocess_dir --sampling_rate $sampling_rate
train_arg = f"""--config {architecture} --config {regularization_type} \
--db_path {preprocess_dir} --name {run_name} --val_every {validation_every} --ckpt {resume_ckpt} \
--override SAMPLING_RATE={sampling_rate}"""
if no_latency_mode:
    train_arg += " --config causal"
!/content/miniconda/bin/rave train $train_arg

## Export to neutone

Once you're done training, you can export to torchscript (.ts) then neutone model format (.nm).
If you're growing impatient or don't have the time, you can pause training and export mid-training.

`final_res_folder`: folder containing model versions and config.gin file (ex. `/content/drive/MyDrive/RAVEruns/runs/RUNNAME_2b26dfad3c`)

In [None]:
# install neutone to Colab runtime
!pip install neutone_sdk torch==1.13.1

In [None]:
# export to torchscript first
final_res_folder = "/content/RAVEruns/runs/testrun_cfefbe3eab" #@param {type:"string"}
!/content/miniconda/bin/rave export --run $final_res_folder --streaming true

### Define RAVEModel Wrapper

Edit in information about your model in `get_model_name`, `get_model_authors`, etc.

In [None]:
from pathlib import Path
from typing import Dict, List

import torch
from torch import Tensor
from neutone_sdk import WaveformToWaveformBase, NeutoneParameter
from neutone_sdk.utils import load_neutone_model, save_neutone_model


class RAVEModelWrapper(WaveformToWaveformBase):
    def get_model_name(self) -> str:
        return "RAVE.example"  # <-EDIT THIS

    def get_model_authors(self) -> List[str]:
        return ["Author Name"]  # <-EDIT THIS

    def get_model_short_description(self) -> str:
        return "RAVE model trained on xxx sounds."  # <-EDIT THIS

    def get_model_long_description(self) -> str:
        return (  # <-EDIT THIS
            "RAVE timbre transfer model trained on xxx sounds. Useful for xxx sounds."
        )

    def get_technical_description(self) -> str:
        return "RAVE model proposed by Caillon, Antoine et al."

    def get_technical_links(self) -> Dict[str, str]:
        return {
            "Paper": "https://arxiv.org/abs/2111.05011",
            "Code": "https://github.com/acids-ircam/RAVE",
        }

    def get_tags(self) -> List[str]:
        return ["timbre transfer", "RAVE"]

    def get_model_version(self) -> str:
        return "1.0.0"

    def is_experimental(self) -> bool:
        """
        set to True for models in experimental stage
        (status shown on the website)
        """
        return True  # <-EDIT THIS

    def get_neutone_parameters(self) -> List[NeutoneParameter]:
        return [
            NeutoneParameter(
                name="Chaos", description="Magnitude of latent noise", default_value=0.0
            ),
            NeutoneParameter(
                name="Z edit index",
                description="Index of latent dimension to edit",
                default_value=0.0,
            ),
            NeutoneParameter(
                name="Z scale",
                description="Scale of latent variable",
                default_value=0.5,
            ),
            NeutoneParameter(
                name="Z offset",
                description="Offset of latent variable",
                default_value=0.5,
            ),
        ]

    def is_input_mono(self) -> bool:
        return False  # <-Set to False for stereo (each channel processed separately)

    def is_output_mono(self) -> bool:
        return False  # <-Set to False for stereo (each channel processed separately)

    def get_native_sample_rates(self) -> List[int]:
        return [48000]  # <-EDIT THIS

    def get_native_buffer_sizes(self) -> List[int]:
        return [2048]

    def get_citation(self) -> str:
        return """Caillon, A., & Esling, P. (2021). RAVE: A variational autoencoder for fast and high-quality neural audio synthesis. arXiv preprint arXiv:2111.05011."""

    @torch.no_grad()
    def do_forward_pass(self, x: Tensor, params: Dict[str, Tensor]) -> Tensor:
        # parameters edit the latent variable
        z = self.model.encode(x.unsqueeze(1))
        noise_amp = params["Chaos"]
        z = torch.randn_like(z) * noise_amp + z
        # add offset / scale
        idx_z = int(
            torch.clamp(params["Z edit index"], min=0.0, max=0.99)
            * self.model.latent_size
        )
        z_scale = params["Z scale"] * 2  # 0~1 -> 0~2
        z_offset = params["Z offset"] * 2 - 1  # 0~1 -> -1~1
        z[:, idx_z] = z[:, idx_z] * z_scale + z_offset
        out = self.model.decode(z)
        out = out.squeeze(1)
        return out  # (n_channels=1, sample_size)

In [None]:
import glob
ts_files = glob.glob(os.path.join(final_res_folder, '*.ts'))
ts_file = max(ts_files, key=os.path.getctime)
# Load model and wrap
model = torch.jit.load(ts_file)
wrapper = RAVEModelWrapper(model)
audio_sample_pairs=None

In [None]:
#@title Audio sample paths for example input
#@markdown These will be used as an example input/output pair to be saved with the model.
#@markdown This doesn't matter if you're using the models locally.
#@markdown Leave these empty to use default.

from neutone_sdk.audio import (
    AudioSample,
    AudioSamplePair,
    render_audio_sample,
)
import torchaudio

example_input1 = '' #@param {type:"string"}
example_input2 = '' #@param {type:"string"}
example_inputs = [example_input1, example_input2]

if example_input1 == '' and example_input2 == '':
    audio_sample_pairs=None
else:
    audio_sample_pairs=[]
    for sound_path in example_inputs:
        wave, sr = torchaudio.load(sound_path)
        wave = wave.mean(0, keepdim=True)
        input_sample = AudioSample(wave, sr)
        rendered_sample = render_audio_sample(wrapper, input_sample)
        audio_sample_pairs.append(AudioSamplePair(input_sample, rendered_sample))

In [None]:
#@title Save neutone model
neutone_save_dir = '/content/drive/MyDrive/neutone/' #@param {type:"string"}
save_neutone_model(
        wrapper, Path(neutone_save_dir) / run_name, freeze=False, dump_samples=True, submission=True, audio_sample_pairs=audio_sample_pairs
)

## Use model in neutone!

You can download the .nm file from Google Drive (under `{neutone_save_dir}`) and load it in neutone (via the "load your own" button at the top of the model selection screen)

If you're satisfied with your model, consider submitting to us via Github (link is in the output of save_neutone_model) or showing it off in the [neutone discord](https://discord.com/invite/zaUbtyxDRZ)!
