# NNSVS

事前に [研究者向け東北きりたん歌唱データベース ログインページ](https://zunko.jp/kiridev/login.php) から kiritan_singing.zip をダウンロードし，このノートブックと同階層に展開しておく．

## Install requirements

In [None]:
! git clone https://github.com/r9y9/hts_engine_API
! cd hts_engine_API/src && python3 waf configure --prefix=/usr/ && python3 waf build > /dev/null 2>&1 && python3 waf install
! git clone https://github.com/r9y9/sinsy
! cd sinsy/src/ && mkdir -p build && cd build && cmake -DCMAKE_BUILD_TYPE=Release -DBUILD_SHARED_LIBS=ON -DCMAKE_INSTALL_PREFIX=/usr/ .. && make -j > /dev/null 2>&1 && make install

In [None]:
! git clone https://github.com/r9y9/pysinsy
! cd pysinsy && export SINSY_INSTALL_PREFIX=/usr/ && pip3 install .
! git clone https://github.com/r9y9/nnmnkwii
! cd nnmnkwii && pip3 install .
! git clone https://github.com/r9y9/nnsvs
! cd nnsvs && pip3 install .

## Setups

In [None]:
import pysptk
import pyworld

sample_rate = 48000
frame_period = 5
fftlen = pyworld.get_cheaptrick_fft_size(sample_rate)
alpha = pysptk.util.mcepalpha(sample_rate)
hop_length = int(0.001 * frame_period * sample_rate)

## Setup models

In [None]:
KIRITAN_SINGING_00_SVS_WORLD_ROOT="nnsvs/egs/kiritan_singing/00-svs-world/"

In [None]:
! sed -i 's#[$]HOME\/data#\/workspace#g' $KIRITAN_SINGING_00_SVS_WORLD_ROOT/run.sh

## Data download

In [None]:
! cd $KIRITAN_SINGING_00_SVS_WORLD_ROOT && ./run.sh --stage -1 --stop-stage -1

## Data preparation

In [None]:
! cd $KIRITAN_SINGING_00_SVS_WORLD_ROOT && rm -rf downloads/kiritan_singing/kiritan_singing_extra
! cd $KIRITAN_SINGING_00_SVS_WORLD_ROOT/downloads/kiritan_singing && git clone https://github.com/r9y9/kiritan_singing_extra

In [None]:
! mkdir -p /usr/local/lib/sinsy
! ln -s /usr/lib/sinsy/dic /usr/local/lib/sinsy/dic

In [None]:
! cd $KIRITAN_SINGING_00_SVS_WORLD_ROOT && ./run.sh --stage 0 --stop-stage 0

## Feature extraction

In [None]:
! cd $KIRITAN_SINGING_00_SVS_WORLD_ROOT && ./run.sh --stage 1 --stop-stage 1

## Training timelag/duration/acoustic models

### - Timelag model

In [None]:
! cd $KIRITAN_SINGING_00_SVS_WORLD_ROOT && ./run.sh --stage 2 --stop-stage 2

### - Phoneme duration model

In [None]:
! cd $KIRITAN_SINGING_00_SVS_WORLD_ROOT && ./run.sh --stage 3 --stop-stage 3

### - Acoustic model

In [None]:
! cd $KIRITAN_SINGING_00_SVS_WORLD_ROOT && ./run.sh --stage 4 --stop-stage 4

## Synthesis

### - Generate features from timelag/duration/acoustic models

In [None]:
! cd $KIRITAN_SINGING_00_SVS_WORLD_ROOT && ./run.sh --stage 5 --stop-stage 5

### - Synthesize waveforms

In [None]:
! cd $KIRITAN_SINGING_00_SVS_WORLD_ROOT && ./run.sh --stage 6 --stop-stage 6

## Generated samples

In [None]:
import IPython
from IPython.display import Audio
from glob import glob
from os.path import join

sample_rate = 48000
synthesized_wav_paths = sorted(glob(join(KIRITAN_SINGING_00_SVS_WORLD_ROOT, "exp/kiritan/synthesis/**/label_phone_score/*.wav"),  recursive=True))

for wav_path in synthesized_wav_paths:
    print(wav_path)
    IPython.display.display(Audio(wav_path, rate=sample_rate))