# NNSVS vs. Sinsy

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/r9y9/nnsvs/blob/master/notebooks/nnsvs_vs_sinsy.ipynb)


This notebooks show audio samples for comparisions of NNSVS and [Sinsy](https://www.sinsy.jp/).

## Models

- **sinsy_f00001j**:  Sinsy's HMM-based SVS system
- **sinsy_f00001j_dnn_beta4**:  Sinsy's DNN-based SVS system.
- **nnsvs_yoko**: NNSVS-based system trained on the publicly available version of nit-song070 database. Specifically, we used 29 songs (out of 31) for training. Note that pre-trained models based on kiritan_singing database (49 songs for trainnig) were used to initialize model parameters. Therefore, the system in fact used 49 + 29 songs in total for training.

## Notes

- **Trainig data**: Accorindg to the [latest sinsy's paper](https://arxiv.org/abs/2108.02776), the authors seems to use 60 songs (out of 70) for training. Since the publically available version of the nit-song070 dataset only contains a subset of the full dataset, we are unable to train NNSVS models with the same training data condition.
- **Date**: Sinsy samples were generated at 2022/03/27 using https://www.sinsy.jp/. 

## Preparation

In [None]:
%%capture
try:
    import nnsvs
except ImportError:
    ! pip install git+https://github.com/r9y9/nnsvs

In [None]:
%pylab inline
%load_ext autoreload
%autoreload
import IPython
from IPython.display import Audio
from scipy.io import wavfile
import pysinsy
from nnmnkwii.io import hts
from urllib.request import urlretrieve
import tempfile

In [None]:
from nnsvs.pretrained import create_svs_engine
import nnsvs

In [None]:
def svs_display(model, xml_file):
    engine = create_svs_engine(model)
    contexts = pysinsy.extract_fullcontext(xml_file)
    labels = hts.HTSLabelFile.create_from_contexts(contexts)
    wav, sr = engine.svs(labels)
    IPython.display.display(Audio(wav, rate=sr))
    
def wav_display(url):
    with tempfile.NamedTemporaryFile(suffix=".wav") as f:
        urlretrieve(url, f.name) 
        sr, wav = wavfile.read(f.name)
    IPython.display.display(Audio(wav, rate=sr))    

## Sample 1: げんこつ山のタヌキさん

In [None]:
print("sinsy_f00001j")
wav_display("https://www.dropbox.com/s/qq6w7bbcc5ikcdf/sinsy_song070_f00001j_063.wav?dl=1")
print("nnsvs_yoko")
svs_display("r9y9/yoko_latest", nnsvs.util.example_xml_file("song070_f00001_063"))
print("sinsy_f00001j_dnn_beta4")
wav_display("https://www.dropbox.com/s/4epe08wqebyuh4g/sinsy_song070_f00001j_dnn_beta4_063.wav?dl=1")

## Sample 2: Get Over

In [None]:
print("sinsy_f00001j")
wav_display("https://www.dropbox.com/s/kam9kju97umi6li/sinsy_f00001j_get_over.wav?dl=1")
print("nnsvs_yoko")
svs_display("r9y9/yoko_latest", nnsvs.util.example_xml_file("get_over"))
print("sinsy_f00001j_dnn_beta4")
wav_display("https://www.dropbox.com/s/7st0acvguvbdoaj/sinsy_f00001j_dnn_beta4_get_over.wav?dl=1")

## Sample 3: 雪

In [None]:
print("sinsy_f00001j")
wav_display("https://www.dropbox.com/s/ho5xgkil8r3f3ed/sinsy_yuki_f00001j.wav?dl=1")
print("nnsvs_yoko")
svs_display("r9y9/yoko_latest", nnsvs.util.example_xml_file("yuki"))
print("sinsy_f00001j_dnn_beta4")
wav_display("https://www.dropbox.com/s/jo2ool0nytzxln2/sinsy_yuki_f00001j_dnn_beta4.wav?dl=1")