# NNSVS demos 

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/r9y9/nnsvs/blob/master/notebooks/Demos.ipynb)


Singing voice synthesis (SVS) demo using nnsvs. All the models were trained using https://github.com/r9y9/nnsvs/. Recipes to reproduce experiments are included in the repository.

## Preparation

###  Download music xml files

In [None]:
! [ ! -e kiritan_singing ] && git clone -q https://github.com/r9y9/kiritan_singing

In [None]:
%%capture
try:
    import nnsvs
except ImportError:
    ! pip install git+https://github.com/r9y9/nnsvs

### Imports

In [None]:
%pylab inline
%load_ext autoreload
%autoreload
import IPython
from IPython.display import Audio
import os
import numpy as np
import torch
import librosa
import librosa.display
import pysinsy
from nnmnkwii.io import hts
from nnsvs.svs import SPSVS

## kiritan_singing

https://zunko.jp/kiridev/login.php 

### Setup pre-trained model

In [None]:
model_dir = "20220321_kiritan_timelag_mdn_duration_mdn_acoustic_resf0conv"
! [ ! -e {model_dir} ] && curl -q -LO https://www.dropbox.com/s/87rqto5l5rpav2n/{model_dir}.zip
! unzip -qq -o {model_dir}.zip

### Run synthesis

In [None]:
# NOTE: 01.xml and 05.xml were not included in the training data
contexts = pysinsy.extract_fullcontext("kiritan_singing/musicxml/05.xml")
labels = hts.HTSLabelFile.create_from_contexts(contexts)

engine = SPSVS(model_dir)
wav, sr = engine.svs(labels)

# Trim long silence part of the beginning
# NOTE: this is not generally needed
wav = librosa.effects.trim(wav.astype(np.float64), top_db=40)[0]

Audio(wav, rate=sr)

In [None]:
fig, ax = plt.subplots(figsize=(14,2))
librosa.display.waveshow(wav.astype(np.float32), sr, ax=ax)
ax.set_xlabel("Time [sec]")
ax.set_ylabel("Amplitude")
plt.tight_layout()

## nit-song070

http://hts.sp.nitech.ac.jp/archives/2.3/HTS-demo_NIT-SONG070-F001.tar.bz2

### Setup pre-trained model

In [None]:
model_dir = "20220322_yoko_timelag_mdn_duration_mdn_acoustic_resf0conv"

! [ ! -e {model_dir} ] && curl -q -LO https://www.dropbox.com/s/l1wo9dewfuk3s1v/{model_dir}.zip
! unzip -qq -o {model_dir}.zip

### Run synthesis

In [None]:
contexts = pysinsy.extract_fullcontext("kiritan_singing/musicxml/05.xml")
labels = hts.HTSLabelFile.create_from_contexts(contexts)

engine = SPSVS(model_dir)
wav, sr = engine.svs(labels)

wav = librosa.effects.trim(wav.astype(np.float64), top_db=40)[0]

Audio(wav, rate=sr)

In [None]:
fig, ax = plt.subplots(figsize=(14,2))
librosa.display.waveshow(wav.astype(np.float32), sr, ax=ax)
ax.set_xlabel("Time [sec]")
ax.set_ylabel("Amplitude")
plt.tight_layout()

## References

- nnsvs: https://github.com/r9y9/nnsvs