nsf-torch

This is an unofficial implementation of [neural source-filter model]¹ proposed by Wang et al. The original implementation is [project-CURRENNT]². The model takes sequences of fundamental frequency also called F0 and sequences of context vectors which appears in the WaveNet model.

Requirments

PyTorch
LibROSA (optional: to load audio files)
Kaldi (optional: to extract fundamental frequency and acoustic features)

Usage

This section describes how to train the model and generate waveforms by the model.

Training

The model (NSFModel) and two loss functions (spectral_amplitude_distance and phase_distance) are defined in model.py and losses.py respectively. The following piece of code is a minimal (but meaningless) example of a training procedure.

import torch

from model import NSFModel
from losses import spectral_amplitude_distance, phase_distance

# Constants
sampling_rate = 16000
waveform_length = 16000 # per single sample
context_length = 80     # the number of context vectors per single sample
input_dim = 81          # F0 and some acoustic features (e.g. MFC)
batch_size = 8

# Initializing the model
model = NSFModel(input_dim, waveform_length)

# Definig the loss functions
dft_bins, frame_length, frame_shift = 512, 320, 80
Ls = spectral_amplitude_distance(dft_bins, frame_length, frame_shift)
Lp = phase_distance(dft_bins, frame_length, frame_shift)
criterion = lambda y_pred, y: Ls(y_pred, y) + Lp(y_pred, y)

# Optimizer
optimizer = torch.optim.Adam(model.parameters(), lr=1e-4)

# Simple data generator
def generate_data():
    for batch in range(16):
        # preparing the F0 and context vectors
        # to extract F0 from real data, using `compute-kaldi-pitch-feats` might by helpful
        # to extract context vectors (e.g. MFCC), please refer to https://kaldi-asr.org/doc/feat.html
        # the followings are just showing the shapes of input tensors
        F0 = torch.Tensor(batch_size, context_length, 1)
        c = torch.Tensor(batch_size, context_length, input_dim - 1)
        # Note: in the original implementation, F0 comes last but this follows the paper
        x = torch.cat((F0, c), -1)
        # preparing the natural waveforms: the following is just showing the shape
        y = torch.Tensor(batch_size, waveform_length)
        yield x, y

# Training procedure
for epoch in range(100):
    for step, (x, y) in enumerate(generate_data()):
        # Make a predicted waveform
        # passing the natural waveform (y) to estimate the best initial phase
        y_pred = model(x, y)
        # Compute loss
        loss = criterion(y_pred, y)
        # Update weight
        loss.backward()
        optimizer.step()
        optimizer.zero_grad()

Generating waveforms

Generating waveforms is done by giving a batch of sequences of fundamental frequency and lists of context vectors.

# Importing, defining some constants, loading models, prepare F0 and context vectors...
# x: (F0, context_vectors) in the previous subsection
y_pred = model(x).detach().numpy().reshape(batch_size*waveform_length)
librosa.output.write_wav('predicted_waveform.wav', y_pred, sr=sampling_rate)

TODO

documentation
make it a library (if it is convenient)

Name		Name	Last commit message	Last commit date
Latest commit History 63 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
layers.py		layers.py
losses.py		losses.py
model.py		model.py
modules.py		modules.py
test_generate_fake_data.py		test_generate_fake_data.py
test_sine_generator.py		test_sine_generator.py
test_train.py		test_train.py
test_train_real_data.py		test_train_real_data.py
test_train_sourcemodule.py		test_train_sourcemodule.py
test_wavenet_core.py		test_wavenet_core.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

nsf-torch

Requirments

Usage

Training

Generating waveforms

TODO

About

Releases

Packages

Languages

License

leichtrhino/nsf-torch

Folders and files

Latest commit

History

Repository files navigation

nsf-torch

Requirments

Usage

Training

Generating waveforms

TODO

Footnotes

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages