Wav2wav: Wave-to-Wave Voice Conversion

📖 Paper Information

Title: Wav2wav: Wave-to-Wave Voice Conversion
Author(s): Changhyeon Jeong, Hyung-pil Chang, In-Chul Yoo, Dongsuk Yook
Published In: Applied Sciences (MDPI)
Publication Year: 2024
DOI (Digital Object Identifier): https://www.mdpi.com/2076-3417/14/10/4251

📌 Introduction

This repository provides the source code associated with the paper "Wav2wav: Wave-to-Wave Voice Conversion." The purpose of this project is to implement a novel voice conversion architecture that integrates the feature extractor, feature converter, and vocoder into a single module trained in an end-to-end manner.

📁 Directory Structure

train.py, models.py: The core files of the project. These files have been most frequently modified for training and evaluation purposes.
cp_* directories: Contain trained weights and TensorBoard logs generated during training.
gen_wavs* directories: Store generated audio samples using trained weights.
train_*.sh: Shell scripts used to train train.py and its variants. Various hyperparameters are defined within these scripts or in config_v1.json.
conv.sh: A script used to generate audio samples from trained weights.(using cp* directories)
- Usage: conv.sh <checkpoint_directory> <output_directory> <source_speaker_directory> <target_speaker_directory>
- Example: conv.sh cp_hifigan_FM gen_wavs_FM 1spkr_SF3 1spkr_TM1
train_mod_loss.py: A modified version of training where Fourier transform is replaced by the use of prenet across all relevant parts.
train_mod_loss2.py: A further modification where only the first convolutional layer of the prenet is used (log-magnitude spectrogram format) for id_loss and cycle_loss training.
Final hyperparameter setting: 45_30_0.5

🐳 Docker Setup

To run this project in a Docker container with GPU support:

docker run --gpus device=0 -it --memory=16G --memory-swap=32G --shm-size=8G --rm -v /your storage path/:/shared_dir/  /your docker images pytorch/pytorch:1.9.0-cuda10.2-cudnn7-runtime

🔍 Install Required Packages

apt-get update && apt-get install libsndfile1-dev
pip install librosa==0.8.1 torchaudio==0.9.0 matplotlib tensorboard

🔍 Usage

Clone this repository:

 git clone https://github.com/hpjang/Wav2Wav.git

Download VCC2018 dataset: https://datashare.ed.ac.uk/handle/10283/3061
Train the model:

 bash train_*.sh

✅ Generate Audio Samples (Wav2wav, Spec2wav)

./conv.sh "<checkpoint_directory>" "<output_directory>" "<source_speaker_directory>" "<target_speaker_directory>"

Example:

./conv.sh cp_hifigan_FM_45_30_0.5 generated 1spkr_SF3 1spkr_TM1

✅ Generate Audio Samples (Wav2spec, Spec2spec)

./conv??.sh "<checkpoint_directory>" "<output_directory>" "<source_speaker_directory>" "<target_speaker_directory>" "<vocoder_type>"

Example:

./convFM.sh cp_hifigan_FM_45_30_0.5 generated 1spkr_SF3 1spkr_TM1 mel

📂 Pre-Trained Weights

Trained weights can be downloaded from the following URL:

Google Drive (Enter the URL for the trained weights here doawnload it and move to cp_* directroy)

📜 License

This project is licensed under the [License Name (MIT or Apache License 2.0)]. For more details, see the LICENSE file.

📢 Citation

If you use this project, please cite the paper as follows:

@article{Jeong2024Wav2wav,
  title={Wav2wav: Wave-to-Wave Voice Conversion},
  author={Changhyeon Jeong and Hyung-pil Chang and In-Chul Yoo and Dongsuk Yook},
  journal={Applied Sciences},
  volume={14},
  number={10},
  pages={4251},
  year={2024},
  doi={https://www.mdpi.com/2076-3417/14/10/4251}
}

📧 Contact

For questions or requests, please contact: [hpchang@korea.ac.kr]

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
VCC2018_wavs		VCC2018_wavs
VCC_meta		VCC_meta
cp_hifigan_MM_45_30_0.5_0109		cp_hifigan_MM_45_30_0.5_0109
LICENSE		LICENSE
README.md		README.md
calc.sh		calc.sh
calc_.py		calc_.py
config_v1.json		config_v1.json
conv.sh		conv.sh
env.py		env.py
inference.py		inference.py
inference_e2e.py		inference_e2e.py
meldataset.py		meldataset.py
models.py		models.py
sharedir_test.txt		sharedir_test.txt
train.py		train.py
train_FF_45_20_100000.sh		train_FF_45_20_100000.sh
train_FF_45_30_0.5.sh		train_FF_45_30_0.5.sh
train_FM_40_20_1.sh		train_FM_40_20_1.sh
train_FM_45_15_1.sh		train_FM_45_15_1.sh
train_FM_45_20_0.5.sh		train_FM_45_20_0.5.sh
train_FM_45_20_1.sh		train_FM_45_20_1.sh
train_FM_45_20_2.sh		train_FM_45_20_2.sh
train_FM_45_25_0.5.sh		train_FM_45_25_0.5.sh
train_FM_45_25_1.sh		train_FM_45_25_1.sh
train_FM_45_30_0.5.sh		train_FM_45_30_0.5.sh
train_FM_45_30_1.sh		train_FM_45_30_1.sh
train_FM_50_20_1.sh		train_FM_50_20_1.sh
train_MF_45_20_100000.sh		train_MF_45_20_100000.sh
train_MF_45_25_0.5.sh		train_MF_45_25_0.5.sh
train_MF_45_25_1.sh		train_MF_45_25_1.sh
train_MF_45_30_0.5.sh		train_MF_45_30_0.5.sh
train_MF_45_30_1.sh		train_MF_45_30_1.sh
train_MM_45_20_100000.sh		train_MM_45_20_100000.sh
train_MM_45_30_0.5.sh		train_MM_45_30_0.5.sh
train_MM_45_30_0.5_0109_4.txt		train_MM_45_30_0.5_0109_4.txt
train_MM_45_30_0.5_01111_2.txt		train_MM_45_30_0.5_01111_2.txt
train_mod_loss.py		train_mod_loss.py
train_mod_loss_2.py		train_mod_loss_2.py
train_mod_loss_2_onestep.py		train_mod_loss_2_onestep.py
train_mod_loss_2_onestep_2.py		train_mod_loss_2_onestep_2.py
train_mod_loss_3.py		train_mod_loss_3.py
train_wav2wav_0109.txt		train_wav2wav_0109.txt
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Wav2wav: Wave-to-Wave Voice Conversion

📖 Paper Information

📌 Introduction

📁 Directory Structure

🐳 Docker Setup

🔍 Install Required Packages

🔍 Usage

✅ Generate Audio Samples (Wav2wav, Spec2wav)

✅ Generate Audio Samples (Wav2spec, Spec2spec)

📂 Pre-Trained Weights

📜 License

📢 Citation

📧 Contact

About

Uh oh!

Releases

Packages

Languages

License

hpjang/Wav2Wav

Folders and files

Latest commit

History

Repository files navigation

Wav2wav: Wave-to-Wave Voice Conversion

📖 Paper Information

📌 Introduction

📁 Directory Structure

🐳 Docker Setup

🔍 Install Required Packages

🔍 Usage

✅ Generate Audio Samples (Wav2wav, Spec2wav)

✅ Generate Audio Samples (Wav2spec, Spec2spec)

📂 Pre-Trained Weights

📜 License

📢 Citation

📧 Contact

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages