# Fish Diffusion
<div style="display: flex; justify-content: center;">
<img alt="LOGO" src="https://cdn.jsdelivr.net/gh/fishaudio/fish-diffusion@main/images/logo_512x512.png" width="256" height="256" />
</div>

<style>
  a {
    margin-right: 16px;
  }
  div {
    margin-top: 16px;
  }
</style>
<div style="display: flex; justify-content: center; margin-bottom: 20px; ">
<a href="https://discord.gg/wbYSRBrW2E">
<img alt="Discord" src="https://img.shields.io/discord/1044927142900809739?color=%23738ADB&label=Discord&logo=discord&logoColor=white&style=flat-square"/>
</a>

<a href="https://huggingface.co/spaces/fishaudio/fish-diffusion">
<img alt="Hugging Face" src="https://img.shields.io/badge/🤗%20Spaces-HiFiSinger-blue.svg?style=flat-square"/>
</a>

<a target="_blank" href="https://colab.research.google.com/github/fishaudio/fish-diffusion/blob/main/notebooks/train.ipynb">
<img alt="Open In Colab" src="https://img.shields.io/static/v1?label=Colab&message=Notebook&color=F9AB00&logo=googlecolab&style=flat-square"/>
</a>
</div>

</div>

## Terms of Use for Fish Diffusion

1. Obtaining Authorization and Intellectual Property Infringement: The user is solely accountable for acquiring the necessary authorization for any datasets utilized in their training process and assumes full responsibility for any infringement issues arising from the utilization of the input source. Fish Diffusion and its developers disclaim all responsibility for any complications that may emerge due to the utilization of unauthorized datasets.

2. BSD-3-Clause-Clear License: Fish Diffusion is distributed under the BSD-3-Clause-Clear License, which confers upon the user the privilege to employ it for any purpose, encompassing commercial applications. For more detail, see the LICENSE file.

3. Proper Attribution: Any derivative works based on Fish Diffusion must explicitly acknowledge the project and its license. In the event of distributing Fish Diffusion's code or disseminating results generated by this project, the user is obliged to cite the original author and source code (Fish Diffusion).

4. Audiovisual Content and AI-generated Disclosure: All derivative works created using Fish Diffusion, including audio or video materials, must explicitly acknowledge the utilization of the Fish Diffusion project and declare that the content is AI-generated. If incorporating videos or audio published by third parties, the original links must be furnished.

6. Agreement to Terms: By persisting in the use of Fish Diffusion, the user unequivocally consents to the terms and conditions delineated in this document. Neither Fish Diffusion nor its developers shall be held liable for any subsequent difficulties that may transpire.

In [None]:
#@title ## Agreement
i_agree_the_terms_above = False #@param {type:"boolean"}

if i_agree_the_terms_above is False:
  raise Exception("You need to agree with the terms to continue.")

## Environment Setup




### Check GPU

In [None]:
#@title
import torch
cuda_available = torch.cuda.is_available()
if cuda_available is False:
  raise Exception("CUDA is not available, please change instance type.")

for i in range(torch.cuda.device_count()):
  print(f"GPU {i}: {torch.cuda.get_device_name(i)} detected.")

print("-" * 20)

!nvidia-smi

### Configure Environment
You may skip this section if you have correct environment (fish_diffusion) installed

#### Install Conda

In [None]:
#@title
%%bash
mkdir /content/env
MINICONDA_INSTALLER_SCRIPT=Miniconda3-py310_23.1.0-1-Linux-x86_64.sh
MINICONDA_PREFIX=/content/env
wget -q --show-progress https://repo.continuum.io/miniconda/$MINICONDA_INSTALLER_SCRIPT
chmod +x $MINICONDA_INSTALLER_SCRIPT
./$MINICONDA_INSTALLER_SCRIPT -b -f -p $MINICONDA_PREFIX

#### Create Conda Environment

In [None]:
#@title
%%bash
source /content/env/bin/activate
conda create -n fish_diffusion python=3.10 -y

#### Clone Repo

In [None]:
#@title
import os

if os.path.exists("/content/fish-diffusion"):
  print("The repo alerady exists, skipping")
else:
  !git clone https://github.com/fishaudio/fish-diffusion /content/fish-diffusion


#### Install Dependencies
Note, this error message is fine:  
`Authorization error accessing https://download.pytorch.org/whl/cu118/wheel/`

In [None]:
#@title
%cd /content/fish-diffusion
!source /content/env/bin/activate;\
conda activate fish_diffusion;\
curl -sSL https://raw.githubusercontent.com/pdm-project/pdm/main/install-pdm.py | python3 -;\
/root/.local/bin/pdm sync;

## Vocoder preparation
This section is used to prepare NSF-HiFiGAN for Diffusion (DiffSVC) training. It's optional in HiFiSinger.

In [None]:
#@title
%cd /content/fish-diffusion
!source /content/env/bin/activate;\
conda activate fish_diffusion;\
python tools/download_nsf_hifigan.py --agree-license

## Dataset Preparation
Currently, only single speaker-training is supported in this notebook.  

### Option 1
**You need to split your audios to 5-10 seconds segements before uploading them!!!**. 
To solve disk IO limitation, you need to upload your dataset in zip format with the following structure:

```shell
[ZIP ROOT]
├───train
│   ├───xxx1-xxx1.wav
│   ├───...
│   └───Lxx-0xx8.wav
└───valid
    ├───xx2-0xxx2.wav
    ├───...
    └───xxx7-xxx007.wav
```

You only need to pick 5-10 samples to the valid folder.  

### Option 2
If you want the program to do split and pick them automatically, only upload a folder of wavs:

```shell
[ZIP ROOT]
├───xxx1-xxx1.wav
├───...
└───Lxx-0xx8.wav
```

In [None]:
#@title Preprocess Dataset

import os
import zipfile

#@markdown Do you want the program to download zip from gdrive automatically, instead of upload it by yourself?
download_from_gdrive = True #@param {type:"boolean"}
#@markdown The path of google drive or relative to the `/content`
dataset_path = "Fish Diffusion/opencpop-wavs.zip" #@param {type:"string"}
#@markdown Do you want the program to automatically slice audio into segments?
auto_slice = False #@param {type:"boolean"}
#@markdown Do you want the program to automatically split train and valid set?
auto_pick = True #@param {type:"boolean"}
auto_pick_num = 5 #@param {type:"number"}

if download_from_gdrive:
  from google.colab import drive
  drive.mount('/content/drive/')
  zip_path = os.path.join("/content/drive/MyDrive", dataset_path)
else:
  zip_path = os.path.join("/content", dataset_path)

z = zipfile.ZipFile(zip_path)
files = list(z.namelist())
has_train_folder = any(i.startswith("train") for i in files)
has_valid_folder = any(i.startswith("valid") for i in files)
has_folder = any("/" in i for i in files)

print(f"Has train folder: {has_train_folder}")
print(f"Has valid folder: {has_valid_folder}")
print(f"Has folder: {has_folder}")
print("-" * 20)

!rm -rf /content/fish-diffusion/dataset

if has_train_folder != has_valid_folder:
  print("Your dataset structure is incorrect, you should have either both train and valid or none of them.")
elif not (has_train_folder and has_valid_folder) and has_folder:
  print("Your dataset structure is incorrect, you shouldn't have any folders in your zip if you don't have both train and valid.")
elif has_folder and not (auto_slice is False and auto_pick is False):
  print("Auto split and auto pick are not available when subfolders exist")
elif has_train_folder and has_valid_folder:
  os.makedirs("/content/fish-diffusion/dataset", exist_ok=True)
  !unzip -q "{zip_path}" -d /content/fish-diffusion/dataset
  print("OK")
else:
  train_path = "/content/fish-diffusion/dataset/train"
  valid_path = "/content/fish-diffusion/dataset/valid"
  os.makedirs(train_path, exist_ok=True)
  os.makedirs(valid_path, exist_ok=True)

  if auto_slice:
    print("Unzipping")
    raw_path = "/content/fish-diffusion/dataset/raw"
    os.makedirs(raw_path, exist_ok=True)
    !unzip -q "{zip_path}" -d "{raw_path}" 
    print("Unzip completed")

    # Call slicer
    !/content/env/envs/fish_diffusion/bin/fap slice-audio \
      "{raw_path}" "{train_path}" --top-db 50 --num-workers 4
    print("Audo sliced")
  else:
    print("Unzipping")
    !unzip -q "{zip_path}" -d "{train_path}"
    print("Unzip completed")

  !/content/env/envs/fish_diffusion/bin/python \
    /content/fish-diffusion/tools/preprocessing/random_move.py \
    "{train_path}" "{valid_path}" "{auto_pick_num}"

  print("Copied 5 random files to valid folder")
  print("OK")


## Choose Your Model

In [None]:
#@title Model Config
#@markdown It's strongly recommand to use a pertrained model in colab
pretrained = True #@param {type:"boolean"}
pretrained_profile ='hifisinger-v2.1.0'#@param ['hifisinger-v2.1.0', 'diffusion-v2.0.0']

PROFILES = {
  "hifisinger-v2.1.0": {
    "arch": "hifisinger",
    "config": "https://github.com/fishaudio/fish-diffusion/releases/download/v2.1.0/svc_hifisinger_finetune.py",
    "model": "https://github.com/fishaudio/fish-diffusion/releases/download/v2.1.0/hifisinger-pretrained-20230329-540k.ckpt"
  },
  "diffusion-v2.0.0": {
    "arch": "diffusion",
    "config": "https://github.com/fishaudio/fish-diffusion/releases/download/v2.0.0/svc_content_vec_finetune.py",
    "model": "https://github.com/fishaudio/fish-diffusion/releases/download/v2.0.0/content-vec-pretrained-v1.ckpt"
  }
}

#@markdown ---
#@markdown Or bring your own config if you don't want to use pretrained
arch = 'hifisinger'#@param ['diffusion', 'hifisinger']
config_path = ''#@param{type:"string"}
#@markdown Leaving the following field empty will disable pretrain
pretrained_model_path = ''#@param{type:"string"}

if pretrained:
  profile = PROFILES[pretrained_profile]
  arch = profile["arch"]
  config_path = f"configs/svc_{profile['arch']}_finetune.py"
  pretrained_model_path = f"checkpoints/pretrained_{profile['arch']}.ckpt"

  !wget -q --show-progress "{profile['config']}" -O "/content/fish-diffusion/{config_path}"
  !wget -q --show-progress "{profile['model']}" -O "/content/fish-diffusion/{pretrained_model_path}"

  print(f"Your config is saved to {config_path}")


> The project is under active development, please backup your config file  
> The project is under active development, please backup your config file  
> The project is under active development, please backup your config file  

### Extract Features

In [None]:
#@title
!source /content/env/bin/activate;\
conda activate fish_diffusion;\
python tools/preprocessing/extract_features.py --config "{config_path}" --path dataset/valid --clean --no-augmentation;\
python tools/preprocessing/extract_features.py --config "{config_path}" --path dataset/train --clean --num-workers 4

## Final Steps

In [None]:
#@title Training
#@markdown You may want to resume from your previous checkpoint
resume = False #@param {type:"boolean"}
resume_model_path = ''#@param{type:"string"}

logger = 'tensorboard' #@param ['wandb', 'tensorboard']
#@markdown You may continue your wandb experiment by providing the experiment id
resume_id = ''#@param{type:"string"}
#@markdown Point to where you want to save models & logs in GDrive
dest_path = '/content/drive/MyDrive/FishSVC/'#@param{type:"string"}

args = ""
if pretrained_model_path:
  args += f"--pretrain {pretrained_model_path} "

if logger == "tensorboard":
  args += f"--tensorboard "
  %load_ext tensorboard
  %tensorboard --logdir .

if resume:
  args += f"--resume {resume_model_path} "
  if logger == "wandb" and resume_id:
    args += f"--resume-id {resume_id} "
if dest_path:
   args += f"--dest-path {dest_path} "

!source /content/env/bin/activate;\
conda activate fish_diffusion;\
python "tools/{arch}/colab_train.py" --config "{config_path}" {args}

After training, you can find your checkpoints in `/content/fish-diffusion/logs/HiFiSVC/[VERSION]/checkpoints`, make sure to backup them.

### Inference

In [None]:
#@title Gradio UI
#@markdown You need to run the model_config block before running this one.   

#@markdown The checkpoint path you want to use. You can use a folder if you want the code to atutomatically find the latest one
checkpoint_path = "/content/fish-diffusion/logs/HiFiSVC/version_0/checkpoints" #@param {type:"string"}

import yaml

if arch == "hifisinger":
  gradio_config = {
    "readme": "# Fish Diffusion - HiFiSinger Demo 🎤\nGitHub Repo: [fishaudio/fish-diffusion](https://github.com/fishaudio/fish-diffusion) \nTo share a new model, please check out the [Share Your Model](https://huggingface.co/spaces/fishaudio/fish-diffusion/discussions/2) discussion.\n",
    "max_mixing_speakers": 3,
    "models": [
      {
        "name": "demo",
        "config": config_path,
        "checkpoint": checkpoint_path,
        "readme": "This model is pretrained on the Opencpop and M4Singer dataset and fintuned on your dataset.",
      }
    ]
  }

  with open("gradio-config.yaml", "w") as f:
    yaml.dump(gradio_config, f)

  !source /content/env/bin/activate;\
    conda activate fish_diffusion;\
    python -c 'import gradio;gradio.close_all()';\
    python /content/fish-diffusion/tools/hifisinger/gradio_ui.py \
    --config gradio-config.yaml --share
else:
  !source /content/env/bin/activate;\
    conda activate fish_diffusion;\
    python tools/diffusion/inference.py \
    --config "{config_path}" --checkpoint "{checkpoint_path}" \
    --gradio --gradio_share

#### Command Line
This is not recommanded for beginners, but it gives you more flexibility.

In [None]:
!source /content/env/bin/activate;\
    conda activate fish_diffusion;\
    python "tools/{arch}/inference.py" \
    --config "{config_path}" --checkpoint "{checkpoint_path}" \
    --input "input.wav" \
    --output "output.wav" \
    --speaker 0 --pitch_adjust 0