Skip to content

v2.1.0: Fine-tuning and parameter freezing, pitch expressiveness control, DS files training, minor featrure improvements and bug fixes

Compare
Choose a tag to compare
@yqzhishen yqzhishen released this 26 Aug 05:48
· 77 commits to main since this release

Fine-tuning and parameter freezing (#108, #120)

If you already have some pre-trained checkpoints, and you need to adapt them to other datasets with their functionalities unchanged, fine-tuning may save training steps and time. Configuration example:

finetune_enabled: true  # the main switch to enable fine-tuning
finetune_ckpt_path: checkpoints/pretrained/model_ckpt_steps_320000.ckpt  # path to your pre-trained checkpoint
finetune_ignored_params:  # prefix rules to exclude specific parameters when loading the checkpoints
  - model.fs2.encoder.embed_tokens  # in case when the phoneme set is changed
  - model.fs2.txt_embed  # same as above
  - model.fs2.spk_embed  # in case when the speaker set is changed
finetune_strict_shapes: true  # whether to raise an error when parameter shapes mismatch

Freezing part of the model parameter during training and fine-tuning may be able to save GPU memory, accelerate the training process or avoid catastrophic forgetting. Configuration example:

freezing_enabled: true  # main switch to enable parameter freezing
frozen_params:  # prefix rules to freeze specific parameters during training
  - model.fs2.encoder
  - model.fs2.pitch_embed

Please see the documentation for detailed usages of these two features.

Pitch expressiveness controlling mechanism (#97)

Expressiveness controls how freely the variance model generates pitch curves. By default, the variance model predicts pitch at a 100% expressiveness, which means completely following the style of the voice provider. Correspondingly, a 0% expressiveness will produce pitch completely close to the smoothened music score. Expressiveness can be freely adjusted from 0% to 100%, statically, or even dynamically on frame level.

Pitch expressiveness controlling is compatible with all variance models with a pitch predictor without re-training anything.

Control pitch expressiveness in CLI

python scripts/infer.py variance my_project.ds --exp my_pitch_exp --predict pitch --expr 0.8  # a value between 0 and 1

Control pitch expressiveness in DS files

{
  "expr": 0.8  // static control
}

or

{
  "expr": "0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0",  // dynamic control
  "expr_timestep": "0.005"
}

Expose pitch expressiveness control in ONNX models

python scripts/export.py variance --exp my_pitch_exp --expose_expr

This will add an additional input named expr in my_pitch_exp.pitch.onnx.

DS files training (#132)

Using DS files to train variance models are now supported - this means users of voicebanks can tune projects of their own styles without recording any real singing voice. The only things needed to be done are: copy the DS files in ds/ folder in raw dataset directory, write a single-column transcriptions.csv to declare them and turn on the main switch of DS files binarization in the configuration file:

binarization_args:
  prefer_ds: true  # prefer loading from DS files

Please see the documentation for more detailed usages and information of DS files binarization.

Other minor feature improvements

  • Support the state-of-the-art RMVPE pitch extractor (#118, #122)
  • Show objective evaluation metrics on TensorBoard (#123, #127)
  • Support composite LR schedulers (#125)
  • Perform graceful exit on keyboard interrupt during binarization and inference (#119)
  • Improve logging format of learning rate (#115)
  • Add more documentation for old and new features

Major bug fixes

  • Fixed wrong speaker ID assignment in fixed pitch shifting augmentation
  • Fixed illegal access to None when training dur predictor
  • Fixed slur mistakes in a sample DS file
  • Fixed wrong model loading logic when using --mel
  • Fixed noisy output of ONNX models on DirectML
  • Fixed missing spk_embed input of multi-speaker duration predictor ONNX models

Some changes may not be listed above. See full change log: v2.0.0...v2.1.0