ESPnet-SPk: major update #5408

Jungjee · 2023-08-12T16:19:23Z

What?

This PR combines several previous working PRs with new functions.

ESPnet-spk: add more model architectures #5385, ESPnet-SPK: add inference #5398
This PR newly introduces:
Speed perturbation as speaker augmentation
Frontend design modification -> we now can train the model with frozen/free SSL front-ends through S3PRL
Additional Speaker models: ECAPA-TDNN and MFA-Conformer
Adds novel on-going research on representation learning
Inference stage, a separate inference stage for evaluating models with EER and minDCF.
(updated) Supported configurations
- RawNet3 - EER 0.73% on Vox1-O
- ECAPA-TDNN w/ mel-spectrogram - EER 0.96% on Vox1-O
- ECAPA-TDNN w/ fixed WavLM-Large frontend - EER 0.60% on Vox1-O

We have been adding several new features.
Several ongoing PRs have conflicts in places.