Reproducing base_noise_pt_noise_ft_30h.pt #115

nobel861017 · 2024-06-02T18:04:39Z

Hi,
I am trying to reproduce the results of base_noise_pt_noise_ft_30h.pt by fine-tuning the pre-trained checkpoint base_noise_pt_lrs3_vox_iter5.pt.
By directly decoding base_noise_pt_noise_ft_30h.pt on the clean set, I get WER 4%. But by fine-tuning base_noise_pt_lrs3_vox_iter5.pt with the config base_noise_pt_noise_ft_30h.yaml and decoding on the clean set, I get WER around 4.6%. For decoding, I used the parameters generation.beam=20 generation.lenpen=1. For fine-tuning, I used update_freq: [8] since I only used on GPU, and I used musan/tsv/all as the noise wav. The rest of the parameters have not been modified.
I also noticed that the performance gap between the provided fine-tuned model and the one that I fine-tuned by myself is even more significant on the noisy testing set.
Do you know what is going on, or did I miss something?
Thanks

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reproducing base_noise_pt_noise_ft_30h.pt #115

Reproducing base_noise_pt_noise_ft_30h.pt #115

nobel861017 commented Jun 2, 2024 •

edited

Loading

Reproducing base_noise_pt_noise_ft_30h.pt #115

Reproducing base_noise_pt_noise_ft_30h.pt #115

Comments

nobel861017 commented Jun 2, 2024 • edited Loading

nobel861017 commented Jun 2, 2024 •

edited

Loading