Skip to content
This repository has been archived by the owner on Sep 1, 2024. It is now read-only.

Reproducing base_noise_pt_noise_ft_30h.pt #115

Open
nobel861017 opened this issue Jun 2, 2024 · 0 comments
Open

Reproducing base_noise_pt_noise_ft_30h.pt #115

nobel861017 opened this issue Jun 2, 2024 · 0 comments

Comments

@nobel861017
Copy link

nobel861017 commented Jun 2, 2024

Hi,
I am trying to reproduce the results of base_noise_pt_noise_ft_30h.pt by fine-tuning the pre-trained checkpoint base_noise_pt_lrs3_vox_iter5.pt.
By directly decoding base_noise_pt_noise_ft_30h.pt on the clean set, I get WER 4%. But by fine-tuning base_noise_pt_lrs3_vox_iter5.pt with the config base_noise_pt_noise_ft_30h.yaml and decoding on the clean set, I get WER around 4.6%. For decoding, I used the parameters generation.beam=20 generation.lenpen=1. For fine-tuning, I used update_freq: [8] since I only used on GPU, and I used musan/tsv/all as the noise wav. The rest of the parameters have not been modified.
I also noticed that the performance gap between the provided fine-tuned model and the one that I fine-tuned by myself is even more significant on the noisy testing set.
Do you know what is going on, or did I miss something?
Thanks

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant