You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi guys! First of all thank you so much for such an amazing repo :)
I'd like to know if you have some insights on which decoder architecture works best for end2end training for medium-hard audio. Imagine that model size and available of data are not a problem. Have you done some tests or know of some paper comparing them?
The text was updated successfully, but these errors were encountered:
Hi @OleguerCanal,
Did you figure out which encoder - decoder pair was the most successful regarding your experiments ?
I am training Contextnet & Conformer encoders together with transducer decoders (not converging at all) and LSTM decoder: lstms are converging but the output predictions seems not perfectly aligned though it outputs correct words (some words are keep being repeated)
❓ Questions & Help
Hi guys! First of all thank you so much for such an amazing repo :)
I'd like to know if you have some insights on which decoder architecture works best for end2end training for medium-hard audio. Imagine that model size and available of data are not a problem. Have you done some tests or know of some paper comparing them?
The text was updated successfully, but these errors were encountered: