Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Results for baseline systems of Task 1 available somewhere? #6

Open
simon-clematide opened this issue Apr 7, 2020 · 8 comments
Open

Comments

@simon-clematide
Copy link

simon-clematide commented Apr 7, 2020

Hi there
as the sweep scripts of the neural baselines really explore a large amount of hyperparameters, wouldn't it make sense to save some energy and make the results of the baselines public?

@kylebgorman
Copy link
Collaborator

I agree, and we will shortly!

As we speak we are improving them on a few dimensions: we're tightening the grid for some hyperparameters and expanding it for a few others.

I think these should be ready in the next week or so, about the same time the surprise languages are ready.

@simon-clematide
Copy link
Author

simon-clematide commented Apr 8, 2020

That would be great, but probably a bit late for most of us.

@kylebgorman
Copy link
Collaborator

kylebgorman commented Apr 8, 2020 via email

@simon-clematide
Copy link
Author

I can share some of the things that we computed. In our experience the transformer is probably pretty strong. For a specific setup (see the original baseline scripts to interpret the meaning of the hyperparameters) that we tested, we got the following results:

checkpoints/arm-256-1024-4-4-256-1024-4-4-0.3/checkpoint_best.pt WER: 17.78 LER: 3.62
checkpoints/bul-256-1024-4-4-256-1024-4-4-0.3/checkpoint_best.pt WER: 30.67 LER: 7.01
checkpoints/fre-256-1024-4-4-256-1024-4-4-0.3/checkpoint_best.pt WER: 8.44 LER: 2.08
checkpoints/geo-256-1024-4-4-256-1024-4-4-0.3/checkpoint_best.pt WER: 28.44 LER: 6.04
checkpoints/gre-256-1024-4-4-256-1024-4-4-0.3/checkpoint_best.pt WER: 18.89 LER: 3.36
checkpoints/hin-256-1024-4-4-256-1024-4-4-0.3/checkpoint_best.pt WER: 8.44 LER: 2.32
checkpoints/hun-256-1024-4-4-256-1024-4-4-0.3/checkpoint_best.pt WER: 3.78 LER: 0.66
checkpoints/ice-256-1024-4-4-256-1024-4-4-0.3/checkpoint_best.pt WER: 11.56 LER: 2.86
checkpoints/kor-256-1024-4-4-256-1024-4-4-0.3/checkpoint_best.pt WER: 44.22 LER: 18.49
checkpoints/lit-256-1024-4-4-256-1024-4-4-0.3/checkpoint_best.pt WER: 22.67 LER: 4.63

Maybe this is gives a hint on the difficulty of the different data sets.

@kylebgorman
Copy link
Collaborator

kylebgorman commented Apr 9, 2020 via email

@simon-clematide simon-clematide changed the title Results for baseline systems available somewhere? Results for baseline systems of Task 1 available somewhere? Apr 10, 2020
@besou
Copy link

besou commented Apr 30, 2020

Thank you for the baseline results so far. The results are pretty strong, especially for the Enc-Dec baseline. Would you mind publishing the hyperparameter combinations of the most successful baseline models as well?

@kylebgorman
Copy link
Collaborator

kylebgorman commented Apr 30, 2020 via email

@kylebgorman
Copy link
Collaborator

Hi @besou, I may have spoken too soon. I'm still running the final sweep (including results on test) and it won't finish for a few days, so I won't have these in time for you to act on them. (I am locked out of my lab with all the GPUs due to the pandemic and associated social distancing.)

There are quite a bit of variation in what works for a given language: some prefer small batches, some large, whether you want a "small" or a "large" encoder and/or decoder varies; nearly all prefer a moderate degree dropout.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants