Results for baseline systems of Task 1 available somewhere? #6

simon-clematide · 2020-04-07T21:57:04Z

Hi there
as the sweep scripts of the neural baselines really explore a large amount of hyperparameters, wouldn't it make sense to save some energy and make the results of the baselines public?

kylebgorman · 2020-04-07T22:33:42Z

I agree, and we will shortly!

As we speak we are improving them on a few dimensions: we're tightening the grid for some hyperparameters and expanding it for a few others.

I think these should be ready in the next week or so, about the same time the surprise languages are ready.

simon-clematide · 2020-04-08T20:43:35Z

That would be great, but probably a bit late for most of us.

kylebgorman · 2020-04-08T21:57:15Z

We'll try to get it out as soon as possible. I have FST results just sitting here (just need to post them), and should be able to at least get the encoder-decoder ("LSTM") results up in the next few days. We're a bit behind on tuning experiments (still refining them a bit) because I don't have access to my lab due to social distancing.

…

On Wed, Apr 8, 2020 at 4:43 PM simon-clematide ***@***.***> wrote: That would be grate, but probably a bit late for most of us when the surprise languages come out. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#6 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AABG4OIA5MW46ZM26UQYIA3RLTOYLANCNFSM4MDOTSUA> .

simon-clematide · 2020-04-09T14:39:46Z

I can share some of the things that we computed. In our experience the transformer is probably pretty strong. For a specific setup (see the original baseline scripts to interpret the meaning of the hyperparameters) that we tested, we got the following results:

checkpoints/arm-256-1024-4-4-256-1024-4-4-0.3/checkpoint_best.pt WER: 17.78 LER: 3.62
checkpoints/bul-256-1024-4-4-256-1024-4-4-0.3/checkpoint_best.pt WER: 30.67 LER: 7.01
checkpoints/fre-256-1024-4-4-256-1024-4-4-0.3/checkpoint_best.pt WER: 8.44 LER: 2.08
checkpoints/geo-256-1024-4-4-256-1024-4-4-0.3/checkpoint_best.pt WER: 28.44 LER: 6.04
checkpoints/gre-256-1024-4-4-256-1024-4-4-0.3/checkpoint_best.pt WER: 18.89 LER: 3.36
checkpoints/hin-256-1024-4-4-256-1024-4-4-0.3/checkpoint_best.pt WER: 8.44 LER: 2.32
checkpoints/hun-256-1024-4-4-256-1024-4-4-0.3/checkpoint_best.pt WER: 3.78 LER: 0.66
checkpoints/ice-256-1024-4-4-256-1024-4-4-0.3/checkpoint_best.pt WER: 11.56 LER: 2.86
checkpoints/kor-256-1024-4-4-256-1024-4-4-0.3/checkpoint_best.pt WER: 44.22 LER: 18.49
checkpoints/lit-256-1024-4-4-256-1024-4-4-0.3/checkpoint_best.pt WER: 22.67 LER: 4.63

Maybe this is gives a hint on the difficulty of the different data sets.

kylebgorman · 2020-04-09T15:17:22Z

Thanks for sharing. We're getting slightly better numbers yet by tuning "smarter" (though not more) and that should be finalized in a few days.

…

On Thu, Apr 9, 2020 at 10:40 AM simon-clematide ***@***.***> wrote: I can share some of the things that we computed. In our experience the transformer is probably pretty strong. For a specific setup (see the original baseline scripts to interpret the meaning of the hyperparameters) that we tested, we got the following results: checkpoints/arm-256-1024-4-4-256-1024-4-4-0.3/checkpoint_best.pt WER: 17.78 LER: 3.62 checkpoints/bul-256-1024-4-4-256-1024-4-4-0.3/checkpoint_best.pt WER: 30.67 LER: 7.01 checkpoints/fre-256-1024-4-4-256-1024-4-4-0.3/checkpoint_best.pt WER: 8.44 LER: 2.08 checkpoints/geo-256-1024-4-4-256-1024-4-4-0.3/checkpoint_best.pt WER: 28.44 LER: 6.04 checkpoints/gre-256-1024-4-4-256-1024-4-4-0.3/checkpoint_best.pt WER: 18.89 LER: 3.36 checkpoints/hin-256-1024-4-4-256-1024-4-4-0.3/checkpoint_best.pt WER: 8.44 LER: 2.32 checkpoints/hun-256-1024-4-4-256-1024-4-4-0.3/checkpoint_best.pt WER: 3.78 LER: 0.66 checkpoints/ice-256-1024-4-4-256-1024-4-4-0.3/checkpoint_best.pt WER: 11.56 LER: 2.86 checkpoints/kor-256-1024-4-4-256-1024-4-4-0.3/checkpoint_best.pt WER: 44.22 LER: 18.49 checkpoints/lit-256-1024-4-4-256-1024-4-4-0.3/checkpoint_best.pt WER: 22.67 LER: 4.63 Maybe this is gives a hint on the difficulty of the different data sets. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#6 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AABG4OPF4FADZZQHR6MMGNDRLXM4DANCNFSM4MDOTSUA> .

besou · 2020-04-30T07:28:17Z

Thank you for the baseline results so far. The results are pretty strong, especially for the Enc-Dec baseline. Would you mind publishing the hyperparameter combinations of the most successful baseline models as well?

kylebgorman · 2020-04-30T17:10:46Z

Sure, I'll add that to the spreadsheet. I save the name of the checkpoint, from which you can derive the hyperparameters.

…

On Thu, Apr 30, 2020 at 3:28 AM besou ***@***.***> wrote: Thank you for the baseline results so far. The results are pretty strong, especially for the Enc-Dec baseline. Would you mind publishing the hyperparameter combinations of the most successful baseline models as well? — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#6 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AABG4OOPLSRDK3ER4QXNV2DRPESB5ANCNFSM4MDOTSUA> .

kylebgorman · 2020-05-02T19:54:07Z

Hi @besou, I may have spoken too soon. I'm still running the final sweep (including results on test) and it won't finish for a few days, so I won't have these in time for you to act on them. (I am locked out of my lab with all the GPUs due to the pandemic and associated social distancing.)

There are quite a bit of variation in what works for a given language: some prefer small batches, some large, whether you want a "small" or a "large" encoder and/or decoder varies; nearly all prefer a moderate degree dropout.

simon-clematide changed the title ~~Results for baseline systems available somewhere?~~ Results for baseline systems of Task 1 available somewhere? Apr 10, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Results for baseline systems of Task 1 available somewhere? #6

Results for baseline systems of Task 1 available somewhere? #6

simon-clematide commented Apr 7, 2020 •

edited

Loading

kylebgorman commented Apr 7, 2020

simon-clematide commented Apr 8, 2020 •

edited

Loading

kylebgorman commented Apr 8, 2020 via email

simon-clematide commented Apr 9, 2020

kylebgorman commented Apr 9, 2020 via email

besou commented Apr 30, 2020

kylebgorman commented Apr 30, 2020 via email

kylebgorman commented May 2, 2020

Results for baseline systems of Task 1 available somewhere? #6

Results for baseline systems of Task 1 available somewhere? #6

Comments

simon-clematide commented Apr 7, 2020 • edited Loading

kylebgorman commented Apr 7, 2020

simon-clematide commented Apr 8, 2020 • edited Loading

kylebgorman commented Apr 8, 2020 via email

simon-clematide commented Apr 9, 2020

kylebgorman commented Apr 9, 2020 via email

besou commented Apr 30, 2020

kylebgorman commented Apr 30, 2020 via email

kylebgorman commented May 2, 2020

simon-clematide commented Apr 7, 2020 •

edited

Loading

simon-clematide commented Apr 8, 2020 •

edited

Loading