Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

paper: compare ga_bert against xml-r #68

Closed
jowagner opened this issue Apr 9, 2021 · 1 comment
Closed

paper: compare ga_bert against xml-r #68

jowagner opened this issue Apr 9, 2021 · 1 comment
Labels
idea Future work idea next step This issue should be addresses in Summer 2022

Comments

@jowagner
Copy link
Collaborator

jowagner commented Apr 9, 2021

https://peltarion.com/blog/data-science/a-deep-dive-into-multilingual-nlp-models suggests "that training monolingual models for small languages is unnecessary" as "XLM-R achieved ~80% accuracy whereas the Swedish BERT models reached ~79% accuracy".

Check whether off-the-shelf xlm-roberta (that's more or less just roberta trained on the larger xlm training data in 100 languages, or more languages as the automatic language filter will have classified some data in other languages as belonging to 1 of the 100), performs better in our downstream tasks than Irish-specific ga_bert.

There are two models: base and large.

@jowagner jowagner added the idea Future work idea label Apr 9, 2021
@jowagner jowagner added the next step This issue should be addresses in Summer 2022 label Sep 16, 2021
@jowagner
Copy link
Collaborator Author

Figures ready for xlm-roberta-base.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
idea Future work idea next step This issue should be addresses in Summer 2022
Projects
None yet
Development

No branches or pull requests

1 participant