New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
single language track setups #4895
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. Please follow our discussion and fix the CI issues. Let's accelerate the process~ Thanks @DanBerrebbi
Codecov Report
@@ Coverage Diff @@
## master #4895 +/- ##
==========================================
+ Coverage 73.10% 76.58% +3.48%
==========================================
Files 603 603
Lines 53709 53737 +28
==========================================
+ Hits 39264 41155 +1891
+ Misses 14445 12582 -1863
Flags with carried forward coverage won't be shown. Click here to find out more.
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
Many thanks! Looks great to me. |
I changed some single language langs :
I removed pol because similar to rus and nob because similar to swe,
I added French because there was no roman language(french, spanish, italian, portuguese ...) and added Swahili because there was no African language.
For dataset selections, it is summarized on the last page of https://docs.google.com/document/d/1sb8SyDjcMf7FDiZHH8wVcZ0EADtXdNBF3LpA9Cu0I1k/edit
For test sets format, I think that it is good to keep only one test set per language with all the datasets of this lang. This way it is an easy decoding process and then WE can split it the decoded file to have scores per dataset and so compute metrics for domain shifts ... . So we have flexibility for scoring and the user has a simple process.
Points to be discussed :