You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Based on your feedback in the paper, and the way you used GlotLID we improved GlotLID into version 3.
For the reproducibility of your results, I want to ask you to change model.bin in your code to model_v2.bin. This ensures that it downloads the version you used to obtain your results, and you won't need to reproduce the results again (model.bin always refers to the latest model.). I think only detection_GlotLID.py#L113 needs to be changed for the sake of reproducibility.
Version 3, based on your feedback, adds both Meiteilon (Manipuri) and Dogri, and also ensures to cover all other Indian languages, even in transliteration. You can see list of them here for v3: https://github.com/cisnlp/GlotLID/blob/main/languages-v3.md
Also, I've seen in your code that there seems to be a hard time managing ISO codes. In the v3 design, we decided to make labels more exclusive of each other. For this reason, some of the "macro" languages that we already cover a good variety of "individual" languages are deleted. Additionally, if two labels are very close and make predictions change a lot, we decided to merge them or delete one of them.
The text was updated successfully, but these errors were encountered:
that is great! Thank you so much for bringing this to our attention. I will make the respective changes in the code :). We are also pleased that we were able to assist with our feedback and are very much looking forward to trying out the new version of GlotLID!
Hi, Thanks for using GlotLID in your project.
Based on your feedback in the paper, and the way you used GlotLID we improved GlotLID into version 3.
For the reproducibility of your results, I want to ask you to change
model.bin
in your code tomodel_v2.bin
. This ensures that it downloads the version you used to obtain your results, and you won't need to reproduce the results again (model.bin
always refers to the latest model.). I think only detection_GlotLID.py#L113 needs to be changed for the sake of reproducibility.Version 3, based on your feedback, adds both Meiteilon (Manipuri) and Dogri, and also ensures to cover all other Indian languages, even in transliteration. You can see list of them here for v3: https://github.com/cisnlp/GlotLID/blob/main/languages-v3.md
Also, I've seen in your code that there seems to be a hard time managing ISO codes. In the v3 design, we decided to make labels more exclusive of each other. For this reason, some of the "macro" languages that we already cover a good variety of "individual" languages are deleted. Additionally, if two labels are very close and make predictions change a lot, we decided to merge them or delete one of them.
The text was updated successfully, but these errors were encountered: