Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GlotLID #1

Closed
kargaranamir opened this issue Apr 8, 2024 · 2 comments
Closed

GlotLID #1

kargaranamir opened this issue Apr 8, 2024 · 2 comments
Assignees

Comments

@kargaranamir
Copy link

Hi, Thanks for using GlotLID in your project.

Based on your feedback in the paper, and the way you used GlotLID we improved GlotLID into version 3.

For the reproducibility of your results, I want to ask you to change model.bin in your code to model_v2.bin. This ensures that it downloads the version you used to obtain your results, and you won't need to reproduce the results again (model.bin always refers to the latest model.). I think only detection_GlotLID.py#L113 needs to be changed for the sake of reproducibility.

model_path = hf_hub_download(repo_id="cis-lmu/glotlid", filename="model_v2.bin", cache_dir=None)

Version 3, based on your feedback, adds both Meiteilon (Manipuri) and Dogri, and also ensures to cover all other Indian languages, even in transliteration. You can see list of them here for v3: https://github.com/cisnlp/GlotLID/blob/main/languages-v3.md

Also, I've seen in your code that there seems to be a hard time managing ISO codes. In the v3 design, we decided to make labels more exclusive of each other. For this reason, some of the "macro" languages that we already cover a good variety of "individual" languages are deleted. Additionally, if two labels are very close and make predictions change a lot, we decided to merge them or delete one of them.

@CaroHolt CaroHolt self-assigned this Apr 9, 2024
@CaroHolt
Copy link
Collaborator

CaroHolt commented Apr 9, 2024

Hi,

that is great! Thank you so much for bringing this to our attention. I will make the respective changes in the code :). We are also pleased that we were able to assist with our feedback and are very much looking forward to trying out the new version of GlotLID!

@kargaranamir
Copy link
Author

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants