GlotLID #1

kargaranamir · 2024-04-08T16:44:46Z

Hi, Thanks for using GlotLID in your project.

Based on your feedback in the paper, and the way you used GlotLID we improved GlotLID into version 3.

For the reproducibility of your results, I want to ask you to change model.bin in your code to model_v2.bin. This ensures that it downloads the version you used to obtain your results, and you won't need to reproduce the results again (model.bin always refers to the latest model.). I think only detection_GlotLID.py#L113 needs to be changed for the sake of reproducibility.

model_path = hf_hub_download(repo_id="cis-lmu/glotlid", filename="model_v2.bin", cache_dir=None)

Version 3, based on your feedback, adds both Meiteilon (Manipuri) and Dogri, and also ensures to cover all other Indian languages, even in transliteration. You can see list of them here for v3: https://github.com/cisnlp/GlotLID/blob/main/languages-v3.md

Also, I've seen in your code that there seems to be a hard time managing ISO codes. In the v3 design, we decided to make labels more exclusive of each other. For this reason, some of the "macro" languages that we already cover a good variety of "individual" languages are deleted. Additionally, if two labels are very close and make predictions change a lot, we decided to merge them or delete one of them.

The text was updated successfully, but these errors were encountered:

CaroHolt · 2024-04-09T14:58:19Z

Hi,

that is great! Thank you so much for bringing this to our attention. I will make the respective changes in the code :). We are also pleased that we were able to assist with our feedback and are very much looking forward to trying out the new version of GlotLID!

kargaranamir · 2024-04-10T07:50:33Z

Thanks!

CaroHolt self-assigned this Apr 9, 2024

kargaranamir closed this as completed Apr 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GlotLID #1

GlotLID #1

kargaranamir commented Apr 8, 2024

CaroHolt commented Apr 9, 2024

kargaranamir commented Apr 10, 2024

GlotLID #1

GlotLID #1

Comments

kargaranamir commented Apr 8, 2024

CaroHolt commented Apr 9, 2024

kargaranamir commented Apr 10, 2024