Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorporating Underrepresented Languages: A Focus on Low-Resource Languages #70

Open
farinamhz opened this issue Jan 20, 2024 · 1 comment
Assignees
Labels
enhancement New feature or request

Comments

@farinamhz
Copy link
Member

In this step, we address the challenge of incorporating underrepresented languages with a focus on low-resource languages. This effort confronts the prevalent imbalance in NLP systems, which are predominantly oriented towards high-resource languages such as English, Chinese, and Spanish. These languages benefit from extensive digital resources, including large text corpora, facilitating their dominance in NLP research. Conversely, low-resource languages like Lao and Sanskrit are characterized by a scarcity of digital resources. Our aim is to highlight these underrepresented languages (Lao and Sanskrit as the candidates from this group), recognizing and exploring their unique linguistic features. By integrating these languages, we strive to develop truly language-agnostic system and embrace the full spectrum of global linguistic diversity.

@farinamhz farinamhz added the enhancement New feature or request label Jan 20, 2024
@farinamhz farinamhz self-assigned this Jan 20, 2024
@farinamhz
Copy link
Member Author

For the backtranslation phase in our experiments with these languages, we employ nllb. The parameters for specifying the languages will be lao_Laoo for Lao and san_Deva for Sanskrit. The outcomes of these experiments will be integrated into LADy version 0.2.0.0, which already contains results from the nllb translator.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant