You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In this step, we address the challenge of incorporating underrepresented languages with a focus on low-resource languages. This effort confronts the prevalent imbalance in NLP systems, which are predominantly oriented towards high-resource languages such as English, Chinese, and Spanish. These languages benefit from extensive digital resources, including large text corpora, facilitating their dominance in NLP research. Conversely, low-resource languages like Lao and Sanskrit are characterized by a scarcity of digital resources. Our aim is to highlight these underrepresented languages (Lao and Sanskrit as the candidates from this group), recognizing and exploring their unique linguistic features. By integrating these languages, we strive to develop truly language-agnostic system and embrace the full spectrum of global linguistic diversity.
The text was updated successfully, but these errors were encountered:
For the backtranslation phase in our experiments with these languages, we employ nllb. The parameters for specifying the languages will be lao_Laoo for Lao and san_Deva for Sanskrit. The outcomes of these experiments will be integrated into LADy version 0.2.0.0, which already contains results from the nllb translator.
In this step, we address the challenge of incorporating underrepresented languages with a focus on low-resource languages. This effort confronts the prevalent imbalance in NLP systems, which are predominantly oriented towards high-resource languages such as
English
,Chinese
, andSpanish
. These languages benefit from extensive digital resources, including large text corpora, facilitating their dominance in NLP research. Conversely, low-resource languages likeLao
andSanskrit
are characterized by a scarcity of digital resources. Our aim is to highlight these underrepresented languages (Lao
andSanskrit
as the candidates from this group), recognizing and exploring their unique linguistic features. By integrating these languages, we strive to develop truly language-agnostic system and embrace the full spectrum of global linguistic diversity.The text was updated successfully, but these errors were encountered: