UzNLP.uz is an open-source Natural Language Processing (NLP) platform for the Uzbek language.
The project provides datasets, linguistic resources, pretrained models, and research tools designed to advance computational linguistics and AI research for low-resource Turkic languages.
- Develop high-quality Uzbek NLP datasets
- Build and release pretrained transformer-based models
- Provide linguistic annotation tools (NER, POS, Lemmatization, SRL)
- Support academic and industrial NLP research
- Create infrastructure for corpus management and evaluation
- Named Entity Recognition (NER)
- Part-of-Speech Tagging (POS)
- Lemmatization & Morphological Analysis
- Sentence Segmentation
- Sentiment Analysis
- Text Classification
- Coreference Resolution
- Uzbek NLP Datasets
- API and Web-based tools
- Transformer-based models (mBERT, XLM-R)
- Custom fine-tuned Uzbek models
- Statistical and neural approaches
- Hybrid rule-based + deep learning systems
- Annotated NER corpora
- POS-tagged datasets
- Coreference datasets
- Slang detection datasets
- Academic and domain-specific corpora
Developed within academic research initiatives in Uzbekistan to promote AI and computational linguistics research.
We welcome researchers, developers, and linguists to contribute to Uzbek NLP ecosystem.
Specify your license here (MIT / Apache 2.0 / GPL / etc.)