Skip to content

notebooks to finetune `bert-small-amharic`, `bert-mini-amharic`, and `xlm-roberta-base` models using an Amharic text classification dataset and the transformers library

Notifications You must be signed in to change notification settings

rasyosef/amharic-news-category-classification

Repository files navigation

amharic-news-category-classification

This github repo that contains three notebooks that use the amharic-news-category-classification dataset to finetune the following models for a text classification task.

The finetuned model classifies a given Amharic news article into one of the following 6 categories.

  • ሀገር አቀፍ ዜና (Local News)
  • መዝናኛ (Entertainment)
  • ስፖርት (Sports)
  • ቢዝነስ (Business)
  • ዓለም አቀፍ ዜና (International News)
  • ፖለቲካ (Politics)

Models

  • xlm-roberta-base : a multilingual transformer model with 280M parameters
  • bert-small-amharic : a new amharic version of the bert-small transformer model with 25.7M parameters, pretrained from scratch using unlabelled amharic text data
  • bert-mini-amharic : a new amharic version of the bert-mini transformer model with 9.67M parameters, pretrained from scratch using unlabelled amharic text data

Fine-tuned Model Performance

Since this is a multi-class classification task, the reported precision, recall, and f1 metrics are macro averages.

Model Size (# params) Accuracy Precision Recall F1
xlm-roberta-base 279M 0.9 0.88 0.88 0.88
bert-small-amharic 25.7M 0.89 0.86 0.87 0.86
bert-mini-amharic 9.67M 0.87 0.83 0.83 0.83

About

notebooks to finetune `bert-small-amharic`, `bert-mini-amharic`, and `xlm-roberta-base` models using an Amharic text classification dataset and the transformers library

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published