Skip to content

ixomaxip/hasoc

Repository files navigation

HASOC

This repo contains the code for our solutions at the Forum for Information Retrieval Evaluation (FIRE-2021). Our team called neuro-utmn-thales participated in two tasks on binary and fine-grained classification of English tweets that contain hate, offensive, and profane content (English Subtasks A & B) and one task on identification of problematic content in Marathi (Marathi Subtask A). For English subtasks, we investigate the impact of additional corpora for hate speech detection to fine-tune transformer models. We also apply a one-vs-rest approach based on Twitter-RoBERTa to discrimination between hate, profane and offensive posts. Our models ranked third in English Subtask A with the F1-score of 81.99% and ranked second in English Subtask B with the F1-score of 65.77%. For the Marathi tasks, we propose a system based on the Language-Agnostic BERT Sentence Embedding (LaBSE). This model achieved the second result in Marathi Subtask A obtaining an F1 of 88.08%.

Colab:

  1. create a new notebook
  2. do something
  3. save
  4. File->Save a copy to GitHub
  5. append "src/" to the file name, e.g. "src/test.ipynb"

Please cite this paper if you use this method or codes:

@article{glazkova2021fine,
  title={Fine-tuning of Pre-trained Transformers for Hate, Offensive, and Profane Content Detection in English and Marathi},
  author={Glazkova, Anna and Kadantsev, Michael and Glazkov, Maksim},
  journal={arXiv preprint arXiv:2110.12687},
  year={2021}
}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published