This is a repo to accompany the paper "Hate and Offensive Speech Detection on Arabic Social Media". It contins Hate speech dataset with 5360 annotated Arabic tweets.
Our dataset is composed of two csv files (train.csv) and testing (test.csv). They contain the tweets ids and the annotations described in our paper:
- Tweet ID(column: ID),
- Binary classifcation Task (column: 2-Class): Tweets are classifed as Clean(C) vs Offensive/Hate(OH)
- 3-way classifcation Task (column: 3-Class): Tweets are classifed as Clean(C) vs Offensive(O) vs Hate(H)
- 6-way classifcation Task (column: 6-Class): Tweets are classifed as Clean(C) vs Offensive(O) vs GenderHate(GH) vs ReligiousHate(RH) vs
NationalityHate(NH) vs EthnicityHate(EH)