hate_speech_adversarial_debiasing

Code for "Separating Hate Speech and Offensive Language Classes via Adversarial Debiasing".

Dataset

How to cite

Please cite our work as follows:

@inproceedings{yuan-etal-2022-separating,
    title = "Separating Hate Speech and Offensive Language Classes via Adversarial Debiasing",
    author = {Yuan, Shuzhou  and
      Maronikolakis, Antonis  and
      Sch{\"u}tze, Hinrich},
    editor = "Narang, Kanika  and
      Mostafazadeh Davani, Aida  and
      Mathias, Lambert  and
      Vidgen, Bertie  and
      Talat, Zeerak",
    booktitle = "Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH)",
    month = jul,
    year = "2022",
    address = "Seattle, Washington (Hybrid)",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2022.woah-1.1",
    doi = "10.18653/v1/2022.woah-1.1",
    pages = "1--10",
    abstract = "Research to tackle hate speech plaguing online media has made strides in providing solutions, analyzing bias and curating data. A challenging problem is ambiguity between hate speech and offensive language, causing low performance both overall and specifically for the hate speech class. It can be argued that misclassifying actual hate speech content as merely offensive can lead to further harm against targeted groups. In our work, we mitigate this potentially harmful phenomenon by proposing an adversarial debiasing method to separate the two classes. We show that our method works for English, Arabic German and Hindi, plus in a multilingual setting, improving performance over baselines.",
}

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
README.md		README.md
adversarial_debiasing.py		adversarial_debiasing.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

hate_speech_adversarial_debiasing

Dataset

How to cite

About

Releases

Packages

Languages

ShuzhouYuan/hate_speech_adversarial_debiasing

Folders and files

Latest commit

History

Repository files navigation

hate_speech_adversarial_debiasing

Dataset

How to cite

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages