GitHub - franciellevargas/SSA: SSA is a post-hoc explanation method by stereotypes and counter-stereotypes to assess social bias in hate speech classifiers

Social Stereotype Bias Analysis in Hate Speech Classifiers

SSA - Social Stereotype Bias Analysis consists of a post-hoc explanation method by stereotypes and counter-stereotypes to assess social bias in hate speech classifiers. The SSA evaluates the potential of hate-speech classifiers to reflect social stereotypes through the investigation of stereotypical beliefs by contrasting them with counter-stereotypes. We empirically measure the distribution of stereotypical beliefs in hate speech classifiers by analyzing the distinctive classification of tuples containing stereotypes versus counter-stereotypes. Experiment results show that hate speech classifiers attribute unreal or negligent offensiveness to social group identifiers (e.g. women, gay, etc.) by reflecting and reinforcing stereotypical beliefs regarding minorities.

CITING

Vargas, F., Carvalho, I., Hürriyetoğlu, A., Pardo, T.A.S., Benevenuto, F. (2023). Socially Responsible Hate Speech Detection: Can Classifiers Reflect Social Stereotypes?. In Proceedings of the 14th International Conference on Recent Advances in Natural Language Processing (RANLP 2023), pp.1187-1196. Varna, Bulgari. Association for Computational Linguistics (ACL).

BIBTEX

@inproceedings{vargas-etal-2023-socially, title = "Socially Responsible Hate Speech Detection: Can Classifiers Reflect Social Stereotypes?", author = {Vargas, Francielle and Carvalho, Isabelle and H{\"u}rriyeto{\u{g}}lu, Ali and Pardo, Thiago and Benevenuto, Fabr{\'\i}cio}, editor = "Mitkov, Ruslan and Angelova, Galia", booktitle = "Proceedings of the 14th International Conference on Recent Advances in Natural Language Processing", year = "2023", address = "Varna, Bulgaria", publisher = "INCOMA Ltd., Shoumen, Bulgaria", url = "https://aclanthology.org/2023.ranlp-1.126", pages = "1187--1196", }

Name		Name	Last commit message	Last commit date
Latest commit History 64 Commits
datasets		datasets
models		models
tuples		tuples
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

datasets

datasets

models

models

tuples

tuples

LICENSE

LICENSE

README.md

README.md

Repository files navigation

Social Stereotype Bias Analysis in Hate Speech Classifiers

CITING

BIBTEX

FUNDING

About

Releases 2

Packages

Contributors 2

License

franciellevargas/SSA

Folders and files

Latest commit

History

Repository files navigation

Social Stereotype Bias Analysis in Hate Speech Classifiers

CITING

BIBTEX

FUNDING

About

Topics

Resources

License

Stars

Watchers

Forks