GitHub - franciellevargas/MOL: Multilingual Offensive Lexicon consists of the first contextual lexicon for abusive language detection, which is composed of 1,000 explicit and implicit terms and expressions with any pejorative connotation annotated with contextual information

MOL - Multilingual Offensive Lexicon Annotated with Contextual Information

The MOL consists of the first specialized lexicon for abusive language detection annotated with contextual information. It is composed of 1,000 explicit and implicit terms and expressions with pejorative connotations, which were manually identified by a linguist and annotated by three different annotators with contextual information labels. For example, the term "stupid" consists of a context-independent offensive term since this term is mostly found in a pejorative context of use. Differently, the terms "useless" and "worm" are classified as context-dependent offensive terms because both terms may be found in both contexts of use: (i) with non-pejorative connotation, such as "this smartphone is useless" or "The fisherman uses worms for bait", as well as (ii) with pejorative connotation, such as "this last President was useless" or "this being human is such a worm".

Multilingual Offensive Lexicon was extracted manually by a linguist from the HateBR dataset, and each term or expression was annotated by three different annotators, obtaining a high human-agreement score (73% Kappa). MOL was originally written in Portuguese and manually translated by native speakers in English, Spanish, French, German, and Turkish. Therefore, MOL is available in six different languages.

The table below describes the MOL statistics

Contextual Information

Hate Targets

class	label	total
Context-independent offensive	1	612
Context-depedent offensive	0	387
Total		1,000

class	total
no-hate	864
partyism	69
sexism	35
homophobia	16
fatphobia	9
religious intolerance	9
antisemitism	1
apology for the dictatorship	5
racism	4
antisemitistm	3
total	1,000

CITING

Vargas, F., Carvalho, I., Pardo, T.A.S., Benevenuto, F. (2024). Context-Aware and Expert Data Resources for Brazilian Portuguese Hate Speech Detection. Natural Language Processing Journal - Cambridge Core. pp.1-21.

Vargas, F., Góes, F., Carvalho, I., Benevenuto, F., Pardo, T.A.S. (2022). Contextual-Lexicon Approach for Abusive Language Detection. Proceedings of the 13th International Conference on Recent Advances in Natural Language Processing (RANLP 2021). pp.1438–1447. Held Online.

BIBTEX

@inproceedings{vargas-etal-2021-contextual, title = "Contextual-Lexicon Approach for Abusive Language Detection", author = "Vargas, Francielle and Rodrigues de G{\'o}es, Fabiana and Carvalho, Isabelle and Benevenuto, Fabr{\'\i}cio and Pardo, Thiago", editor = "Mitkov, Ruslan and Angelova, Galia", booktitle = "Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021)", year = "2021", address = "Held Online", publisher = "INCOMA Ltd.", url = "https://aclanthology.org/2021.ranlp-1.161", pages = "1438--1447", }

Name		Name	Last commit message	Last commit date
Latest commit History 76 Commits
data		data
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data

data

LICENSE

LICENSE

README.md

README.md

Repository files navigation

MOL - Multilingual Offensive Lexicon Annotated with Contextual Information

CITING

BIBTEX

FUNDING

About

Releases 1

Packages

License

franciellevargas/MOL

Folders and files

Latest commit

History

Repository files navigation

MOL - Multilingual Offensive Lexicon Annotated with Contextual Information

CITING

BIBTEX

FUNDING

About

Topics

Resources

License

Stars

Watchers

Forks