euphemistic_abuse

This repository contains the supplementary data to the paper "Euphemistic Abuse -- A New Dataset and Classification Experiments for Implicitly Abusive Language".

The repository comprises the following data:

goldstandard.txt:

This file contains the set of 1797 sentences along their manual annotation. Each row corresponds to one sentence and each column provides some information regarding that sentence. The first row of the file indicates the semantics of each column (the labels are self-explanatory). Next to the information whether a sentence has been considered as abusive (i.e. "ABUSIVE") or not (i.e. "OTHER") -- this corresponds to the judgement of the majority of the 5 crowdworkers -- this file also includes the manual annotation of the different features of the feature-based classifier. The file also contains the fragment and the cue phrase of the respective sentence.

gpt3_completions_100.txt:

This file contains for each sentence the set of completions generated by GPT-3 which were used for the classifiers "GPT-3::inclusive" and "GPT-3::exclusive" in our paper. Each line represents one individual completion for the respective original sentence. The completion represents the last column while the original sentence represents the first column. For some sentences, there are fewer than 100 completions although we queried for each sentence 100 prompts. This can be explained by the fact that for some of these prompts an empty string was returned. Completions that correspond to empty strings were omitted from that file.

negative_instances.completed_by_gpt3.txt:

This file contains the negative instances which were automatically generated with the help of GPT3. These data are not part of our proposed gold standard (in which the negative data were produced manually via crowdsourcing)! These instances were used in Section 5.2 as an alternative method to produce negative data. (They correspond to the method "automatic" in Table 4.) For fragments that also indicate how the sentence is to be concluded, sometimes, the sentence generated by GPT-3 does not match the end of the given fragment. This is due to the fact that GPT-3 may have generated more than one sentence (with the second sentence containing the concluding part of the fragment). We just included the beginning sentence of those completions.

Folds:

This directory contains the folds we used for 5-fold crossvalidation. The folds are created in such a way that all instances of euphemistic abuse belonging to one particular cue phrase are in the same fold. Thus, in 5-fold crossvalidation, the test fold will always include euphemistic abuse of unseen cue phrases, i.e. cue phrases whose euphemistic counterparts were not observed in the training data. This is the strictest and most realistic evaluation setting.

Annotation Guidelines:

This directory includes all annotation guidelines. It is split into 2 subdirectories. The subdirectory "Crowdsourcing" contains all the guidelines related to crowdsourcing experiments while the subdirectory "Experts" contains all guidelines related to the expert annotation (i.e. the annotation guidelines referring to the features of the feature-based classifier). Please notice that for the crowdsourcing experiments we used the term "polite insults" when referring to euphemistic abuse. We used that term since we felt it was more intuitive (though less linguistically adequate) to laymen.

data_sheet.pdf:

A data sheet providing summarizing important information about the data containe d in this repository.

attribution

This data set is published under Creative Commons Attribution 4.0.

acknowledgement

The research was partially supported by the Austrian Science Fund (FWF): P 35467-G.

contact information

Please direct any questions that you have about this software to Michael Wiegand at Alpen-Adria-Universitaet Klagenfurt.

Michael Wiegand email: michael.wiegand@aau.at

reference

M. Wiegand, J. Kampfmeier, E. Eder, J. Ruppenhofer: "Euphemistic Abuse -- A New Dataset and Classification Experiments for Implicitly Abusive Language", in EMNLP, 2023.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
AnnotationGuidelines		AnnotationGuidelines
Folds		Folds
LICENSE		LICENSE
README.md		README.md
data_sheet.pdf		data_sheet.pdf
goldstandard.txt		goldstandard.txt
gpt3_completions_100.txt		gpt3_completions_100.txt
negative_instances.completed_by_gpt3.txt		negative_instances.completed_by_gpt3.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

euphemistic_abuse

goldstandard.txt:

gpt3_completions_100.txt:

negative_instances.completed_by_gpt3.txt:

Folds:

Annotation Guidelines:

data_sheet.pdf:

attribution

acknowledgement

contact information

reference

About

Releases 1

Packages

License

miwieg/euphemistic_abuse

Folders and files

Latest commit

History

Repository files navigation

euphemistic_abuse

goldstandard.txt:

gpt3_completions_100.txt:

negative_instances.completed_by_gpt3.txt:

Folds:

Annotation Guidelines:

data_sheet.pdf:

attribution

acknowledgement

contact information

reference

About

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Packages