Skip to content

This code accompanies the paper "Evaluating the Effectiveness of Natural Language Inference for Hate Speech Detection in Languages with Limited Labeled Data"

Notifications You must be signed in to change notification settings

jagol/xnli4xhsd

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Evaluating the Effectiveness of Natural Language Inference for Hate Speech Detection in Languages with Limited Labeled Data

This code accompanies the paper: Evaluating the Effectiveness of Natural Language Inference for Hate Speech Detection in Languages with Limited Labeled Data.

Setup

Create a file paths.json in the repository's root directory and write to it:

{
  "data_dir": "path/to/the/data/directory",
  "output_dir": "path/to/the/output/directory",
  "configs_dir": "path/to/the/configs/directory"
}

Checkpoints, logs and results will be written to the output directory.

As an example:

  • the data_dir could be /home/user/projects/xnli-for-hate-speech-detection/data/ (in the following sections referenced as <path-to-data-dir>)
  • output_dir could be /home/user/projects/xnli-for-hate-speech-detection/output/ (in the following sections referenced as <path-to-output-dir>)
  • and configs_dir could be configs/ (in the following sections referenced as <path-to-configs-dir>).

Download all datasets:

bash scripts/download_data.sh

Create a python environment and install the required packages:

python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt

Preprocess the datasets:

bash scripts/preprocessing.sh

Run Experiments

To reproduce the results from Röttger et al. (2022), Data-Efficient Strategies for Expanding Hate Speech Detection into Under-Resourced Languages:

bash scripts/training_runs_repro.sh
bash scripts/run_eval_repro.sh

The baselines M and X are evaluated separately:

bash scripts/run_eval_M.sh
bash scripts/run_eval_X.sh

Fine-tune monolingual and multilingual models, which have been trained on (X)NLI:

bash scripts/training_runs_NLI.sh

Evaluate monolingual models trained on NLI:

bash scripts/run_eval_M_NLI.sh

Evaluate XLM-T models trained on NLI:

bash scripts/run_eval_X_NLI_baseline.sh
bash scripts/run_eval_X_NLI_strategies.sh

Evaluate models based on XLM-T and trained on XNLI:

bash scripts/run_eval_X_XNLI_baseline.sh
bash scripts/run_eval_X_XNLI_strategies.sh

Collect the results and write to one csv-file:

python3 src/parse_results_into_csv.py -i <path-to-output-dir> -o <path-to-csv-file>

To generate the plots execute the notebook notebooks/plot_results.ipynb.

Training runs are based on configs. To regenerate the configs execute the notebook notebooks/generate_configs.ipynb.

About

This code accompanies the paper "Evaluating the Effectiveness of Natural Language Inference for Hate Speech Detection in Languages with Limited Labeled Data"

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published