HateSpeechDetection

Dataset overview:

Subtasks:

Subtasks:

Notebook to run BERT NLI model on Hateval dataset: https://github.com/mrinaltak/HateSpeechDetection/blob/main/Sentence%20Entailment/Sentence_Entailment_Hateval.ipynb
Notebook to run BERT NLI model on Offenseval dataset: https://github.com/mrinaltak/HateSpeechDetection/blob/main/Sentence%20Entailment/Sentence_Entailment_Offenseval.ipynb
The weights for the best models on hateval and offenseval datasets can be found [here] (https://drive.google.com/drive/folders/1TTl7-7GWVQ9fy4YTDSdB5_NF14xBAi3u?usp=sharing) in the WeightsHateval and WeightsOffenseval folders. Note that the variable "base_path" would need to be changed in both the notebooks according to the file system on which the notebook is used.

install required packages pip install -r requirements.txt
install transformers from source using - pip install git+https://github.com/huggingface/transformers [had to do this because decoding from 'input embeddings' was not available in the last release of transformers but implemented very recently in the official repository, this is needed for continous prompts on T5]
for training:
- For OffensEval, use continuous/continuous_prompt.py. Sample launch commands are provided in continuous/launch.sh
- For Hateval, use continuous/continuous_prompt_hateval.py. Sample launch commands are provided in continuous/launch.sh
for evaluation, use the respective scripts with --model_path argument as the path to the model checkpoint and set --epochs to 0. This will run the test script directly.
for coss evaluation, run the hateval script with the OffensEval checkpoint, and --n_tasks as 4.
model checkpoints can be found here

Name		Name	Last commit message	Last commit date
Latest commit History 83 Commits
Baselines		Baselines
Data annotation task		Data annotation task
Error_analysis		Error_analysis
OLID_dataset		OLID_dataset
Sentence Entailment		Sentence Entailment
T5-discrete		T5-discrete
continuous		continuous
hateval_dataset		hateval_dataset
.gitignore		.gitignore
README.md		README.md
generate_datasets.py		generate_datasets.py
olid_test_multiclass_v2.csv		olid_test_multiclass_v2.csv
requirements.txt		requirements.txt
split_dataset.py		split_dataset.py