Submission for the Moj Abusive Comment Detection Challenge, hosted on Kaggle.
- Download the dataset and extract it the root directory.
- Create a virtual environment:
python -m venv env # virtual env
pip install -r requirements.txt
source env/bin/activate
- Change
index
inmain.py
to choose which model of the ensemble to train/test. - Run
python main.py
utils.py
contains helper functions for caching and ensembling.
Note: BERT models are large, GPU with 16GB VRAM required. Batch size can be reduced if training on 8GB GPU.