The Hateful Memes Challenge by Meta (2020): docs
Image is a compilation of assets, including ©Getty Image.
- OS: Ubuntu 24.04.1 LTS
- GPU: RTX 4060 (8GB) x1
- RAM: 32GB
| Model | config | AUC | Accuracy | F1 |
|---|---|---|---|---|
| CLIP | mlp | 0.826 | 0.754 | 0.658 |
| CLIP + Cross Attention | ca | 0.825 | 0.758 | 0.659 |
| CLIP + TRM | trm | 0.819 | 0.727 | 0.676 |
Since the ground-truth labels of the original test set are not accessible, the original validation set was repurposed as the test set, and the original training set was split at an 8:2 ratio to construct the new training and validation sets (data).
Check the-results for more details.
| Model | AUC | Accuracy |
|---|---|---|
| ViLBERT CC | 0.708 | 0.704 |
| Visual BERT COCO | 0.737 | 0.708 |
| VL-BERT (#1) | - | - |
| VILIO (#2) | 0.816 | - |
| VisualBERT (#3) | 0.752 | 0.710 |
| UNITER (#4) | 0.791 | - |
Since the original validation set was used as the test set for evaluation, the performance of the comparison models are also recorded based on their validation set scores.
All images should be placed under the data/img/ directory (e.g. data/img/01329.png).
Generate the .env file based on .env.example.
pip install -r requirements.txt
python utils/load_clip.py
python utils/preprocess.pypython train.py --config-name capython train_trm.py --config-name trmCheck trm_pseudo for more details.
- Korean: Tech Blog
- English: Academic Format


