This repository contains the models and code for UTDRM, an unsupervised method for training debunked-narrative retrieval models. The approach involves generating topical claims using T5 and ChatGPT claim generators and training the dense retrieval model using MNR loss.
-
Generate Claims:
- Utilize the T5 and ChatGPT claim generators located in the
generators/
directory to generate claims.
- Utilize the T5 and ChatGPT claim generators located in the
-
Convert Dataset and Mine Hard Negatives:
- Convert the dataset into BIER format and mine hard negatives using the scripts available at GPL. This process will generate the
hard-negatives.jsonl
file.
- Convert the dataset into BIER format and mine hard negatives using the scripts available at GPL. This process will generate the
-
Train the Model:
- Find the script for training the dense retrieval model using MNR loss in the
training/
directory.
- Find the script for training the dense retrieval model using MNR loss in the
The best performing models can be accessed via the following Hugging Face model repositories:
If you use the code or models from this repository, please consider citing the following paper:
@article{singh2023utdrm,
title={UTDRM: unsupervised method for training debunked-narrative retrieval models},
author={Singh, Iknoor and Scarton, Carolina and Bontcheva, Kalina},
journal={EPJ Data Science},
volume={12},
number={1},
pages={59},
year={2023},
publisher={Springer}
}
Feel free to reach out if you have any questions or need further assistance.