We follow the repositories of contrastive-htc and HDLTex to get the preprocessed datasets in json format file {'token': List[str], 'label': List[str]}.
Please download the origin datasets and pre-process them using the code in the corresponding folder:
- WoS :
cd data/WebOfScience/ & python preprocess_wos.py
- NYT:
cd data/nyt/ & python preprocess_nyt.py
- RCV1-V2:
cd data/rcv1/ & python preprocess_rcv1.py . & python data_rcv1.py
bash run_rcv1.sh
bash run_wos.sh
bash run_nyt.sh
Our Code is based on s2s-ft