Official implementation for the paper Revisiting Document Representations for Large-Scale Zero-Shot Learning
by Jihyung Kil, Wei-Lun Chao, NAACL 2021.
[Update 03/20/22]: Add environment, visual features and labels of our split, and codes for weighted average semantic represenations and DeViSE*.
Import our conda environment:
conda env create -f ZSL_fv.yaml
conda activate ZSL_fv
The (non) filtered Wikipedia sentences are available on here. Please refer to the related README for more details.
Extract the semantic representations from the (non) filtered sentences:
CUDA_VISIBLE_DEVICES=0 python3 get_sem_rep.py --wiki_set data/21k_true_wiki_sents_vis_sec_clu --pool avg_pool --flt vis_sec_clu --max_seq_len 64 --max_sent all
We use the ResNet visual features (He et al., 2016) provided by (Xian et al., 2018a).
Visual features and labels of our 1K/2-Hop/3-Hop/ALL split.
For AwA2 and aPY, we use visual attributes provided by (Xian et al., 2018a).
Please refer to README on here how to split ImageNet into our settings (i.e., 2-Hop, 3-Hop, ALL).
For AwA2 and aPY, we follow the proposed split provided by (Xian et al., 2018a).
We leverage three Zero-Shot Learning models in our experiments:
- DeViSE (Frome et al., 2013): DeViSE and DeViSE* are based on the implementation from here.
- EXEM (Changpinyo et al., 2020): We use its official implementation.
- HVE (Liu et al., 2020): The official implementation can be found on here.
Weighted Average Semantic Represenations (ac):
-
Train bψ by minimizing the objectives (4) and (5) in the paper (e.g., ε: 0.95, τ: 0.96, BERTp-w, Vissec-clu):
- vis_sec_clu_sem_rep_pre_trained_fv.pt: pre-trained sentence representations filtered by Vissec-clu
- vis_sec_clu_avg_sem_rep_pre_trained_fv/bert_sem.pt: pre-trained average semantic represenations filtered by Vissec-clu
python3 train_b_psi_comb_fv.py --output fine_tune_b_psi --tau 0.96 --type pre_trained --train True --discri discriminate --epochs 5 --split all_data --batch_size 512 --eps 0.95 --lr 1e-4 --sent_rep vis_sec_clu_sem_rep_pre_trained_fv.pt --avg_rep vis_sec_clu_avg_sem_rep_pre_trained_fv/bert_sem.pt
- Obtain the weighted average semantic represenations (ac) after training:
python3 train_b_psi_comb_fv.py --output fine_tune_b_psi --tau 0.96 --type pre_trained --train False --discri discriminate --epochs 1 --split all_data --batch_size 768 --eps 0.95
Train DeViSE*:
python3 DeVise_star.py --data_dir /local/scratch/jihyung --output devise_result --tau 0.96 --type pre_trained --eps 0.95 --bs 768 --split all_data --marg 0.2 --lr 0.0004 --num_epochs 50 --sem_type bert_p_w --sem_rep fine_tune_b_psi_eps_0.95_tau_0.96_pre_trained_fv/all_data_semantic_rep_after_train_epochs_1.pt
If you find the code and data useful, please cite the following paper:
@inproceedings{kil2021revisiting,
title={Revisiting Document Representations for Large-Scale Zero-Shot Learning},
author={Kil, Jihyung and Chao, Wei-Lun},
booktitle={Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies},
pages={3117--3128},
year={2021}
}