Code for Self-Lifting: A Novel Framework For Unsupervised Voice-Face Association Learning,ICMR,2022
faiss==1.7.1
pytorch==1.8.1
pytorch-metric-learning==0.9.96
wandb==0.12.10
Download file from Baidu Disk (code:9d0a
) or GoogleDrive and unzip it to the project root.
The dataset
folder structure is shown below:
dataset/
└── voxceleb
├── cluster
│ ├── movie2jpg_path.pkl
│ ├── movie2wav_path.pkl
│ └── train_movie_list.pkl
├── eval
│ ├── test_matching_10.pkl
│ ├── test_matching_g.pkl
│ ├── test_matching.pkl
│ ├── test_retrieval.pkl
│ ├── test_verification.pkl
│ ├── test_verification_g.pkl
│ └── valid_verification.pkl
├── face_input.pkl
└── voice_input.pkl
1. Train Self-Lifting Framework:
python sl.py
2. Train a baseline:
python baseline/1_ccae.py
python baseline/2_deepcluster.py
python baseline/3_barlow.py
use wandb to view the training process:
-
Create
wb_config.json
file in the./configs
folder, using the following content:{ "WB_KEY": "Your wandb auth key" }
-
add
--dryrun=False
to the training command, for example:python sl.py --dryrun=False
You can get the final model checkpoints at here (code:4ae6
).
The Inception-V1 model is based on facenet_pytorch.
The ECAPA-TDNN model is based on SpeechBrain. While this model is trained with Vox1+Vox2, thus we retrained one only with Vox2. The checkpoint can be found here.
We also offer demo scripts for extracting the embeddings in scripts/
.