MSMD-AVT

MSMD-AVT：Multi-Stage Multimodal Distillation for Audio-Visual Speaker Tracking

Requirements

AV16.3: the original dataset, available at http://www.glat.info/ma/av16.3/
For training the MSMD-AVT, you should prepare the audio-visual samples:
- tools/prepareAudio.py, prepare_gccphat.py
- tools/prepareSample.py, prepareAusample

Train

To train MSMD-AVT, you need to download seq01, 02, 03and camera parameters from the AV16.3 dataset. Use the preprocessing files provided in the tools for audio and video synchronization, audio preprocessing and prepare audio-visual sample pairs for training
After preparing the dataset and training samples, set the path of the correct image samples and GCF samples path in tain_auto.py and run it.

Tracking

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
models		models
tools		tools
.gitignore		.gitignore
README.md		README.md
test_auto.py		test_auto.py
train_auto.py		train_auto.py
visualnet_e50.pth		visualnet_e50.pth