SpeechCLIP

Code Contributors

Yi-Jen Shih, Hsuan-Fu Wang, Heng-Jui Chang

Prequisite

Install packages

pip install -r requirements.txt

Data Preparation

See Details

Download Pretrained Checkpoints

bash download_ckpts.sh

You chould see Done downloading all checkpoints after the script is executed

Notice that it reuqires 2 GPUs for training base models and 4 GPUs for large models

Usage

Remember to check the dataset_root

Train

Example: train Parallel SpeechCLIP base:

bash egs/model_base/parallel/train.sh

Inference

Example: test Parallel SpeechCLIP base: (Using pretrained checkpoint)

bash egs/model_base/parallel/test.sh

For more settings, please see the folders in ./egs/.

Getting embeddings from SpeechCLIP

See example.py

Citation

@article{speechclip2022,
  title={SpeechCLIP: Integrating Speech with Pre-Trained Vision and Language Model},
  author={Yi-Jen Shih and Hsuan-Fu Wang and Heng-Jui Chang and Layne Berry and Hung-yi Lee and David Harwath},
  journal={IEEE SLT},
  year={2022},
  publisher={IEEE}
}

Contribute

Please run autoformatter before opening PR! Autoformat ./dev-support/

Name		Name	Last commit message	Last commit date
Latest commit History 184 Commits
.github/workflows		.github/workflows
avssl		avssl
config/speechCLIP		config/speechCLIP
data		data
dev-support		dev-support
egs		egs
test		test
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
download_ckpts.sh		download_ckpts.sh
download_dataset.sh		download_dataset.sh
example.py		example.py
model_summary.jpg		model_summary.jpg
requirements.txt		requirements.txt
run_task.py		run_task.py
run_test.sh		run_test.sh

License

ShampooWang/SpeechCLIP_test

Folders and files

Latest commit

History

Repository files navigation

SpeechCLIP

Code Contributors

Prequisite

Install packages

Data Preparation

Download Pretrained Checkpoints

Usage

Train

Inference

Getting embeddings from SpeechCLIP

Citation

Contribute

About

Resources

License

Stars

Watchers

Forks

Languages