2022 01 26

Jump to bottom

Dan Oneață edited this page Feb 1, 2022 · 4 revisions

Should we upload the JSTPS paper on arXiv?
The previous idea of using the CLIP model to train an audio encoder has been already explored in a couple of recent papers:
- Wu, Ho-Hsiang, et al. "Wav2CLIP: Learning Robust Audio Representations From CLIP." arXiv preprint arXiv:2110.11499 (2021). link
- Zhao, Yanpeng, et al. "Connecting the Dots between Audio and Text without Parallel Data through Visual Knowledge Transfer." arXiv preprint arXiv:2112.08995 (2021). link
Discuss new direction for future work:
- Apply our framework for language documentation (the Yorùbá language)