Skip to content

2022 01 26

Dan Oneață edited this page Feb 1, 2022 · 4 revisions
  • Should we upload the JSTPS paper on arXiv?
  • The previous idea of using the CLIP model to train an audio encoder has been already explored in a couple of recent papers:
    • Wu, Ho-Hsiang, et al. "Wav2CLIP: Learning Robust Audio Representations From CLIP." arXiv preprint arXiv:2110.11499 (2021). link
    • Zhao, Yanpeng, et al. "Connecting the Dots between Audio and Text without Parallel Data through Visual Knowledge Transfer." arXiv preprint arXiv:2112.08995 (2021). link
  • Discuss new direction for future work:
    • Apply our framework for language documentation (the Yorùbá language)

Clone this wiki locally