Level: Introductory
Target group
First year Research Master’s students in Linguistics. Participants have a broad general knowledge of linguistics, and are familiar with basic math, as well as with computer programming at a basic level: ideally, they are able to understand and adapt simple Python scripts. They do not have previous experience or knowledge of deep learning or neural models.
Course description
This course briefly introduces students to current deep-learning-based (neural) approaches to modeling spoken language. Students learn the fundamental concepts underlying deep learning and study in some detail how it is applied to modeling and simulating the acquisition and processing of spoken language. The course covers the most important recent research and focuses on two families of approaches: (i) self-supervised representation learning (ii) visually grounded modeling. Students learn how to apply pre-trained models to new utterances, and extract, evaluate and analyze representations produced by the models.
Preparatory reading material:
-
Grzegorz Chrupała. 2023. Putting Natural in Natural Language Processing. In Findings of the Association for Computational Linguistics: ACL 2023, pages 7820–7827, Toronto, Canada. Association for Computational Linguistics. http://dx.doi.org/10.18653/v1/2023.findings-acl.495
-
Ian Goodfellow, Yoshua Bengio and Aaron Courville. 2016. Deep Learning. MIT Press. Chapter 1: https://www.deeplearningbook.org/contents/intro.html
Monday: Fundamental concepts of neural modeling. Multi-layer perceptrons, convolutional and recurrent networks.
Slides: deep learning
Reading:
- Ian Goodfellow, Yoshua Bengio and Aaron Courville. 2016. Deep Learning. MIT Press. Chapter 6: https://www.deeplearningbook.org/contents/mlp.html
Slides: transformers
Reading:
- Vaswani, A., Shazeer, N.M., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L., & Polosukhin, I. (2017). Attention is All you Need. NIPS. https://proceedings.neurips.cc/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf
Slides: Self-supervised
Reading:
- Mohamed, A., Lee, H., Borgholt, L., Havtorn, J.D., Edin, J., Igel, C., Kirchhoff, K., Li, S., Livescu, K., Maaløe, L., Sainath, T.N., & Watanabe, S. (2022). Self-Supervised Speech Representation Learning: A Review. IEEE Journal of Selected Topics in Signal Processing, 16, 1179-1210. https://arxiv.org/pdf/2205.10643.pdf
Assignment: Quiz
Slides: Visually_grounded
Reading:
- Chrupała, G. (2022). Visually grounded models of spoken language: A survey of datasets, architectures and evaluation techniques. Journal of Artificial Intelligence Research, 73, 673-707. https://doi.org/10.1613/jair.1.12967
Assignment: Programming exercise
Slides: Video
Reading:
-
Nikolaus, M., Alishahi, A., & Chrupała, G. (2022). Learning English with Peppa Pig. Transactions of the Association for Computational Linguistics, 10, 922-936. https://doi.org/10.1162/tacl_a_00498
-
Peng, P., Li, S., Rasanen, O., Mohamed, A., & Harwath, D.F. (2023). Syllable Discovery and Cross-Lingual Generalization in a Visually Grounded, Self-Supervised Speech Model. In Interspeech. https://doi.org/10.21437/Interspeech.2023-2044
Final assignment: Project report
The preparatory and final assignments will be evaluated on a pass/fail basis. With respect to the homework assignments during the course, we will check whether students hand in the assignments and, discuss the assignment in class. The students will receive the preparatory assignment 4 weeks before the start of the school (by e-mail, CC to LOT@uva.nl). The homework assignments during the course and the final assignment will be handed out in class.