Skip to content

gchrupala/neurospoken

Repository files navigation

Neural models of spoken language

Level: Introductory

Target group

First year Research Master’s students in Linguistics. Participants have a broad general knowledge of linguistics, and are familiar with basic math, as well as with computer programming at a basic level: ideally, they are able to understand and adapt simple Python scripts. They do not have previous experience or knowledge of deep learning or neural models.

Course description

This course briefly introduces students to current deep-learning-based (neural) approaches to modeling spoken language. Students learn the fundamental concepts underlying deep learning and study in some detail how it is applied to modeling and simulating the acquisition and processing of spoken language. The course covers the most important recent research and focuses on two families of approaches: (i) self-supervised representation learning (ii) visually grounded modeling. Students learn how to apply pre-trained models to new utterances, and extract, evaluate and analyze representations produced by the models.

Preparatory reading material:

Monday: Fundamental concepts of neural modeling. Multi-layer perceptrons, convolutional and recurrent networks.

Slides: deep learning

Reading:

Tuesday: Transformers

Slides: transformers

Reading:

Wednesday: Self-supervised representation learning of spoken language

Slides: Self-supervised

Reading:

  • Mohamed, A., Lee, H., Borgholt, L., Havtorn, J.D., Edin, J., Igel, C., Kirchhoff, K., Li, S., Livescu, K., Maaløe, L., Sainath, T.N., & Watanabe, S. (2022). Self-Supervised Speech Representation Learning: A Review. IEEE Journal of Selected Topics in Signal Processing, 16, 1179-1210. https://arxiv.org/pdf/2205.10643.pdf

Assignment: Quiz

Thursday: Introduction to visually grounded models of spoken language

Slides: Visually_grounded

Reading:

  • Chrupała, G. (2022). Visually grounded models of spoken language: A survey of datasets, architectures and evaluation techniques. Journal of Artificial Intelligence Research, 73, 673-707. https://doi.org/10.1613/jair.1.12967

Assignment: Programming exercise

Friday: Dynamic visual grounding via video.

Slides: Video

Reading:

  • Nikolaus, M., Alishahi, A., & Chrupała, G. (2022). Learning English with Peppa Pig. Transactions of the Association for Computational Linguistics, 10, 922-936. https://doi.org/10.1162/tacl_a_00498

  • Peng, P., Li, S., Rasanen, O., Mohamed, A., & Harwath, D.F. (2023). Syllable Discovery and Cross-Lingual Generalization in a Visually Grounded, Self-Supervised Speech Model. In Interspeech. https://doi.org/10.21437/Interspeech.2023-2044

Final assignment: Project report

Assessment

The preparatory and final assignments will be evaluated on a pass/fail basis. With respect to the homework assignments during the course, we will check whether students hand in the assignments and, discuss the assignment in class. The students will receive the preparatory assignment 4 weeks before the start of the school (by e-mail, CC to LOT@uva.nl). The homework assignments during the course and the final assignment will be handed out in class.

About

Neural models of spoken language - LOT Winter school 2024

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published