Implements attention based speech to text transcription using Recurrent Neural Networks (RNNs) / Convolutional Neural Networks (CNNs) and Dense Networks. End-to-end the system transcribes a given speech utterance to its corresponding transcript. This project implements the paper Listen, Attend and Spell with LAS Variant 1. The final performance achieved a perplexity of less than 12 by incorporting teacher-forcing and gumbel noise.
If you are currently enrolled in this course, please refer to Carnegie Mellon University Policy on Academic Integrity here before referring to the any of the repository contents.