Lab1: Feature extraction
• compute MFCC features step-by-step • examine features • evaluate correlation between feature • compare utterances with Dynamic Time Warping • illustrate the discriminative power of the features with respect to words • perform hierarchical clustering of utterances • train and analyze a Gaussian Mixture Model of the feature vectors.
Lab2: Hidden Markov Models with Gaussian Emissions
• combine phonetic HMMs into word HMMs using a lexicon • implement the forward-backward algorithm, • use it compute the log likelihood of spoken utterances given a Gaussian HMM • perform isolated word recognition • implement the Viterbi algorithm, and use it to compute Viterbi path and likelihood • compare and comment Viterbi and Forward likelihoods • implement the Baum-Welch algorithm to update the parameters of the emission probability distributions
Lab3: Phoneme Recognition with Deep Neural Network
Train and test a phone recogniser based on digit speech material from the TIDIGIT database:
• using predefined Gaussian-emission HMM phonetic models, create time aligned phonetic transcriptions of the TIDIGITS database • define appropriate DNN models for phoneme recognition using Keras • train and evaluate the DNN models on a frame-by-frame recognition score • repeat the training by varying model parameters and input features
Project: Automatic music genre classification using deep learning technologies
Music genre classification based on CNN and LSTM netwotks