Skip to content

This repository contains a short introduction on the topic of audio and speech processing -- from basics to applications.

License

Notifications You must be signed in to change notification settings

koudounasalkis/Audio-Speech-Tutorial

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Audio & Speech Tutorial

This repository contains a short introduction on the topic of audio and speech processing, from the very basics to the numerous applications.

You can open the notebook directly in colab here:

Open In Colab

This notebook briefly shows:

  • What is a sound and how do we represent it digitally
  • How we prepare audio data for a deep learning model
  • Time Domain vs Frequency Domain
  • Spectrum and Spectrograms
  • Data Augmentation Techniques
  • Deep Learning Models with some examples
    • Speaker Identification
    • Emotion Recognition
    • Intent Classification
    • Keyword Spotting
    • Speech-to-Text
    • Text-to-Speech
    • Speech Translation
    • Music Generation
    • LLMs for Speech: Gemini

If you want to get in touch with me, you can send me an email at alkis.koudounas@polito.it.