🎙️ Speech Command Recognition System

A deep learning CNN trained to classify spoken commands in real time — achieving 85% validation accuracy on the Google Speech Commands dataset.

🧠 What It Does

This system listens to short spoken commands and classifies them into one of several categories in real time. It demonstrates how audio signals can be transformed into visual representations (spectrograms) and fed into a CNN — the same technique used in voice assistants and smart devices.

Supported commands: yes · no · up · down · left · right · on · off · stop · go

🔍 How It Works

Raw audio is not directly fed into the CNN. Instead it goes through a transformation pipeline:

Raw Audio (.wav)
      │
      ▼
Mel Spectrogram
(audio → image representation)
      │
      ▼
CNN Classifier
(treats spectrogram like an image)
      │
      ▼
Predicted Command + Confidence Score

Converting audio to a Mel Spectrogram is the key insight — it turns a 1D audio signal into a 2D image that a CNN can process with the same power it uses for visual recognition.

📊 Model Performance

Metric	Score
Validation Accuracy	85%
Architecture	Deep CNN (PyTorch)
Dataset	Google Speech Commands
Input Format	Mel Spectrogram
Commands	10 classes

🏗️ Project Structure

speech-command-recognition/
├── data/                        # Google Speech Commands dataset
├── notebooks/
│   └── training.ipynb           # EDA, preprocessing, training
├── model/
│   ├── train.py                 # Training pipeline
│   ├── cnn_model.py             # CNN architecture
│   └── model.pth                # Saved model weights
├── utils/
│   └── audio_processing.py      # Mel spectrogram transformation
├── app.py                       # Streamlit web app (real-time input)
├── requirements.txt
└── README.md

🧰 Tech Stack

Component	Technology
Deep Learning	PyTorch
Audio Processing	Librosa, torchaudio
Model Architecture	CNN
Web App	Streamlit
Visualisation	Matplotlib
Language	Python 3.11+

🛠️ Setup

git clone https://github.com/mustaphy666/speech-command-recognition.git
cd speech-command-recognition
pip install -r requirements.txt
streamlit run app.py

Note: Microphone access is required for real-time recognition mode.

📁 Dataset

Uses the Google Speech Commands dataset — a collection of 65,000 one-second audio clips of 30 short words.

👤 Author

Saheed Mustapha Olatunji

GitHub: @mustaphy666
LinkedIn: mustapha-saheed

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
README.md		README.md
requirements.txt		requirements.txt
specch.py		specch.py
speech.ipynb		speech.ipynb
speech_command_mfcc.pth		speech_command_mfcc.pth

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🎙️ Speech Command Recognition System

🧠 What It Does

🔍 How It Works

📊 Model Performance

🏗️ Project Structure

🧰 Tech Stack

🛠️ Setup

📁 Dataset

👤 Author

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🎙️ Speech Command Recognition System

🧠 What It Does

🔍 How It Works

📊 Model Performance

🏗️ Project Structure

🧰 Tech Stack

🛠️ Setup

📁 Dataset

👤 Author

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages