Skip to content
This repository has been archived by the owner on Jun 24, 2023. It is now read-only.

Latest commit

 

History

History
executable file
·
40 lines (24 loc) · 1.32 KB

README.md

File metadata and controls

executable file
·
40 lines (24 loc) · 1.32 KB

Urban Sounds Classification

Date

Final project for the Machine Learning and AI ID tech camp.

High Level Overview

There are 8732 .wav files of 10 different urban sounds like dog barks, car horns, gun shots, etc. The dataset is divided in 10 folds (folders) to make the train and test easier. I used fold 1-9 to train the model, and fold 10 to test it. A custom CNN is used to classify the sounds.

The sound features used in the CNN are:

  • MFCC: Mel-frequency cepstral coefficients that use a quasi-logarithmic spaced frequency scale, which is more similar to how the human auditory system processes sounds.
  • Melspectrogram: Compute a Mel-scaled power spectrogram. Based on human ear.
  • chroma-stft: Compute a chromagram from a waveform or power spectrogram. Uses pitches.
  • chroma_cq: Constant-Q chromogram. Uses pitches.
  • chroma_cens: Chroma Energy Normalized CENS. Uses pitches.

Tech Stack

  • Python 3
  • Keras
  • Pandas
  • Librosa

Results

Test accuracy: 70%

Validation accurarcy: 90%

Reflection

As seen in the above results, the model is clearly overfitting. See more in my reflection on this project.

Useful Links

Dataset

Vlog I used as reference and inspiration