DESED Classification

Sound Event Classification for DESED Dataset and Activities of Daily Living Dataset with Deep Learning Models (Conv1D, Conv2D, LSTM) This code is based on YOUTUBE video

Dataset

There are two datasets used in the project

DESED Dataset

Domestic Environment Sound Event Detection Dataset is provided by DCASE for evaluating systems for the detection of sound events using weakly labeled data. DESED consists of 10 different classes: alarm_bell_ringing, blender, cat, dishes, dog, electric_shaver_toothbrush, frying, running_water, speech, vacuum_cleaner. You can download DESED from this website.

Soundbank: Foreground and background soundbanks are synthesized and augmented with Scaper to produce synthesized soundscapes
Synthesized soundscapes: Mixture and foreground and background soundbanks which are strongly labeled
Recorded soundscapes: Real recorded dataset from Audioset which are unlabeled/weakly labeled/strongly labeled

This project used only strongly labeled data(foreground soundscape, strongly labeled recorded soundscapes) for training.

Thingy:52 Recorded Dataset

Thingy:52 is a multi-sensor prototyping platform including microphone which supports BLE. This project collected sounds generated from activities of daily living in real domestic environment with 3 residents with Thingy:52.
Recorded sounds are annotated with Audacity into 10 different classes: toilet, shower, wash, brush_teeth, dry_hair, cook, eat, wash_dish, watch_tv, vacuum_cleaner. Each classes contain many sound events for example, toilet class can contain sounds such as toilet flush, fart, and so on. Length of labeled sound data varies from seconds to minutes.

Data Integration

DESED dataset and Thingy:52 dataset are integrated for training to classify four different sound events: dishes, frying, running_water, vacuum_cleaner. Dataset are divided into three groupset for training and integrated Thingy:52 dataset is used for evaluation.

Soundbank Trained Model: Foreground soundbank is used for training and Thingy:52 recorded dataset is used for evaluation
Recorded Trained Model: Recorded sounscapes (validation+test) is used for training and Thingy:52 recorded dataset is used for evaluation
Thingy:52 Trained Model: 80% of Thingy:52 recorded datset is used for training and 20% is used for evaluation

Data Process

All audio files are downsampled to 16kHz and enveloped with threshold magnitude of 0.003. Files are sliced into 1 second delta time and saved in each class directories into samples.

Models

Models include Conv1D, Conv2D, and LSTM. 128 log mel-banks are extracted with 25ms window frame and a stride of 10ms.

Train

Models are selected and trained with training samples. Trained models are saved in models directory. Accuracy and loss histories are saved in logs directory.

Predict

Predictions are made with evaluation samples and saved in logs directory as numpy array. Prediction logs are used for confusion matrix.

Confusion Matrix

Soundbank Trained Model

Conv1D

Conv2D

LSTM

Recorded Trained Model

Conv1D

Conv2D

LSTM

Thingy:52 Trained Model

Conv1D

Conv2D

LSTM

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
logs		logs
models		models
.gitignore		.gitignore
Data Process.py		Data Process.py
Mel Spectrogram.py		Mel Spectrogram.py
Models.py		Models.py
Plot History.py		Plot History.py
Plot Signals.py		Plot Signals.py
Predict & Confusion Matrix (1sec).py		Predict & Confusion Matrix (1sec).py
Predict.py		Predict.py
README.md		README.md
Train.py		Train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DESED Classification

Dataset

DESED Dataset

Thingy:52 Recorded Dataset

Data Integration

Data Process

Models

Train

Predict

Confusion Matrix

Soundbank Trained Model

Recorded Trained Model

Thingy:52 Trained Model

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

DESED Classification

Dataset

DESED Dataset

Thingy:52 Recorded Dataset

Data Integration

Data Process

Models

Train

Predict

Confusion Matrix

Soundbank Trained Model

Recorded Trained Model

Thingy:52 Trained Model

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages