ECE 251C - Speech Enhancement using Convolutional - Recurrent Neural Networks and Wavelet Pooling

In this project, we present an end-to-end data-driven system for enhancing the quality of speech signals using a convolutional-recurrent neural network. We present a quantitative and qualitative analysis of our model's performance on a real-world noisy speech dataset and evaluate our proposed system’s performance using several metrics such as SNR, PESQ, etc. We have employed wavelet pooling mechanism instead of max-pooling layer in the convolutional layer and compared the performances of these variants.

Dataset

We use the CSR-WSJ01 dataset for clean signals and use noise recordings from ACE corpus dataset.

The CSR-WSJ dataset is in the wv1/wv2 file format. We use the conversion tools available here to convert to sphere (sph) file format followed by conversion to .wav using this repo. Steps to set up the sphere conversion tool (wv1/2 - sph) -

gcc -o sph2pipe *.c -lm
Add executable to path - PATH="$(pwd):$PATH"

Folder structure for data -

-data 
  -clean_data (contains all dataset from [CSR-WSJ01 dataset](https://catalog.ldc.upenn.edu/LDC93s6a) as sph files)
  -train_set.txt
  -val_set.txt
  -test_set.txt

Model

The two models we implement are based on the following 2 architectures -

Instructions to run code

The 2 models can be trained via train.py configured by yaml file like in configs.
Evalaute model on test set using generateMetrics.py
Use the plot_curves.ipynb to generate loss / evaluation plots.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
code		code
configs		configs
data		data
notebooks		notebooks
plots		plots
ECE 251C Final Presentation.pdf		ECE 251C Final Presentation.pdf
README.md		README.md
Team11_Report.pdf		Team11_Report.pdf
plot_curves.ipynb		plot_curves.ipynb
script.sh		script.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ECE 251C - Speech Enhancement using Convolutional - Recurrent Neural Networks and Wavelet Pooling

Dataset

Model

The two models we implement are based on the following 2 architectures -

Instructions to run code

About

Releases

Packages

Languages

saqib1707/Speech-Enhancement

Folders and files

Latest commit

History

Repository files navigation

ECE 251C - Speech Enhancement using Convolutional - Recurrent Neural Networks and Wavelet Pooling

Dataset

Model

The two models we implement are based on the following 2 architectures -

Instructions to run code

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages