1D-Triplet-CNN

PyTorch implementation of the 1D-Triplet-CNN neural network model described in Fusing MFCC and LPC Features using 1D Triplet CNN for Speaker Recognition in Severely Degraded Audio Signals by A. Chowdhury, and A. Ross.

Research Article

Anurag Chowdhury, and Arun Ross, Fusing MFCC and LPC Features using 1D Triplet CNN for Speaker Recognition in Severely Degraded Audio Signals, IEEE Transactions on Information Forensics and Security (2019).

IEEE Xplore: https://ieeexplore.ieee.org/document/8839817

Implementation details and requirements

The model was implemented in PyTorch 1.2.1 using Python 3.6 and may be compatible with different versions of PyTorch and Python, but it has not been tested.

Additional requirements are listed in the ./requirements.txt file.

Usage

Source code and model parameters

The source code of the 1D-Triplet-CNN model can be found in the model subdirectory, and a pre-trained model is available in the trained_models subdirectory.

Dataset

The pre-trained model avilable in the trained_models subdirectory was trained on a subset of Fisher speech corpus obtained from https://catalog.ldc.upenn.edu/LDC2004S13. The training data was also degraded with varying degrees of Babble noise obtained from NOISEX-92 dataset.

Training the 1D-Triplet-CNN model

In order to train a 1D-Triplet-CNN model as described in the research paper, use the 1D-Triplet-CNN implementation given in the models subdirectory. The network attains optimal performance when trained using a triplet learning framework. Read the research paper for more details on training the model.

Testing with the pretrained model

Recommended audio specifications

Usually, 2 seconds of speech audio sampled at 8000KHz is enough to produce reliable speaker recognition results. Longer audio samples will make the recognition task significantly slower with no significant benefits to performance. Audio samples smaller than 1secs with have considerable performance loss.

Usage

Satisfy the requirements listed in the ./requirements.txt file.
Run src/extractFeatures.m in MATLAB R2019a(or newer) to extract MFCC-LPC features from audio files placed in sample_audio subdirectory and save corresponding features as individual .mat files in sample_feature subdirectory.
Run src/test.py in Python 3.6 to evaluate some sample audio pairs for generating speaker verification scores.

Examples

Some usage examples might be added in future.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
images		images
models		models
sample_audio		sample_audio
sample_feature		sample_feature
src		src
trained_models		trained_models
utils		utils
.DS_Store		.DS_Store
._.DS_Store		._.DS_Store
._README.md		._README.md
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

images

images

models

models

sample_audio

sample_audio

sample_feature

sample_feature

src

src

trained_models

trained_models

utils

utils

.DS_Store

.DS_Store

._.DS_Store

._.DS_Store

._README.md

._README.md

LICENSE

LICENSE

README.md

README.md

requirements.txt

requirements.txt

Repository files navigation

1D-Triplet-CNN

Research Article

Implementation details and requirements

Usage

About

Releases

Packages

Languages

License

iPRoBe-lab/1D-Triplet-CNN

Folders and files

Latest commit

History

Repository files navigation

1D-Triplet-CNN

Research Article

Implementation details and requirements

Usage

About

Topics

Resources

License

Stars

Watchers

Forks

Languages