Skip to content

guyeshet/keras-accent-trainer

 
 

Repository files navigation

Keras Accent Trainer CometML

About

Every individual has their own dialects or mannerisms in which they speak. This project revolves around the detection of backgrounds of every individual using their speeches. The goal in this project is to classify various types of accents, specifically foreign accents, by the native language of the speaker. This project allows to detect the demographic and linguistic backgrounds of the speakers by comparing different speech outputs with the speech accent archive dataset in order to determine which variables are key predictors of each accent. The speech accent archive demonstrates that accents are systematic rather than merely mistaken speech. Given a recording of a speaker speaking a known script of English words, this project predicts the speaker’s native language.

Dataset

All of the speech files used for this project come from the Speech Accent Archive, a repository of spoken English hosted by George Mason University. Over 2000 speakers representing over 100 native languages read a common elicitation paragraph in English:

'Please call Stella. Ask her to bring these things with her from the store: Six spoons of fresh
snow peas, five thick slabs of blue cheese, and maybe a snack for her brother Bob. We also need 
a small plastic snake and a big toy frog for the kids. She can scoop these things into three red 
bags, and we will go meet her Wednesday at the train station.'

The dataset contained .mp3 audio files which were converted to .wav audio files which allowed easy extraction of the MFCC (Mel Frequency Cepstral Coefficients) features to build a 2-D convolution neural network.

The MFCC was fed into a 2-Dimensional Convolutional Neural Network (CNN) to predict the native language class.

Running The Demo Project

Comet.ml Integration

We added integration to Comet.ml which allows you to see all your hyper-params, metrics, graphs, dependencies and more including real-time metric.

Add your API key in the configuration file:

For example: "{"api": {"comet": {"api_key": "your key here"}}}

Project Architecture

Training configurations

  • all_english_speakers.json
    • The training set contains all speaksers, where USA natives are in one class and all others are matched to the other class as foreign speakers
  • usa_english_speakers.json
    • The training set contains only USA natives as one class, without any other english speakers.

Dockerized environment

Use keras_accent_deployment for the fully dockerized environment

Local Installation

  1. First we need to download locally the accent audio files
    pip install -r requirements.txt
    

Project Execution

  1. First we need to download locally the accent audio files. Its a long operation that downloads over 1000 audio files
    python accent_dataset/create.py
    
  2. Train the model by the requested configuration. At first it's a long process, as we need to convert the wav files into MFCC. The MFCCs are cached so future trainings are faster:
    python train_from_config.py -c configs/usa_english_speakers.json
    

Acknowledgements

This project template is based on Ahmkel's Keras Project Template.

The model and implementation is inspired by yatharthgarg's Speech-Accent-Recognition.

About

A project template to simplify building and training deep learning models using Keras.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Python 98.7%
  • Dockerfile 1.2%
  • Shell 0.1%