Skip to content

paboldin/SpeechRecognition

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SpeechRecognition project

Introduction

NOTE: Yes, there is CMU-Sphinx.

This project provides basic "speech" recognition system (digits only) starting from preparing studying sample for Neural Network from given database of sound to software that records audio from system input and tries to categorize it using trained Neural Network.

Project was written as diploma software for my customer.

Neural Network library used is Encog.

Code contains parts of SpectroEdit project and funf-open-sensing-framework.

Workflow

Building training set

First user must prepare database of samples of sounds with following directory structure:

<Dictor>/<Number>/<Samplefile#1>.wav <Dictor>/<Number>/<Samplefile#2>.wav

It is important not to mess these files up, as this will degrade recognition performance greatly.

Main application then uses it to build a database of features, using selected FeatureExtractor. This produces two output files: one with FeatureExtractor parameters and other with .csv file suitable for use in Encog to train a Multi-Layer Perceptron.

Most sucessfull FeatureExtractor is MFCC -- Mel-Frequency Cepstrum Coefficients.

Using neural network

Second step is to run recognizer (either from NetBeans or with appropriate command). Recognizer requires FeatureExtractor paremeters' file as well as file with trained neuron network. GUI is semi-obviously.

About

Speech (digitals only) recognition system

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages