GitHub - rampa069/PhnRec: Phoneme recognizer based on long temporal context (with ALIZE VAD command added)

Branches Tags

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
PHN_CZ_SPDAT_LCRC_N1500		PHN_CZ_SPDAT_LCRC_N1500
PHN_EN_TIMIT_LCRC_N500		PHN_EN_TIMIT_LCRC_N500
PHN_HU_SPDAT_LCRC_N1500		PHN_HU_SPDAT_LCRC_N1500
PHN_RU_SPDAT_LCRC_N1500		PHN_RU_SPDAT_LCRC_N1500
STKLib		STKLib
doc		doc
licence		licence
test		test
.gitignore		.gitignore
README		README
alaw.cpp		alaw.cpp
alaw.h		alaw.h
atlas.def		atlas.def
atlas.dll		atlas.dll
cblas.h		cblas.h
config.h		config.h
configz.cpp		configz.cpp
configz.h		configz.h
decoder.cpp		decoder.cpp
decoder.h		decoder.h
dspc.cpp		dspc.cpp
dspc.h		dspc.h
encode.cpp		encode.cpp
encode.h		encode.h
es.rec		es.rec
es.wav		es.wav
fexp.h		fexp.h
filename.cpp		filename.cpp
filename.h		filename.h
fsm.cpp		fsm.cpp
fsm.h		fsm.h
fsmcache.cpp		fsmcache.cpp
fsmcache.h		fsmcache.h
getopt.cpp		getopt.cpp
getopt.h		getopt.h
gptrans.cpp		gptrans.cpp
gptrans.h		gptrans.h
history.txt		history.txt
kwsnetg.cpp		kwsnetg.cpp
kwsnetg.h		kwsnetg.h
lexicon.cpp		lexicon.cpp
lexicon.h		lexicon.h
lwfsource.cpp		lwfsource.cpp
lwfsource.h		lwfsource.h
makefile.lin		makefile.lin
makefile.win		makefile.win
makefile_noblas.lin		makefile_noblas.lin
makefile_noblas.win		makefile_noblas.win
makefile_phnrec.lin		makefile_phnrec.lin
makefile_phnrec.win		makefile_phnrec.win
makefile_phnrec.win_gcc_atlas		makefile_phnrec.win_gcc_atlas
matrix.h		matrix.h
melbanks.cpp		melbanks.cpp
melbanks.h		melbanks.h
myrand.cpp		myrand.cpp
myrand.h		myrand.h
netgen.cpp		netgen.cpp
netgen.h		netgen.h
nn.cpp		nn.cpp
nn.h		nn.h
norm.cpp		norm.cpp
norm.h		norm.h
phndec.cpp		phndec.cpp
phndec.h		phndec.h
phndecalize.cpp		phndecalize.cpp
phnrec.cpp		phnrec.cpp
phnrec.exe		phnrec.exe
phntrans.cpp		phntrans.cpp
phntrans.h		phntrans.h
phntranscheck.cpp		phntranscheck.cpp
phntranscheck.h		phntranscheck.h
plp.cpp		plp.cpp
plp.h		plp.h
soundcard.h		soundcard.h
srec.cpp		srec.cpp
srec.h		srec.h
stkinterface.cpp		stkinterface.cpp
stkinterface.h		stkinterface.h
sxmlparser.cpp		sxmlparser.cpp
sxmlparser.h		sxmlparser.h
test.bat		test.bat
test.raw		test.raw
test.rec		test.rec
test.rec.org		test.rec.org
test.sh		test.sh
test.wav		test.wav
test_en.rec		test_en.rec
test_hu.rec		test_hu.rec
test_ru.rec		test_ru.rec
thresholds.cpp		thresholds.cpp
thresholds.h		thresholds.h
traps.cpp		traps.cpp
traps.h		traps.h
vadalize.cpp		vadalize.cpp
vadalize.exe		vadalize.exe
wfsource.cpp		wfsource.cpp
wfsource.h		wfsource.h

Repository files navigation

The phoneme recognizer was developed at Brno University of Technology, Faculty of Information Technology and was successfully applied to tasks including language identification [4], indexing and search of audio records, and keyword spotting [5]. The main purpose of this distribution is research. Outputs from this phoneme recognizer can be used as a baseline for subsequent processing, as for example phonotactic language modeling.


Authors:
Petr Schwarz , Pavel Matejka, Lukas Burget, Ondrej Glembek

Description
Split temporal context (STC) [1, 2, 3] based feature extraction
Neural network classifiers
Viterbi algorithm is used for phoneme string decoding
English systems was trained on the TIMIT database
Czech, Hungarian and Russian systems were trained on the SpeechDat-E databases



Compilation:
The source code has been successfully compiled under Linux (GCC) and under Windows (MinGW32). The program can be compiled with or without BLAS (Basic Linear Algebra Subprograms) for acceleration. The ATLAS (Automatically Tuned Linear Algebra Software) is used in this case.
Compilation under Linux with BLASsupport
make -f makefile.lin
Compilation under Linux without BLASsupport
make -f makefile_noblas.lin
Compilation under Windows with BLAS support
make -f makefile.win
Compilation under Windows without BLASsupport
make -f makefile_noblas.win

How to:
set the recognition system
phnrec -c PHN_CZ_SPDAT_LCRC_N1500|PHN_HU_SPDAT_LCRC_N1500|PHN_RU_SPDAT_LCRC_N1500|
PHN_EN_TIMIT_LCRC_N500
set the input format:
phnrec -c PHN_EN_TIMIT_LCRC_N500 -w alaw|lin16
set input and output filesThe output is the HTK label file or Master Label File (MLF). Input is either speech file or a list of files. The recognizer can also save intermediate results like Mel-banks or posteriors. Saving of intermediate results can for example significantly speed-up tuning of word insertion penalty. 
phnrec -c PHN_EN_TIMIT_LCRC_N500 -l list -m out.mlf
#!MLF!#
"*/faem0.rec"
000000 1300000 pau
1300000 2000000 ah
2000000 3500000 s
3500000 4500000 ih

phnrec -c PHN_EN_TIMIT_LCRC_N500 -i input.raw -o output.rec
change the word (phoneme) insertion penalty:
phnrec -c PHN_EN_TIMIT_LCRC_N500 -i input.raw -o output.rec -p -3.0
Systems:
PHN_CZ_SPDAT_LCRC_N1500 - 8kHz, 2 block STC, trained on Czech SpeechDat-E, 15 banks, 31 points, the DCT is applied on each temporal vector to reduce its size to 11 values, 1500 neurons in all nets
PHN_HU_SPDAT_LCRC_N1500 - 8kHz, 2 block STC, trained on Hungarian SpeechDat-E, 15 banks, 31 points, the DCT is applied on each temporal vector to reduce its size to 11 values, 1500 neurons in all nets
PHN_RU_SPDAT_LCRC_N1500 - 8kHz, 2 block STC, trained on Russian SpeechDat-E, 15 banks, 31 points, the DCT is applied on each temporal vector to reduce its size to 11 values, 1500 neurons in all nets
PHN_EN_TIMIT_LCRC_N500 - 16kHz, 2 block STC, trained on TIMIT, 15 banks, 31 points, the DCT is applied on each temporal vector to reduce its size to 11 values, 500 neurons in all nets

Note: The Czech, Hungarian and Russian SpeechDat systems were used in NIST LRE2005.
Results obtained by this system can slightly differ from published ones due to implementation.
Licence:
Source codes and binaries can be redistributed and/or modified under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. Model files (directories PHN_CZ_SPDAT_LCRC_N1500, PHN_HU_SPDAT_LCRC_N1500, PHN_RU_SPDAT_LCRC_N1500, PHN_EN_TIMIT_LCRC_N500) can be used for research and educational purposes only. For any other use, please contact Jan Cernocky.