Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tools to create acoustic models #26

Open
pdtwonotes opened this issue Apr 14, 2016 · 4 comments
Open

Tools to create acoustic models #26

pdtwonotes opened this issue Apr 14, 2016 · 4 comments

Comments

@pdtwonotes
Copy link

I need to create an acoustic model for Julius in English. The README says that Julius will accept AMs either in HTK format or in ARPA format.

The HTK toolkit looks very old; last web site update was in 2009. I can not get it to build properly on my x86-64but Linux system.

What tools are there for creating models in ARPA format?

While VoxForge claims to have English models for Julius, that website can not actually deliver the files - speed slows to zero.

@colbec
Copy link

colbec commented Apr 14, 2016

Make sure you have the right site for HTK (http://htk.eng.cam.ac.uk/) - there is a note about the recently released beta 3.5 version which happened Dec 2015, so your source seems to be out of date.

I run openSUSE Leap 42.1 64 bit and have no problem compiling HTK. You might have to specify what error you see.

There are several tools for building models in ARPA and converting between formats. They tend to have minor differences in output, it depends what you are looking for. Try a google search for "language model generator."

Voxforge is occasionally slow, but right now it is ok for me. I selected a 4 MB file from the downloads section and it completed in 16 seconds on my slow connection.

@palles77
Copy link

I agree with colbec. HTK is old in some places, however there is a beta version available. I have been using HTK for years now for both language and acoustic modelling. Best way is to follow HTK tutorials provided in Voxforge for acoustic modeling and HTK tutorials for language modeling.

You need to be aware that creating a decent acoustic model is a non trivial process and you need to consider how much effort you are prepared to put into it. I myself have a few English UK models from my own experiments in the past, but their quality is not the best (around 25% WER).

@pdtwonotes
Copy link
Author

Since colbec reported that VoxForge downloads worked, on a hunch I created a VPN tunnel out of my local area and tried again. I was able to download the English model in just a few seconds. So something is wrong with my local ISP.

At first glance the VoxForge model appeared to work, and a quite large pronounciation dictionary was included. Unfortunately, the hmmdef file is missing many of the triphones that the dictionary uses.

@colbec
Copy link

colbec commented Apr 21, 2016

One of the downsides of a phone based model is that triphone possibilities are of the order of N^3; in English this might mean 40^3 or 64000 triphone candidates. It is really hard to exercise them all, even the most commonly used ones unless you are working with a very large audio database. This is made harder by trying to get a wide variety of voices. Sometimes you can bend your requirements for additional words by substituting phones that the model is aware of.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants