SPINDLE Speech to text automatic transcription

We used CMU Sphinx4 during the SPINDLE project to generate automatic transcription of our podcasts. Please find below the instructions to set up CMU Sphinx4 in Large Vocabulary Continuous Speech Recognition mode.

  • Download and install the source version of CMU Sphinx4 to $SPHINX_INSTALL_DIRECTORY. Instructions can be found at their website and here is the access to their forums.

  • Download HUB4 acoustic and language models.

  • Copy HUB4 acoustic models to the $SPHINX_INSTALL_DIRECTORY/models/acustic/ directory and HUB4 language models to the $SPHINX_INSTALL_DIRECTORY/models/language/ directory.

  • Download CMUdict dictionary version cmudict.0.7a_SPHINX_40. Copy the file to $SPHINX_INSTALL_DIRECTORY/models/dictionary/.

  • Modify Transcriber.java from src/apps/edu/cmu/sphinx/demo/transcriber/Transcriber.java to show the time stamps for each word in the automatic transcription.


          String resultText = result.getBestResultNoFiller();


          if (result != null){
              System.out.println(result.getTimedBestResult(true, true));
  • Download config.xml and copy to src/apps/edu/cmu/sphinx/demo/transcriber/config.xml

  • Compile from the installation directory:

  • Run the Transcriber.jar program from $SPHINX_INSTALL_DIRECTORY:

      java -mx800m -jar bin/Transcriber.jar file.wav
  • The audio .wav file should be 16khz, 16-bit, 1 channel, little-endian signed integer (lpcm)


  • Configuration may not be optimal. You could adjust some of the parameters (beams, language model weight, word insertion penalty, etc) depending on your task.

  • We used different models to generate our automatic transcription such as a British English dictionary. If you are interested in it please get in contact with us.



