Markus Toman edited this page Jan 4, 2016 · 7 revisions

SALB

Usage

API

Code sample:

#include "TTSManager.h"

using namespace htstts;
 
int main() {
   std::string input = "Hello world.";
   TTSManager tts;

   FragmentPropertiesPtr properties = std::make_shared<FragmentProperties>();         
   (*properties)[PROPERTY_KEY_SYNTHESIZER] = PROPERTY_VALUE_HTSENGINE;
   (*properties)[PROPERTY_KEY_TEXTANALYZER] = PROPERTY_VALUE_AUTOMATIC;  
   (*properties)[PROPERTY_KEY_TEXTANALYZER_RULES] = "leo.rules";
   (*properties)[PROPERTY_KEY_VOICE_PATH]  = "leo.htsvoice";
   (*properties)[PROPERTY_KEY_LANGUAGE]    = "de-at"; 
   (*properties)[PROPERTY_KEY_VOICE_NAME]  = "Leo";

   // create a text fragment with given input text and properties
   TextFragmentPtr tf = std::make_shared<TextFragment>(input, properties);
   TTSResultPtr result = tts.SynthesizeTextFragment(tf);
   save_result_riff(result, "out.wav");

   return 0;
}

Possible property key-value string pairs can be found in engine/manager/include/common.h:

#define PROPERTY_KEY_SYNTHESIZER "synthesizer"      ///< property defining which synthesizer to use
#define PROPERTY_KEY_TEXTANALYZER "tanalyzer"       ///< property defining which text analyzer to use
#define PROPERTY_KEY_TEXTANALYZER_RULES "trules"    ///< property defining path to a file with text analysis    rules
#define PROPERTY_KEY_LANGUAGE "lang"                ///< property defining the language of the text fragment
#define PROPERTY_KEY_VOICE_NAME "vName"             ///< property defining the name of the voice to use
#define PROPERTY_KEY_VOICE_PATH "vPath"             ///< property defining the path to the voice model
#define PROPERTY_KEY_VOLUME "vol"                   ///< property defining the synthesis volume (0-100)
#define PROPERTY_KEY_RATE   "rate"                  ///< property defining the synthesis speaking rate (~0.5-2.0)
#define PROPERTY_KEY_PITCH  "pitch"                 ///< property defining the synthesis pitch (0.0 - no change)

#define PROPERTY_VALUE_AUTOMATIC "automatic"        ///< property value for automatic choices
#define PROPERTY_VALUE_FLITE "flite"                ///< property value for flite as text analyzer
#define PROPERTY_VALUE_INTERNAL "internal"          ///< property value for using the internal text analyzer
#define PROPERTY_VALUE_HTSENGINE "htsengine"        ///< property value for hts_engine as synthesis engine

Notes:

  • PROPERTY_KEY_SYNTHESIZER defines the synthesizer to use, currently only PROPERTY_VALUE_HTSENGINE for hts_engine.
  • PROPERTY_KEY_TEXTANALYZER defines the text analysis to use, currently this can be either PROPERTY_VALUE_FLITE to use flite or PROPERTY_VALUE_INTERNAL to use the internal text analyzer. PROPERTY_VALUE_AUTOMATIC selects either flite or the internal text analyzer based on the language used.
  • PROPERTY_KEY_TEXTANALYZER_RULES is only needed for the internal text analyzer, as flite contains everything for english in code.
  • PROPERTY_KEY_LANGUAGE is a language shortcut. Examples: "en-us", "en-gb", "de-at".

Creating and using a custom voice

A straightforward approach is to use the "Speaker dependent training demo/English/Normal demo" package at HTS 2.3 alpha, direct download: http://hts.sp.nitech.ac.jp/archives/2.3alpha/HTS-demo_CMU-ARCTIC-SLT.tar.bz2 . This results in an htsvoice voice model file that can be used with the SALB system. Please note that the phone set for English should be the same as used in the demo package, so it works with the flite text analysis.

Adding a new language

Voice model

A voice model for a new language can be trained as described previously. It is important to use the label format as specified in the HTS demo packages (you can also find it here).

Text analysis rules

The SALB system uses lexica and LTS rules in Festival format. Take a look at the provided data/leo.rules (another example for reference.

The first line is always "MNCL", then the lexicon follows. The last lexicon entry is followed by a line containing only "ENDLEX".

The following LTS rules have to be in the form:

;; LTS rules 
(set! XYZ '
... RULES ...
)

as produced by several helper tools for Festival.

Text normalization

Once you have the model and text rules file, you can already try to synthesize a text sentence using the command line interface. Note to use a language shortcut not equal to "en-*", so that the internal text analyzer is used instead of flite. Given an unknown language shortcut, the text normalization just passes each word through without modification.

A good starting point to add a Normalizer for your language is by copying text/internal/src/AustrianGermanNormalizer.cpp and text/internal/include/AustrianGermanNormalizer.h, adapt them for your language and add the call to text/internal/src/Normalizer.cpp. But basically you can call any function or method from text/internal/src/Normalizer.cpp, you just have to return the normalized string from Normalizer::Normalize if the language shortcut matches your new language.

Clone this wiki locally
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.
Press h to open a hovercard with more details.