SSML consists of XML-like tags, for example: Did you mean the <emphasis level="strong"><prosody pitch="75">green</prosody></emphasis> beans?
The following markup tags and attributes are recognised:
- xml:base (the value is just passed back as a parameter with the UriCallback() function)
- xml:lang
- xml:lang
- name
- age
- variant
- gender
- rate (
x-slow,slow,medium,fast,x-fastor a percentage such as125%) - volume (
silent,x-soft,soft,medium,loud,x-loud,+1dBor-1dB) - pitch (a number, for example "75")
- range (
default,x-low,low,medium,high,x-high)
- interpret-as="characters"
- interpret-as="characters" format="glyphs"
- interpret-as="tts:key"
- interpret-as="tts:char"
- interpret-as="tts:digits"
- name
- xml:lang
- xml:lang
- alias
- field="punctuation" mode=none,all,some
- field="capital_letters" mode=no,spelling,icon,pitch
- src
- level (
none,reduced,moderate,strongorx-strong)
- strength
- time
eSpeak can speak HTML text directly, or text containing both SSML and HTML markup.
Any unrecognised tags are ignored.
The following tags cause a sentence break:
brddliimgtd
The following tags cause a paragraph break:
h1h2h3h4hr
Text between the following tags is ignored:
scriptstyle
- Speech Synthesis Markup Language (SSML) Version 1.0. W3C Recommendation, 3 March 2009. W3C.
- Speech Synthesis Markup Language (SSML) Version 1.1. W3C Recommendation, 7 September 2010. W3C.
- SSML 1.0 say-as attribute values. W3C NOTE, 26 May 2005. W3C.
- HTML 5.2. W3C Recommendation, 14 December 2017. W3C.
- HTML Living Standard. Continually updated. WHATWG.