Phonetic representation to speech #1865

RonBOakes · 2024-02-10T19:28:57Z

RonBOakes
Feb 10, 2024

I am working on a text-to-speech related application that produces input in a phonetic representation. In my current implementation, I am using Amazon Polly. But I would like eSpeak-ng as the default (no-cost) option.

At this moment, I have (temporarily) given up on direct integration due to issues with getting my Windows .NET (Visual Studio 2022) project to properly reference the eSpeak-ng libraries after successfully building them. So, I will call the eSpeak-ng directly, passing in command line arguments.

I am looking for documentation, or at least informal help, on how to submit a string of IPA, X-SAMPA, Conlang X-SAMPA, or if needed another format.

FWIW, I did look back into the ancient (well several months old) history on my Linux box and found that I had conducted the following experiment:

cat sample.txt | lexconvert --phones2phones unicode-ipa espeak | espeak-ng -g 10 -w sample.wav

Looking at sample.txt, I see it contains: "[[ˈoʊˌsətɛi̯χ]] [[ˈoʊˌŋəkʊ]] [[ˈlikʏχkə]] [[ˈpœy̯ɣi]] [[ˈaːpɑmyɦ]] [[ˈpʊkəŋœy̯]]" -- IPA text enclosed in double square brackets. After running it through lexconvert with the output set to "espeak" I get: "[['oU,s@tEi:]] [['oU,N@kU]] [['li:kk@]] [['pi:]] [['A:pA:m]] [['pUk@N]]". If I change the output option to x-sampa, I get "oU"s@%tEi oU"N@%kU li"kk@ pi" a:"pAm pU"k@N".

(FWIW: I cannot use lexconvert because it does not accurately or completely translate IPA to X-SAMPA due to its primary writer's use of British English (I'm guessing Received Pronunciation) which lacks a full set of phonemes, notably vowels, and merges several. I will handle the conversion out of the IPA I use internally into whatever phonetic representation I need to feed espeak-ng in my code.)

I would appreciate any assistance with:

Formatting the input to indicate that the words are in a phonetic representation
The best phonetic representation to use
any command line arguments
any other advise.

FYI: When I work with Amazon Polly, I utilize the SSML "phoneme" tag, but I note that this is not currently implemented in espeak-ng. Even though this project may end up including either a fork of espeak-ng, or work that gets submitted back to the project, I am expecting that work to be towards a limited-use "universal" voice that can support a broad set of phonemes beyond what any currently supported voice can support. I am unlikely to also have the time to implement support for the "phoneme" tag.

Ron Oakes (RonBOakes)
ron@ron-oakes.us
ro421@mynova.nsu.edu

jaacoppi · 2024-02-17T12:40:52Z

jaacoppi
Feb 17, 2024
Maintainer

Looking at sample.txt, I see it contains: "[[ˈoʊˌsətɛi̯χ]] [[ˈoʊˌŋəkʊ]] [[ˈlikʏχkə]] [[ˈpœy̯ɣi]] [[ˈaːpɑmyɦ]] [[ˈpʊkəŋœy̯]]" -- IPA text enclosed in double square brackets. After running it through lexconvert with the output set to "espeak" I get: ***@***.***:]] ***@***.***] [['li:kk@]] [['pi:]] [['A:pA:m]] ***@***.***]". If I change the output option to x-sampa, I get "oU"s@%tEi oU"N@%kU li"kk@ pi" a:"pAm ***@***.***".

espeak-ng uses Kirshenbaum. That's what you've produced here. We also have an SSML implementation available with espeak-ng -m, but it's buggy and unrealiable.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

espeak-ng

Phonetic representation to speech #1865

{{title}}

Replies: 1 comment

{{title}}

Select a reply

espeak-ng

Phonetic representation to speech #1865

RonBOakes Feb 10, 2024

Replies: 1 comment

jaacoppi Feb 17, 2024 Maintainer

RonBOakes
Feb 10, 2024

jaacoppi
Feb 17, 2024
Maintainer