Sunsetting of the “US English Female Text-to-speech” extension #355

ruffsl · 2023-11-04T18:26:08Z

ruffsl
Nov 4, 2023

After a few recent Chrome updates, I noticed that the offline TTS engine provided by Google in the “US English Female Text-to-speech” extension suddenly stopped working. At first, I figured it was one of the extension's regular audio glitches, where I either have to restart the browser or reinstall the extension to resume regular audio playback. However, after navigating to the chrome extension panel, this review notice was pinned:

US English Female Text-to-speech (by Google)
On • This extension was unpublished by its developer

Subsequently, a 404 error is now returned when following the "View in Chrome Web Store" redirect:

Unpublished extension: https://chrome.google.com/webstore/detail/pkidpnnapnfgjhfhkpmjpbckkbaodldb

A quick search results in finding this Google Support question posted by @ken107 , now marked as duplicate with the sunset notice from a Chrome Support Manager announcing it depreciation along with Native Client (NaCl) support that the extension relied upon:

While I don't recall the extension receiving any updates past 2017, it was fast, efficient, intelligible and pretty much feature complete, if only a little flaky on occasional startup. A big part of that was the TTS engine being entirely offline, using an LSTM (Long Short-Term Memory) variant of a RNN (Recurrent Neural Network). Although still sounding slightly robotic, the monotone speech and hard consonant accent enabled the TTS engine to remain consistently intelligible even at high WPM (Words per Minute) counts, making it easier to quickly and reliably read long bodies of text when using the browser.

The old TTS engine was also helpful for copy editing, as when listing to one's own typing - any grammatical issues where made auditorily clear, as the robotic voice would phonetically butcher any misspelled words, or result in irregular intonation and pacing given any punctuation errors. I feel like this may be why other still rely on TTS stick with older voice options, such as Microsoft Sam from 1998 (or other using classic Hidden Markov Models), regardless of how inanimate they may sound. I felt like “US English Female Text-to-speech” was a happy middle ground, while also conveniently remaining OS platform agnostic.

Meanwhile, the latest (and never stable) cloud based TTS options tend to softly slur words, or change phonetics over time - no longer announcing each syllable as clearly, while also irregularly pausing - rendering grammatical punctuation indistinguishable. For example, pauses between commas vs periods are both awkwardly and absurdly long, and don't seem to scale in proportion with the user's WPM settings. Compoundingly, it sounds like paragraphs or even sentences are being indiscriminately tokenized by the server, resulting in any pauses in playback between punctuation vs API latency becoming frustratingly indistinguishable as well.

In summery, many newer cloud based TTS engines are either:

too slow for regular repeated use
- due to inherent lag and latency with remote APIs
unreliable for spontaneous use
- due to the inherent always online demand
phonetically soft spoken with inconsistent intonation
- making it difficult hear grammatical punctuation

So, would there be any interest in developing an extension with an alternate offline TTS engine? Perhaps we could reverse engineer the LSTM model binary (voice_lstm_en-US.zvoice) trained in the deprecated “US English Female Text-to-speech” extension, or bundle a newer neural network model as long as the inference framework for speech syntheses is offline.

https://chromium-review.googlesource.com/c/chromiumos/platform/assets/+/443133

ken107 · 2023-11-05T00:23:08Z

ken107
Nov 5, 2023
Maintainer

Thank you for the fantastic writeup.

I have been digging into this as well. The "US English Female TTS" and the other voices from the ChromeVox code are so useful it's such a waste to just sunset them without providing any alternatives. I would love to distribute these voices with Read Aloud if it were possible.

So I did some digging. Unfortunately the LSTM/RNN stuff are out of my technical reach. And I don't know if there's enough context to make use of the data inside the .zvoice archive. They are not in any standardized format, and I couldn't find anything about them on Google, or ChatGPT. The reverse engineering of this would need somebody of your caliber.

I was considering the alternative of somehow distributing the Native Client runtime as a Windows program that users will need to install. This might be possible using the sel_ldr.exe to load the NEXE into memory, then sending messages to it via sockets. Ultimately, though, its utility will be limited because 1 it's only for Windows, and 2 it's too much hassle to have to install a separate program.
https://www.chromium.org/nativeclient/life-of-sel_ldr/

If we can find the original developer of the ChromeVox code, that might help as well.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sunsetting of the “US English Female Text-to-speech” extension #355

{{title}}

Replies: 1 comment

{{title}}

Select a reply

Sunsetting of the “US English Female Text-to-speech” extension #355

ruffsl Nov 4, 2023

Replies: 1 comment

ken107 Nov 5, 2023 Maintainer

ruffsl
Nov 4, 2023

ken107
Nov 5, 2023
Maintainer