Implement text-to-speech for reading articles #166

mossroy · 2016-01-21T20:57:16Z

Like on the Android version of Kiwix.

Using the Speech Synthesis API : https://caniuse.com/#feat=speech-synthesis

kasemmarifet · 2018-06-29T08:51:48Z

I'll try to work on this as part of GoogleServe.

Jaifroid · 2018-06-29T12:49:49Z

Hi @kasemmarifet , thanks for your welcome contribution. I'm just testing your PR. It works offline on Chromium and Firefox Quantum on Windows 10 for English-language ZIMs, so congratulations!

I can't get it working on Microsoft Edge or Internet Explorer. On Edge, the speech button appears, and pressing it appears to initiate something, but no sound comes out. When I turn it off, briefly I get a loud-speaker icon in the browser tab for the page, indicating that some audio was activated at some point, but only, paradoxically, on turning off the read button. And there still is no sound. The API is described here: https://blogs.windows.com/msedgedev/2016/06/01/introducing-speech-synthesis-api/ , and there are some examples of how to use it for Edge, so maybe you can adapt your code.

On Internet Explorer there is no button, but that is expected, as it is not fully HTML5 compliant. It seems to degrade gracefully (by not showing the button).

When I load a Spanish-language ZIM, however, in Firefox, or a French ZIM in Chromium, in both cases, the read button attempts to use the English-language text-to-speech engine to read the non-English language. I'm sure that can be remedied fairly easily.

In any case, this is a great base on which to build. Thank you.

kasemmarifet · 2018-06-29T14:08:59Z

Thanks Jaifroid,

I just pushed an update to the code. We discussed this with Kelson a little bit (cced):

If the browser doesn't have the API (like IE), the code should handle it and not show the button
By default, the speech voice is the system one. We looked to see if we can get the content language to match the voice to it, but it seems like this is missing in the JS library. So further work has to be done here to get the content language and load the correct voice based on this. I added TODO for that.
For Edge: I'm not sure what the issue is there. From your description it looks like there is an issue with the voice. Can you please try to select a different voice from the dropdown?

Kasem

Jaifroid · 2018-06-29T17:07:32Z

@kasemmarifet @kelson42
Thank you for adding the language drop-down. Below is a screenshot of the languages that it shows for my system in Edge. However, none of these voices actually work in Edge (42.17134): no sound at all on pressing "play". As I say, it works fine in Firefox Quantum and Chromium. I'll try to see if I can do some debugging at weekend, but let me know if there's anything specifically I should be looking for. The following demo page works fine in the same install of Edge:

https://developer.microsoft.com/en-us/microsoft-edge/testdrive/demos/speechsynthesis/

Maybe it has some clues about what's necessary.

Jaifroid · 2018-06-30T11:06:28Z

@kasemmarifet - update, it does work on Edge! What happens is that it takes 8 or 9 seconds (I didn't time it precisely) for the audio to start, at least on the size of article I was testing. I had assumed it wasn't working after about six seconds, since it starts much faster on FF and Chromium, and was turning it off. I wonder if we could add an intermediate, "Please wait..." type of message, maybe in a small overlay message box or on the play button itself which would get cancelled once the audio starts. If I fell into this trap, others might. People aren't very patient nowadays, ahem.

Jaifroid · 2018-06-30T11:14:11Z

A minor point, but would you be able to put the different voices into a bootstrap split-button dropdown, like in the image below? It would save space rather than having a separate dropdown box (I realize the box was added to debug the voices, but it's useful to have the choices).

Jaifroid · 2018-06-30T11:30:30Z

One other thing: how difficult would it be to exclude certain types of text? I don't think it's desirable for it to read the infoboxes, for example. Or this might be a configuration option. It would also be useful to exclude footnote reference numbers, which are also currently read (and slow down the reading as a result). Any ideas on how best to achieve that?

Jaifroid · 2018-07-04T07:17:01Z

Just to summarize a few things that I believe are required to make a complete and genuinely useful solution for this issue, building on @kasemmarifet's work. I'd be very happy to work on some of these:

Choose the correct voice for the natural language of the ZIM (see discussion in Add support of ZIM meta information reading #395 ), with the system default being the first choice if it matches the language of the ZIM; however, the user should be able to override the auto-selected voice, e.g., if they want a female voice rather than male or vice-versa);
Remember the chosen voice on a per-ZIM basis (cookie value);
Display the availability of the speech synthesis API in the API panel on the Configuration page; maybe also display the auto- or user-selected voice for the currently loaded ZIM; maybe we should also move the voice-selection options to the Configuration page;
Read only the main text of the article, not text in info boxes or nav boxes, or footnote reference numbers (reading info boxes could be an option in Configuration);
While preparing for text-to-speech synthesis, make the Read button unselectable and have it show "please wait"; change to Stop button once reading has started;
Allow reading from the cursor or selected text position - most Wikipedia articles are quite long, if we always start at the beginning of the article, the usefulness of a reading feature becomes extremely limited;
Provide a way to vary the speed of reading - this is essential in the long run, in my opinion, if the feature is to be of real use, but it could be added as a todo;
Ideally, highlight the words being read using utterance.onboundary - see https://stackoverflow.com/questions/38120478/speech-synthesis-api-highlight-words-as-they-are-spoken . I think users will expect this and expect to know where they are in the article. It also makes the feature genuinely useful for things like language learning.

mossroy · 2018-07-06T11:13:36Z

I like this TODO-list, but it does not mean it's all-or-nothing.
Based on what @kasemmarifet would be willing to do (and maybe what he discussed with @kelson42 ), we might decide together what is required to merge the PR #394, and what could be split into other improvement issues (that could be implemented later and/or by someone else)

Jaifroid · 2018-07-08T12:57:56Z

I completely agree -- we can certainly break this down into more than one PR. The current PR #394 will need either the first point above (choose correct voice for the language of the ZIM) and/or move the voice-selection option to the configuration page (as it is currently displayed in an obtrusive manner) before we could merge it.

@kasemmarifet, could you let us know if you are able to do any further work on your PR? If your allotted GoogleServe time is up, please let us know so that we can work out how to take this forward.

kasemmarifet · 2018-07-10T14:37:32Z

Sorry for the late reply, I was on vacation.

Thanks for the list of further improvements. Can we put this change behind a flag (turned off by default) and then we can do the work to get the locale for the document. Once we have the locale we can use that to select the voice correctly.

I can't spend too much time this week on this but I can do the change to put this behind a flag. Can someone work on getting the locale in a separate PR?

Jaifroid · 2018-07-10T14:44:16Z

@kasemmarifet We have a PR #397 for the language code, but it's currently returning ISO-639-3 instead of BCP 47 format, which is needed for voice synthesis -- see discussion in #395. However, I can do a PR for returning the BCP code, as it is contained in the ZIM's meta Name attribute.

Bam92 · 2020-02-04T12:50:49Z

@Jaifroid Any update about this issue?

ykabusalah · 2020-03-24T22:57:24Z

Can I tackle this issue?

Jaifroid · 2020-03-25T06:27:35Z

@ykabusalah Please look carefully at the discussion above. A PR was already made, but it was never completed. If you think you can complete the existing PR according to the requirements in the discussion above, please do try.

Rbcoder1 · 2023-07-19T15:35:55Z

If This Issue Is Not Solved Am Interested To Solved It
Please Assign It To Me

Jaifroid · 2023-07-22T07:12:51Z

Again, please work on one at a time. I think this issue depends on the UI changes, because room will need to be made for a "read aloud" button.

Rbcoder1 · 2023-07-23T05:40:16Z

Ok

…

On Sat, 22 Jul 2023, 12:43 pm Jaifroid, ***@***.***> wrote: Again, please work on one at a time. I think this issue depends on the UI changes, because room will need to be made for a "read aloud" button. — Reply to this email directly, view it on GitHub <#166 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AZRVEQUTYKQ2J2H3CMPFBXTXRN4P5ANCNFSM4BZHNYMQ> . You are receiving this because you commented.Message ID: ***@***.***>

Paulie-Aditya · 2023-11-18T00:15:49Z

Could I be assigned to this issue?

Jaifroid · 2023-11-18T13:52:28Z

@Paulie-Aditya Please take a look at https://github.com/kiwix/kiwix-js/blob/main/CONTRIBUTING.md, set up your development environment, and make sure you're happy with the process here. If all is well, please come back here outlining your suggestion of how to complete this issue so that I can assign you. A particular problem in the past has been how to organize the UI to invoke this function, so I'd be interested in what you propose.

In some browsers, reading aloud just works already. For example, in Edge, pressing Ctrl-Shift-U will start to read the loaded article and provide its own UI.

Hamza1821 · 2024-02-19T20:13:22Z

is it open..if it is assign it to me

Jaifroid · 2024-02-19T20:16:34Z

@Hamza1821, please read the instructions above, and come back here with your proposed solution before you start coding. Be sure to read all the discussion above as well.

mossroy added the enhancement label Jan 21, 2016

mossroy added this to the v2.5 milestone Jan 21, 2016

mossroy added the good first issue label Jan 10, 2018

mossroy mentioned this issue Jan 16, 2018

Provide a way for the device to read (out loud) the content of the article kiwix/kiwix-js-pwa#45

Closed

kasemmarifet mentioned this issue Jun 29, 2018

Added support for reading the article aloud using the text to speech API #394

Closed

kelson42 changed the title ~~Implement text-to-speech for reading articles~~ [WIP] Implement text-to-speech for reading articles Jun 29, 2018

kelson42 changed the title ~~[WIP] Implement text-to-speech for reading articles~~ Implement text-to-speech for reading articles Jun 29, 2018

Jaifroid mentioned this issue Jul 10, 2018

Read bcp47 language of zim #400

Closed

Jaifroid mentioned this issue Jun 17, 2019

Use a custom context menu for extra features #521

Closed

Jaifroid modified the milestones: v4.0, v4.1 Apr 22, 2023

Jaifroid added user interface i18n Internationalization and removed good first issue labels Feb 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement text-to-speech for reading articles #166

Implement text-to-speech for reading articles #166

mossroy commented Jan 21, 2016 •

edited

Loading

kasemmarifet commented Jun 29, 2018

Jaifroid commented Jun 29, 2018

kasemmarifet commented Jun 29, 2018

Jaifroid commented Jun 29, 2018

Jaifroid commented Jun 30, 2018

Jaifroid commented Jun 30, 2018

Jaifroid commented Jun 30, 2018

Jaifroid commented Jul 4, 2018 •

edited

Loading

mossroy commented Jul 6, 2018

Jaifroid commented Jul 8, 2018

kasemmarifet commented Jul 10, 2018

Jaifroid commented Jul 10, 2018

Bam92 commented Feb 4, 2020

ykabusalah commented Mar 24, 2020

Jaifroid commented Mar 25, 2020

Rbcoder1 commented Jul 19, 2023

Jaifroid commented Jul 22, 2023

Rbcoder1 commented Jul 23, 2023 via email

Paulie-Aditya commented Nov 18, 2023

Jaifroid commented Nov 18, 2023

Hamza1821 commented Feb 19, 2024

Jaifroid commented Feb 19, 2024

Implement text-to-speech for reading articles #166

Implement text-to-speech for reading articles #166

Comments

mossroy commented Jan 21, 2016 • edited Loading

kasemmarifet commented Jun 29, 2018

Jaifroid commented Jun 29, 2018

kasemmarifet commented Jun 29, 2018

Jaifroid commented Jun 29, 2018

Jaifroid commented Jun 30, 2018

Jaifroid commented Jun 30, 2018

Jaifroid commented Jun 30, 2018

Jaifroid commented Jul 4, 2018 • edited Loading

mossroy commented Jul 6, 2018

Jaifroid commented Jul 8, 2018

kasemmarifet commented Jul 10, 2018

Jaifroid commented Jul 10, 2018

Bam92 commented Feb 4, 2020

ykabusalah commented Mar 24, 2020

Jaifroid commented Mar 25, 2020

Rbcoder1 commented Jul 19, 2023

Jaifroid commented Jul 22, 2023

Rbcoder1 commented Jul 23, 2023 via email

Paulie-Aditya commented Nov 18, 2023

Jaifroid commented Nov 18, 2023

Hamza1821 commented Feb 19, 2024

Jaifroid commented Feb 19, 2024

mossroy commented Jan 21, 2016 •

edited

Loading

Jaifroid commented Jul 4, 2018 •

edited

Loading