Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement text-to-speech for reading articles #166

Open
mossroy opened this issue Jan 21, 2016 · 22 comments
Open

Implement text-to-speech for reading articles #166

mossroy opened this issue Jan 21, 2016 · 22 comments
Labels
Milestone

Comments

@mossroy
Copy link
Contributor

mossroy commented Jan 21, 2016

Like on the Android version of Kiwix.

Using the Speech Synthesis API : https://caniuse.com/#feat=speech-synthesis

@kasemmarifet
Copy link

I'll try to work on this as part of GoogleServe.

@Jaifroid
Copy link
Member

Hi @kasemmarifet , thanks for your welcome contribution. I'm just testing your PR. It works offline on Chromium and Firefox Quantum on Windows 10 for English-language ZIMs, so congratulations!

I can't get it working on Microsoft Edge or Internet Explorer. On Edge, the speech button appears, and pressing it appears to initiate something, but no sound comes out. When I turn it off, briefly I get a loud-speaker icon in the browser tab for the page, indicating that some audio was activated at some point, but only, paradoxically, on turning off the read button. And there still is no sound. The API is described here: https://blogs.windows.com/msedgedev/2016/06/01/introducing-speech-synthesis-api/ , and there are some examples of how to use it for Edge, so maybe you can adapt your code.

On Internet Explorer there is no button, but that is expected, as it is not fully HTML5 compliant. It seems to degrade gracefully (by not showing the button).

When I load a Spanish-language ZIM, however, in Firefox, or a French ZIM in Chromium, in both cases, the read button attempts to use the English-language text-to-speech engine to read the non-English language. I'm sure that can be remedied fairly easily.

In any case, this is a great base on which to build. Thank you.

@kelson42 kelson42 changed the title Implement text-to-speech for reading articles [WIP] Implement text-to-speech for reading articles Jun 29, 2018
@kelson42 kelson42 changed the title [WIP] Implement text-to-speech for reading articles Implement text-to-speech for reading articles Jun 29, 2018
@kasemmarifet
Copy link

Thanks Jaifroid,

I just pushed an update to the code. We discussed this with Kelson a little bit (cced):

  • If the browser doesn't have the API (like IE), the code should handle it and not show the button
  • By default, the speech voice is the system one. We looked to see if we can get the content language to match the voice to it, but it seems like this is missing in the JS library. So further work has to be done here to get the content language and load the correct voice based on this. I added TODO for that.
  • For Edge: I'm not sure what the issue is there. From your description it looks like there is an issue with the voice. Can you please try to select a different voice from the dropdown?

Kasem

@Jaifroid
Copy link
Member

@kasemmarifet @kelson42
Thank you for adding the language drop-down. Below is a screenshot of the languages that it shows for my system in Edge. However, none of these voices actually work in Edge (42.17134): no sound at all on pressing "play". As I say, it works fine in Firefox Quantum and Chromium. I'll try to see if I can do some debugging at weekend, but let me know if there's anything specifically I should be looking for. The following demo page works fine in the same install of Edge:

https://developer.microsoft.com/en-us/microsoft-edge/testdrive/demos/speechsynthesis/

Maybe it has some clues about what's necessary.

image

@Jaifroid
Copy link
Member

@kasemmarifet - update, it does work on Edge! What happens is that it takes 8 or 9 seconds (I didn't time it precisely) for the audio to start, at least on the size of article I was testing. I had assumed it wasn't working after about six seconds, since it starts much faster on FF and Chromium, and was turning it off. I wonder if we could add an intermediate, "Please wait..." type of message, maybe in a small overlay message box or on the play button itself which would get cancelled once the audio starts. If I fell into this trap, others might. People aren't very patient nowadays, ahem.

@Jaifroid
Copy link
Member

A minor point, but would you be able to put the different voices into a bootstrap split-button dropdown, like in the image below? It would save space rather than having a separate dropdown box (I realize the box was added to debug the voices, but it's useful to have the choices).

image

@Jaifroid
Copy link
Member

One other thing: how difficult would it be to exclude certain types of text? I don't think it's desirable for it to read the infoboxes, for example. Or this might be a configuration option. It would also be useful to exclude footnote reference numbers, which are also currently read (and slow down the reading as a result). Any ideas on how best to achieve that?

@Jaifroid
Copy link
Member

Jaifroid commented Jul 4, 2018

Just to summarize a few things that I believe are required to make a complete and genuinely useful solution for this issue, building on @kasemmarifet's work. I'd be very happy to work on some of these:

  • Choose the correct voice for the natural language of the ZIM (see discussion in Add support of ZIM meta information reading #395 ), with the system default being the first choice if it matches the language of the ZIM; however, the user should be able to override the auto-selected voice, e.g., if they want a female voice rather than male or vice-versa);
  • Remember the chosen voice on a per-ZIM basis (cookie value);
  • Display the availability of the speech synthesis API in the API panel on the Configuration page; maybe also display the auto- or user-selected voice for the currently loaded ZIM; maybe we should also move the voice-selection options to the Configuration page;
  • Read only the main text of the article, not text in info boxes or nav boxes, or footnote reference numbers (reading info boxes could be an option in Configuration);
  • While preparing for text-to-speech synthesis, make the Read button unselectable and have it show "please wait"; change to Stop button once reading has started;
  • Allow reading from the cursor or selected text position - most Wikipedia articles are quite long, if we always start at the beginning of the article, the usefulness of a reading feature becomes extremely limited;
  • Provide a way to vary the speed of reading - this is essential in the long run, in my opinion, if the feature is to be of real use, but it could be added as a todo;
  • Ideally, highlight the words being read using utterance.onboundary - see https://stackoverflow.com/questions/38120478/speech-synthesis-api-highlight-words-as-they-are-spoken . I think users will expect this and expect to know where they are in the article. It also makes the feature genuinely useful for things like language learning.

@mossroy
Copy link
Contributor Author

mossroy commented Jul 6, 2018

I like this TODO-list, but it does not mean it's all-or-nothing.
Based on what @kasemmarifet would be willing to do (and maybe what he discussed with @kelson42 ), we might decide together what is required to merge the PR #394, and what could be split into other improvement issues (that could be implemented later and/or by someone else)

@Jaifroid
Copy link
Member

Jaifroid commented Jul 8, 2018

I completely agree -- we can certainly break this down into more than one PR. The current PR #394 will need either the first point above (choose correct voice for the language of the ZIM) and/or move the voice-selection option to the configuration page (as it is currently displayed in an obtrusive manner) before we could merge it.

@kasemmarifet, could you let us know if you are able to do any further work on your PR? If your allotted GoogleServe time is up, please let us know so that we can work out how to take this forward.

@kasemmarifet
Copy link

Sorry for the late reply, I was on vacation.

Thanks for the list of further improvements. Can we put this change behind a flag (turned off by default) and then we can do the work to get the locale for the document. Once we have the locale we can use that to select the voice correctly.

I can't spend too much time this week on this but I can do the change to put this behind a flag. Can someone work on getting the locale in a separate PR?

@Jaifroid
Copy link
Member

@kasemmarifet We have a PR #397 for the language code, but it's currently returning ISO-639-3 instead of BCP 47 format, which is needed for voice synthesis -- see discussion in #395. However, I can do a PR for returning the BCP code, as it is contained in the ZIM's meta Name attribute.

@Bam92
Copy link

Bam92 commented Feb 4, 2020

@Jaifroid Any update about this issue?

@ykabusalah
Copy link
Contributor

Can I tackle this issue?

@Jaifroid
Copy link
Member

@ykabusalah Please look carefully at the discussion above. A PR was already made, but it was never completed. If you think you can complete the existing PR according to the requirements in the discussion above, please do try.

@Jaifroid Jaifroid modified the milestones: v4.0, v4.1 Apr 22, 2023
@Rbcoder1
Copy link

If This Issue Is Not Solved Am Interested To Solved It
Please Assign It To Me

@Jaifroid
Copy link
Member

Again, please work on one at a time. I think this issue depends on the UI changes, because room will need to be made for a "read aloud" button.

@Rbcoder1
Copy link

Rbcoder1 commented Jul 23, 2023 via email

@Paulie-Aditya
Copy link

Could I be assigned to this issue?

@Jaifroid
Copy link
Member

@Paulie-Aditya Please take a look at https://github.com/kiwix/kiwix-js/blob/main/CONTRIBUTING.md, set up your development environment, and make sure you're happy with the process here. If all is well, please come back here outlining your suggestion of how to complete this issue so that I can assign you. A particular problem in the past has been how to organize the UI to invoke this function, so I'd be interested in what you propose.

In some browsers, reading aloud just works already. For example, in Edge, pressing Ctrl-Shift-U will start to read the loaded article and provide its own UI.

@Hamza1821
Copy link

is it open..if it is assign it to me

@Jaifroid
Copy link
Member

@Hamza1821, please read the instructions above, and come back here with your proposed solution before you start coding. Be sure to read all the discussion above as well.

@Jaifroid Jaifroid added user interface i18n Internationalization and removed good first issue labels Feb 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

8 participants