-
-
Notifications
You must be signed in to change notification settings - Fork 125
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement text-to-speech for reading articles #166
Comments
I'll try to work on this as part of GoogleServe. |
Hi @kasemmarifet , thanks for your welcome contribution. I'm just testing your PR. It works offline on Chromium and Firefox Quantum on Windows 10 for English-language ZIMs, so congratulations! I can't get it working on Microsoft Edge or Internet Explorer. On Edge, the speech button appears, and pressing it appears to initiate something, but no sound comes out. When I turn it off, briefly I get a loud-speaker icon in the browser tab for the page, indicating that some audio was activated at some point, but only, paradoxically, on turning off the read button. And there still is no sound. The API is described here: https://blogs.windows.com/msedgedev/2016/06/01/introducing-speech-synthesis-api/ , and there are some examples of how to use it for Edge, so maybe you can adapt your code. On Internet Explorer there is no button, but that is expected, as it is not fully HTML5 compliant. It seems to degrade gracefully (by not showing the button). When I load a Spanish-language ZIM, however, in Firefox, or a French ZIM in Chromium, in both cases, the read button attempts to use the English-language text-to-speech engine to read the non-English language. I'm sure that can be remedied fairly easily. In any case, this is a great base on which to build. Thank you. |
Thanks Jaifroid, I just pushed an update to the code. We discussed this with Kelson a little bit (cced):
Kasem |
@kasemmarifet @kelson42 https://developer.microsoft.com/en-us/microsoft-edge/testdrive/demos/speechsynthesis/ Maybe it has some clues about what's necessary. |
@kasemmarifet - update, it does work on Edge! What happens is that it takes 8 or 9 seconds (I didn't time it precisely) for the audio to start, at least on the size of article I was testing. I had assumed it wasn't working after about six seconds, since it starts much faster on FF and Chromium, and was turning it off. I wonder if we could add an intermediate, "Please wait..." type of message, maybe in a small overlay message box or on the play button itself which would get cancelled once the audio starts. If I fell into this trap, others might. People aren't very patient nowadays, ahem. |
One other thing: how difficult would it be to exclude certain types of text? I don't think it's desirable for it to read the infoboxes, for example. Or this might be a configuration option. It would also be useful to exclude footnote reference numbers, which are also currently read (and slow down the reading as a result). Any ideas on how best to achieve that? |
Just to summarize a few things that I believe are required to make a complete and genuinely useful solution for this issue, building on @kasemmarifet's work. I'd be very happy to work on some of these:
|
I like this TODO-list, but it does not mean it's all-or-nothing. |
I completely agree -- we can certainly break this down into more than one PR. The current PR #394 will need either the first point above (choose correct voice for the language of the ZIM) and/or move the voice-selection option to the configuration page (as it is currently displayed in an obtrusive manner) before we could merge it. @kasemmarifet, could you let us know if you are able to do any further work on your PR? If your allotted GoogleServe time is up, please let us know so that we can work out how to take this forward. |
Sorry for the late reply, I was on vacation. Thanks for the list of further improvements. Can we put this change behind a flag (turned off by default) and then we can do the work to get the locale for the document. Once we have the locale we can use that to select the voice correctly. I can't spend too much time this week on this but I can do the change to put this behind a flag. Can someone work on getting the locale in a separate PR? |
@kasemmarifet We have a PR #397 for the language code, but it's currently returning ISO-639-3 instead of BCP 47 format, which is needed for voice synthesis -- see discussion in #395. However, I can do a PR for returning the BCP code, as it is contained in the ZIM's meta Name attribute. |
@Jaifroid Any update about this issue? |
Can I tackle this issue? |
@ykabusalah Please look carefully at the discussion above. A PR was already made, but it was never completed. If you think you can complete the existing PR according to the requirements in the discussion above, please do try. |
If This Issue Is Not Solved Am Interested To Solved It |
Again, please work on one at a time. I think this issue depends on the UI changes, because room will need to be made for a "read aloud" button. |
Ok
…On Sat, 22 Jul 2023, 12:43 pm Jaifroid, ***@***.***> wrote:
Again, please work on one at a time. I think this issue depends on the UI
changes, because room will need to be made for a "read aloud" button.
—
Reply to this email directly, view it on GitHub
<#166 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AZRVEQUTYKQ2J2H3CMPFBXTXRN4P5ANCNFSM4BZHNYMQ>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
Could I be assigned to this issue? |
@Paulie-Aditya Please take a look at https://github.com/kiwix/kiwix-js/blob/main/CONTRIBUTING.md, set up your development environment, and make sure you're happy with the process here. If all is well, please come back here outlining your suggestion of how to complete this issue so that I can assign you. A particular problem in the past has been how to organize the UI to invoke this function, so I'd be interested in what you propose. In some browsers, reading aloud just works already. For example, in Edge, pressing Ctrl-Shift-U will start to read the loaded article and provide its own UI. |
is it open..if it is assign it to me |
@Hamza1821, please read the instructions above, and come back here with your proposed solution before you start coding. Be sure to read all the discussion above as well. |
Like on the Android version of Kiwix.
Using the Speech Synthesis API : https://caniuse.com/#feat=speech-synthesis
The text was updated successfully, but these errors were encountered: