-
-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for EPUB 3 Audio-eBooks #2061
Conversation
Cool, nice to see. What's the issue with implementing it in the content |
In terms of platform compatibility, I have run it on Windows and Ubuntu though I don't think it would be an issue even as qtwebengine seems only able to play mp3. My issue with implementing it on the content server is mainly from figuring out how to create the link to the audio file for embedding in the overlay. For the local viewer, it's as simple as adding the relative link to the src attribute of the audio tag. But I figure for the content server I will have to render some blob link to the uploaded file instead. The relevant code seems somewhere around create_link_replacer function in render_book.py or db.pyj. Though first I will have to understand how the srv part communicates with the pyj part as well as how to debug the code in srv more efficiently. |
Hi. Could you merge it? Though I could use the feature fine with a shell script, I would like it also included in the official application for my mother and sister too. |
I will review when I have the time. |
Note that EPUB 3 defines mp3, mp4 and ogg as core media types for audio. https://www.w3.org/TR/epub/#sec-core-media-types According to https://doc.qt.io/qt-6/qtwebengine-features.html#audio-and-video-codecs So how is this supposed to work? I will try it with a sample file and see. Maybe the Qt docs are wrong and mp3 is not a propritary codec I know its patents expired in 2017 |
In my testing the audio worked on windows and macOS but not linux. I am guessing chromium fallsback to OS provided facilities? No idea. |
Thanks for your contribution! |
Hi. Thank you for checking up. I will address the problems soon.
I also had no idea. It worked for me on Ubuntu. I expected the browser would support more codes but that seemed to not be the case. |
On Tue, Oct 17, 2023 at 07:29:50AM -0700, duydl wrote:
Hi. Thank you for checking up. I will address the problems soon.
But regarding the 8 and 2.5. The toggle overlay is not meant to quit the player, its purpose is to just pull down the height of the read-aloud overlay so users can scroll/highlight/take notes while the audio still plays and marks. I should probably figure out a better name then.
The only effect it had for me was hiding the toggle button itself, the
overlay was unchanged. You can use the name Collapse/Expand instead
changing the tooltip based on current state.
|
Did you try to highlight or click on the text? It will have a different effect. Instead of jump to the clicked sentence it will highlight instead like in normal mode. |
No, I just started it with ctrl+s and clicked the toggle button |
Ah ok, yes, that works. Then maybe make the tooltip something like "Allow |
Hi. I have finished what you asked. |
Thanks, looks much better. Some more comments:
Relatedly why is change_audio_src() calling show_next_spine_item().
|
|
On Wed, Oct 18, 2023 at 09:32:03PM -0700, duydl wrote:
1. Perhaps just for testing. Apologize. I should have checked more carefully when cleaning up for the PR
2,3: This is for when users start the player in a spine without audio. The book will automatically go to the next spine until it reaches a spine with an audio. I tried putting the change_audio_src in the cb and other places but the global current_spine_item still would not update and the book just got stuck in the next spine. Very confused because set_current_spine_item() is called much earlier, there shouldn't even be a need for a callback. I could have modified the global function to return a promise but the effect felt much better with some delay anyway so I settled with setTimeout.
I dont think I am comfortable with this, opening read aloud should not
cause the book to jump ahead/behind to an arbitrary point. Instead, if
the current file has no smil audio, just popup a modal dialog saying so
and maybe ask the user if they want to skip to the next location with audio.
See question_dialog() for easy implementation.
3. It is true the smil file and audio need to be specified in content.opf. I will attempt that but it seems to be more work than I could finish immediately.
Dont worry about it, I will take care of it, you can use the resulting
metadata in your PR once I do.
|
Just so you know I didn't just invent the behavior, just mimicked that from Thorium. It would not skip arbitrarily to anywhere, just continue to go ahead to where there is an audio. The users would feel like there is empty audio at the current location, just like an audiobook, the user experience would be much smoother IMO without any additional interaction needed. Also, users would not know which spine specifically has audio, so they may not be able to skip to the right section. |
On Wed, Oct 18, 2023 at 11:35:25PM -0700, duydl wrote:
> I dont think I am comfortable with this, opening read aloud should not cause the book to jump ahead/behind to an arbitrary point. Instead, if the current file has no smil audio, just popup a modal dialog saying so and maybe ask the user if they want to skip to the next location with audio.
Just so you know I didn't just invent the behavior, just mimicked that from Thorium. It would not skip arbitrarily to anywhere, just continue to go ahead to where there is an audio. The users would feel like there is empty audio at the current location, just like an audiobook, the user experience would be much smoother IMO without any additional interaction needed. Also, users would not know which spine specifically has audio, so they may not be able to skip to the right section.
The modal will ask the user to skip to where the audio starts/continues.
So the user doesnt have to do anything more than click a button.
|
But what if there is some spine without audio in the middle of the spines with audio? Many books have an h1 HTML spine or image spine among text sections. Edited: They would also be unable to see the non-audio sections also. |
On Wed, Oct 18, 2023 at 11:44:14PM -0700, duydl wrote:
But what if there is some spine without audio in the middle of the spine with audio? Many books have an h1 HTML file or images between different sections.
Exactly, it is not correct to just skip these sections, the user should
have the option to read them or skip them to the next audio section.
|
They could press pause and it will not skip to the next spine. Then they could toggle scrolling and read that section. |
I have implemented parsing of SMIL data in srv/render_book.py please use |
I added a function is_anchor_on_screen() that you can use to find the first visible smil id. It caches the position computation, so should be fairly performant. |
This has been merged into the notes branch which will eventually become calibre 7. I have rewritten all the smil sync code and made it behave like TTS based read aloud. Feel free to test/comment. |
I haven't looked into what you have changed, but the viewer now cannot play both the Readbeyond samples and my custom-made audio ebooks, with different errors appearing. Edited: Clearing the cache solved it. There are some other problems though from a brief testing.
|
On Tue, Oct 24, 2023 at 01:46:43PM -0700, duydl wrote:
I haven't looked into what you have changed, but the viewer now cannot play both the Readbeyond samples and my custom-made audio ebooks, with different errors appearing.
ReadBeyond audio books play fine for me. Remember to reload the books in
the viewer.
I am also really confused about why you decided to remove the toggle that allows text selection while the audio is playing, especially after all of my explanation on why it is a necessary feature for me.
The explanation is in the commit message.
|
|
And regarding being able to highlight while listening to audio, that strikes me a a pretty broken design, since the text scrolls while the audio is playing which means things move around. In flow mode it is potentially continuous design in paged mode it is of course better in that page transitions are less frequent, but can still happen while you are in the middle of highlighting something. |
Changing the spine in the middle of playing the TTS will not update the audio, but the mark keeps on highlighting irrelevant text in the new spine. I don't understand why you don't see it as buggy as hell. I haven't seen any TTS that locks up the user while running either. Interacting with the text is what people want from these features, or else they would just listen to the audio.
Yeah it was in some of the other buggy commits. I didn't sync that newest ones. But seriously that is to show you would barely ever use the feature yourself to miss that problem.
No they want to read while listening. It is the better auto-scroll mode. And users could scroll and select text in that mode. Or other comparison is with interactive transcripts like that of Youtube, they do not lock user in the current scene. No, the purpose of the interactive transcript is to have users able to search and navigate the media while listening/watching as well as study the text content when the media is not fully comprehensible i.e second language users, language learners.
Again to show you barely tried the feature. Both in flow mode and paged mode viewer only scrolls when the marked element is out of viewport. So technically paged mode and flowmode functioned the same. The other audio-ebook player, Thorium, allows that, and their users do not complain. And it is not even a default behavior in Calibre.
Javascript run async. So the smil file is parsed when the HTML is also parsed. And the parsed content is cached anyway. In the end any performance issue in the viewer would stem significantly more from the audio loading. Parsing smil with python or web engine is just a matter of preference. Though I think that just invented work for yourself. |
On Tue, Oct 24, 2023 at 11:37:10PM -0700, duydl wrote:
> 1. Yes, this behaviot matches TTS read aloud
Changing the spine in the middle of playing the TTS will not update the audio, but the mark keeps on highlighting irrelevant text in the new spine. I don't understand why you don't see it as buggy as hell. I haven't seen any TTS that locks up the user while running either. Interacting with the text is what people want from these features, or else they would just listen to the audio.
That makes zero sense, here we are highlighting text as it is being
read. AKA the user intends to watch as the text is being read. If you
want a mode that means "play some random audio in the background while
I work on some unrelated part of the text" then
that would be a whole separate mode of interaction. One that would need
to be implemented for both TTS and SMIL and called something like "read
aloud in background". Indeed I wonder what level of cognitive dissonance
it must cause hearing text A while reading text B.
> 2. No it was still a problem in your code
Yeah it was in some of the other buggy commits. I didn't sync that newest ones. But seriously that is to show you would barely ever use the feature yourself to miss that problem.
By that metric, given the entire forest of bugs that was in your code,
you dont use this feature at all.
> 3. Yes, after some consideration I decided you were right about that. Since if the user decides to listen to audio, their primary mode of interaction is the audio.
No they want to read while listening. It is the better auto-scroll mode.
Read aloud is not auto-scroll, the two have no connection what so ever.
Or other comparison is with interactive transcripts like that of Youtube, they do not lock user in the current scene. No, the purpose of the interactive transcript is to have users able to search the media while listening/watching.
> that strikes me a a pretty broken design
Again to show you barely tried the feature. Both in flow mode and paged mode viewer only scrolls when the marked element is out of viewport. So technically paged mode and flowmode functioned the same.
And once in flow mode you reach the last line it will go out of viewport
on every line. Maybe try using it yourself.
The other audio-ebook player, Thorium, allows that, and their users do not complain. And it is not even a default behavior in Calibre.
Then use Thorium.
Tbh, I am your only user feedback source, you would probably not use the feature yourself. In itself, it is a pretty isolated module. Why do you need to be so controlling of its behavior?
I develop calibre to work in ways that make sense to me, because I am
the person responsible for all of it and maintaining all of it for the
rest of my life. If you cannot find a way to work with me, please fork
the code and do whatever the fuck you like. Stop wasting my time.
> SMIL files are parsed in native code by lxml
Javascript run async anyway.
The viewer is blocked while a section is loading, JS being async or not
is irrelevant.
So the smil file is parsed when the HTML is also parsed. And the parsed content is cached anyway. In the end any performance issue in the viewer would stem significantly more from the audio loading. Parsing smil with python or web engine is just a matter of preference. Though I think that just invented work for yourself.
No, I improved the performance of parsing SMIL files by having it done
once only on first load. And I fixed a whole passel of bugs in your SMIL
parsing code. To name just a few:
1) You assume a single smil file maps to a single spine element only.
The spec does not require this.
2) You think SMIL anchors always point to spans. The spec does not require
this.
3) Your code breaks if the audio file has any mime type other than
audio/mpeg
4) Your parsing of SMIL timestamps was completely spec non-compliant
5) Your code assumed there is some fixed relationship between spine file
names, smil file names and audio file names, such as that they are in
the same folder or related folder. Not required by the spec.
|
Allow me to apologize sir if I was impolite. I understand you are the principal dev of Calibre and your value is most important for the project. |
No worries. And yes I did it myself as it would take me less time that A "background" mode for read aloud (for both tts and smil). Basically, a If this is acceptable to you feel free to work on a PR for it. Doing it |
Feature: Support for EPUB 3 Audio-eBooks in Calibre
Overview
This pull request adds support for EPUB 3 Audio-eBooks in Calibre Ebook Viewer. They are EPUB with SMIL audio synchronization (EPUB3 with Media Overlays) which includes the additions of SMIL files and audio content compared to conventional EPUB.
Additional Resources
Public domain audio eBooks can be found on ReadBeyond. They also developed Thorium Reader but it lacked many features of Calibre ebook-viewer, not to mention the library management.
This PR enhances Calibre's capabilities and makes it compatible with the format.
Further plans
I created a new overlay based on the read-aloud overlay for tts. The program checks for SMIL Files and if detected, the Read Aloud toggle will open that overlay instead of TTS.
The audio control is implemented directly on the front end of Rapydscript. There is no communication with the Python backend.
I will continue to maintain and improve this feature if needed. Particularly, the audio files have not been linked successfully to the Calibre content server viewer.
Thank you very much for the amazing program.