-
Notifications
You must be signed in to change notification settings - Fork 116
Text-to-speech and content extraction #118
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| * Returns the [Locator] to the first content element that begins on the current screen. | ||
| */ | ||
| @ExperimentalReadiumApi | ||
| suspend fun firstVisibleElementLocator(): Locator? = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was added to start the TTS from the currently visible page instead of the start of the publication. Using the progression in the currentLocator was not precise enough, and firstVisibleElementLocator() makes sure to target the first HTML element fully visible or starting on the page.
| ) | ||
| @Retention(value = AnnotationRetention.BINARY) | ||
| @Target(AnnotationTarget.CLASS, AnnotationTarget.FUNCTION, AnnotationTarget.TYPEALIAS, AnnotationTarget.PROPERTY) | ||
| annotation class DelicateReadiumApi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inspired by @DelicateCoroutinesApi. We can use it to tag a dangerous Readium API that should be used after carefully reading the doc.
| /** | ||
| * Reads the full content as a [Bitmap]. | ||
| */ | ||
| suspend fun readAsBitmap(): ResourceTry<Bitmap> = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not necessary for this PR, but this seems to be a useful helper. I used it in the example of the content.md user guide to build an index of images.
| val startIndex: Int, | ||
| ) | ||
|
|
||
| private class ContentParser( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This algorithm is largely stole inspired by jsoup's text(), to extract a normalized text of an element.
|
|
||
| import java.util.* | ||
|
|
||
| class Language private constructor(val code: String, val locale: Locale) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I introduced this Language to have a higher language construct that doesn't depend on the JVM, and to add some useful helpers such as removeRegion().
| } | ||
| }) | ||
|
|
||
| // Speech speed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed the legacy TTS implementation that was entirely in the test app.
| @OptIn(ExperimentalMedia2::class, ExperimentalTime::class, ExperimentalCoroutinesApi::class) | ||
| class AudioReaderFragment : BaseReaderFragment(), SeekBar.OnSeekBarChangeListener { | ||
|
|
||
| override val model: ReaderViewModel by activityViewModels() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be moved to BaseReaderFragment to simplify the subclasses.
|
|
||
| override fun onCreateOptionsMenu(menu: Menu, menuInflater: MenuInflater) { | ||
| super.onCreateOptionsMenu(menu, menuInflater) | ||
| menuInflater.inflate(R.menu.menu_epub, menu) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All the options menu items are now in a single layout inflated by BaseReaderFragment.
| private lateinit var model: ReaderViewModel | ||
| private val model: ReaderViewModel by viewModels() | ||
|
|
||
| override fun getDefaultViewModelProviderFactory(): ViewModelProvider.Factory { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change seems pointless but I refactored it twice. At some point TtsViewModel was a proper Android ViewModel and so I had a composite view model factory.
| insets | ||
| } | ||
|
|
||
| binding.overlay.setContent { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a Jetpack Compose overlay view on top of the reader in VisualReaderFragment. This is used to display the TTS controls above the publication, but also to prevent user interaction on the pages while the TTS is running.
| setupHighlights(this) | ||
| setupSearch(this) | ||
| setupTts(this) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I split the setup to make it clearer to people wanting to integrate only one of these features.
| * Sets the emphasis (alpha) of a group of [Composable] views. | ||
| */ | ||
| @Composable | ||
| fun Group(lowEmphasis: Boolean = false, enabled: Boolean = true, content: @Composable () -> Unit) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not really necessary, but I find the recommended approach for de-emphasizing elements really heavy and unclear.
| /** | ||
| * Utterances to be synthesized, in order of [speak] calls. | ||
| */ | ||
| private val tasks = Channel<UtteranceTask>(Channel.BUFFERED) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using a Channel ensures that the utterances are played in the same order as the speak() calls.
| * | ||
| * This is used to interrupt on-going commands. | ||
| */ | ||
| private fun replacePlaybackJob(block: suspend CoroutineScope.() -> Unit) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's the solution I came up with to be able to skip utterances or interrupt a running utterance. As only one playback job is allowed at any time, when you start a new command the previous one is cancelled.
| val showControls by model.state.asStateWhenStarted { it.showControls } | ||
| val isPlaying by model.state.asStateWhenStarted { it.isPlaying } | ||
| val settings by model.state.asStateWhenStarted { it.settings } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I started with the usual:
val state by model.state.flowWithLocalLifecycle().collectAsState()But as I used a single State object that is updated every time a word is spoken, that triggered a lot of recomposition. I came up with this asStateWhenStarted extension that select a property of the global state instead, and is basically (it's a bit more complicated, see Flow.kt) equivalent to:
val showControls by remember {
model.state
.map { it.showControls }
.flowWithLocalLifecycle()
}.collectAsState()# Conflicts: # readium/navigator/src/main/java/org/readium/r2/navigator/epub/EpubNavigatorFragment.kt
commit f1e27aa Author: Mickaël Menu <mickael.menu@gmail.com> Date: Mon Jul 11 15:54:11 2022 +0200 Fix loading pending locators commit 9d95f83 Merge: 6d68be2 188b636 Author: Mickaël Menu <mickael.menu@gmail.com> Date: Mon Jul 11 12:37:59 2022 +0200 Merge branch 'develop' into feature/tts # Conflicts: # readium/navigator/src/main/java/org/readium/r2/navigator/epub/EpubNavigatorFragment.kt commit 6d68be2 Author: Mickaël Menu <mickael.menu@gmail.com> Date: Fri Jul 8 11:22:44 2022 +0200 Add missing doc comments commit 92e97b4 Author: Mickaël Menu <mickael.menu@gmail.com> Date: Fri Jul 8 09:31:46 2022 +0200 Rename `TtsDirector` into `PublicationSpeechSynthesizer` commit d8fc300 Merge: 2489a6c 8b78f27 Author: Mickaël Menu <mickael.menu@gmail.com> Date: Fri Jul 8 09:02:39 2022 +0200 Merge branch 'develop' into feature/tts commit 2489a6c Author: Mickaël Menu <mickael.menu@gmail.com> Date: Fri Jul 8 09:02:14 2022 +0200 Fix Groovy snippets highlighting commit 2a4c518 Author: Mickaël Menu <mickael.menu@gmail.com> Date: Thu Jul 7 19:39:04 2022 +0200 Fix user guides commit efbf37b Author: Mickaël Menu <mickael.menu@gmail.com> Date: Thu Jul 7 19:19:15 2022 +0200 Remove `Content.ImageElement.description commit 0474f97 Author: Mickaël Menu <mickael.menu@gmail.com> Date: Thu Jul 7 18:58:36 2022 +0200 Update the changelog commit a6b7001 Author: Mickaël Menu <mickael.menu@gmail.com> Date: Thu Jul 7 18:51:57 2022 +0200 Add TTS guide commit 89c574d Author: Mickaël Menu <mickael.menu@gmail.com> Date: Thu Jul 7 17:40:36 2022 +0200 Refactor the configuration commit fe3262d Author: Mickaël Menu <mickael.menu@gmail.com> Date: Thu Jul 7 17:30:44 2022 +0200 Remove `Failure` state to simplify commit a6767e4 Author: Mickaël Menu <mickael.menu@gmail.com> Date: Thu Jul 7 15:32:08 2022 +0200 Add documentation commit cd8c157 Author: Mickaël Menu <mickael.menu@gmail.com> Date: Thu Jul 7 14:08:32 2022 +0200 Reorganize `Content` attributes commit da664ce Author: Mickaël Menu <mickael.menu@gmail.com> Date: Thu Jul 7 12:02:43 2022 +0200 Simplify `Content` models commit bfa97ed Author: Mickaël Menu <mickael.menu@gmail.com> Date: Thu Jul 7 10:26:15 2022 +0200 Refactor `Content` to use a more idiomatic iterator commit 627456b Author: Mickaël Menu <mickael.menu@gmail.com> Date: Wed Jul 6 14:32:29 2022 +0200 Fix strings commit 30c3ff2 Author: Mickaël Menu <mickael.menu@gmail.com> Date: Wed Jul 6 12:49:15 2022 +0200 Fix `PublicationContentIterator` commit ef39264 Author: Mickaël Menu <mickael.menu@gmail.com> Date: Wed Jul 6 11:24:06 2022 +0200 Doc comments and renames commit 4e65684 Author: Mickaël Menu <mickael.menu@gmail.com> Date: Tue Jul 5 17:06:32 2022 +0200 Refactoring and documentation commit 41da132 Author: Mickaël Menu <mickael.menu@gmail.com> Date: Tue Jul 5 09:32:14 2022 +0200 Clean up commit 6b10acc Author: Mickaël Menu <mickael.menu@gmail.com> Date: Mon Jul 4 19:39:01 2022 +0200 Toggle TTS commit 03edb96 Author: Mickaël Menu <mickael.menu@gmail.com> Date: Mon Jul 4 19:32:11 2022 +0200 Fix tests commit 4484783 Author: Mickaël Menu <mickael.menu@gmail.com> Date: Mon Jul 4 18:33:28 2022 +0200 Fix lints commit 9aaa664 Author: Mickaël Menu <mickael.menu@gmail.com> Date: Mon Jul 4 17:48:28 2022 +0200 Disable touches during playback commit 5c15d4d Merge: afa6378 7e0ce8a Author: Mickaël Menu <mickael.menu@gmail.com> Date: Mon Jul 4 14:35:37 2022 +0200 Merge branch 'develop' into feature/tts # Conflicts: # readium/navigator/src/main/java/org/readium/r2/navigator/epub/EpubNavigatorFragment.kt # readium/shared/src/main/java/org/readium/r2/shared/OptIn.kt # readium/shared/src/main/java/org/readium/r2/shared/publication/Publication.kt # test-app/src/main/java/org/readium/r2/testapp/reader/PdfReaderFragment.kt commit afa6378 Author: Mickaël Menu <mickael.menu@gmail.com> Date: Mon Jul 4 14:24:37 2022 +0200 More refactoring commit 40ae740 Author: Mickaël Menu <mickael.menu@gmail.com> Date: Thu Jun 30 16:25:21 2022 +0200 More refactoring commit b98cf6b Author: Mickaël Menu <mickael.menu@gmail.com> Date: Thu Jun 30 16:24:50 2022 +0200 Simplify `TtsViewModel` commit a6807f1 Author: Mickaël Menu <mickael.menu@gmail.com> Date: Thu Jun 30 14:41:00 2022 +0200 Simplify view model and controls commit 9783238 Author: Mickaël Menu <mickael.menu@gmail.com> Date: Thu Jun 30 14:04:38 2022 +0200 Fix performance issue with compositions commit 998e18f Author: Mickaël Menu <mickael.menu@gmail.com> Date: Tue Jun 28 15:36:50 2022 +0200 Improve voice selection commit c68d25f Author: Mickaël Menu <mickael.menu@gmail.com> Date: Tue Jun 28 14:21:21 2022 +0200 Refactor language and config constraints commit d4b3dd6 Author: Mickaël Menu <mickael.menu@gmail.com> Date: Mon Jun 27 19:15:18 2022 +0200 Extract the `TtsViewModel` commit ae631ee Author: Mickaël Menu <mickael.menu@gmail.com> Date: Mon Jun 27 16:17:32 2022 +0200 Add `Voice` commit b6a1c6c Author: Mickaël Menu <mickael.menu@gmail.com> Date: Mon Jun 27 13:11:34 2022 +0200 Handle installing missing TTS voice data commit a06ec26 Author: Mickaël Menu <mickael.menu@gmail.com> Date: Mon Jun 27 11:54:08 2022 +0200 Improve error reporting commit 1d4bd86 Author: Mickaël Menu <mickael.menu@gmail.com> Date: Mon Jun 27 10:23:04 2022 +0200 Add TTS config commit 89467d0 Author: Mickaël Menu <mickael.menu@gmail.com> Date: Thu Jun 23 12:59:57 2022 +0200 Add TTS controls commit af1f4c1 Author: Mickaël Menu <mickael.menu@gmail.com> Date: Mon Jun 20 19:36:34 2022 +0200 Basic integration of the new TTS commit 1f9504d Author: Mickaël Menu <mickael.menu@gmail.com> Date: Mon Jun 20 18:25:34 2022 +0200 Backport changes from Swift commit 2c79cdd Author: Mickaël Menu <mickael.menu@gmail.com> Date: Thu Jun 16 17:31:54 2022 +0200 Refactor the reader options menu and remove the old TTS implementation commit e18a8dd Merge: ba8b491 96e07b2 Author: Mickaël Menu <mickael.menu@gmail.com> Date: Thu Jun 9 11:17:11 2022 +0200 Merge branch 'develop' into feature/tts commit ba8b491 Merge: f803660 ce1c73e Author: Mickaël Menu <mickael.menu@gmail.com> Date: Fri Jun 3 11:46:18 2022 +0200 Merge branch 'develop' into feature/tts commit f803660 Merge: e092bb1 cc73996 Author: Mickaël Menu <mickael.menu@gmail.com> Date: Tue May 3 13:24:16 2022 +0200 Merge branch 'develop' of github.com:readium/kotlin-toolkit into feature/tts commit e092bb1 Author: Mickaël Menu <mickael.menu@gmail.com> Date: Wed Apr 13 11:14:12 2022 +0200 Fix regression when loading a locator of an already loaded resource. commit 4cbbad7 Author: Mickaël Menu <mickael.menu@gmail.com> Date: Wed Apr 6 17:13:54 2022 +0200 Fix current locator not being propagated commit c62fa13 Author: Mickaël Menu <mickael.menu@gmail.com> Date: Wed Apr 6 14:57:21 2022 +0200 Fix crash commit 6679272 Author: Mickaël Menu <mickael.menu@gmail.com> Date: Tue Apr 5 18:45:10 2022 +0200 Optimize first visible element selection commit 41e2bdc Author: Mickaël Menu <mickael.menu@gmail.com> Date: Tue Apr 5 15:56:55 2022 +0200 Start the iteration from a CSS selector commit 9c34487 Author: Mickaël Menu <mickael.menu@gmail.com> Date: Mon Apr 4 22:17:25 2022 +0200 Optimize first element detection commit 04a00d8 Author: Mickaël Menu <mickael.menu@gmail.com> Date: Fri Apr 1 18:21:38 2022 +0200 Add isSpeaking, playPause() commit 10e2704 Author: Mickaël Menu <mickael.menu@gmail.com> Date: Tue Mar 29 09:17:16 2022 +0200 Make the default locale optional commit f85efde Author: Mickaël Menu <mickael.menu@gmail.com> Date: Fri Mar 25 20:39:34 2022 +0100 Remove the dependency to the navigator in `TextToSpeechController` commit f465c13 Author: Mickaël Menu <mickael.menu@gmail.com> Date: Wed Mar 23 19:50:18 2022 +0100 Optimize the resolution of Locators when a CSS selector is provided commit d88a960 Author: Mickaël Menu <mickael.menu@gmail.com> Date: Wed Mar 23 19:49:49 2022 +0100 Improve the `TextToSpeechController` commit ef11ff1 Author: Mickaël Menu <mickael.menu@gmail.com> Date: Tue Mar 22 18:23:05 2022 +0100 Fix DOM iterator commit 636288e Author: Mickaël Menu <mickael.menu@gmail.com> Date: Tue Mar 22 08:54:34 2022 +0100 Add the ContentIterator commit 3cf8797 Author: Mickaël Menu <mickael.menu@gmail.com> Date: Wed Mar 16 11:42:44 2022 +0100 Add JS Dom utils commit 1f430dd Author: Mickaël Menu <mickael.menu@gmail.com> Date: Thu Mar 10 17:01:55 2022 +0100 Add the `TextToSpeechController` and more refactorings commit 3dfb35b Author: Mickaël Menu <mickael.menu@gmail.com> Date: Mon Mar 7 13:51:30 2022 +0100 Don't refresh an already loaded resource when jumping to a locator commit c1af172 Author: Mickaël Menu <mickael.menu@gmail.com> Date: Mon Mar 7 12:48:30 2022 +0100 Refactoring and implement the default `TextIteratorService` commit 0100ea0 Author: Mickaël Menu <mickael.menu@gmail.com> Date: Fri Mar 4 11:01:36 2022 +0100 Add the `NaiveUnitTextContentTokenizer` commit 81e3371 Author: Mickaël Menu <mickael.menu@gmail.com> Date: Thu Mar 3 18:57:13 2022 +0100 Add `IcuUnitTextContentTokenizer`
52fdb47 to
127a65a
Compare
Changelog
Added
Shared
Navigator
Review notes
A good first step is to read the two user guides linked in the changelog above.
Then I suggest going in this order:
Content.ktcontains theContentmodels as well as theContent.Iteratorinterface.PublicationContentIterator.ktis a general implementation to iterate over a full publication, it delegates the actual extraction toResourceContentIteratorinstances.HtmlResourceContentIteratoris a specializedContent.Iteratorwhich can extract the content from an HTMLResource.shared/util/tokenizeris a new package providing utilities to tokenize pieces of data.TextTokenizer.ktis used to split aStringinto multipleIntRangecomponents. You can split a paragraph into sentences with it.ContentTokenizer.ktsplits a singleContent.Elementinto a list ofContent.Element.TextContentTokenizerimplementation delegates the actual text tokenizing to a providedTextTokenizerto splitContent.TextElementinto multipleContent.TextElement. Then, it will split theLocator.Textmodel to match the generated tokens.Language.ktis basicallyLocalebut (ultimately) independent from the JVM.EpubNavigatorwas updated to not refresh the current resource when jumping to a locator in it. The TTS jumps to locator every seconds so it was causing a lot of visual refreshes.navigator/ttspackage.sharedor even a newreadium-ttsmodule. However I put it innavigatoras it's usually used in combination with a Navigator. We could raise this discussion in a call.TtsEngineis an interface to implement to support a third-party TTS engine. It is pretty simple with aspeak()API to speak a single utterance. It's suspending so cancelling the job cancels the speech. Callingspeak()multiple time should queue the utterances (the caller is suspended until its utterance is played).AndroidTtsEngineis an implementation using the native Android TTSPublicationSpeechSynthesizeris the high-level orchestrator that an app can use to read a whole publication aloud.BaseReaderFragmentoptions menu to have a single one. Before EPUB had a specific options menu layout which contained the TTS button.TtsViewModelwhich will be composed insideReaderViewModelto make it easier to look at TTS stuff.overlayview on top of the reader inVisualReaderFragment. This is used to display the TTS controls above the publication, but also to prevent user interaction on the pages while the TTS is running.Unfinished stuff
The following can be done in follow-up PRs as it's not critical right now:
HtmlResourceContentIteratorto skip specific HTML nodes.ContentServiceimplementation for PDF documents.