Skip to content

Conversation

@mickael-menu
Copy link
Member

@mickael-menu mickael-menu commented May 2, 2022

Changelog

Added

Shared

Navigator

Review notes

A good first step is to read the two user guides linked in the changelog above.

Then I suggest going in this order:

  • Implementation of the Content Iterator proposal (with some modifications).
    • Content.kt contains the Content models as well as the Content.Iterator interface.
    • PublicationContentIterator.kt is a general implementation to iterate over a full publication, it delegates the actual extraction to ResourceContentIterator instances.
    • HtmlResourceContentIterator is a specialized Content.Iterator which can extract the content from an HTML Resource.
  • shared/util/tokenizer is a new package providing utilities to tokenize pieces of data.
    • TextTokenizer.kt is used to split a String into multiple IntRange components. You can split a paragraph into sentences with it.
    • ContentTokenizer.kt splits a single Content.Element into a list of Content.Element.
      • The default TextContentTokenizer implementation delegates the actual text tokenizing to a provided TextTokenizer to split Content.TextElement into multiple Content.TextElement. Then, it will split the Locator.Text model to match the generated tokens.
  • Language.kt is basically Locale but (ultimately) independent from the JVM.
  • EpubNavigator was updated to not refresh the current resource when jumping to a locator in it. The TTS jumps to locator every seconds so it was causing a lot of visual refreshes.
  • navigator/tts package.
    • Note that the TTS is completely independent from the Navigator, we could put it in shared or even a new readium-tts module. However I put it in navigator as it's usually used in combination with a Navigator. We could raise this discussion in a call.
    • TtsEngine is an interface to implement to support a third-party TTS engine. It is pretty simple with a speak() API to speak a single utterance. It's suspending so cancelling the job cancels the speech. Calling speak() multiple time should queue the utterances (the caller is suspended until its utterance is played).
    • AndroidTtsEngine is an implementation using the native Android TTS
    • PublicationSpeechSynthesizer is the high-level orchestrator that an app can use to read a whole publication aloud.
  • In the Test App:
    • Removed the legacy TTS implementation which was entirely in the test app
    • Refactored the BaseReaderFragment options menu to have a single one. Before EPUB had a specific options menu layout which contained the TTS button.
    • Added a TtsViewModel which will be composed inside ReaderViewModel to make it easier to look at TTS stuff.
    • Added a Jetpack Compose overlay view on top of the reader in VisualReaderFragment. This is used to display the TTS controls above the publication, but also to prevent user interaction on the pages while the TTS is running.

Unfinished stuff

The following can be done in follow-up PRs as it's not critical right now:

  • Supporting custom filters in the HtmlResourceContentIterator to skip specific HTML nodes.
  • Better support for various HTML tags and ARIA attributes.
  • Adding a ContentService implementation for PDF documents.

@mickael-menu mickael-menu changed the title Text to speech Text-to-speech and content extraction Jul 7, 2022
* Returns the [Locator] to the first content element that begins on the current screen.
*/
@ExperimentalReadiumApi
suspend fun firstVisibleElementLocator(): Locator? =
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was added to start the TTS from the currently visible page instead of the start of the publication. Using the progression in the currentLocator was not precise enough, and firstVisibleElementLocator() makes sure to target the first HTML element fully visible or starting on the page.

)
@Retention(value = AnnotationRetention.BINARY)
@Target(AnnotationTarget.CLASS, AnnotationTarget.FUNCTION, AnnotationTarget.TYPEALIAS, AnnotationTarget.PROPERTY)
annotation class DelicateReadiumApi
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inspired by @DelicateCoroutinesApi. We can use it to tag a dangerous Readium API that should be used after carefully reading the doc.

/**
* Reads the full content as a [Bitmap].
*/
suspend fun readAsBitmap(): ResourceTry<Bitmap> =
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not necessary for this PR, but this seems to be a useful helper. I used it in the example of the content.md user guide to build an index of images.

val startIndex: Int,
)

private class ContentParser(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This algorithm is largely stole inspired by jsoup's text(), to extract a normalized text of an element.


import java.util.*

class Language private constructor(val code: String, val locale: Locale) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I introduced this Language to have a higher language construct that doesn't depend on the JVM, and to add some useful helpers such as removeRegion().

}
})

// Speech speed
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the legacy TTS implementation that was entirely in the test app.

@OptIn(ExperimentalMedia2::class, ExperimentalTime::class, ExperimentalCoroutinesApi::class)
class AudioReaderFragment : BaseReaderFragment(), SeekBar.OnSeekBarChangeListener {

override val model: ReaderViewModel by activityViewModels()
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be moved to BaseReaderFragment to simplify the subclasses.


override fun onCreateOptionsMenu(menu: Menu, menuInflater: MenuInflater) {
super.onCreateOptionsMenu(menu, menuInflater)
menuInflater.inflate(R.menu.menu_epub, menu)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the options menu items are now in a single layout inflated by BaseReaderFragment.

private lateinit var model: ReaderViewModel
private val model: ReaderViewModel by viewModels()

override fun getDefaultViewModelProviderFactory(): ViewModelProvider.Factory {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change seems pointless but I refactored it twice. At some point TtsViewModel was a proper Android ViewModel and so I had a composite view model factory.

insets
}

binding.overlay.setContent {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a Jetpack Compose overlay view on top of the reader in VisualReaderFragment. This is used to display the TTS controls above the publication, but also to prevent user interaction on the pages while the TTS is running.

Comment on lines +140 to +142
setupHighlights(this)
setupSearch(this)
setupTts(this)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I split the setup to make it clearer to people wanting to integrate only one of these features.

* Sets the emphasis (alpha) of a group of [Composable] views.
*/
@Composable
fun Group(lowEmphasis: Boolean = false, enabled: Boolean = true, content: @Composable () -> Unit) {
Copy link
Member Author

@mickael-menu mickael-menu Jul 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not really necessary, but I find the recommended approach for de-emphasizing elements really heavy and unclear.

/**
* Utterances to be synthesized, in order of [speak] calls.
*/
private val tasks = Channel<UtteranceTask>(Channel.BUFFERED)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using a Channel ensures that the utterances are played in the same order as the speak() calls.

*
* This is used to interrupt on-going commands.
*/
private fun replacePlaybackJob(block: suspend CoroutineScope.() -> Unit) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's the solution I came up with to be able to skip utterances or interrupt a running utterance. As only one playback job is allowed at any time, when you start a new command the previous one is cancelled.

@mickael-menu mickael-menu marked this pull request as ready for review July 8, 2022 09:41
@mickael-menu mickael-menu requested a review from qnga July 8, 2022 09:42
Comment on lines +34 to +36
val showControls by model.state.asStateWhenStarted { it.showControls }
val isPlaying by model.state.asStateWhenStarted { it.isPlaying }
val settings by model.state.asStateWhenStarted { it.settings }
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I started with the usual:

val state by model.state.flowWithLocalLifecycle().collectAsState()

But as I used a single State object that is updated every time a word is spoken, that triggered a lot of recomposition. I came up with this asStateWhenStarted extension that select a property of the global state instead, and is basically (it's a bit more complicated, see Flow.kt) equivalent to:

val showControls by remember {
    model.state
        .map { it.showControls }
        .flowWithLocalLifecycle()
}.collectAsState()

@mickael-menu mickael-menu mentioned this pull request Jul 8, 2022
# Conflicts:
#	readium/navigator/src/main/java/org/readium/r2/navigator/epub/EpubNavigatorFragment.kt
mickael-menu added a commit to demarque/readium-kotlin-toolkit that referenced this pull request Jul 11, 2022
commit f1e27aa
Author: Mickaël Menu <mickael.menu@gmail.com>
Date:   Mon Jul 11 15:54:11 2022 +0200

    Fix loading pending locators

commit 9d95f83
Merge: 6d68be2 188b636
Author: Mickaël Menu <mickael.menu@gmail.com>
Date:   Mon Jul 11 12:37:59 2022 +0200

    Merge branch 'develop' into feature/tts

    # Conflicts:
    #	readium/navigator/src/main/java/org/readium/r2/navigator/epub/EpubNavigatorFragment.kt

commit 6d68be2
Author: Mickaël Menu <mickael.menu@gmail.com>
Date:   Fri Jul 8 11:22:44 2022 +0200

    Add missing doc comments

commit 92e97b4
Author: Mickaël Menu <mickael.menu@gmail.com>
Date:   Fri Jul 8 09:31:46 2022 +0200

    Rename `TtsDirector` into `PublicationSpeechSynthesizer`

commit d8fc300
Merge: 2489a6c 8b78f27
Author: Mickaël Menu <mickael.menu@gmail.com>
Date:   Fri Jul 8 09:02:39 2022 +0200

    Merge branch 'develop' into feature/tts

commit 2489a6c
Author: Mickaël Menu <mickael.menu@gmail.com>
Date:   Fri Jul 8 09:02:14 2022 +0200

    Fix Groovy snippets highlighting

commit 2a4c518
Author: Mickaël Menu <mickael.menu@gmail.com>
Date:   Thu Jul 7 19:39:04 2022 +0200

    Fix user guides

commit efbf37b
Author: Mickaël Menu <mickael.menu@gmail.com>
Date:   Thu Jul 7 19:19:15 2022 +0200

    Remove `Content.ImageElement.description

commit 0474f97
Author: Mickaël Menu <mickael.menu@gmail.com>
Date:   Thu Jul 7 18:58:36 2022 +0200

    Update the changelog

commit a6b7001
Author: Mickaël Menu <mickael.menu@gmail.com>
Date:   Thu Jul 7 18:51:57 2022 +0200

    Add TTS guide

commit 89c574d
Author: Mickaël Menu <mickael.menu@gmail.com>
Date:   Thu Jul 7 17:40:36 2022 +0200

    Refactor the configuration

commit fe3262d
Author: Mickaël Menu <mickael.menu@gmail.com>
Date:   Thu Jul 7 17:30:44 2022 +0200

    Remove `Failure` state to simplify

commit a6767e4
Author: Mickaël Menu <mickael.menu@gmail.com>
Date:   Thu Jul 7 15:32:08 2022 +0200

    Add documentation

commit cd8c157
Author: Mickaël Menu <mickael.menu@gmail.com>
Date:   Thu Jul 7 14:08:32 2022 +0200

    Reorganize `Content` attributes

commit da664ce
Author: Mickaël Menu <mickael.menu@gmail.com>
Date:   Thu Jul 7 12:02:43 2022 +0200

    Simplify `Content` models

commit bfa97ed
Author: Mickaël Menu <mickael.menu@gmail.com>
Date:   Thu Jul 7 10:26:15 2022 +0200

    Refactor `Content` to use a more idiomatic iterator

commit 627456b
Author: Mickaël Menu <mickael.menu@gmail.com>
Date:   Wed Jul 6 14:32:29 2022 +0200

    Fix strings

commit 30c3ff2
Author: Mickaël Menu <mickael.menu@gmail.com>
Date:   Wed Jul 6 12:49:15 2022 +0200

    Fix `PublicationContentIterator`

commit ef39264
Author: Mickaël Menu <mickael.menu@gmail.com>
Date:   Wed Jul 6 11:24:06 2022 +0200

    Doc comments and renames

commit 4e65684
Author: Mickaël Menu <mickael.menu@gmail.com>
Date:   Tue Jul 5 17:06:32 2022 +0200

    Refactoring and documentation

commit 41da132
Author: Mickaël Menu <mickael.menu@gmail.com>
Date:   Tue Jul 5 09:32:14 2022 +0200

    Clean up

commit 6b10acc
Author: Mickaël Menu <mickael.menu@gmail.com>
Date:   Mon Jul 4 19:39:01 2022 +0200

    Toggle TTS

commit 03edb96
Author: Mickaël Menu <mickael.menu@gmail.com>
Date:   Mon Jul 4 19:32:11 2022 +0200

    Fix tests

commit 4484783
Author: Mickaël Menu <mickael.menu@gmail.com>
Date:   Mon Jul 4 18:33:28 2022 +0200

    Fix lints

commit 9aaa664
Author: Mickaël Menu <mickael.menu@gmail.com>
Date:   Mon Jul 4 17:48:28 2022 +0200

    Disable touches during playback

commit 5c15d4d
Merge: afa6378 7e0ce8a
Author: Mickaël Menu <mickael.menu@gmail.com>
Date:   Mon Jul 4 14:35:37 2022 +0200

    Merge branch 'develop' into feature/tts

    # Conflicts:
    #	readium/navigator/src/main/java/org/readium/r2/navigator/epub/EpubNavigatorFragment.kt
    #	readium/shared/src/main/java/org/readium/r2/shared/OptIn.kt
    #	readium/shared/src/main/java/org/readium/r2/shared/publication/Publication.kt
    #	test-app/src/main/java/org/readium/r2/testapp/reader/PdfReaderFragment.kt

commit afa6378
Author: Mickaël Menu <mickael.menu@gmail.com>
Date:   Mon Jul 4 14:24:37 2022 +0200

    More refactoring

commit 40ae740
Author: Mickaël Menu <mickael.menu@gmail.com>
Date:   Thu Jun 30 16:25:21 2022 +0200

    More refactoring

commit b98cf6b
Author: Mickaël Menu <mickael.menu@gmail.com>
Date:   Thu Jun 30 16:24:50 2022 +0200

    Simplify `TtsViewModel`

commit a6807f1
Author: Mickaël Menu <mickael.menu@gmail.com>
Date:   Thu Jun 30 14:41:00 2022 +0200

    Simplify view model and controls

commit 9783238
Author: Mickaël Menu <mickael.menu@gmail.com>
Date:   Thu Jun 30 14:04:38 2022 +0200

    Fix performance issue with compositions

commit 998e18f
Author: Mickaël Menu <mickael.menu@gmail.com>
Date:   Tue Jun 28 15:36:50 2022 +0200

    Improve voice selection

commit c68d25f
Author: Mickaël Menu <mickael.menu@gmail.com>
Date:   Tue Jun 28 14:21:21 2022 +0200

    Refactor language and config constraints

commit d4b3dd6
Author: Mickaël Menu <mickael.menu@gmail.com>
Date:   Mon Jun 27 19:15:18 2022 +0200

    Extract the `TtsViewModel`

commit ae631ee
Author: Mickaël Menu <mickael.menu@gmail.com>
Date:   Mon Jun 27 16:17:32 2022 +0200

    Add `Voice`

commit b6a1c6c
Author: Mickaël Menu <mickael.menu@gmail.com>
Date:   Mon Jun 27 13:11:34 2022 +0200

    Handle installing missing TTS voice data

commit a06ec26
Author: Mickaël Menu <mickael.menu@gmail.com>
Date:   Mon Jun 27 11:54:08 2022 +0200

    Improve error reporting

commit 1d4bd86
Author: Mickaël Menu <mickael.menu@gmail.com>
Date:   Mon Jun 27 10:23:04 2022 +0200

    Add TTS config

commit 89467d0
Author: Mickaël Menu <mickael.menu@gmail.com>
Date:   Thu Jun 23 12:59:57 2022 +0200

    Add TTS controls

commit af1f4c1
Author: Mickaël Menu <mickael.menu@gmail.com>
Date:   Mon Jun 20 19:36:34 2022 +0200

    Basic integration of the new TTS

commit 1f9504d
Author: Mickaël Menu <mickael.menu@gmail.com>
Date:   Mon Jun 20 18:25:34 2022 +0200

    Backport changes from Swift

commit 2c79cdd
Author: Mickaël Menu <mickael.menu@gmail.com>
Date:   Thu Jun 16 17:31:54 2022 +0200

    Refactor the reader options menu and remove the old TTS implementation

commit e18a8dd
Merge: ba8b491 96e07b2
Author: Mickaël Menu <mickael.menu@gmail.com>
Date:   Thu Jun 9 11:17:11 2022 +0200

    Merge branch 'develop' into feature/tts

commit ba8b491
Merge: f803660 ce1c73e
Author: Mickaël Menu <mickael.menu@gmail.com>
Date:   Fri Jun 3 11:46:18 2022 +0200

    Merge branch 'develop' into feature/tts

commit f803660
Merge: e092bb1 cc73996
Author: Mickaël Menu <mickael.menu@gmail.com>
Date:   Tue May 3 13:24:16 2022 +0200

    Merge branch 'develop' of github.com:readium/kotlin-toolkit into feature/tts

commit e092bb1
Author: Mickaël Menu <mickael.menu@gmail.com>
Date:   Wed Apr 13 11:14:12 2022 +0200

    Fix regression when loading a locator of an already loaded resource.

commit 4cbbad7
Author: Mickaël Menu <mickael.menu@gmail.com>
Date:   Wed Apr 6 17:13:54 2022 +0200

    Fix current locator not being propagated

commit c62fa13
Author: Mickaël Menu <mickael.menu@gmail.com>
Date:   Wed Apr 6 14:57:21 2022 +0200

    Fix crash

commit 6679272
Author: Mickaël Menu <mickael.menu@gmail.com>
Date:   Tue Apr 5 18:45:10 2022 +0200

    Optimize first visible element selection

commit 41e2bdc
Author: Mickaël Menu <mickael.menu@gmail.com>
Date:   Tue Apr 5 15:56:55 2022 +0200

    Start the iteration from a CSS selector

commit 9c34487
Author: Mickaël Menu <mickael.menu@gmail.com>
Date:   Mon Apr 4 22:17:25 2022 +0200

    Optimize first element detection

commit 04a00d8
Author: Mickaël Menu <mickael.menu@gmail.com>
Date:   Fri Apr 1 18:21:38 2022 +0200

    Add isSpeaking, playPause()

commit 10e2704
Author: Mickaël Menu <mickael.menu@gmail.com>
Date:   Tue Mar 29 09:17:16 2022 +0200

    Make the default locale optional

commit f85efde
Author: Mickaël Menu <mickael.menu@gmail.com>
Date:   Fri Mar 25 20:39:34 2022 +0100

    Remove the dependency to the navigator in `TextToSpeechController`

commit f465c13
Author: Mickaël Menu <mickael.menu@gmail.com>
Date:   Wed Mar 23 19:50:18 2022 +0100

    Optimize the resolution of Locators when a CSS selector is provided

commit d88a960
Author: Mickaël Menu <mickael.menu@gmail.com>
Date:   Wed Mar 23 19:49:49 2022 +0100

    Improve the `TextToSpeechController`

commit ef11ff1
Author: Mickaël Menu <mickael.menu@gmail.com>
Date:   Tue Mar 22 18:23:05 2022 +0100

    Fix DOM iterator

commit 636288e
Author: Mickaël Menu <mickael.menu@gmail.com>
Date:   Tue Mar 22 08:54:34 2022 +0100

    Add the ContentIterator

commit 3cf8797
Author: Mickaël Menu <mickael.menu@gmail.com>
Date:   Wed Mar 16 11:42:44 2022 +0100

    Add JS Dom utils

commit 1f430dd
Author: Mickaël Menu <mickael.menu@gmail.com>
Date:   Thu Mar 10 17:01:55 2022 +0100

    Add the `TextToSpeechController` and more refactorings

commit 3dfb35b
Author: Mickaël Menu <mickael.menu@gmail.com>
Date:   Mon Mar 7 13:51:30 2022 +0100

    Don't refresh an already loaded resource when jumping to a locator

commit c1af172
Author: Mickaël Menu <mickael.menu@gmail.com>
Date:   Mon Mar 7 12:48:30 2022 +0100

    Refactoring and implement the default `TextIteratorService`

commit 0100ea0
Author: Mickaël Menu <mickael.menu@gmail.com>
Date:   Fri Mar 4 11:01:36 2022 +0100

    Add the `NaiveUnitTextContentTokenizer`

commit 81e3371
Author: Mickaël Menu <mickael.menu@gmail.com>
Date:   Thu Mar 3 18:57:13 2022 +0100

    Add `IcuUnitTextContentTokenizer`
@mickael-menu mickael-menu merged commit a93d0a2 into develop Jul 20, 2022
@mickael-menu mickael-menu deleted the feature/tts branch July 20, 2022 17:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants