Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
65 commits
Select commit Hold shift + click to select a range
81e3371
Add `IcuUnitTextContentTokenizer`
mickael-menu Mar 3, 2022
0100ea0
Add the `NaiveUnitTextContentTokenizer`
mickael-menu Mar 4, 2022
c1af172
Refactoring and implement the default `TextIteratorService`
mickael-menu Mar 7, 2022
3dfb35b
Don't refresh an already loaded resource when jumping to a locator
mickael-menu Mar 7, 2022
1f430dd
Add the `TextToSpeechController` and more refactorings
mickael-menu Mar 10, 2022
3cf8797
Add JS Dom utils
mickael-menu Mar 16, 2022
636288e
Add the ContentIterator
mickael-menu Mar 22, 2022
ef11ff1
Fix DOM iterator
mickael-menu Mar 22, 2022
d88a960
Improve the `TextToSpeechController`
mickael-menu Mar 23, 2022
f465c13
Optimize the resolution of Locators when a CSS selector is provided
mickael-menu Mar 23, 2022
f85efde
Remove the dependency to the navigator in `TextToSpeechController`
mickael-menu Mar 25, 2022
10e2704
Make the default locale optional
mickael-menu Mar 29, 2022
04a00d8
Add isSpeaking, playPause()
mickael-menu Apr 1, 2022
9c34487
Optimize first element detection
mickael-menu Apr 4, 2022
41e2bdc
Start the iteration from a CSS selector
mickael-menu Apr 5, 2022
6679272
Optimize first visible element selection
mickael-menu Apr 5, 2022
c62fa13
Fix crash
mickael-menu Apr 6, 2022
4cbbad7
Fix current locator not being propagated
mickael-menu Apr 6, 2022
e092bb1
Fix regression when loading a locator of an already loaded resource.
mickael-menu Apr 13, 2022
f803660
Merge branch 'develop' of github.com:readium/kotlin-toolkit into feat…
mickael-menu May 3, 2022
ba8b491
Merge branch 'develop' into feature/tts
mickael-menu Jun 3, 2022
e18a8dd
Merge branch 'develop' into feature/tts
mickael-menu Jun 9, 2022
2c79cdd
Refactor the reader options menu and remove the old TTS implementation
mickael-menu Jun 16, 2022
1f9504d
Backport changes from Swift
mickael-menu Jun 20, 2022
af1f4c1
Basic integration of the new TTS
mickael-menu Jun 20, 2022
89467d0
Add TTS controls
mickael-menu Jun 23, 2022
1d4bd86
Add TTS config
mickael-menu Jun 27, 2022
a06ec26
Improve error reporting
mickael-menu Jun 27, 2022
b6a1c6c
Handle installing missing TTS voice data
mickael-menu Jun 27, 2022
ae631ee
Add `Voice`
mickael-menu Jun 27, 2022
d4b3dd6
Extract the `TtsViewModel`
mickael-menu Jun 27, 2022
c68d25f
Refactor language and config constraints
mickael-menu Jun 28, 2022
998e18f
Improve voice selection
mickael-menu Jun 28, 2022
9783238
Fix performance issue with compositions
mickael-menu Jun 30, 2022
a6807f1
Simplify view model and controls
mickael-menu Jun 30, 2022
b98cf6b
Simplify `TtsViewModel`
mickael-menu Jun 30, 2022
40ae740
More refactoring
mickael-menu Jun 30, 2022
afa6378
More refactoring
mickael-menu Jul 4, 2022
5c15d4d
Merge branch 'develop' into feature/tts
mickael-menu Jul 4, 2022
9aaa664
Disable touches during playback
mickael-menu Jul 4, 2022
4484783
Fix lints
mickael-menu Jul 4, 2022
03edb96
Fix tests
mickael-menu Jul 4, 2022
6b10acc
Toggle TTS
mickael-menu Jul 4, 2022
41da132
Clean up
mickael-menu Jul 5, 2022
4e65684
Refactoring and documentation
mickael-menu Jul 5, 2022
ef39264
Doc comments and renames
mickael-menu Jul 6, 2022
30c3ff2
Fix `PublicationContentIterator`
mickael-menu Jul 6, 2022
627456b
Fix strings
mickael-menu Jul 6, 2022
bfa97ed
Refactor `Content` to use a more idiomatic iterator
mickael-menu Jul 7, 2022
da664ce
Simplify `Content` models
mickael-menu Jul 7, 2022
cd8c157
Reorganize `Content` attributes
mickael-menu Jul 7, 2022
a6767e4
Add documentation
mickael-menu Jul 7, 2022
fe3262d
Remove `Failure` state to simplify
mickael-menu Jul 7, 2022
89c574d
Refactor the configuration
mickael-menu Jul 7, 2022
a6b7001
Add TTS guide
mickael-menu Jul 7, 2022
0474f97
Update the changelog
mickael-menu Jul 7, 2022
efbf37b
Remove `Content.ImageElement.description
mickael-menu Jul 7, 2022
2a4c518
Fix user guides
mickael-menu Jul 7, 2022
2489a6c
Fix Groovy snippets highlighting
mickael-menu Jul 8, 2022
d8fc300
Merge branch 'develop' into feature/tts
mickael-menu Jul 8, 2022
92e97b4
Rename `TtsDirector` into `PublicationSpeechSynthesizer`
mickael-menu Jul 8, 2022
6d68be2
Add missing doc comments
mickael-menu Jul 8, 2022
9d95f83
Merge branch 'develop' into feature/tts
mickael-menu Jul 11, 2022
f1e27aa
Fix loading pending locators
mickael-menu Jul 11, 2022
127a65a
Fix crash on some devices
mickael-menu Jul 18, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 6 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,10 @@ All notable changes to this project will be documented in this file. Take a look

### Added

#### Shared

* Extract the raw content (text, images, etc.) of a publication. [Take a look at the user guide](docs/guides/content.md).

#### Navigator

* Improved Javascript support in the EPUB navigator:
Expand All @@ -34,7 +38,8 @@ All notable changes to this project will be documented in this file. Take a look
```kotlin
val result = navigator.evaluateJavascript("customInterface.api('argument')")
```
* New [PSPDFKit](readium/adapters/pspdfkit) adapter for rendering PDF documents.
* New [PSPDFKit](readium/adapters/pspdfkit) adapter for rendering PDF documents. [Take a look at the user guide](docs/guides/pdf.md).
* A brand new text-to-speech implementation. [Take a look at the user guide](docs/guides/tts.md).

### Changed

Expand Down
6 changes: 3 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ Readium modules are distributed through [JitPack](https://jitpack.io/#readium/ko

Make sure that you have the `$readium_version` property set in your root `build.gradle` and add the JitPack repository.

```gradle
```groovy
buildscript {
ext.readium_version = '2.2.0'
}
Expand All @@ -39,7 +39,7 @@ allprojects {

Then, add the dependencies to the Readium modules you need in your app's `build.gradle`.

```gradle
```groovy
dependencies {
implementation "com.github.readium.kotlin-toolkit:readium-shared:$readium_version"
implementation "com.github.readium.kotlin-toolkit:readium-streamer:$readium_version"
Expand All @@ -61,7 +61,7 @@ git submodule add https://github.com/readium/kotlin-toolkit.git

Then, add the following to your project's `settings.gradle` file, altering the paths if needed. Keep only the modules you want to use.

```gradle
```groovy
include ':readium:shared'
project(':readium:shared').projectDir = file('kotlin-toolkit/readium/shared')

Expand Down
188 changes: 188 additions & 0 deletions docs/guides/content.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,188 @@
# Extracting the content of a publication

:warning: The described feature is still experimental and the implementation incomplete.

Many high-level features require access to the raw content (text, media, etc.) of a publication, such as:

* Text-to-speech
* Accessibility reader
* Basic search
* Full-text search indexing
* Image or audio indexes

The `ContentService` provides a way to iterate through a publication's content, extracted as semantic elements.

First, request the publication's `Content`, starting from a given `Locator`. If the locator is missing, the `Content` will be extracted from the beginning of the publication.

```kotlin
val content = publication.content(startLocator)
if (content == null) {
// Abort as the content cannot be extracted
}
```

## Extracting the raw text content

Getting the whole raw text of a publication is such a common use case that a helper is available on `Content`:

```kotlin
val wholeText = content.text()
```

This is an expensive operation, proceed with caution and cache the result if you need to reuse it.

## Iterating through the content

The individual `Content` elements can be iterated through with a regular `for` loop:

```kotlin
for (element in content) {
// Process element
}
```

Alternatively, you can get the whole list of elements with `content.elements()`, or use the lower level APIs to iterate the content manually:

```kotlin
val iterator = content.iterator()
while (iterator.hasNext()) {
val element = iterator.next()
}
```

Some `Content` implementations support bidirectional iterations. To iterate backwards, use:

```kotlin
while (iterator.hasPrevious()) {
val element = iterator.hasPrevious()
}
```

## Processing the elements

The `Content` iterator yields `Content.Element` objects representing a single semantic portion of the publication, such as a heading, a paragraph or an embedded image.

Every element has a `locator` property targeting it in the publication. You can use the locator, for example, to navigate to the element or to draw a `Decoration` on top of it.

```kotlin
navigator.go(element.locator)
```

### Types of elements

Depending on the actual implementation of `Content.Element`, more properties are available to access the actual data. The toolkit ships with a number of default implementations for common types of elements.

#### Embedded media

The `Content.EmbeddedElement` interface is implemented by any element referencing an external resource. It contains an `embeddedLink` property you can use to get the actual content of the resource.

```kotlin
if (element is Content.EmbeddedElement) {
val bytes = publication
.get(element.embeddedLink)
.read().getOrThrow()
}
```

Here are the default available implementations:

* `Content.AudioElement` - audio clips
* `Content.VideoElement` - video clips
* `Content.ImageElement` - bitmap images, with the additional property:
* `caption: String?` - figure caption, when available

#### Text

##### Textual elements

The `Content.TextualElement` interface is implemented by any element which can be represented as human-readable text. This is useful when you want to extract the text content of a publication without caring for each individual type of elements.

```kotlin
val wholeText = publication.content()
.elements()
.filterIsInstance<Content.TextualElement>()
.mapNotNull { it.text }
.joinToString(separator = "\n")
```

##### Text elements

Actual text elements are instances of `Content.TextElement`, which represent a single block of text such as a heading, a paragraph or a list item. It is comprised of a `role` and a list of `segments`.

The `role` is the nature of the text element in the document. For example a heading, body, footnote or a quote. It can be used to reconstruct part of the structure of the original document.

A text element is composed of individual segments with their own `locator` and `attributes`. They are useful to associate attributes with a portion of a text element. For example, given the HTML paragraph:

```html
<p>It is pronounced <span lang="fr">croissant</span>.</p>
```

The following `TextElement` will be produced:

```kotlin
TextElement(
segments = listOf(
Segment(text = "It is pronounced "),
Segment(text = "croissant", attributes = mapOf(LANGUAGE to "fr")),
Segment(text = ".")
)
)
```

If you are not interested in the segment attributes, you can also use `element.text` to get the concatenated raw text.

### Element attributes

All types of `Content.Element` can have associated attributes. Custom `ContentService` implementations can use this as an extensibility point.

## Use cases

### An index of all images embedded in the publication

This example extracts all the embedded images in the publication and displays them in a Jetpack Compose list. Clicking on an image jumps to its location in the publication.

```kotlin
data class Item(
val locator: Locator,
val text: String?,
val bitmap: ImageBitmap?
)

var images by remember {
mutableStateOf<List<Item>>(emptyList())
}

LaunchedEffect(publication) {
publication.content()?.let { content ->
images = content.elements()
.filterIsInstance<Content.ImageElement>()
.map { element ->
Item(
locator = element.locator,
text = element.caption,
bitmap = publication.get(element.embeddedLink)
.readAsBitmap().getOrNull()?.asImageBitmap()
)
}
}
}

LazyColumn {
items(images) { item ->
if (item.bitmap != null) {
Column(
modifier = Modifier.clickable {
navigator.go(item.locator)
}
) {
Image(bitmap = item.bitmap, contentDescription = item.text)
Text(item.caption ?: "No caption")
}
}
}
}
```

## References

* [Content Iterator proposal](https://github.com/readium/architecture/pull/177)
2 changes: 1 addition & 1 deletion docs/guides/pdf.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# PDF support in Readium
# Supporting PDF documents

The Readium toolkit relies on third-party PDF engines to parse and render PDF documents.

Expand Down
Loading