Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expressing Locator ranges for highlights #95

Closed
mickael-menu-mantano opened this issue May 8, 2019 · 46 comments
Closed

Expressing Locator ranges for highlights #95

mickael-menu-mantano opened this issue May 8, 2019 · 46 comments

Comments

@mickael-menu-mantano
Copy link
Contributor

In the context of building the Highlight Navigator API, we need a way to express a selection range with Locators.

We can't use a single Locator (as the model is now) because:

  • not all the fragment types can express a range,
  • a range can span two documents (eg. FXL spreads) so we need two hrefs

Using two Locators is not great because the selection text (LocatorText) is duplicated and it's not obvious which one should be used.

A potential solution would be to have an additional LocatorRange object that contains two Locators with empty .text and an additional "standalone" LocatorText that represents the content of the range.

LocatorRange {
   start: Locator
   end: Locator
   text: LocatorText?
}

The .text is optional because not necessary when passing the Highlight to the navigator, and for protected book it would be empty (to avoid bypassing the copy rights).

Any thoughts, alternative solutions?

@danielweck
Copy link
Member

The electron/desktop navigator currently "augments" the Locator data structure, not by extending it (strictly-speaking), but via composition:
LocatorExtended composes Locator and SelectionInfo (as well as a few other useful properties). SelectionInfo is a JSON-friendly serialization format for a DOM Range, i.e. a "start" and an "end" pair of positions within a document (possibly at the character / TextNode level).

The Locator refers to the "start" position of the selection, in document order. In the current implementation, the resolution of the Locator is limited by the usage of CSS Selectors (i.e. DOM Element), even though the selection itself affords character-level precision. I will harmonize this in a future revision, when the APIs stabilize.

A word about the "visual mapping" technique: right now, in order to "turn to the correct page / scroll to the corresponding offset" when a Locator link is followed (e.g. bookmark, selection, TOC item, etc.), the navigator relies on "client rectangles" coordinates (which is more reliable than the bounding box approximation, or scroll offset, due to CSS box model intricacies). The "beginning" of a targeted DOM element can be located on one page (i.e. column), but its end might be on the next page, which may be out of view (e.g. clipped paragraph, spills beyond the visible viewport). With character-level positioning extracted from the SelectionInfo, it is possible to use the same "client rectangles" coordinate system, in order to resolve the most accurate visual reading position. For example: the selected text might be a short fragment at the end of a large paragraph, so with CSS Selectors the resolved position and the user's expectation would likely be out-of-sync (bad, confusing UX).

@danielweck
Copy link
Member

Follow-up: because Locator and SelectionInfo do not correspond to the exact same document location, strictly-speaking there is no redundancy in the text property. But there is obviously some degree of redundancy in the expression of the actual "reading location", due to the different resolutions.

@mickael-menu-mantano
Copy link
Contributor Author

I've seen the SelectionInfo and RangeInfo models, but they are really tied to HTML. We need to come up with a solution that would work for all publication formats in the API.
Ideally, we would have a way to express the RangeInfo directly inside the Locator, maybe through a custom fragment (eg. [cssSelector]/index/offset).

Here is the relevant part in the doc where I mentioned this: https://docs.google.com/document/d/1W_kbSRve7c1ZtyYTZ-fbEOsFZPTqxN0IV7UdvH_h20U/edit#bookmark=id.jc195922qtyf

If we decide to implement a per-format custom model like SelectionInfo, we need to specify it so that it's implemented the same way on each platform, otherwise the annotations won't be cross-platform.

In my opinion, Locator already has everything needed, if we figure out a fragment format that contains the data from SelectionInfo. But maybe I'm missing something, Daniel?

@danielweck
Copy link
Member

I agree with you :)
I was just doing a recap / status report, in order to highlight the limitations of the current ad-hoc approach.

@danielweck
Copy link
Member

Regarding RangeInfo: although I am personally satisfied with the ad-hoc JSON-friendly serialisation format used in the TypeScript code of the R2 desktop SDK (navigator), I would like to hear more from EvidentPoint / Juan who have explored alternative DOMRange marshalling solutions in their GlueJS experiments. Perhaps there are certain pros/cons we should carefully examinate before committing to a particular solution.

@HadrienGardeur
Copy link

Using two Locators is not great because the selection text (LocatorText) is duplicated and it's not obvious which one should be used.

Some additional thoughts on that:

  • text is not required in a Locator, which means that in use cases where we need two of them, we could simply avoid including text at all
  • for large ranges, there are potentially issues that are more legal than technical (a large range could be seen as a way to extract text out of a publication)

The .text is optional because not necessary when passing the Highlight to the navigator, and for protected book it would be empty (to avoid bypassing the copy rights).

This raises a concern. We want Locators to be useful on their own, if they ever become orphans (publication is updated or no longer available). We need to find the right balance between having an empty text highlight and respecting the restrictions expressed by rights holders.

@mickael-menu-mantano
Copy link
Contributor Author

@HadrienGardeur

text is not required in a Locator, which means that in use cases where we need two of them, we could simply avoid including text at all

This sounds like the solution with the LocatorRange object. Unless you mean that we don't pass the text selection at all, and just have a tuple (start: Locator, end: Locator)? In which case, how does the host app get the selection text? We need it to present the highlights outside of the book, or in an annotation panel inside the book.

for large ranges, there are potentially issues that are more legal than technical (a large range could be seen as a way to extract text out of a publication)

Maybe a hard limit for the length of text.highlight could be added. If we have the range and enough text context to be displayed in a list of highlights, then that's probably enough. Of course the highlight requires then the book to get its full content, and exporting the highlights in another form wouldn't work (eg. PDF export of the user notes).

And/or that could be part of the LCP app compliance tests that the user can't export or copy the text of a highlight extracted from an LCP protected book.

@danielweck
Copy link
Member

danielweck commented May 15, 2019

Fragments in the currently-proposed Locator model:
https://github.com/readium/architecture/blob/master/locators/README.md#fragments

W3C web annotations selectors:
https://www.w3.org/TR/annotation-model/#selectors

We know from implementation experience (currently in R2 SDK for desktop/Electron) that DOMRange can be reliably serialized into a text format / JSON-friendly data structure. This is a direct translation of DOMRange without the pitfalls of EPUB CFI (which can be seen as a useful interchange syntax, but not convenient / performant for the internals of R2 features). Basically, we cannot reference a DOMRange TextNode directly, so we encode the parent Element reference using a quasi-canonical / unique CSS Selector (as already used for Locators in R2 desktop/Electron), and we record the zero-based index of the child TextNode. Other than that, the data structure is exactly the same as the DOMRange object model (start, end, offset, etc.).

So, one option is to "flatten" this tuple notation into a single string so that it fits within the current Locator Fragment proposal (array of strings, reflecting different resolutions applicable to different media types).
Personally, I would very much prefer to preserve the "exploded" representation of the range's start and end locations. Instead of having to guess the type of Locator Fragment using some kind of parser (e.g. regular expressions), we would ; like W3C annotations selectors ; identify the fragment types using a predefined vocabulary ... but then we might as well encode JSON property names directly, for example:

  • myLocator.locations.fragments.cfi = "..."
  • myLocator.locations.fragments.cssSelector = "..."
  • myLocator.locations.fragments.domRangePos = { cssSelector: "...", textNodeIndex: "...", offset: "..." }

... instead of:

  • myLocator.locations.fragments = [
    • { type: "cfi", value: "..." }
    • { type: "cssSelector", value: "..." }
    • { type: "domRangePos", value: { cssSelector: "...", textNodeIndex: "...", offset: "..." } }
  • ]

Thoughts?

@danielweck
Copy link
Member

Then, we indeed have the problem of myLocator.text[before|after|highlight], more specifically myLocator.text.highlight which applies to a range, i.e. a tuple of two Locators, one of start, the other for end (if we chose to represent a DOMRange using two such Locators).

https://github.com/readium/architecture/blob/master/locators/README.md#the-locator-text-object

@llemeurfr
Copy link
Contributor

It seems that we got a consensus yesterday on the highlights model, but before we close this issue, I feel I must challenge one of its hypothesis: "a range can span two documents (eg. FXL spreads) so we need two hrefs".

The reasons why I'm challenging it is that if it was not a requirement, we could extend the expressivity of some fragment expressions and get ranges in a simpler way. Already, cfis, media fragments (especially time based, but also rectangles in a sense), html ids to a certain extent can represent a text range or an audiovisual segment. It would then suffice to extend css selection expressions and we would get ranges in a single Locator.

It is true that Acrobat allows selecting text across PDF pages (but for Acrobat these may not be different "resources". But I tried iBooks and it does not allow selecting text across resources. I tried Aldiko iOS and could not select text across CSS columns.

So, are we sure we are not creating a complex model with no real requirement?

@mickael-menu-mantano
Copy link
Contributor Author

I think for EPUB it might really not be needed, since content will rarely be split between two resources (maybe for some FXL layed out like a PDF with split paragraphs). But more importantly, I don't think we can make the native selection work properly (or at all) between two webviews or even two iframes. The only way to make this work would have to either build the selection from scratch with overlays or have a convoluted UX to merge two highlights.

For PDF we don't need it either since a single PDF document is one href.

Personally I'd be more comfortable having a single Locator object representing the full range, including the text. But if we go down this road we still need a way to express the range in one single Locator. Having a custom range fragment format would work, but we then lose the expressiveness of the other discrete location formats. Also, the Locations object contains additional properties that are by nature for a discrete location: position, progression. In which case what does it represent? The beginning of the range, the center?

In this context, I suggest again my initial proposal which is to have Locator.locations be an array of Locations objects instead of a single one.

  • If the array is empty, it's a Locator to the global resource (like currently having a null locations)
  • With one object, then it's a Locator to one discrete location
  • With two objects, then it's a range for highlights
  • With more objects, it represents a path that could be used for example to represent a drawing on a page

@llemeurfr
Copy link
Contributor

llemeurfr commented May 16, 2019

@mmenu-mantano I like this last proposal but in such a case, what to do with fragments which can already represent a range, like "t=1,10"? Do we state that the must not be used and that we consider only fragments which represent a single "point" (in space of time)?

And how do we justify that we don't take the same approach as Web Annotations selectors, in which a single fragment can represent a range?

Note that even a css id is mapped to a "segment", not a single point in the text. Same for xpointer, xywh, even page for pdf.

@mickael-menu-mantano
Copy link
Contributor Author

mickael-menu-mantano commented May 17, 2019

Yes usually range fragments can also express a discrete location. I think it makes sense to ignore ranges in a single Locations object since the other properties (or other fragment formats) can't express a range.

Also, this problem is there too if we use a containing structure with two Locators like the aforementioned LocatorRange.

Maybe it doesn't answer all formats but if a fragment is mapping to a rectangle (eg. a PDF page) then the top-left (depending on reading progression I guess) is the discrete location in this case.

I'm not familiar with the Web Annotations selectors, but if they offer a better solution I'm all for it, all the current solutions feel awkward in the Locator model.

@HadrienGardeur
Copy link

If we ignore the case of a range expressed over two different resources, then there's no need to extend the Locator model at all:

  • in an audiobook, a range can be expressed using t=10,45
  • in a visual publication, a range can be expressed using a rectangle with xywh=160,120,320,240
  • in an EPUB, a range can be expressed using a CFI, an XPath or any other fragment capable of expressing a range on a tree structure

Also, the Locations object contains additional properties that are by nature for a discrete location: position, progression. In which case what does it represent? The beginning of the range, the center?

IMO this should be the beginning of the range, since that's where you jump to when you want to present a range.

@HadrienGardeur
Copy link

I'm not familiar with the Web Annotations selectors, but if they offer a better solution I'm all for it, all the current solutions feel awkward in the Locator model.

The Web Annotation model is very flexible, IMO a little too much for its own good.

As I said in my previous comment, it defaults to a single "selector" (equivalent of our locations) using either: fragment, XPath, CSS or text.

If the range can't be easily defined that way, it uses a specific "range" selector. If we ported that to our model, it would look like that:

{
  "href": "http://example.com/track6",
  "type": "audio/ogg",
  "title": "Chapter 5",
  "locations": {
    "progression": 0.607379,
    "totalProgression": 0.50678,
    "range": {
      "start": {
        "fragments": ["t=389.84"]
      },
      "end": {
        "fragments": ["t=529.52"]
      }
    }
  }
}

I think this is not necessary in our case and I'd much rather rely on a single location to express range.

@mickael-menu-mantano
Copy link
Contributor Author

That's fine with me, so we just need a range fragment for EPUB that we can use in R2 (CFI is out by consensus).
It looks like Daniel's RangeInfo model fits the need, we just need a way to encode it as a fragment.

Do you have any opinion on Daniel's comment Hadrien? #95 (comment)

If we want to keep the semantics without changing too much the JSON model, we could also use the PDF fragment approach with a query string:
locations.fragment = "selector=div[3]&index=3&offset=25"

@danielweck
Copy link
Member

EPUB CFI is indeed fit for purpose when it comes to expressing DOM position and range, as a standardized interchange format across systems.

However, in my opinion (and based on implementation experience) generating and parsing CFI references is error-prone on the border-cases, and it is also woefully inneficient.

Right now in the R2 desktop/Electron implementation an ad-hoc serialisation format is used to reliably and cheaply marshal DOMRange objects. Basically, a CSS Selector is combined with a child index, and a character offset (this 3-tuple is used for both the start and end positions). This is as close as it can possibly get to a native DOMRange, resulting in extremely efficient processing.

It would be a shame to flatten this simple JSON-compatible data structure into a string "fragment" inside the Locator payload. I would much prefer transporting this information across the reading system layers (typically: app <--> navigator) in its raw form, rather than unnecessarily massaging it just to fit into a prescribed Locator format (which would require awkward escaping rules in order to cater for the fragment's own syntax, delimiters etc.).

That is not to say that there is no value in the flattened "fragments", which are indeed naturally amenable to URI encoding (e.g. like the elegant Media Fragment t=0,99 for time ranges), and therefore desirable for interchange purposes.

We can certainly produce interoperable position/range references a-la W3C selectors, whenever necessary (for example when such Locator payloads are exported into non-R2 systems, like an external annotation service). But for the subsystems involved in R2 reading systems (navigator, etc.) I believe that we should stay as close as possible to native web browser engine constructs like DOMRange and CSS Selector, which yield the best performance and readability (have you tried debugging CFI?). As for XPath, I guess you mean XPointer scheme which would offer similar capabilities to CFI, but also similar headaches.

Thoughts?

@HadrienGardeur
Copy link

Do you have any opinion on Daniel's comment Hadrien? #95 (comment)

I think it's slightly better than the Web Annotation approach, but I'm not a big fan of having a controlled vocabulary for fragments. IMO fragments should be registered alongside a media type (which is usually the case) and we should be able to express potentially any fragment, not just a list of well-known fragments.

Keep in mind that I'm mostly discussing the core model and its JSON serialization, what Daniel suggested IMO makes perfect sense in the context of a specific implementation for example.

@mickael-menu-mantano
Copy link
Contributor Author

@danielweck Are you mentioning this for your internal Locator model, or for the JSON serialization that will be transferred between platforms and stored in the host app database?

I think we should be designing the format of the fragment that will be stored in the Locator JSON here (https://github.com/readium/architecture/tree/master/locators). Each platform can then implement helpers around the Locator model to have the locator expressed in a more convenient way (eg. as a DOMRange).

@danielweck
Copy link
Member

danielweck commented May 17, 2019

JSON serialization that will be transferred between platforms and stored in the host app database

I do not understand the nuance of this statement. But let's take platform "desktop / Electron" as an example (bearing in mind that the same reasoning applies to any coherent reading system platform, like an iOS or Android app, or even a web app involving its own client/server code). The key subsystems that exchange Locator data back-and-forth
(as per the specification we are discussing) are the navigator and the host app. For example: the app receives information about the reading locations (then stores bookmarks either as native Locator + additional info, or as totally bespoke database format), and the app can subsequently instruct the navigator component to restore such reading locations by passing-in the standardized Locator payload. Same with selections/highlights, search module, etc. That is la "raison d'être" of such normalization effort.

The Locator object (herein discussed) transports this information across the navigator / app boundary in a standardized fashion, not only syntax-wise (typically in a serialised JSON form, due to technical constraints), but also in terms of the processing model (e.g. fragments that express different resolutions, precedence order). This works great for things like progression, position, CSS Selectors, time Media Fragments, which are quite consensual.

But here we are now discussing the more complex use-case of precise character-level document range, for which CFI was a semi-successful attempt to offer a reliable consistent model, and XPointer scheme never took off either.

In a platform-specific implementation context (e.g. the iOS, Android, Electron/desktop apps), why would a navigator ever bother creating and consuming CFI or other supplemental fragment types, when they have no concrete use for them? This is why in desktop/Electron we currently "extend" the Locator definition by means of composition (additional proprietary / ad-hoc data structure) in order to fill the gap where standard Locator is not satisfactory.

Now, for "interchange" purposes across different system implementations ; e.g. a streamer server and multiple platform-specific clients ; the Locator payload obviously need to be standardized, just like any other part of the architecture. What I am suggesting is that in the real world, implementations will not bear the burden (performance impact, debuggability, readibility, etc.) of creating processors and executing their code (for example, producing CFI fragments when the receiving end is known to ignore them completely) unless there is a need for generating and consuming the proposed multi-resolution/granularity, "flattened" Locator fragment syntax ... thereby favouring instead the extended / proprietary data structure.

Surely, we want to mitigate this? I can see the cons of incorporating a totally bespoke DOMRange structured serialisation format in the Locator model (which would be at odds with the dominant "linearized string" fragment approach), but I also want to make sure we examine the pros too. For me, the benefits in terms of performance, testability, readability etc. are not insignificant. I would hate to continue to use a totally different system alongside standard Locators (in the Electron/desktop implementation), when there is in fact an opportunity for us all to leverage the same structured syntax for selection/annotation ranges.

Do you understand my viewpoint? (sorry for the long-winded explanation)

@HadrienGardeur
Copy link

HadrienGardeur commented May 17, 2019

@danielweck I think that we're having two separate discussions here:

  • changing the fragment into an object (instead of a string currently)
  • adding support for DOMRanges

IMO changing how fragments are represented has a lot of downsides and so far, only DOMRanges would benefit from this.

DOMRanges can already be added to a Locator right now using a string and flattening the representation (selector=div[3]&index=3&offset=25 ) or simply by using three strings (["selector=div[3]", "index=3", "offset=25"]).

As an alternative, we can also define an extensibility model for locations which would allow something like:

{
  "href": "http://example.com/chapter1",
  "type": "text/html",
  "title": "Chapter 1",
  "locations": {
    "position": 4,
    "progression": 0.03401,
    "totalProgression": 0.01349,
    "domrange": {
      "selector": "div[3]",
      "index": 3,
      "offset": 25
    }
  }
}

[...] why would a navigator ever bother creating and consuming CFI or other supplemental fragment types, when they have no concrete use for them?

I'm not buying the argument that CFI is completely going away.
It's used in EPUB.js and Vivliostyle Viewer, which both support RWPM. It's also the core fragment being used in Colibrio and remains an important focus for Evident Point.

I don't think it's going to be that uncommon for organizations to run a mix of Web Apps that are not primarily developed by Readium (yet compatible with RWPM) with mobile/desktop apps that are built by Readium.

@mickael-menu-mantano
Copy link
Contributor Author

Keeping the fragments as strings is really important in my opinion:

  • A fragment identifier is supposed to be a string of characters
  • We can use them with URI using #
  • Having a JSON array with multiple possible types makes it much more complicated to parse on a more rigid language such as Swift

So adding a (standardized across R2 platforms) Locator extension for the DOM range could really be a solution. Especially since there's no standard fragment format to represent a DOM range and so this is likely a private Readium implementation.

But keeping everything in fragments would make it easier in the JS layer to fallback on various fragment formats when available, to find the target location. We might not be interested in using CFI as our default implementation, but someone might want to add support to read annotations exported from other SDKs.

Note that on mobile we will most likely not forward the full JSON locator to the JS layer, since it will only be interested in the fragments and/or the DOM range.

I would hate to continue to use a totally different system alongside standard Locators (in the Electron/desktop implementation), when there is in fact an opportunity for us all to leverage the same structured syntax for selection/annotation ranges.

Definitely, especially to add additional data that should be encoded into the Locator. However, wrapping a JSON locator into an object that adds helper to read the various data, and maybe build an actual DOMRange in a provided document, might be helpful thing to answer the per-platform requirements. This is akin to the native Locator model that we have on Swift/Kotlin. The line might be blurrier on a JS platform I guess.

@HadrienGardeur
Copy link

We really want to close this issue tomorrow, do you have any additional comment @danielweck ?

@danielweck
Copy link
Member

I do actually :)

In your example:

{
  "href": "http://example.com/chapter1",
  "type": "text/html",
  "title": "Chapter 1",
  "locations": {
    "position": 4,
    "progression": 0.03401,
    "totalProgression": 0.01349,
    "domrange": {
      "selector": "div[3]",
      "index": 3,
      "offset": 25
    }
  }
}

...I think we need to distinguish start and end:

{
  "href": "http://example.com/chapter1",
  "type": "text/html",
  "title": "Chapter 1",
  "locations": {
    "position": 4,
    "progression": 0.03401,
    "totalProgression": 0.01349,
    "domrange": {
        "start": {
            "selector": "div[3]",
            "index": 3,
            "offset": 25
        },
        "end": {
            "selector": "div[4]",
            "index": 2,
            "offset": 11
        }
    }
  }
}

@danielweck
Copy link
Member

Also, as the proposed locations.domrange JSON property is a "first class citizen" on par with locations.totalProgression / locations.progression / locations.position, does this mean that DOMRange information does not need to be serialized (and flattened) into the locations.fragments array of string values? (see https://github.com/readium/architecture/blob/master/locators/README.md#fragments )

@danielweck
Copy link
Member

I filed a separate issue to express a more general concern about the concept of Locator "fragments": #98

@HadrienGardeur
Copy link

Also, as the proposed locations.domrange JSON property is a "first class citizen" on par with locations.totalProgression / locations.progression / locations.position, does this mean that DOMRange information does not need to be serialized (and flattened) into the locations.fragments array of string values? (see https://github.com/readium/architecture/blob/master/locators/README.md#fragments )

Actually, this is not my personal preference nor the only solution that I've listed in my previous comment.

Overall, I think that the current approach of an array of strings is perfectly fine and works for what you've requested.

That said, we could use an extensibility model for both the top level of a Locator and for locations. The example with domrange was only meant to illustrate how this could work if we had such extensibility available.

Overall, I'm not in favor of extending our current model with anything too specialized, I think the mix of position, progression, overallProgression and fragments is good enough:

  • it's as media-generic as we can be
  • it provides both context and the ability to express very specific locations
  • there's built-in extensibility through fragments (that can be defined per media type)

We had something more specialized elements before (with dedicated elements for CFI and CSS Selectors among other things) and I think it would be a step back to go back in that direction.

@danielweck
Copy link
Member

danielweck commented Jun 5, 2019

Progress update based on our latest conference call:
#99

Here let's discuss the specifics of domRange.

Full example:

{
    "href": "http://example.com/chapter1",
    "type": "text/html",
    "title": "Chapter 1",
    "locations": {
        "progression": 0.8,
        "fragments": [
            "#paragraph-id",
            "t=15"
        ],
        "cssSelector": "body.rootClass div:nth-child(2) p#paragraph-id",
        "partialCfi": "/4/2/8/6[paragraph-id]",
        "domRange": {
            "start": {
                "cssSelector": "body.rootClass div:nth-child(2) p#paragraph-id",
                "textNodeIndex": 0,
                "offset": 11
            },
            "end": {
                "cssSelector": "body.rootClass div:nth-child(2) p#paragraph-id",
                "textNodeIndex": 0,
                "offset": 22
            }
        }
    }
}

Partial example (so we can focus on property naming, and meaning):

        "domRange": {
            "start": {
                "cssSelector": "body.rootClass div:nth-child(2) p#paragraph-id",
                "textNodeIndex": 0,
                "offset": 11
            },
            "end": {
                "cssSelector": "body.rootClass div:nth-child(2) p#paragraph-id",
                "textNodeIndex": 0,
                "offset": 22
            }
        }

@danielweck
Copy link
Member

danielweck commented Jun 5, 2019

locator.locations.domRange.start and locator.locations.domRange.end have the same object structure, which is made of 3 mandatory fields: cssSelector, textNodeIndex, textOffset.

locator.locations.domRange.start is mandatory and can be used on its own (i.e. collapsed range with same begin/end pointer). locator.locations.domRange.end is optional.

@danielweck
Copy link
Member

danielweck commented Jun 5, 2019

  • cssSelector (string): always references a DOM element, but if the original DOMRange start/end references a text node instead, the textNodeIndex field (described below) is used to complement the CSS Selector.
  • textNodeIndex (-1, or integer [0, n]): when -1, the above cssSelector corresponds to the DOM element of the original DOMRange start/end. Otherwise, [0, n] represents the zero-based index of the text node child (of the DOM element referenced by the above cssSelector).
  • offset (integer [0, n]): if the original DOMRange start/end references a DOM element, then this zero-based index represents the position of a child text node of that element. If however the original DOMRange start/end references a text node, this represents the position of a character within that text node.

Note that in [0, n], n is indeed included even though it represents the total number of children. That is because index overflow is allowed, and it signifies a pointer after the last item, but before the next sibling element starts (just like XPointer and CFI edge-cases ... literally "edge" cases).

Also note that sometimes, mobile iOS/Android selection UX always normalize the original DOMRange start/end so that it never references DOM elements (always text nodes). This normalization step can also be explicitly added in desktop implementations (I am working on that, actually, based on the work of Apache annotator).

See TypeScript prototype implementation which inspired this proposal:
https://github.com/readium/r2-navigator-js/blob/7a2ddc8287659d6d985eb930b776e108891a3510/src/electron/common/selection.ts#L14-L33

@mickael-menu-mantano
Copy link
Contributor Author

Thank you, Daniel. The naming is very clear.

I have two questions/suggestions:

  1. Would it make sense to use this {cssSelector, textNodeIndex, offset} object too instead of the single locations.cssSelector? I guess it could be useful to point to a specific position in a long paragraph spanning more than one page.

  2. Rather than -1 for textNodeIndex, I would rather use undefined (not setting the key in the object) to avoid ambiguity (-1 is a bit reminiscent of good ol' C but nowadays might be seen as "the first text node starting from the end"). Unless this value is directly used when creating a DOMRange?

Also note that sometimes, mobile iOS/Android selection UX always normalize the original DOMRange start/end so that it never references DOM elements (always text nodes). This normalization step can also be explicitly added in desktop implementations (I am working on that, actually, based on the work of Apache annotator).

We really need to make sure that those locators are compatible when shared between R2 platforms. I think it's not such a problem if the output is different (not canonical) as long as when used from a different platform we end up at the exact same place in the DOM.

@danielweck
Copy link
Member

danielweck commented Jun 6, 2019

  1. I have a personal preference for keeping locations.cssSelector separate (simple string instead of object), which saves the consumer checking the presence of extra fields just to determine whether the locator references a "standard" CSS Selector (de-facto element) vs. our ad-hoc encoding method for DOMRange start/end = [element reference (cssSelector) + non--1 textNodeIndex + character offset] or [element reference + -1 textNodeIndex + text node index (offset)].
  2. I agree that the argument that -1 may actually be perceived as an index based on the end of the sequence of items, or even as the edge position before the first character but after the preceding sibling element. In my draft proposal, all 3 fields are mandatory, in the sense that they cannot be null (actual value) or undefined (no value). The undefined possibility for textNodeIndex would actually be useful to differentiate the special case of locations.cssSelector in your proposal (1). But if we keep the standalone locations.cssSelector, perhaps instead of -1 (or undefined) we can use null for textNodeIndex?

@mickael-menu-mantano
Copy link
Contributor Author

Regarding highlights, we agreed on using a single "DOM-ranged" Locator using the new notation (#99). This lifts the ambiguity of using LocatorText in the range. 👍

I suggest that we use it this way:

  • before and after are not used
  • highlight contains the cleaned (whitespaces coallesced and trimmed) text selection

I don't see any use for keeping the raw text once the Locator is created, but we can revisit this if needed.

Additional processing could be done for highlight depending on whether the book is copy-protected:

  • Not filling the highlight at all
  • Consuming the highlight content as a selection copy in the rights (and if exceeded, then fallback on empty highlight)
  • Truncating the highlight up to a certain length. This doesn't really protect the book against copy but prevent copying whole sections of the book.

For non-protected books, I think truncating the highlight could be detrimental. The host app can choose to truncate the content before storing in the database as seen fit.

@mickael-menu-mantano
Copy link
Contributor Author

@danielweck

  1. We have to determine if the use case is useful (pointing to a character in a given text node). I think so, otherwise how would we be able to go to the middle of a long paragraph using only a CSS selector (ignoring percentage)? We could use a collapsed domRange.start but it also adds additional checking and is easy to miss out in my opinion. (This additional property could be named domLocation or domPosition)

  2. null would work if the fields are mandatory.

@danielweck
Copy link
Member

danielweck commented Jun 6, 2019

Thinking about an alternative to avoid using offset as a TextNode index (which seems counter-intuitive, given there is a dedicated field named textNodeIndex), instead using offset ONLY as a character index (which would then be optional). This updated proposal diverges slightly from the meaning of "offset" in a vanilla DOMRange object, but I think it is worth considering to make our model easier to understand.

  • cssSelector (string): always references a DOM Element. If the original DOMRange start/end references a Text Node, the textNodeIndex field (described below) is used to complement the CSS Selector, and offset is used to tell a character position within that Text Node. If the original DOMRange start/end references a DOM Element, then textNodeIndex is used to designate the child Text Node, and the charOffset is not used (no explicit character position within the Text Node).
  • textNodeIndex (integer [0, n]): [0, n] represents the zero-based index of the text node child (of the DOM element referenced by the above cssSelector).
  • charOffset (null, or integer [0, n]): null if the original DOMRange start/end references a DOM Element (in which case textNodeIndex points to a Text Node without any specific character offset within it), otherwise this zero-based index represents a character position within the Text Node designated by textNodeIndex.

Make more logical sense, I think, and can still roundtrip from/to DOMRange without ambiguity.

@danielweck
Copy link
Member

danielweck commented Jun 6, 2019

pointing to a character in a given text node

Yes, this would typically be a collapsed range. locator.locations.cssSelector would continue to represent the "nearest" ancestor Element, useful for API consumers / processors that only support this coarse resolution. More advanced API consumers / processors can check locator.locations.domRange.start to see if a more accurate reference is available.

@danielweck
Copy link
Member

I don't see any use for keeping the raw text

I have a preference for preserving "raw" text alongside "clean" text. This can be used to match actual text in the DOM (including line breaks, insignificant whitespaces, etc.), for example when reconstructing/correlating DOM text with spoken TTS utterances.

@danielweck
Copy link
Member

before and after are not used

Why? Could that not be useful for both the "collapse range" case (discrete position) and the "spanning start/end range" case?

@mickael-menu-mantano
Copy link
Contributor Author

I'm talking specifically about the highlights for this case (eg. a bookmark would need to fill before/after). A collapsed selection should not be allowed in the SelectableNavigator API, it would be a nil selection.

That being said, before/after could still be useful for highlights to add more context, especially if the highlighted part is only a few words.

I have a preference for preserving "raw" text alongside "clean" text. This can be used to match actual text in the DOM (including line breaks, insignificant whitespaces, etc.), for example when reconstructing/correlating DOM text with spoken TTS utterances.

There's nothing in the model right now that can hold raw text, but for specific use case we could expose the raw text directly in the navigator API related to TTS. Also, different API can produce different Locator. So the locators created by the TTS could have the raw text in highlight instead of clean. Since they are not made for display.

However, I think consistency might be more important. Maybe highlight should always contain the raw text, and the host app would be responsible to process it for display as desired. We are not using the clean text in the R2 deps anyway.


Thinking about an alternative to avoid using offset as a TextNode index (which seems counter-intuitive, given there is a dedicated field named textNodeIndex), instead using offset ONLY as a character index (which would then be optional).

Agreed, I think your new proposal is much clearer.

One reservation: is there a practical difference between null and 0 for charOffset?

@danielweck
Copy link
Member

is there a practical difference between null and 0 for charOffset?

For all intents and purposes, no. However I remember seeing DOMRange pointing to TextNode without character offset ... I cannot remember how I came across this edge case, but I allowed 'null' / -1 to ensure precise round-tripping. But to be honest I think this should be normalized, and I am pretty sure that normalization utilities from Apache annotator etc. do this too.

@danielweck
Copy link
Member

Maybe highlight should always contain the raw text, and the host app would be responsible to process it for display as desired.

I think this makes sense. The "clean/normalize text" utility function should also be standardized (in the context of the R2 Locator model).

@danielweck
Copy link
Member

A collapsed selection should not be allowed

What about a collapsed DOMRange though? I would like to be able to use locator.locations.domRange.start with undefined locator.locations.domRange.end (to reference a mouse click / text cursor inside HTML documents, for example).

@mickael-menu-mantano
Copy link
Contributor Author

I think this makes sense. The "clean/normalize text" utility function should also be standardized (in the context of the R2 Locator model).

Yes, especially if we consider that the Locator is really used to locate a precise position and not as a substitute for a Highlight model, therefore the text excerpts should be used for that and not be stripped of meaningful whitespaces.

On the Swift native model we can easily add a dynamic property to get the cleanHighlight, but it might not be practical on platforms sending only the JSON serialization back to the host app.

What about a collapsed DOMRange though? I would like to be able to use locator.locations.domRange.start with undefined locator.locations.domRange.end (to reference a mouse click / text cursor inside HTML documents, for example).

I was talking about the text selection for highlights specifically. If a DOM range is collapsed, then it is actually not a selection so this API would return null: https://github.com/readium/r2-navigator-swift/pull/57/files#diff-71cffd3595869fdd8bacb309df9b5e6cR21

But for other single-location cases (eg. mouse click) then a collapsed DOMRange makes sense, and having text.before/after might be useful.

On a side note, I'm not sure we need a Locator for taps. It might be too costly to build it on-the-fly for each tap when the locator will most likely be discarded right away. Right now in the Swift test app we only use a point coordinate on the screen, to determine if we want to turn the page or show the toolbars. Maybe we could provide a closure to lazily build the Locator if needed.

@mickael-menu-mantano
Copy link
Contributor Author

To summarize the current consensus on this discussion:

"locations": {
  "cssSelector": "body.rootClass div:nth-child(2) p#paragraph-id",
  "partialCfi": "/4/2/8/6[paragraph-id]",
  "domRange": {
    "start": {
      "cssSelector": "body.rootClass div:nth-child(2) p#paragraph-id",
      "textNodeIndex": 0,
      "charOffset": 11
    },
    "end": {
      "cssSelector": "body.rootClass div:nth-child(2) p#paragraph-id",
      "textNodeIndex": 0,
      "charOffset": 22
    }
  }
}

DOM Range

  • domRange.start is mandatory and can be used on its own (i.e. collapsed range with same begin/end pointer). domRange.end is optional.
  • cssSelector (string): always references a DOM Element. If the original DOMRange start/end references a Text Node, the textNodeIndex field (described below) is used to complement the CSS Selector, and offset is used to tell a character position within that Text Node. If the original DOMRange start/end references a DOM Element, then textNodeIndex is used to designate the child Text Node, and the charOffset is not used (no explicit character position within the Text Node).
  • textNodeIndex (integer [0, n]): [0, n] represents the zero-based index of the text node child (of the DOM element referenced by the above cssSelector).
  • charOffset (integer [0, n]): Zero-based index represents a character position within the Text Node designated by textNodeIndex.

(@danielweck I removed the null from charOffset following your comment on normalization)

LocatorText

  • Locator.text.highlight/before/after need to be filled with the raw text content to be able to reconstruct/correlate DOM text.
  • A to-be-determined standardized API to clean raw text will be provided. (eg. dynamic properties LocatorText.cleanHighlight/cleanBefore/cleanAfter)

@danielweck
Copy link
Member

@llemeurfr
Copy link
Contributor

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants