-
-
Notifications
You must be signed in to change notification settings - Fork 254
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Read/write/import/export highlights from calibre #849
Comments
Didn’t know you could actually switch that off in Calibre’s Reader—I always disliked a reader to actually modify original EPUBs. So thanks for finding that one! On the other hand, I always liked that both Calibre reader and yours seem at least to use CFI—might be a step forward and I always hoped these two might somehow be able to interchange highlights/notes/bookmarks. Somehow. It is a pity that reading status, bookmarks, highlights and notes (and their exchange) aren’t really standardized. Sadly, now everybody cooks their own. |
Well, for better or worse, calibre sort of is the standard (or a standard, in the free/open-source world, at least).
Well, there are really only so many ways to reference document fragments, whether it's a point reference for bookmarks or a range for highlights. Most reading systems probably uses one or more of the following:
The DOM-based references isn't too hard to implement for any browser-based reading systems and they should be more or less interoperable (barring any implementation bugs). The main advantage of using CFI, apart from it being the standard, is that it's designed to be more robust against modifications made to the document source code (i.e. updated versions of the book), and it can be sorted without having to look at the source document. The offset-based method is used by Mobipocket (and possibly Kindle) in its proprietary format. It's brittle and hard to implement, though probably more performant on a low powered machine, depending on how the book is rendered. It is possible to convert between offset-based and DOM-based references, but it's not straightforward at all. The text-based method is the slowest and least reliable. It should probably only be used as a last resort, or as an additional aid to other kinds of selectors, or simply as part of the annotation when exported. A slightly unfortunate thing here is that the CFI used by calibre is not standard.
So it appears that the first step The next step (The The equivalent standard CFI would be
|
I wonder how much sense it’d make if you talked directly to Kovid Goyal about that. Maybe some overall more standard solution could be found, to the benefit of users? Personally, I don’t see Calibre and Foliate as competition, but complementing each other: The more involved user will use Calibre anyway, so an exchange might be nice, but there are many people that just want to read an e-book on the desktop once in a while, without the wish or need for a fully-fledged solution like Calibre. Lacking a common reading status/bookmark/highlight/annotation standard, the biggest hurdles for users (and e-book editors) are nowadays:
|
IMO the biggest problem isn't the lack of standards. It's the lack of documentation. Most apps don't even document how annotations are stored internally. Simply figuring out how they are stored would go a long way towards interoperability and compatibility by allowing other apps or third-party tools to convert between different formats. But allowing exchange without an explicit import/export step is going to be difficult. For example, there isn't even a standardized way to determine whether or not two files are the same book or not. While there is a standard way to give unique identifiers to books, the spec actually recommends against relying completely on this identifier, and many reading systems (unlike Foliate) don't use this identifier at all (see #435). So there isn't even a reliable way to associate annotations with a book.
Syncing anything between devices is itself often a hard enough problem. There are many different kinds of technologies and designs, each with their advantages and limitations. Embedding data in the file would nicely sidestep the problem of both uniquely identifying books, as well as any issues with transporting the annotations. But it requires modifying the file. Some people like it, but for others it's a dealbreaker. (Though I suspect that people might be less resistant to the idea if calibre had made things clearer.) So here we can observe that it would be hard to find a design that suits different, often conflicting needs well. There's always a trade-off, but there are never any clear answers.
Academic citing is really a different problem. First there's the problem of compatibility with printed works. This is more or less a solved problem. It's just that Foliate doesn't support Then there's the problem of how to reference pages if no This is, however, mostly an issue when presenting the book or annotations for human consumption. It has very little to do with how the annotations are stored for the internal consumption of reading systems. |
Looks like still some way to go. For everyone. |
It seems Calibre also produces incorrect CFIs in some situations. To reproduce,
Actual result: Calibre creates the following highlight data.
Here Expected result: The end of the highlighted text, |
Just a quick note about how highlighting works in calibre. It doesn't seem to be documented anywhere.
If "Keep a copy of annotations/bookmarks in the e-book file, for easy sharing" is checked in "Preferences" > "Miscellaneous" in the e-book viewer, calibre will store a copy of annotation data in
META-INF/calibre_bookmarks.txt
, which is a base64 encoded JSON file.Here is a sample:
Which decodes to,
Remarks:
Also, not included in the sample above, but notes are stored with the
notes
key in each highlight.Also if you choose "Export", it will produce JSON in the following format:
It does not, however, support importing highlights (see https://www.mobileread.com/forums/showpost.php?s=e73c5ef33e1b606d66d59b001fb5a7dd&p=4171284&postcount=6).
The text was updated successfully, but these errors were encountered: