Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merging highlights #4670

Closed
noembryo opened this issue Feb 27, 2019 · 21 comments
Closed

Merging highlights #4670

noembryo opened this issue Feb 27, 2019 · 21 comments
Labels

Comments

@noembryo
Copy link
Contributor

noembryo commented Feb 27, 2019

There was a question in a KoHighlights' issue from a user, about the possibility to merge different highlights from the same book in different devices.
My answer was negative at first, because I thought that with different devices, the position info in the highlight would create problems when read from other devices.

But then I thought that I don't know that for sure, so here I am, asking.
The highlight's data consists from two keys.
One in the ["bookmarks"] key and another at the ["highlight"] key.
E.g. this inside the ["bookmarks"]

        [3] =     {
            ["text"] = "fair!",
            ["notes"] = "wasn’t fairl",
            ["datetime"] = "2019-02-20 22:54:10",
            ["highlighted"] = true,
            ["page"] = "/body/DocFragment[20]/body/div[2]/p[30]/text().392",
            ["pos0"] = "/body/DocFragment[20]/body/div[2]/p[30]/text().392",
            ["pos1"] = "/body/DocFragment[20]/body/div[2]/p[30]/text().404"
        },

and this inside the ["highlight"]

        [209] =     {
            [1] =         {
                ["drawer"] = "lighten",
                ["datetime"] = "2019-02-20 22:54:10",
                ["text"] = "wasn’t fairl",
                ["pos0"] = "/body/DocFragment[20]/body/div[2]/p[30]/text().392",
                ["pos1"] = "/body/DocFragment[20]/body/div[2]/p[30]/text().404"
            }
        },

And the question is this:
If I open the same book with another device and highlight the exact same words, are the ["page"] ["pos0"] ["pos1"] going to be the same?

A second question, about the [209] key is:
If this number is not right (since a different screen format and/or font and/or size would produce different amount of pages) what would happen to the highlight?

Is there anything else that I missed and needs to be addressed?

@poire-z
Copy link
Contributor

poire-z commented Feb 27, 2019

From what I understand from having read the code:
Both the highlight and bookmarks tables are updated when a highlight is made (or adjusted with the recently added feature).
Why there was at one time the need to have 2 tables, I don't know. But currently, the highlight is used by the drawing code (to draw the little grey boxes or underlines), while the bookmarks is used for bookmark listing, editing text...

I don't think the [209] key matters. It's the original page number that this highlight was on when it was added, but indeed, when you change font size, the page numbers changes, so this key is just a slot. The code loop thru each subtree (to meet all the bookmarks/highlights), ignoring this value.

The ["page"] you mention is actually a XPointer (xpath), and looks like it is the same as pos0 value.
Later, when displaying, I guess the code gets the current page that this xpointer is in, so it's queried dynamically from the engine.

The xpointers should be stable across devices, but there are cases when changes/fixes/new features in crengine may make them invalid.
When I added such stuff, i took care of updating another thing that you can find in the metadata.lua:
["cre_dom_version"] = 20180528,
And I took care that a past'ly opened book would request its old dom version so the engine would still work the same way it did previously (so, voluntarily disabling fixes and features in that case), so the highlights are not broken. (It's quite possible that I missed some changes that could break highlights and forgot to upgrade the engine dom version - but when I knew that it could, I did ensure that).
Some discussion about that in #3940 and #3976 and koreader/crengine#172 (comment)

So, highlights you would took from different metadata.lua from 2 devices (for the same book) with the same cre_dom_version should be similar and mergeable.
Those from books opened with different koreader versions that happened to be different enough to have different cre_dom_version may be unmergable, and some highlights from one not be shown on the other.

You can see the historical cre_dom_version and the whys at https://github.com/koreader/crengine/blob/fca482f9c57bc8fd9a7e047b6eb6f7f6ab4211a7/crengine/src/lvtinydom.cpp#L15-L57
(There was a bunch in may 2018, none since - but I may bump it again in a few weeks if I ever manage to add support for float:s)

@poire-z
Copy link
Contributor

poire-z commented Feb 27, 2019

The side effect of this is that if you had opened a book in 2016, even on the laster koreader nightly version, you would request the engine to behave like it did in 2016 (even if you have never highlighted anything), so the highlight if any would still show - and you would not get some fixes and new features.

The only solution to get these fixes and new features is to explicitely "Purge .sdr" (so losing your highlights), so, as there's no past, the engine will use the lastest and best dom version.

@noembryo
Copy link
Contributor Author

This is very promising. It does give me something to check before creating a non working file.
I can ensure that the cre_dom_version is the same before merging anything.
I have to read these links you provided, but since the page key is irrelevant, I think that is doable..
Thanks for the answer.

@poire-z
Copy link
Contributor

poire-z commented Feb 27, 2019

Just mentionning that for completness (I think that's no concern of yours with kohighlights):

When one get this popup (from #3968, after playing with some styletweaks - mostly those with display: changes):

image

The document is fully reloaded, the DOM is rebuilt (with no cre_dom_version change), with styles that may change the xpaths, and so break some highlights.
Reverting the style tweak that provoked that message should bring back the past highlights.

I don't know if it can be a feature or not for kohighlight to backup/save the list of styletweaks, to eventually allow restoring its state if the user feels like "damned, I lose them, fortunately I backuped them in kohighlights". (But you told me you don't do restore I think.)

@noembryo
Copy link
Contributor Author

I try to avoid messing around with KoReader apart from editing the metadata files (for the comment only).
Looking at an example, I only see a ["css"] key but nothing about styletweaks.
Maybe its not there because I never used them.
If there was something in the metadata file, I could try to warn at least the user about a potential problem..

@poire-z
Copy link
Contributor

poire-z commented Feb 27, 2019

One can have:

    ["style_tweaks"] = {
        ["lineheight_all_inherit"] = true,
        ["titles_page-break-before_avoid "] = true,
        ["widows_orphans_all_5"] = true
    },

But yeah, nothing for you to worry about, you can't know from kohighlight what has happened and if some mess happened. So don't worry :)
I was just updating this issue with all the info I could think of on the topic of losing highlights, for reference, so I can link to it in the future.

@noembryo
Copy link
Contributor Author

noembryo commented Feb 28, 2019

... and another thing ...
What is the difference between the ["partial_md5_checksum"] and the ["stats"]["md5"] keys?
It seems that ["partial_md5_checksum"] is a newer addition, but other than that, is any of them going to change in the future?
It would be very helpful if I could use one of them to ID the books and merge only the same book's highlights.

@Frenzie
Copy link
Member

Frenzie commented Feb 28, 2019

It's used on the sync server, so in principle it shouldn't change.

See:
https://github.com/koreader/koreader-sync-server#privacy-and-security
#3530 (comment)

-- calculate partial digest of the document and store in its docsettings to avoid document saving
-- feature to change its checksum.
--
-- To the calculating mechanism itself.
-- since only PDF documents could be modified by KOReader by appending data
-- at the end of the files when highlighting, we use a non-even sampling
-- algorithm which samples with larger weight at file head and much smaller
-- weight at file tail, thus reduces the probability that appended data may change
-- the digest value.
-- Note that if PDF file size is around 1024, 4096, 16384, 65536, 262144
-- 1048576, 4194304, 16777216, 67108864, 268435456 or 1073741824, appending data
-- by highlighting in KOReader may change the digest value.
function Document:fastDigest(docsettings)
if not self.file then return end
local file = io.open(self.file, 'rb')
if file then
local tmp_docsettings = false
if not docsettings then -- if not provided, open/create it
docsettings = require("docsettings"):open(self.file)
tmp_docsettings = true
end
local result = docsettings:readSetting("partial_md5_checksum")
if not result then
logger.dbg("computing and storing partial_md5_checksum")
local md5 = require("ffi/MD5")
local lshift = bit.lshift
local step, size = 1024, 1024
local m = md5.new()
for i = -1, 10 do
file:seek("set", lshift(step, 2*i))
local sample = file:read(size)
if sample then
m:update(sample)
else
break
end
end
result = m:sum()
docsettings:saveSetting("partial_md5_checksum", result)
end
if tmp_docsettings then
docsettings:close()
end
file:close()
return result
end
end

The stats thing I don't know about. It looks suspiciously similar:

function ReaderStatistics:partialMd5(file)
if file == nil then
return nil
end
local md5 = require("ffi/MD5")
local lshift = bit.lshift
local step, size = 1024, 1024
local m = md5.new()
local file_handle = io.open(file, 'rb')
for i = -1, 10 do
file_handle:seek("set", lshift(step, 2*i))
local sample = file_handle:read(size)
if sample then
m:update(sample)
else
break
end
end
return m:sum()
end

@houqp
Copy link
Member

houqp commented Mar 1, 2019

I would use partial_md5_checksum over ["stats"]["md5"].

@houqp
Copy link
Member

houqp commented Mar 1, 2019

And highlight config should point to the same text regardless how the book is rendered for all document types, if not, it's a bug we need to fix.

@noembryo
Copy link
Contributor Author

noembryo commented Mar 2, 2019

Added the (experimental) option to merge highlights with v0.6.1.0
As a bonus, it can also sync positions now..

@poire-z
Copy link
Contributor

poire-z commented Mar 2, 2019

Btw, positions too (as they are xpointers) may change when cre_dom_version changes.
When a last_xpointer is not found, you're just brought to the 1st page, so no real harm done. So I dunno if that should be a worry to you.

@noembryo
Copy link
Contributor Author

noembryo commented Mar 2, 2019

Well, to sync 2 books, they have to be on the same cre_dom_version.
And I think that the last_xpointer key was added to metadata earlier than cre_dom_version key (or wasn't?)

@poire-z
Copy link
Contributor

poire-z commented Mar 2, 2019

And I think that the last_xpointer key was added to metadata earlier

Yes it was. But its value is updated from the engine, with the dom version in use. So no problem if you make all that dependant of needing the same cre_dom_version.

Just giving more details about this cre_dom_version:
When opening an old book that has not been opened for years, but that has a metadata.lua (technically, a last_xpointer set), KOReader requests cre_dom_version to be the older it knows: 20171225.
If no metadata.lua (technically no last_xpointer set), it requests the newest/lateset cre_dom_version (currently: 20180528).

function ReaderRolling:onReadSettings(config)
-- 20180503: some fix in crengine has changed the way the DOM is built
-- for HTML documents and may make XPATHs obtained from previous version
-- invalid.
-- We may request the previous (buggy) behaviour though, which we do
-- if we use a DocSetting previously made that may contain bookmarks
-- and highlights with old XPATHs.
-- (EPUB will use the same correct DOM code no matter what DOM version
-- we request here.)
if not config:readSetting("cre_dom_version") then
-- Not previously set, guess which DOM version to use
if config:readSetting("last_xpointer") then
-- We have a last_xpointer: this book was previously opened
-- with possibly a very old version: request the oldest
config:saveSetting("cre_dom_version", self.ui.document:getOldestDomVersion())
else
-- No previous xpointer: book never opened (or sidecar file
-- purged): we can use the latest DOM version
config:saveSetting("cre_dom_version", self.ui.document:getLatestDomVersion())
end
end
self.ui.document:requestDomVersion(config:readSetting("cre_dom_version"))

(The comment (EPUB will use the same... is wrong now - it may have been right on the first increment - I'll remove it).

@noembryo
Copy link
Contributor Author

noembryo commented Mar 2, 2019

So, if all the older metadata are getting a 20171225 cre_dom_version key, do they also get a last_xpointer key?
Because if they don't, I have to check for its existence..

@poire-z
Copy link
Contributor

poire-z commented Mar 2, 2019

Yes, they do. If they have been opened with a > 201805 koreader, they will get a cre_dom_version and a last_xpointer.

last_xpointer should be there if the document has been opened since 2013-2014 (before I jumped in): e7e6a2b and #980.

There is code to use a previous setting when last_xpointer is not present, and to convert it to a last_xpointer:

local last_xp = config:readSetting("last_xpointer")
local last_per = config:readSetting("last_percent")
if last_xp then
self.xpointer = last_xp
self.setupXpointer = function()
self:_gotoXPointer(self.xpointer)
-- we have to do a real jump in self.ui.document._document to
-- update status information in CREngine.
self.ui.document:gotoXPointer(self.xpointer)
end
-- we read last_percent just for backward compatibility
-- FIXME: remove this branch with migration script
elseif last_per then
self.setupXpointer = function()
self:_gotoPercent(last_per)

@noembryo
Copy link
Contributor Author

noembryo commented Mar 2, 2019

OK then, we're all set!
Lets hope now that the merging actually works ;o)

@noembryo noembryo closed this as completed Mar 2, 2019
@arooni
Copy link

arooni commented Mar 6, 2019

Is there something I need to do to enable the syncing of highlights between devices? Just installed the latest nightly (3/5/2019) and I don't see highlights syncing from my PW4 to my Kindle Fire Tablet... am I doing something wrong?

[position syncs just fine]

@Frenzie
Copy link
Member

Frenzie commented Mar 6, 2019

This is a third-party app. The built-in sync stuff hasn't changed in years. I think I had to restart it one or two years ago and that's about it. :-P

@arooni
Copy link

arooni commented Mar 7, 2019

So sounds like I shouldn't hold my breath for highlighting sync to be baked into Koreader ;)

@Frenzie
Copy link
Member

Frenzie commented Mar 7, 2019

I don't think anybody's actively doing anything with it, no. :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants