New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Exporting highlights runs out of memory #10745
Comments
See https://github.com/koreader/koreader/blob/master/plugins/exporter.koplugin/base.lua#L87 The table of booknotes for all books is kept in memory while the exporter formats it to match its ouput. In the case of the text exporter it just concatenates each note on a big string: https://github.com/koreader/koreader/blob/master/plugins/exporter.koplugin/target/text.lua#L12-L39 The json exporter works slightly different as it creates an inner representation of the booknotes on its own table and then use that table to encode the json file using rapidjson. Rapidjson/lua it is also known to cause OOM on its own (see #7790 (comment)) as it doesn't support partial encoding from a stream. tl;dr: if rapidjson doesn't OOM it looks like the rest of exporters that do OOM are not optimized. Given that nobody reported this issue until now it looks like the people who use this plugin are not big highlighters. Patches are very welcome. |
@pazos If a user has this problem, currently their options for getting their highlights are:
Is this correct? |
@offset-torque: not sure if it's worth to mention it on the guide, other than a link to this ticket. But you tell me: go to your most highlighted document and try to export to text. Does it OOM?
Then please attach the book and the metadata sidecar here, so we can reproduce the ticket and eventually fix it :) I wouldn't mention workarounds directly on the user guide. Not because I hate users but because I hate the idea of affected users (1?, 10?, 100?) getting some sort of workable workaround that lets this issue to stay unsolved. |
How do I determine the most highlighted document? I guess I can parse the JSON with some code and look for it, though I've since deleted some books in an attempt to fix the problem. Do you mean "most highlighted" by highlight count, character count, …? |
@rixx: any document that gets an OOM is enough. If you have one and you're ok with sharing it please send me an email. I can try to reproduce on a VM. |
Good news, my exporter still runs OOM. Bad news, I don't think any individual book runs OOM:
(In my defense, these are very long, dense books and I use highlights to write up summaries.) |
Could you log |
Also, @rixx, try with this patch: diff --git a/plugins/exporter.koplugin/target/text.lua b/plugins/exporter.koplugin/target/text.lua
index dadc22849..6611fad46 100644
--- a/plugins/exporter.koplugin/target/text.lua
+++ b/plugins/exporter.koplugin/target/text.lua
@@ -9,6 +9,39 @@ local TextExporter = require("base"):new {
mimetype = "text/plain",
}
+local function fformat(booknotes)
+ local wide_space = "\227\128\128"
+ local tbl = {}
+
+ if booknotes.title then
+ table.insert(tbl, wide_space .. booknotes.title)
+ table.insert(tbl, wide_space)
+ end
+
+ for ___, entry in ipairs(booknotes) do
+ for ____, clipping in ipairs(entry) do
+ if clipping.chapter then
+ table.insert(tbl, wide_space .. clipping.chapter)
+ table.insert(tbl, wide_space)
+ end
+ local text = T(_("-- Page: %1, added on %2\n"), clipping.page, os.date("%c", clipping.time))
+ table.insert(tbl, wide_space .. wide_space .. text)
+ if clipping.text then
+ table.insert(tbl, clipping.text)
+ end
+ if clipping.note then
+ table.insert(tbl, "\n---\n" .. clipping.note)
+ end
+ if clipping.image then
+ table.insert(tbl, _("<An image>"))
+ end
+ table.insert(tbl, "\n-=-=-=-=-=-\n")
+ end
+ end
+ return tbl
+end
+
+
local function format(booknotes)
local wide_space = "\227\128\128"
local content = ""
@@ -45,8 +78,8 @@ function TextExporter:export(t)
local file = io.open(path, "a")
if not file then return false end
for __, booknotes in ipairs(t) do
- local content = format(booknotes)
- file:write(content)
+ local tbl = fformat(booknotes)
+ file:write(table.concat(tbl, "\n"))
end
file:close()
return true That should be much more performant in both memory allocations and time spent doing garbage collection. |
@pazos Nicely done, it works with your patch! Sorry don't have time to compare the different outputs though atm. |
Great, thanks for the feedback.
Not in a hurry here :) When/if you find some time please check with clips that contain chapters, notes and images (I don't really know if the latter is possible :p) |
Will do! While I'm here, is there a way to increase log levels? My kobo has been experiencing random KOreader crashes since I got it, and |
On the File Manager: tools -> more tools -> developer options -> verbose logs. It is a known issue on the Kobo Libra 2. See open tickets like #8414 and #10713. You can try some of the workarounds in there. Sadly, IIRC, there's nothing to fix the underlying issue, since we're talking about kernel deadlocks here :/ |
I see it is already solved pazos, that was fast. So no need to add anything to the guide. |
Sadly it isn't fixed yet. Markdown needs a similar change in MyClippings in Pinging @uroybd and @Mochitto as the creators (and hopefully users) of those exporters :) To avoid OOM to a certain extent we need to remove all kind of string concatenation in inner loops. String concatenation makes a bunch of allocations So we need to replace local content = ""
for index, notes in ipairs(t)
content = content .. "new content\n"
end with local tbl = {}
for index, notes in ipairs(t)
table.insert(tbl, "new content")
end
local content = table.concat(tbl, "\n") I think Memos is ok as is, but lets ping @fierceX as well :) |
I can open a PR containing the changes @pazos has recommended. Is there any other scope for optimizations? |
Will take care of the patch for my clippings as well later on today :) Thanks for solving it @pazos |
Thank you both. We're not in a hurry. Just please check while you're at it that the new code produces the same output as the old one or you're ok with the differences. I can't test myself as I don't highlight on KO.
I bet they are. Not relevant to this ticket, though |
I can confirm this is an issue with the current v2023.08 version. For reference I have 11k highlights (collecting frases).
I have not tested the patch, but I can confirm that using the json exporter works (while txt exporter crashes). |
Alright, I finally got around to comparing the outputs of the two code paths. There is a difference in output: in the new, patched version, there are added newlines surrounding the separator lines ( --- /home/rixx/highlights-unpatched.txt 2024-01-17 20:28:45.396911516 +0100
+++ /home/rixx/highlights-patched.txt 2024-01-17 20:28:35.416853066 +0100
@@ -3,11 +3,15 @@
Special Collections
-- Page: 3, added on Wed Jan 17 12:57:13 2024
+
Jude had been teaching Engineering Communication and Design online for so long that the temporary virtual classroom seemed permanent. Every year or two someone in the department brought up in-person classes, but he wasn’t surprised when the state of emergency extended another year, and they continued to wait for a return to normal that never came.
+
-=-=-=-=-=-
+
Special Collections
-- Page: 4, added on Wed Jan 17 12:58:19 2024
+
Twenty years before, he had lectured them about communication standards in complex infrastructure projects, and told them that they would spend their entire careers writing, and that they should learn to do it well. But now, Jude had no answer that meant anything to them.
--=-=-=-=-=-
+-=-=-=-=-=- |
Thank you, @rixx
These are easy to change. Just see my patch above and remove both "\n" before and after the separator lines. Feel free to submit a PR if you're happy with the changes :) |
Issue
Trying to export all highlights in my device, the process crashes when it runs out of memory. This happens with all three plain text export formats (Text, My Clippings, Markdown), whereas the JSON export works perfectly fine (which is maybe a place to start debugging?).
The text exports still produce a limited result that just stopy partway through while attempting the export. Here are all file sizes, plus the successful JSON export:
Steps to reproduce
[Step 0: have a large amount of highlights. I realise that I'm part of the problem.]
"./luajit: not enough memory"
I realise that having a ton of highlights is my fault, however, I do think KOReader can handle highlights larger than the available memory, at least when exporting in one of the plain text formats (like Text, Markdown, My Clippings, etc), as those permit sequential reads and writes, and there is no real need to keep the entire highlights db in memory to do so.
The text was updated successfully, but these errors were encountered: