Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KOReader crashes #11354

Closed
gerhaher opened this issue Jan 14, 2024 · 19 comments · Fixed by #11362
Closed

KOReader crashes #11354

gerhaher opened this issue Jan 14, 2024 · 19 comments · Fixed by #11362
Milestone

Comments

@gerhaher
Copy link

  • KOReader version: latest (I suppose)
  • Device: Kobo Sage

Issue

The crashing started when I used 'Page Browser'. Then I updated KOReader (or tried to). Crash during update process. And now I can't even start KOReader...because it crashes when I try.

crash.log

@mergen3107
Copy link
Contributor

Latest what, stable or nightly?

@gerhaher
Copy link
Author

nightly

@pazos
Copy link
Member

pazos commented Jan 14, 2024

@gerhaher
Copy link
Author

gerhaher commented Jan 14, 2024

I installed 2024.01 and the crash loop that Uncle Robin describes disappered.

But I still have the Page browser crash:
crash.log

@poire-z
Copy link
Contributor

poire-z commented Jan 14, 2024

May not really be related to Page browser (except that it may trigger it by fetching page read from statistics):

01/14/24-16:19:42 INFO  Loading plugins from directory: plugins
01/14/24-16:19:42 INFO  ReaderStatistics: Migrating DB from schema 0 to schema 20221111 ...
01/14/24-16:19:42 WARN  ReaderStatistics: A DB backup from schema 0 to schema 20221111 already exists!
01/14/24-16:19:42 ERROR Failed to initialize statistics plugin: common/lua-ljsqlite3/init.lua:60: ljsqlite3[error] no such table: page_stat
stack traceback:
    common/lua-ljsqlite3/init.lua:60: in function 'err'
    common/lua-ljsqlite3/init.lua:236: in function 'E_conn'
    common/lua-ljsqlite3/init.lua:242: in function 'T_okcode'
    common/lua-ljsqlite3/init.lua:375: in function 'prepare'
    common/lua-ljsqlite3/init.lua:387: in function 'exec'
    plugins/statistics.koplugin/main.lua:531: in function 'upgradeDBto20201010'

So, as you noticed a crash from an empty bookinfo db, I guess you may have reset book stats too?
You need to tell us more about how you came to that state - and try to reproduce without a statistics.sqlite3 file, and from a backup if you have one. May be it's some transition between versions - or from no db at all, which shouldn't be migrated - that we don't handle well?

Or it is the various other crashes you had that made the DB migration incomplete/broken/in a ill state.

@gerhaher
Copy link
Author

gerhaher commented Jan 14, 2024

May be it's some transition between versions - or from no db at all, which shouldn't be migrated - that we don't handle well?

Or it is the various other crashes you had that made the DB migration incomplete/broken/in a ill state.

Thanks.

But, maybe it is something else?

Could it be the 'Autostandby timeout' feature? It seems that something happens after 4 s - crash or screen flash.

I use 4x4 (columns/rows) in my Page browser. Usually the crash happens during the last 2-3 thumbnails.
I've also tested 2x2, 3x3, 5x5, 6x6. But it crashes only when using 4x4 or 5x4 (but not 4x5).

Anyway, with Autostandby timeout disabled, crashes and screen flashes are gone.

Here is another crash log
crash.log

@poire-z
Copy link
Contributor

poire-z commented Jan 14, 2024

Ok, there was indeed in your previous crash.log what we see in your last crash.log:

01/14/24-18:26:47 WARN  Kobo standby: the kernel refused to enter standby!
01/14/24-18:26:47 WARN  PageBrowserWidget thumbnail deserialize() failed: malformed serialized data (unexpected end of buffer)
01/14/24-18:26:51 INFO  WakeupMgr: scheduling wakeup in 9 -> 1705253220
Cannot write `standby` to file `/sys/power/state`:  Operation not permitted
01/14/24-18:26:51 WARN  Kobo standby: the kernel refused to enter standby!
01/14/24-18:26:55 INFO  WakeupMgr: scheduling wakeup in 1 -> 1705253216
Cannot write `standby` to file `/sys/power/state`:  Operation not permitted
01/14/24-18:26:56 WARN  Kobo standby: the kernel refused to enter standby!
01/14/24-18:26:56 WARN  PageBrowserWidget thumbnail deserialize() failed: malformed serialized data (unexpected end of buffer)
01/14/24-18:27:00 INFO  WakeupMgr: scheduling wakeup in 2 -> 1705253222
Cannot write `standby` to file `/sys/power/state`:  Device or resource busy
01/14/24-18:27:00 WARN  Kobo standby: the kernel refused to enter standby!
01/14/24-18:27:08 INFO  WakeupMgr: scheduling wakeup in 1 -> 1705253229
Cannot write `standby` to file `/sys/power/state`:  Operation not permitted
01/14/24-18:27:08 WARN  Kobo standby: the kernel refused to enter standby!
01/14/24-18:27:08 WARN  PageBrowserWidget thumbnail deserialize() failed: malformed serialized data (unexpected end of buffer)
01/14/24-18:27:15 INFO  WakeupMgr: scheduling wakeup in 1 -> 1705253236
Cannot write `standby` to file `/sys/power/state`:  Operation not permitted
01/14/24-18:27:16 WARN  Kobo standby: the kernel refused to enter standby!
./luajit: unexpected end of buffer
!!!!
Uh oh, something went awry... (Crash n°1: 01/14/24 @ 18:27:16)
Running FW 4.38.21908 on Linux 4.9.56 (#76 SMP PREEMPT Mon Feb 13 17:22:54 CST 2023)

No idea about the PageBrowser deserialization failure (memory issue? subprocess killed?...) - and there are a few that are just logged and do not cause a crash - before the crash happen.
Not competent with the autostandby, its scheduling and the operation not permitted.

@NiLuJe
Copy link
Member

NiLuJe commented Jan 14, 2024

Standby failing makes sense if the device is busy. It may also be randomly and permanently broken, because NTX (in which case, you need a reboot to jog things back into working order).

The PageBrowser thing looks like a short read (during deser of a LuaJIT string.buffer), so, anybody's guess as to why that happens ;p. OOM is certainly possible.

@NiLuJe
Copy link
Member

NiLuJe commented Jan 14, 2024

Standby failing makes sense if the device is busy.

And, IIRC, the runInSubprocess does NOT stall the standby requests, because it's not actually running in the main loop, so standby failing as a result of that sounds sensible.

@NiLuJe
Copy link
Member

NiLuJe commented Jan 14, 2024

There's quite a few different crashes:

On MR (and for OP):

./luajit: ./ffi/blitbuffer.lua:1097: bad argument #1 to 'ceil' (number expected, got nil)
stack traceback:
	[C]: in function 'ceil'
	./ffi/blitbuffer.lua:1097: in function 'getBoundedRect'
	./ffi/framebuffer_sunxi.lua:130: in function 'mech_refresh'
	./ffi/framebuffer_sunxi.lua:261: in function <./ffi/framebuffer_sunxi.lua:259>
	frontend/ui/uimanager.lua:1305: in function '_repaint'
	frontend/ui/uimanager.lua:1478: in function 'handleInput'
	frontend/ui/uimanager.lua:1578: in function 'run'
	./reader.lua:289: in main chunk
	[C]: at 0x000140cd

(On nightlies, going to look into that one).

OP:

./luajit: frontend/ui/widget/textboxwidget.lua:332: bad argument #2 to 'makeLine' (width must be strictly positive)
stack traceback:
	[C]: in function 'makeLine'
	frontend/ui/widget/textboxwidget.lua:332: in function '_splitToLines'
	frontend/ui/widget/textboxwidget.lua:180: in function '_computeTextDimensions'
	frontend/ui/widget/textboxwidget.lua:157: in function 'init'
	frontend/ui/widget/widget.lua:46: in function 'new'
	frontend/ui/widget/buttondialog.lua:136: in function 'init'
	frontend/ui/widget/widget.lua:46: in function 'new'
	frontend/dispatcher.lua:1110: in function 'execute'
	plugins/gestures.koplugin/main.lua:1173: in function 'handler'
	frontend/ui/widget/container/inputcontainer.lua:254: in function 'handleEvent'
	frontend/ui/uimanager.lua:907: in function 'sendEvent'
	frontend/ui/uimanager.lua:53: in function '__default__'
	frontend/ui/uimanager.lua:1436: in function 'handleInputEvent'
	frontend/ui/uimanager.lua:1534: in function 'handleInput'
	frontend/ui/uimanager.lua:1578: in function 'run'
	./reader.lua:289: in main chunk
	[C]: at 0x000140cd

(On stable!)

And a few of the string.buffer crashes (probably because a crash screwed with a cache somewhere).

@poire-z
Copy link
Contributor

poire-z commented Jan 14, 2024

Some other runInSubProcess/Trapper stuff are wrapped before in UIManager:prevent/allowStandby() (ie. coverbrowser extraction, that may run for a few minutes, or partial rerendering).
Didn't feel like PageBrowser needed that as it's launched after user interactions, and the few page thumbnail generation only take a few seconds, which felt not enough for autostandby (no idea what the common delays are).

@poire-z
Copy link
Contributor

poire-z commented Jan 14, 2024

./luajit: frontend/ui/widget/textboxwidget.lua:332: bad argument #2 to 'makeLine' (width must be strictly positive)
stack traceback:
    [C]: in function 'makeLine'
    frontend/ui/widget/textboxwidget.lua:332: in function '_splitToLines'
    frontend/ui/widget/textboxwidget.lua:180: in function '_computeTextDimensions'
    frontend/ui/widget/textboxwidget.lua:157: in function 'init'
    frontend/ui/widget/widget.lua:46: in function 'new'
    frontend/ui/widget/buttondialog.lua:136: in function 'init'
    frontend/ui/widget/widget.lua:46: in function 'new'
    frontend/dispatcher.lua:1110: in function 'execute'
    plugins/gestures.koplugin/main.lua:1173: in function 'handler'

That seems to be happening when showing a QuickMenu.
May depend on what's in it :) May be some old action that we have removed/renamed?
It was also happening in the OP crash.log some weeks earlier with v2023.10 - so nothing new.

@NiLuJe
Copy link
Member

NiLuJe commented Jan 14, 2024

(no idea what the common delays are).

It's 4s. Some crazy people (;p) set it as low as 2.

I'd wager the fences are warranted here (and we can't really do it from within runInSubprocess, because it's in base and the fences are in UIManager :s).

@NiLuJe
Copy link
Member

NiLuJe commented Jan 14, 2024

And a few of the string.buffer crashes (probably because a crash screwed with a cache somewhere).

Hard to say where without verbose debug logs, FWIW (wink, wink, nudge, nudge ;)).

NiLuJe added a commit to NiLuJe/koreader-base that referenced this issue Jan 14, 2024
Compute the proper dimensions instead, as getBoundedRect noi longer
accepts nils ;).

Re: koreader/koreader#11354
NiLuJe added a commit to NiLuJe/koreader-base that referenced this issue Jan 14, 2024
Compute the proper dimensions instead, as getBoundedRect no longer
accepts nil ;).

Re: koreader/koreader#11354
@NiLuJe
Copy link
Member

NiLuJe commented Jan 14, 2024

And a few of the string.buffer crashes (probably because a crash screwed with a cache somewhere).

Hard to say where without verbose debug logs, FWIW (wink, wink, nudge, nudge ;)).

Stranger still is that it's from an unprotected call, and one that doesn't appear to come from persist. We have very few string.buffer users outside of persist...

Hard to pinpoint without verbose debug logs ;).

NiLuJe added a commit to koreader/koreader-base that referenced this issue Jan 14, 2024
Compute the proper dimensions instead, as getBoundedRect no longer
accepts nil ;).

Re: koreader/koreader#11354
@poire-z
Copy link
Contributor

poire-z commented Jan 14, 2024

@gerhaher : with the current version and before you apply any other fix, could you try replacing frontend/apps/reader/modules/readerthumbnail.lua with this one (removing the .txt extension): readerthumbnail.lua.txt
which contains this modification (quite hard to be sure these points are the right places to put them, and I couldn't managed to get anything not messed up without adding a flag):

--- a/frontend/apps/reader/modules/readerthumbnail.lua
+++ b/frontend/apps/reader/modules/readerthumbnail.lua
@@ -279,4 +279,8 @@ end

 function ReaderThumbnail:ensureTileGeneration()
+    if not self._standby_prevented then
+        UIManager:preventStandby()
+        self._standby_prevented = true
+    end
     local has_pids_still_to_collect = self:collectPids()

@@ -319,4 +323,9 @@ function ReaderThumbnail:ensureTileGeneration()
     if self.req_in_progress or has_pids_still_to_collect or next(self.thumbnails_requests) then
         self._ensureTileGeneration_action()
+    else
+        if self._standby_prevented then
+            self._standby_prevented = false
+            UIManager:allowStandby()
+        end
     end
 end

Just curious if this is enough to prevent your strange PageBrowser issues

@NiLuJe
Copy link
Member

NiLuJe commented Jan 14, 2024

(quite hard to be sure these points are the right places to put them, and I couldn't managed to get anything not messed up without adding a flag)

Yeah, it's essentially a refcount, so they need to be balanced (i.e., an allow for every prevent), which is probably easier said than done depending on how the code flows ;p.

@gerhaher
Copy link
Author

Hard to pinpoint without verbose debug logs ;).

crash.log

Just curious if this is enough to prevent your strange PageBrowser issues

Thanks!
It seems to work. I could not trigger a single crash with that.

@NiLuJe
Copy link
Member

NiLuJe commented Jan 15, 2024

Welp, that really seems to be coming from inside the pcall (from persist's deserialize), huh.

@poire-z poire-z added this to the 2024.02 milestone Jan 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants