Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Semi freezes after a period of time #5

Closed
rien333 opened this issue Apr 19, 2020 · 23 comments
Closed

Semi freezes after a period of time #5

rien333 opened this issue Apr 19, 2020 · 23 comments

Comments

@rien333
Copy link

rien333 commented Apr 19, 2020

I run inkvt over ssh. (for some reason, I can't kfmon to get inkvt show up in nikel, though the kfmon logs do show that inkvt is being registered, without errors). Though inkvt runs fine for a while, after a seemingly random — but short — period of time, inkvt freezes the screen. In addition to freezing the screen, I can't acces the underlying application anymore (e.g. if I would swipe while inkvt is running, the Plato reader would normally update the screen and show the next page). There are no errors shown in my ssh terminal, and I can still start new ssh sessions on the Kobo after the occurrence of the freeze. Oddly, if inkvt is in its frozen state, I can still send keypresses over http and ssh, but it takes a while for them to arrive.

To my knowledge, I followed your instructions carefully. Do you have any idea of what might be going wrong, and where/how to look for errors?

I run inkvt as as follows:

$ ssh $KOBO_ADDRESS  # login to KOBO
$ /mnt/onboard/.adds/inkvt
$ ./inkvt.armhf # the same happens if I run ./inkvt.sh
[FBInk] Detected a Kobo Clara HD (376 => Nova @ Mark 7)
[FBInk] Enabled Kobo Mark 7 quirks
[FBInk] Clock tick frequency appears to be 100 Hz
[FBInk] Screen density set to 300 dpi
[FBInk] Variable fb info: 1072x1448, 32bpp @ rotation: 3 (Counter Clockwise, 270°)
[FBInk] Fontsize set to 24x48 (Terminus base glyph size: 8x16)
[FBInk] Line length: 44 cols, Page size: 30 rows
[FBInk] Vertical fit isn't perfect, shifting rows down by 4 pixels
[FBInk] Fixed fb info: ID is "mxc_epdc_fb", length of fb mem: 6782976 bytes & line length: 4352 bytes
[FBInk] Pen colors set to #000000 for the foreground and #FFFFFF for the background

Generally, there are no errors upon the moment it freezes. This one time, inkvt did tell me the following (though this could be completely unrelated):

[FBInk] MXCFB_SEND_UPDATE_V2: Invalid argument!
[FBInk] update_region={top=340, left=0, width=1072, height=1104}!
[FBInk] Failed to refresh the screen!

System info

I used the same compiler as you suggested to compile inkvt. I have a KOBO Clara HD, with the latest 2019 firmware (I haven't checked for firmware since 2020).

@llandsmeer
Copy link
Owner

Hi rien333,

Thanks for the detailed issue!

I do not have the time now to dive into this directly, but maybe the problem is interference between Plato and inkvt? When you start inkvt.sh in Nickel, inkvt kills Nickel and takes control over the framebuffer. Plato, instead, will still be alive in the background. It might be that plato updates the framebuffer state while inkvt is running. However, that has never been a problem for me with KOReader. I'll try to reproduce the bug when I'm back at home.

Maybe you have found out that keyboard over HTTP is a bit unstable too in general, I'm still trying to figure out a better way to fix this (maybe more into the direction of using the Kobo as VNC screen)

For this:

MXCFB_SEND_UPDATE_V2: Invalid argument!

Maybe @NiLuJe known that that means? Update region seems to be within the screen resolution. Sorry to chip you in if you don't want to be fixing problems here :)

Kind regards,
Lennart

@rien333
Copy link
Author

rien333 commented Apr 19, 2020

maybe the problem is interference between Plato and inkvt

Unfortunately, the same thing happens with Nikel.

When you start inkvt.sh in Nickel, inkvt kills Nickel and takes control over the framebuffer ... It might be that plato updates the framebuffer state while inkvt is running.

Strangely, Nickel seems to freeze faster (though this could be purely subjective). I'll experiment with ensuring that inkvt runs with full control over the framebuffer. So far, this has failed, probably because I can't get inkvt to properly work with fmon (it doesn't show up in my books, for some reason). I'll try and sort that out first.

Maybe you have found out that keyboard over HTTP is a bit unstable too in general, I'm still trying to figure out a better way to fix this

ssh seems pretty stable and fairly straightforward, actually. (though it has some obvious drawbacks) It even forwards key combos correctly.

@NiLuJe
Copy link
Contributor

NiLuJe commented Apr 19, 2020

Plato (EDIT: and Nickel, duh.) can definitely do hardware rotations, something which will definitely break inkvt right now (there'd need to be a checked fbink_reinit() + state refresh + whatever else might be needed to resync the new state w/ libvterm's state) at key places to deal with it; but dealing with it could arguably be construed as out of scope ^^).

That would explain that one specific error log (and the subsequent lack of inkvt refreshes ;)).

(FWIW, for legacy reasons, KOReader doesn't do hardware rotation, on the other hand. It'll setup the fb at startup and exit, but that's it).


I haven't actually tried the KFMon script, but it looked sane, I'll double-check ;).


Sidebar on compilers: https://github.com/koreader/koxtoolchain (TL;DR: kobo if you want a non-sucky GCC version, at the expense of needing to link against the STL statically; nickel if you want a sucky GCC version with which you'll be able to link against the STL dynamically).

Ubuntu TCs might work in theory, but target a far too recent glibc, which will likely lead to stuff randomly breaking in fun and interesting ways at load or runtime.

The official TC binaries provided by Kobo are essentially an old binary build of what koxtoolchain's nickel target will do, but will obviously do the job, if you can get them working on your system (never tried).

@NiLuJe
Copy link
Contributor

NiLuJe commented Apr 19, 2020

As usual, I'd check what dmesg & htop have to say about whatever's happening next time you can replicate this.

(It's conceivable that a broken MXCFB request could softlock the device. I should be dropping all the known offenders inside FBInk, but, who knows. dmesg should be helpful if that's actually the case).

@rien333
Copy link
Author

rien333 commented Apr 19, 2020

As usual, I'd check what dmesg & htop have to say about whatever's happening next time you can replicate this.

Good one! Totally forgot that dmesg is available on the Kobo. As far as htop goes, ps does show inkvt[.sh] as still running after the freeze. I could also kill it, but the screen's contents remained the same.

@NiLuJe
Copy link
Contributor

NiLuJe commented Apr 19, 2020

I was mainly wondering about something stuck in a busy-loop, since you mentioned some stuff behaving slower ;).

@llandsmeer
Copy link
Owner

@rien333 How fast is fast? :)

I've been typing things in invt and frantically rotating my Libra H2O for quite a while now, over SSH in KOReader, but everything is still working fine.. (it did hang shortly after issueing a dmesg, but it started working after a few seconds again)..

Could you still maybe send a dmesg | tail -n 20 output? A photo would be fine too. And attach you inkvt.armhf binary, so I can test it on my device. Maybe the build environment is doing something strange (?)

Firmware Mark 7

Looks good, that what I use too during development

ssh seems pretty stable and fairly straightforward, actually. (though it has some obvious drawbacks) It even forwards key combos correctly.

Oh I thought you were using my Keyboard over HTTP hack :) Yeah ssh works quite straightforward :)

[..] In addition to freezing the screen, I can't acces the underlying application anymore (e.g. if I would swipe while inkvt is running, the Plato reader would normally update the screen and show the next page).

That doesn't sound like only inkvt is hanging.. (its not hooking into evdev, unless you did edit the main.cpp file to do that).

Oddly, if inkvt is in its frozen state, I can still send keypresses over http and ssh, but it takes a while for them to arrive.

So the display still updates? Very strange..

@NiLuJe
Copy link
Contributor

NiLuJe commented Apr 21, 2020

@llandsmeer: KOReader doesn't do hardware rotation, as such, it won't trash the state behind inkvt's back like Nickel or Plato can ;).

I alluded a bit to it in my original answer, but FBInk has a mechanism to deal with that:

fbink_reinit, which basically does an ioctl to see if the fb state changed. If it did in a significant way (depth/rotate), it updates its internal state, and (since fairly recently) returns a specific value to the caller to mention that fact. (Otherwise, it does nothing more than that ioctl & an early successful return).

In inkvt's case, in such instances, you'd also have to reissue an fbink_state_dump to update inkvt's own copy of that, and then make libvterm aware of the new layout.

That said, whether you want to deal with that or not to begin with is debatable: in "normal" conditions, killing Nickel ensures something like that won't happen ;).

@rien333
Copy link
Author

rien333 commented Apr 21, 2020

@rien333 How fast is fast? :)

Think in units of 10 seconds — no freezes are immediate, and I have used inkvt without problems for what felt like more than a minute, almost 2 minutes.

Could you still maybe send a dmesg | tail -n 20 output? A photo would be fine too. And attach you inkvt.armhf binary, so I can test it on my device. Maybe the build environment is doing something strange (?)

I was planning on reinstalling and recompiling everything in accordance with the recently updated instructions. Do you think you'll still find logs and binaries from my failed installation useful?

rien333: Oddly, if inkvt is in its frozen state, I can still send keypresses over http and ssh, but it takes a while for them to arrive.
llandsmeer: So the display still updates? Very strange..

No, sorry for creating confusion. The kobo's display itself remains frozen, but if I were to host a tmux session on my PC, and then connect the kobo to this tmux session through ssh, that ssh session will still revieve keypresses send through inkvt, albeit with a significant delay and some other glitchiness. (I found this out, because the tmux session hosted on my PC suddenly started to move a bit after me trying to send keypresses during a freeze) (doesn't matter though, just a random observation)

I'll post some logs/new results tomorrow!

@NiLuJe
Copy link
Contributor

NiLuJe commented Apr 21, 2020

I was planning on reinstalling and recompiling everything in accordance with the recently updated instructions. Do you think you'll still find logs and binaries from my failed installation useful?

If it's no bother, it certainly can't hurt ;).

@rien333
Copy link
Author

rien333 commented Apr 21, 2020

I ran htop and dmesg on my failed installation (basically, the one I described in my opening post).

This is what dmesg outputs after running inkvt.sh (while plato is running, and with messages from RTL871X stripped out):

PMU:STATUS= 6: IBAT= -2: VSYS= 4337500: VBAT= 4105250: DSOC= 10000: RSOC= 9800: cc_delta=72: rrf= 1
tps6518x_get_temperature():temperature = 25
# this is probaly where I started inkvt
imx_epdc_v2_fb 20f4000.epdc: collision detected, can not do REAGl/-D
imx_epdc_v2_fb 20f4000.epdc: collision detected, can not do REAGl/-D
imx_epdc_v2_fb 20f4000.epdc: collision detected, can not do REAGl/-D
imx_epdc_v2_fb 20f4000.epdc: collision detected, can not do REAGl/-D
PMU: ricoh61x_displayed_work Full-Clear CC, PSWR(100)
PMU:STATUS= 6: IBAT= -2: VSYS= 4331500: VBAT= 4101000: DSOC= 10000: RSOC= 9800: cc_delta=72: rrf= 1
imx_epdc_v2_fb 20f4000.epdc: collision detected, can not do REAGl/-D
imx_epdc_v2_fb 20f4000.epdc: collision detected, can not do REAGl/-D
# the message above is repeated a ton of times
...
imx_epdc_v2_fb 20f4000.epdc: Ignoring collision withnewer update.
imx_epdc_v2_fb 20f4000.epdc: collision detected, can not do REAGl/-D
imx_epdc_v2_fb 20f4000.epdc: collision detected, can not do REAGl/-D [∞x]
PMU: ricoh61x_displayed_work Full-Clear CC, PSWR(100)
PMU:STATUS= 6: IBAT= -2: VSYS= 4333250: VBAT= 4100700: DSOC= 10000: RSOC= 9800: cc_delta=72: rrf= 1
...
imx_epdc_v2_fb 20f4000.epdc: collision detected, can not do REAGl/-D
imx_epdc_v2_fb 20f4000.epdc: TCE underrun! Will continue to update panel
imx_epdc_v2_fb 20f4000.epdc: TCE underrun! Will continue to update panel
imx_epdc_v2_fb 20f4000.epdc: collision detected, can not do REAGl/-D
imx_epdc_v2_fb 20f4000.epdc: Ignoring collision withnewer update.
imx_epdc_v2_fb 20f4000.epdc: collision detected, can not do REAGl/-D
imx_epdc_v2_fb 20f4000.epdc: Ignoring collision withnewer update.
imx_epdc_v2_fb 20f4000.epdc: collision detected, can not do REAGl/-D
...
# maybe this is where I rebooted?
imx_epdc_v2_fb 20f4000.epdc: collision detected, can not do REAGl/-D
imx_epdc_v2_fb 20f4000.epdc: collision detected, can not do REAGl/-D
PMU: ricoh61x_displayed_work Full-Clear CC, PSWR(100)
PMU:STATUS= 6: IBAT= -2: VSYS= 4335500: VBAT= 4100400: DSOC= 10000: RSOC= 9800: cc_delta=72: rrf= 1
imx_epdc_v2_fb 20f4000.epdc: collision detected, can not do REAGl/-D
imx_epdc_v2_fb 20f4000.epdc: Timed out waiting for update completion
imx_epdc_v2_fb 20f4000.epdc: Timed out waiting for update completion
imx_epdc_v2_fb 20f4000.epdc: Timed out waiting for update completion
imx_epdc_v2_fb 20f4000.epdc: Timed out waiting for update completion
imx_epdc_v2_fb 20f4000.epdc: Timed out waiting for update completion
PMU: ricoh61x_displayed_work Full-Clear CC, PSWR(100)
PMU:STATUS= 6: IBAT= -2: VSYS= 4328500: VBAT= 4094600: DSOC= 10000: RSOC= 9800: cc_delta=72: rrf= 1
imx_epdc_v2_fb 20f4000.epdc: Timed out waiting for update completion
imx_epdc_v2_fb 20f4000.epdc: Timed out waiting for update completion

This seems to be largely with you guys predicted, inkvt has trouble drawing while plato is also using the screen (btw: I used this hacky Kobo terminal before, and I was able to run it alongside plato). Practically all the messages consisted of the "collision detected" message, though I also included the other messages I was able to find.
htop wasn't too interesting either. CPU and memory usage were low both before and after inkvt froze (~1% iirc)

Gonna try the new and improved installation instructions now!

@NiLuJe
Copy link
Contributor

NiLuJe commented Apr 21, 2020

The collisions are somewhat to be expected given the patterns of ioctl generated by InkVT, so those don't worry me too much.

The timeouts, on the other hand, is where things start to get interesting. What's interesting is that InkVT itself never waits on update completion, which tells me that those are actually Plato's refreshes going wonky.

I'm not quite sure how that could come to be, unless something really upset the EPDC, leading to one of the softlocks I mentioned earlier. (In which case, reboot to recover. Or an mxcfb uninit/init might work, but I've never tried).
In which case, I'd be very interested in more details, ideally with an exact trace of the offending ioctl (i.e., wrapping inkvt in a strace -fitv -e trace=ioctl).

@llandsmeer
Copy link
Owner

@llandsmeer: KOReader doesn't do hardware rotation, as such, it won't trash the state behind inkvt's back like Nickel or Plato can ;).

Yeah I tried getting inkvt to launch with Nickel or Plato running yesterday as I thought that would make the difference, but I even couldn't get SSH access working in a day.. (I just always use KOReader's dropbear version)

I alluded a bit to it in my original answer, but FBInk has a mechanism to deal with that:

fbink_reinit, which basically does an ioctl to see if the fb state changed. If it did in a significant way (depth/rotate), it updates its internal state, and (since fairly recently) returns a specific value to the caller to mention that fact. (Otherwise, it does nothing more than that ioctl & an early successful return).

In inkvt's case, in such instances, you'd also have to reissue an fbink_state_dump to update inkvt's own copy of that, and then make libvterm aware of the new layout.

That said, whether you want to deal with that or not to begin with is debatable: in "normal" conditions, killing Nickel ensures something like that won't happen ;).

Thank you very much (again, and also for the pull requests! 😄). A single extra ioctl() per draw request doesn't sound that bad. I think I'll hide it behind a env variable/argv for inkvt.sh (for people launcing it in the right ™️ way).

@llandsmeer
Copy link
Owner

The timeouts, on the other hand, is where things start to get interesting. What's interesting is that InkVT itself never waits on update completion, which tells me that those are actually Plato's refreshes going wonky.

Maybe this is the problem for Plato (eg. it doesnt' expect to be running in 8bit mode)?

echo "Restoring original fb bitdepth @ ${ORIG_FB_BPP}bpp & rotation @ ${ORIG_FB_ROTA}" >>crash.log 2>&1
./fbdepth -d "${ORIG_FB_BPP}" -r "${ORIG_FB_ROTA}" >>crash.log 2>&1

@NiLuJe
Copy link
Contributor

NiLuJe commented Apr 21, 2020

Oh, if @rien333 launched inkvt via the script, definitely ^^.

(I'd naively assumed he was running the binary directly by hand... :D).

@llandsmeer
Copy link
Owner

This is what dmesg outputs after running inkvt.sh (while plato is running, and with messages from RTL871X stripped out):

I guess so... maybe inkvt.sh should include a error message if it finds plato running

@rien333
Copy link
Author

rien333 commented Apr 21, 2020

Oh, if @rien333 launched inkvt via the script, definitely ^^.
(I'd naively assumed he was running the binary directly by hand... :D).

Basically, I did something like this: from ssh, I ran /mnt/onboard/.adds/inkvt/inkvt.sh (with plato active, not 100% about the path, but I think you get what I mean)

I guess so... maybe inkvt.sh should include a error message if it finds plato running

Maybe. Though hopefully most users will just launch inkvt normally.

@NiLuJe
Copy link
Contributor

NiLuJe commented Apr 21, 2020

Okay, then, yeah, that'll definitely break Plato ;).

llandsmeer added a commit that referenced this issue Apr 21, 2020
@llandsmeer
Copy link
Owner

Could reproduce a freeze at my kobo by issueing a fbdepth -r 1 inside inkvt

[...]
[FBInk] MXCFB_SEND_UPDATE_V2: Invalid argument!
[FBInk] update_region={top=8, left=0, width=1264, height=1664}!
[FBInk] Failed to refresh the screen!
[...]

But thats after the fbink_reinit() patch... still working on it :)

@llandsmeer
Copy link
Owner

llandsmeer commented Apr 21, 2020

Ah that was a deployment problem (new inkvt.armhf binary wasn't overwritten..). With 1cd991c, inkvt handles rotations kind of ok (only inversions which do not change WxH screen size)

Now I'll have to figure out how to fix the deployment

@rien333
Copy link
Author

rien333 commented Apr 21, 2020

Okay, I got everything to work, including kfmon integration. Maybe my mistake last time was not doing the import from nickel. CPU usage seems to be a lot higher compared to my last attempt (htop shows inkvt.armhf at 23%, as opposed to ~1%).

I've seen one freeze very similar to the one described in this thread, but now everything has been running smoothly for at least a few minutes.

Thanks again for this cool project. I hope it will see other interesting improvements!

I guess all my problems outlined in this thread have been resolved by the new and improved instructions (I like the auto-generated zip, very easy to install!). If I have any new, concrete problems I'll open a new issue.

@rien333 rien333 closed this as completed Apr 21, 2020
llandsmeer added a commit that referenced this issue Apr 22, 2020
One really nice befenit of this, is that we get screen rotations
working for free (using fbdepth -r <n>)! (#5).
llandsmeer added a commit that referenced this issue Apr 22, 2020
Do not slow down inkvt when its being started in a
sane environment. I'm not sure if its actually slower
though, with all those fbink_reinit() calls.

We also lose the ability to handle manual screen rotations,
so maybe revert in the future?

(#5)
@llandsmeer
Copy link
Owner

Those last two commits should make inkvt handle arbitrary screen rotations when started from SSH, but keep functioning as before when launched from Nickel with kfmon (maybe that should fix the increased CPU usage, but I'm not sure)
I also think that constantly refreshing htop type payload will cause higher CPU usage regardless..

I guess all my problems outlined in this thread have been resolved by the new and improved instructions (I like the auto-generated zip, very easy to install!). If I have any new, concrete problems I'll open a new issue.

Yes I'm very happy with that too (thanks @NiLuJe 😄). I think this thread brought some nice improvements to inkvt, thanks for showing you interest in this project :)

@NiLuJe
Copy link
Contributor

NiLuJe commented Apr 22, 2020

The actual userland logic in fbink_reinit is pretty minimal (basically two integer comparisons), so the main bottleneck will be the ioctl itself.

Not quite sure there's any better 'spot' to put it in inkvt, though ;).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants