Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bmcweb service crashes when using KVM over GUI #80

Closed
gkeishin opened this issue Apr 26, 2019 · 9 comments
Closed

bmcweb service crashes when using KVM over GUI #80

gkeishin opened this issue Apr 26, 2019 · 9 comments

Comments

@gkeishin
Copy link
Member

root@witherspoon:~# cat /etc/os-release 
ID="openbmc-phosphor"
NAME="Phosphor OpenBMC (Phosphor OpenBMC Project Reference Distro)"
VERSION="2.7.0-dev"
VERSION_ID="2.7.0-dev-498-gec97c3bc7"
PRETTY_NAME="Phosphor OpenBMC (Phosphor OpenBMC Project Reference Distro) 2.7.0-dev"
BUILD_ID="2.7.0-dev"
OPENBMC_TARGET_MACHINE="witherspoon"
Apr 26 14:11:39 witherspoon obmc-ikvm[1281]: 26/04/2019 14:11:39 Statistics             events    Transmit/ RawEquiv ( saved)
Apr 26 14:11:39 witherspoon obmc-ikvm[1281]: 26/04/2019 14:11:39  FramebufferUpdate   :      4 |         0/        0 (  0.0%)
Apr 26 14:11:39 witherspoon obmc-ikvm[1281]: 26/04/2019 14:11:39  NewFBSize           :      2 |        24/       24 (  0.0%)
Apr 26 14:11:39 witherspoon obmc-ikvm[1281]: 26/04/2019 14:11:39  RichCursor          :      1 |      1684/     1684 (  0.0%)
Apr 26 14:11:39 witherspoon obmc-ikvm[1281]: 26/04/2019 14:11:39  LastRect            :   6521 |     78252/    78252 (  0.0%)
Apr 26 14:11:39 witherspoon obmc-ikvm[1281]: 26/04/2019 14:11:39  tight               :   6530 | 168933027/-941362664 (  0.0%)
Apr 26 14:11:39 witherspoon obmc-ikvm[1281]: 26/04/2019 14:11:39  TOTALS              :  13058 | 169012987/-941282704 (  0.0%)
Apr 26 14:11:39 witherspoon obmc-ikvm[1281]: 26/04/2019 14:11:39 Statistics             events    Received/ RawEquiv ( saved)
Apr 26 14:11:39 witherspoon obmc-ikvm[1281]: 26/04/2019 14:11:39  KeyEvent            :     32 |       256/      256 (  0.0%)
Apr 26 14:11:39 witherspoon obmc-ikvm[1281]: 26/04/2019 14:11:39  PointerEvent        :    344 |      2064/     2064 (  0.0%)
Apr 26 14:11:39 witherspoon obmc-ikvm[1281]: 26/04/2019 14:11:39  FramebufferUpdate   :   1718 |     17180/    17180 (  0.0%)
Apr 26 14:11:39 witherspoon obmc-ikvm[1281]: 26/04/2019 14:11:39  SetEncodings        :      1 |        68/       68 (  0.0%)
Apr 26 14:11:39 witherspoon obmc-ikvm[1281]: 26/04/2019 14:11:39  SetPixelFormat      :      1 |        20/       20 (  0.0%)
Apr 26 14:11:39 witherspoon obmc-ikvm[1281]: 26/04/2019 14:11:39  TOTALS              :   2096 |     19588/    19588 (  0.0%)
Apr 26 14:11:39 witherspoon systemd[1]: bmcweb.service: Main process exited, code=killed, status=11/SEGV
Apr 26 14:11:39 witherspoon systemd[1]: bmcweb.service: Failed with result 'signal'.
Apr 26 14:11:47 witherspoon systemd[1]: Started Start bmcweb server.
Apr 26 14:12:04 witherspoon obmc-ikvm[1281]: 26/04/2019 14:12:04 Got connection from client 127.0.0.1
Apr 26 14:12:04 witherspoon obmc-ikvm[1281]: 26/04/2019 14:12:04   other clients:
Apr 26 14:12:04 witherspoon systemd-journald[819]: Forwarding to syslog missed 21 messages.
Apr 26 14:12:04 witherspoon obmc-ikvm[1281]: 26/04/2019 14:12:04 Normal socket connection
Apr 26 14:12:04 witherspoon obmc-ikvm[1281]: Failed to write pointer report

All I did was following:

  • Login to KVM GUI console and login to the guest host OS
  • switch to different GUI page and come back to KVM console view

and and I see the bmcweb service crashing out

@gkeishin
Copy link
Member Author

@eddiejames @gtmills ^^^

@gtmills
Copy link
Member

gtmills commented May 7, 2019

@edtanous You seen this before? Any thoughts?

@edtanous
Copy link
Contributor

edtanous commented May 7, 2019

Yep, we've seen it. It's a flow control issue with how KVM was architected. My understanding of the problem here is that libvncserver has no flow control, and pushes frames as fast as the hardware can produce them. This is fine if bmcweb and the TLS connection can keep up with the rate the frames are coming. The problem shows up if the TLS and network connections go slower than the available frames (think cellular connection speeds). In that case, bmcweb has no way to adjust the flow control or block frames, as they would otherwise just stack up in the unix socket, which is just as fatal.
Eventually, the buffer overcomes the BMC memory limits, and bmcweb gets killed to reclaim the memory.

In my original design, I had put the RFB server inside bmcweb, so it would have information about the framerate, and the buffer sizes, and only pull frames out of the hardware as the link allowed bandwidth for. This idea was dropped in Lieu of the libvncserver -> unix socket-> bmcweb -> websocket approach. Jae is working on some fixes in this area, using the existing design, and I'm really interested to see what he comes up with. If he's not successful, we might have to resurrect some of the code below to implement flow control.

https://gerrit.openbmc-project.xyz/c/openbmc/bmcweb/+/16976/8/include/web_kvm.hpp#b16

@geissonator
Copy link
Contributor

Please keep us in the loop on any updates @yoojae

@jaehyoo
Copy link
Contributor

jaehyoo commented May 13, 2019

@gkeishin
Copy link
Member Author

@yoojae the changes looks good when I tested them out. Thanks

@gkeishin
Copy link
Member Author

We did another downstream build with the latest Master and seems we hit BMC web crashing out but only when we do few the BMC web GUI refresh and switching GUI options back and forth a bit.

Jun 13 05:33:21 openbmc obmc-ikvm[1264]: 13/06/2019 05:33:21  SetPixelFormat      :      1 |        20/       20 (  0.0%)
Jun 13 05:33:21 openbmc obmc-ikvm[1264]: 13/06/2019 05:33:21  TOTALS              :     42 |       488/      488 (  0.0%)
Jun 13 05:33:22 openbmc systemd[1]: bmcweb.service: Main process exited, code=killed, status=11/SEGV
Jun 13 05:33:22 openbmc systemd[1]: bmcweb.service: Failed with result 'signal'.
Jun 13 05:33:30 openbmc systemd[1]: Started Start bmcweb server.

any recent changes we should be looking at ? @yoojae

@jaehyoo
Copy link
Contributor

jaehyoo commented Jun 13, 2019 via email

@gkeishin
Copy link
Member Author

Yup we picked those commits when I re-ran the test again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants