Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Complete periodic freezes with intel graphics #4641

Open
rcorre opened this issue Mar 12, 2019 · 32 comments

Comments

Projects
None yet
7 participants
@rcorre
Copy link
Collaborator

commented Mar 12, 2019

Version info (see :version):

qutebrowser v1.6.0
Git commit: 
Backend: QtWebEngine (Chromium 69.0.3497.128)

CPython: 3.7.2
Qt: 5.12.1
PyQt: 5.12

sip: 4.19.14
colorama: 0.4.1
pypeg2: 2.15
jinja2: 2.10
pygments: 2.3.1
yaml: 3.13
cssutils: no
attr: 18.2.0
PyQt5.QtWebEngineWidgets: yes
PyQt5.QtWebKitWidgets: no
pdf.js: no
sqlite: 3.27.2
QtNetwork SSL: OpenSSL 1.1.1b  26 Feb 2019

Style: QFusionStyle
Platform: Linux-5.0.0-arch1-1-ARCH-x86_64-with-arch, 64bit
Linux distribution: Arch Linux (arch)
Frozen: False
Imported from /usr/lib/python3.7/site-packages/qutebrowser
Using Python from /usr/bin/python3
Qt library executable path: /usr/lib/qt/libexec, data path: /usr/share/qt

Paths:
cache: /home/rodencor/.cache/qutebrowser
config: /home/rodencor/.config/qutebrowser
data: /home/rodencor/.local/share/qutebrowser
runtime: /run/user/1000/qutebrowser
system data: /usr/share/qutebrowser

Other info:

[rodencor@karminac ~]$ pacman -Qs xf86-video-intel
local/xf86-video-intel 1:2.99.917+859+g33ee0c3b-1 (xorg-drivers)
    X.org Intel i810/i830/i915/945G/G965+ video drivers

[rodencor@karminac ~]$ xrandr | head -3
Screen 0: minimum 8 x 8, current 3840 x 2160, maximum 32767 x 32767
eDP1 connected primary 3840x2160+0+0 (normal left inverted right x axis y axis) 350mm x 190mm
   3840x2160     60.00*+  59.97

[rodencor@karminac ~]$ lspci | grep VGA
00:02.0 VGA compatible controller: Intel Corporation HD Graphics 630 (rev 04)

[rodencor@karminac ~]$ echo $QT_SCALE_FACTOR
2

Window manager: DWM

Does the bug happen if you start with --temp-basedir?:

Yes. Probably. When there is one regular instance and one temp-basedir instance, I've seen both freeze.
I've yet to repro with nothing but a temp-basedir open.

Description

Within a few minutes of starting, qutebrowser will completely hang.

  • It will not draw to the display at all
  • It will not respond to any inputs
  • It cannot be closed via the WM
  • It can be moved between workspaces, but will not continue drawing
  • It will not respond to SIGINT (but stops with kill -9)
  • It can be stopped with kill -9

Nothing in particular seems to trigger this. I can leave it to sit in another workspace for awhile, and it will be frozen when I come back to it.
The logs show nothing suspicious (see attached). I tried rolling back to the version of xf86-video-intel before I noticed this, but it still reproes.

It happens pretty frequently on my intel/4k laptop, but I have yet to notice this on my NVIDIA/1920x1080 desktop (which is an otherwise similarly configured and updated Arch machine).

I haven't tried to rollback QT yet. The last Arch update seemed to change how some of the qt/pyqt/webengine packages are organized, so it may be tricky.

In the attached screenshot, the small thing that looks like a terminal is actually qutebrowser, but the terminal underneath is drawing into it (or something?).

How to reproduce

1552398468

log.txt

@The-Compiler

This comment has been minimized.

Copy link
Collaborator

commented Mar 12, 2019

Oh, so it's not just me... I've been seeing the same (Intel graphics too), but haven't been able to find out more yet...

@The-Compiler

This comment has been minimized.

Copy link
Collaborator

commented Mar 12, 2019

Do you have any idea around when this started? I've been seeing it since a week or two, and I've been using Qt 5.12 since the Beta in November (and 5.12.1 since mid-February) - so I don't think it's that. My last xf86-video-intel upgrade is also further back (February 25th).

Maybe it's the linux kernel, either 4.20.12 -> 4.20.13, or 4.20.13 -> 5.0?

@rcorre

This comment has been minimized.

Copy link
Collaborator Author

commented Mar 12, 2019

I think it started after my last update. Here's some packages that might be related:

[2019-03-11 10:28] [ALPM] upgraded pyqt5-common (5.11.3-4 -> 5.12-2)
[2019-03-11 10:28] [ALPM] upgraded python-sip-pyqt5 (4.19.13-1 -> 4.19.14-1)
[2019-03-11 10:28] [ALPM] upgraded python-pyqt5 (5.11.3-4 -> 5.12-2)
[2019-03-11 10:28] [ALPM] upgraded python-sip (4.19.13-1 -> 4.19.14-1)
[2019-03-11 10:28] [ALPM] upgraded qt5-webengine (5.12.1-1 -> 5.12.1-3)
[2019-03-11 10:28] [ALPM] installed pyqtwebengine-common (5.12-2)
[2019-03-11 10:28] [ALPM] installed python-pyqtwebengine (5.12-2)
[2019-03-11 10:28] [ALPM] upgraded qutebrowser (1.5.2-3 -> 1.6.0-1)
[2019-03-11 10:28] [ALPM] upgraded linux (4.20.7.arch1-1 -> 5.0.arch1-1)
[2019-03-11 10:28] [ALPM] upgraded linux-firmware (20190118.a8b75ca-1 -> 20190212.28f5f7d-1)

[2019-03-11 10:28] [ALPM] upgraded xf86-video-intel (1:2.99.917+859+g33ee0c3b-1 -> 1:2.99.917+860+g3a2dec17-1)
[2019-03-11 10:28] [ALPM] upgraded xorg-mkfontscale (1.1.3-1 -> 1.2.0-2)
[2019-03-11 10:28] [ALPM] upgraded xorg-server-common (1.20.3-1 -> 1.20.4-1)
[2019-03-11 10:28] [ALPM] upgraded xorg-server (1.20.3-1 -> 1.20.4-1)
[2019-03-11 10:28] [ALPM] upgraded xorg-xev (1.2.2-2 -> 1.2.3-1)
[2019-03-11 10:28] [ALPM] upgraded xorg-xrdb (1.1.1-1 -> 1.2.0-1)
[2019-03-11 10:28] [ALPM] upgraded xorg-xinit (1.4.0-3 -> 1.4.1-1)
@The-Compiler

This comment has been minimized.

Copy link
Collaborator

commented Mar 13, 2019

Just had it happen again while viewing a Travis CI log. Killing the main process with a SIGUSR1 shows a Python stracktrace which just points to the Qt mainloop; attaching gdb shows an unusable stacktrace. Attaching strace also shows that the main process seems to be running still, at least it reacts to keyboard and mouse inputs (but I can't get qutebrowser to quit via :wq FWIW).

No idea how to debug this...

@rcorre

This comment has been minimized.

Copy link
Collaborator Author

commented Mar 13, 2019

Downgraded xf86-video-intel and linux, and so far haven't had any hangs. If it keeps running smooth, I'll try to isolate between those two.

[2019-03-12 09:12] [ALPM] downgraded xf86-video-intel (1:2.99.917+860+g3a2dec17-1 -> 1:2.99.917+859+g33ee0c3b-1)
[2019-03-13 09:22] [ALPM] downgraded linux (5.0.arch1-1 -> 4.20.7.arch1-1)
@rcorre

This comment has been minimized.

Copy link
Collaborator Author

commented Mar 14, 2019

Ran all day yesterday with downgraded linux and xf86-video-intel without a single hang. I'm going to upgrade just linux today and see what happens.

@rcorre

This comment has been minimized.

Copy link
Collaborator Author

commented Mar 14, 2019

Ran a few minutes with downgraded xf86 and updated linux and almost immediately hit a hang. Now I've updated xf86 and downgraded linux and am running smooth again. I think I've isolated the issue to the 4.20.7.arch1-1 -> 5.0.arch1-1 kernel update

@The-Compiler

This comment has been minimized.

Copy link
Collaborator

commented Mar 19, 2019

Still seems to be an issue with Linux 5.0.2.

Interestingly, Chromium seemed to crash here as well (not sure if related or not), and when trying to start it while qutebrowser hanged, I got [27671:27671:0319/102748.267886:ERROR:sandbox_linux.cc(364)] InitializeSandbox() called with multiple threads in process gpu-process.

Looking at the qutebrowser kernel stack:

[root@hooch florian]# cat /proc/25249/stack
[<0>] poll_schedule_timeout.constprop.6+0x42/0x70
[<0>] do_sys_poll+0x498/0x530
[<0>] __se_sys_poll+0x2c/0x130
[<0>] do_syscall_64+0x5b/0x170
[<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[<0>] 0xffffffffffffffff

strace usually looks like this:

futex(0x5610d6561264, FUTEX_WAIT_PRIVATE, 0, NULL) = -1 EAGAIN (Resource temporarily unavailable)
futex(0x5610d0ec45b8, FUTEX_WAKE_PRIVATE, 1) = 0
futex(0x5610d6561260, FUTEX_WAIT_PRIVATE, 0, NULL) = 0
futex(0x5610d0ec45b8, FUTEX_WAKE_PRIVATE, 1) = 0
futex(0x5610d0ec45b8, FUTEX_WAKE_PRIVATE, 1) = 0
futex(0x5610d6561260, FUTEX_WAIT_PRIVATE, 0, NULL) = 0

or this:

futex(0x5610d0ec4608, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x5610d0ec45b8, FUTEX_WAKE_PRIVATE, 1) = 1
poll([{fd=3, events=POLLIN}], 1, -1)    = 1 ([{fd=3, revents=POLLIN}])
recvmsg(3, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="U\2#\31\2163\213\4\3\4\4\0\0\0\0\0\0\0\0\4\4\4\4\4\0\0\3\0372\3\0\0", iov_len=4096}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 32
futex(0x5610d0ec460c, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x5610d0ec45b8, FUTEX_WAKE_PRIVATE, 1) = 1
poll([{fd=3, events=POLLIN}], 1, -1)    = 1 ([{fd=3, revents=POLLIN}])
recvmsg(3, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="U\2#\31\2343\213\4\3\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\3\37%\3\0\0", iov_len=4096}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 32
futex(0x5610d0ec4608, FUTEX_WAKE_PRIVATE, 1) = 1
poll([{fd=3, events=POLLIN}], 1, -1)    = 1 ([{fd=3, revents=POLLIN}])
recvmsg(3, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="U\2#\31P8\213\4\3\4\4\0\0\0\0\0\0\0\0\4\4\4\4\4\0\0\3\37%\2\0\0", iov_len=4096}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 32
futex(0x5610d0ec460c, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x5610d0ec45b8, FUTEX_WAKE_PRIVATE, 1) = 1
poll([{fd=3, events=POLLIN}], 1, -1

fd3 which it seems to poll is some kind of socket:

lrwx------ 1 florian florian 64 Mar 19 10:40 /proc/25249/fd/3 -> 'socket:[1847936]'
$ lsof -p 25249 | grep 1847936
python3 25249 florian    3u     unix 0x00000000fa673305       0t0  1847936 type=STREAM

gdb stacktrace (not the same hang):

#0  0x00007f8b6ca42c21 in ?? ()
#1  0x00000000ffffffff in ?? ()
#2  0x00007fff6932eed8 in ?? ()
#3  0x000055d5b27077a0 in ?? ()
#4  0x00000000ffffffff in ?? ()
#5  0x00007fff6932eed8 in ?? ()
#6  0x00007f8b6b4f5630 in ?? ()
#7  0x00007fff6932ef68 in ?? ()
#8  0x0000000100000028 in ?? ()
#9  0x0000000000000001 in ?? ()
#10 0x0000000100000003 in ?? ()
#11 0x00007fff6932ef68 in ?? ()
#12 0x00007f8b322bb3e8 in ?? ()
#13 0x00007f8b6c94fe80 in ?? ()
#14 0x000055d5b2002c48 in ?? ()
#15 0x0000000000000001 in ?? ()
#16 0x00007f8b6c7548da in ?? ()
#17 0x0000000000000000 in ?? ()

No idea what of all this is relevant, and what's just a red herring.

@The-Compiler

This comment has been minimized.

Copy link
Collaborator

commented Mar 19, 2019

Most of those things (/proc/PID/stack, weird gdb output, strace output) look the same with a normally running process as well - so I think Qt's mainloop is running normally. It's weird that there's no response to SIGINT and inputs (I tried :wq) though...

@The-Compiler

This comment has been minimized.

Copy link
Collaborator

commented Mar 26, 2019

Hmm, I also see some hangs and other weirdness (like not resizing window contents) when maximizing windows lately.

I can reproduce a hang by doing the following:

@edio

This comment has been minimized.

Copy link

commented Apr 4, 2019

FWIW, I see freezes too and I don't have 4k display. The rest matches: 5.0.5.arch1-1, xf86-video-intel

@The-Compiler The-Compiler changed the title Complete periodic freezes with 4k and intel graphics Complete periodic freezes with intel graphics Apr 5, 2019

@The-Compiler

This comment has been minimized.

Copy link
Collaborator

commented Apr 5, 2019

Yeah, I agree the 4K display probably isn't relevant, I also see it with my internal screen only (2560x1440). Though it seems it happens more often when I also have two external (1920x1080) screens connected.

Last time this happened, mpv also freezed for me when fullscreened (but not completely - it just didn't render until un-fullscreening the window). Pretty sure this is some issue in the Intel drivers, but haven't been able to find out more yet.

@jgkamat

This comment has been minimized.

Copy link
Collaborator

commented Apr 10, 2019

@The-Compiler

This comment has been minimized.

Copy link
Collaborator

commented Apr 11, 2019

Just saw the exact same kind of freeze happen with Chromium. Definitely some issue with Chromium and the Intel Graphics driver from what it looks like...

@jgkamat Can you show glxinfo -B please?

@jgkamat

This comment has been minimized.

Copy link
Collaborator

commented Apr 11, 2019

@rcorre

This comment has been minimized.

Copy link
Collaborator Author

commented Apr 17, 2019

Updated to 5.0.7.arch1-1 and started encountering hangs again, it doesn't seem fixed for me.

name of display: :0
display: :0  screen: 0
direct rendering: Yes
Extended renderer info (GLX_MESA_query_renderer):
    Vendor: Intel Open Source Technology Center (0x8086)
    Device: Mesa DRI Intel(R) HD Graphics 630 (Kaby Lake GT2)  (0x591b)
    Version: 19.0.2
    Accelerated: yes
    Video memory: 3072MB
    Unified memory: yes
    Preferred profile: core (0x1)
    Max core profile version: 4.5
    Max compat profile version: 3.0
    Max GLES1 profile version: 1.1
    Max GLES[23] profile version: 3.2
OpenGL vendor string: Intel Open Source Technology Center
OpenGL renderer string: Mesa DRI Intel(R) HD Graphics 630 (Kaby Lake GT2) 
OpenGL core profile version string: 4.5 (Core Profile) Mesa 19.0.2
OpenGL core profile shading language version string: 4.50
OpenGL core profile context flags: (none)
OpenGL core profile profile mask: core profile

OpenGL version string: 3.0 Mesa 19.0.2
OpenGL shading language version string: 1.30
OpenGL context flags: (none)

OpenGL ES profile version string: OpenGL ES 3.2 Mesa 19.0.2
OpenGL ES profile shading language version string: OpenGL ES GLSL ES 3.20
@karasuma-chitose

This comment has been minimized.

Copy link

commented Apr 22, 2019

I'm on Linux 5.0.8, using qtwebengine 5.12.2, intel graphics and no issues.

Maybe try uninstalling xf86-video-intel.
DDX drivers are a mess and are pretty much deprecated in favor of KMS.
Most distros have moved away from them.

https://packages.debian.org/sid/x11/xserver-xorg-video-intel

The use of this driver is discouraged if your hw is new enough (ca. 2007 and newer). You can try uninstalling this driver and let the server use it's builtin modesetting driver instead.

@rcorre

This comment has been minimized.

Copy link
Collaborator Author

commented Apr 23, 2019

Interesting, there's a similar note in Archwiki, though it suggests that the modesetting driver can also cause issues like switching virtual desktops leaves artifacts from the previous desktop in Chromium. I'll give it a try though.

@The-Compiler

This comment has been minimized.

Copy link
Collaborator

commented Apr 23, 2019

In IRC, @craftyguy also said he didn't notice any problems on Arch with the modesetting driver. I'm trying out as well now. Had some little issues (monitor names changing, crazy slowdown when displaying a volume meter while watching a video), but nothing that'd be a show-stopper so far.

@rcorre

This comment has been minimized.

Copy link
Collaborator Author

commented Apr 23, 2019

No freezes for me either (removed xf86-video-intel, updated to 5.0.9.arch1-1), but there is a significant (1-2s) delay switching between desktops sometimes. Switching between two desktops with fullscreen st always seems fine, but switching to a desktop with qutebrowser will sometimes lag.

@craftyguy

This comment has been minimized.

Copy link
Contributor

commented Apr 23, 2019

@rcorre What DE? I see no lag at all switching between workspaces (with fullscreen qutebrowser, fullscreen Firefox, etc) with i3wm using modesetting. Perhaps your DE is trying to do fancy transitions or other things that depend heavily on 2D acceleration?

@karasuma-chitose

This comment has been minimized.

Copy link

commented Apr 23, 2019

Check your Xorg.0.log if it's using glamor and DRI3.
It should have something like

[    26.149] (II) Initializing extension DRI3

and

[    25.377] (**) modeset(0): Option "AccelMethod" "glamor"
[    25.377] (==) modeset(0): RGB weight 888
[    25.378] (==) modeset(0): Default visual is TrueColor
[    25.378] (II) Loading sub module "glamoregl"
[    25.378] (II) LoadModule: "glamoregl"
[    25.378] (II) Loading /usr/lib64/xorg/modules/libglamoregl.so
[    25.462] (II) Module glamoregl: vendor="X.Org Foundation"
[    25.462]    compiled for 1.20.3, module version = 1.0.1
[    25.462]    ABI class: X.Org ANSI C Emulation, version 0.4
[    25.781] (II) modeset(0): glamor X acceleration enabled on Mesa DRI Intel(R) Sandybridge Mobile 
[    25.781] (II) modeset(0): glamor initialized

If not, create a new file in /etc/X11/xorg.conf.d/ called 20-modesetting.conf or whatever and add this:

Section "Device"
	Identifier 	"Intel Graphics"
	Driver		"modesetting"
	Option		"AccelMethod" "glamor"
	Option		"DRI"	"3"
EndSection
@rcorre

This comment has been minimized.

Copy link
Collaborator Author

commented Apr 23, 2019

dwm. It is about as un-fancy as it gets 😁.
The lag seems to be isolated to specific sites. Switching back and forth to most sites is fine, but the Webex Teams webapp (something I have to use for work) takes a while. Other apps, like the Spotify desktop app, have a similar slowness, so this isn't a specific qutebrowser problem.

@rcorre

This comment has been minimized.

Copy link
Collaborator Author

commented Apr 23, 2019

[   141.038] (==) modeset(0): Depth 24, (==) framebuffer bpp 32
[   141.038] (==) modeset(0): RGB weight 888
[   141.038] (==) modeset(0): Default visual is TrueColor
[   141.038] (II) Loading sub module "glamoregl"
[   141.038] (II) LoadModule: "glamoregl"
[   141.038] (II) Loading /usr/lib/xorg/modules/libglamoregl.so
[   141.045] (II) Module glamoregl: vendor="X.Org Foundation"
[   141.045]    compiled for 1.20.4, module version = 1.0.1
[   141.045]    ABI class: X.Org ANSI C Emulation, version 0.4
[   141.110] (II) modeset(0): glamor X acceleration enabled on Mesa DRI Intel(R) HD Graphics 630 (Kaby Lake GT2)
[   141.110] (II) modeset(0): glamor initialized

but no modeset(0): Option "AccelMethod" "glamor".

I can try the xorg conf later.

@The-Compiler

This comment has been minimized.

Copy link
Collaborator

commented Apr 23, 2019

With glamor and DRI3 here, using herbstluftwm. Switching to a workspace with qutebrowser fullscreen on my 4K monitor takes around 1s here...

I can see qutebrowser in the resolution of my laptop screen first, then some weird tearing with the console window behind it, then it resizes, then the webpage adjusts to the new size.

Definitely not ideal, but preferable to the freezes I guess.

@craftyguy

This comment has been minimized.

Copy link
Contributor

commented Apr 23, 2019

Using a compositor (e.g. compton) should help with the tearing issues.

@rcorre

This comment has been minimized.

Copy link
Collaborator Author

commented Apr 24, 2019

Using @karasuma-chitose's conf suggestion:

[rodencor@karminac ~]$ grep AccelMethod /var/log/Xorg.0.log
[    11.833] (**) modeset(0): Option "AccelMethod" "glamor"

But the same lag when switching desktops.

@rcorre

This comment has been minimized.

Copy link
Collaborator Author

commented Apr 26, 2019

I've also noticed that qutebrowser becomes extremely laggy when I have more than one qutebrowser window open. The CPU usage is low, but the UI becomes sluggish as soon as I open a second window and returns to normal after closing the second window.

@bitraid

This comment has been minimized.

Copy link
Contributor

commented Apr 26, 2019

qutebrowser becomes extremely laggy when I have more than one qutebrowser window open

I wonder if it has something to do with #4692 on wayland (which also uses kms). It should be fixed in QT 5.12.4 though.

FWIW, i haven't experienced any freezes on (more than one pc with) intel graphics under wayland, for all this time.

@The-Compiler

This comment has been minimized.

Copy link
Collaborator

commented May 9, 2019

Indeed there are no hangs with the modesetting driver (i.e. when removing xf86-video-intel). However, due to various problems (lag, external monitor not working properly, etc.) I'm back on xf86-video-intel and trying to figure out what caused it exactly.

Like @rcorre said, it seems to be connected to some linux upgrade:

  • 4.20.7.arch1-1: no hangs
  • 4.20.13.arch1-1: no hangs
  • 5.0.10.arch1-1: hangs
  • linux-git (v5.1-10240-g63863ee8e2f6): currently testing
@The-Compiler

This comment has been minimized.

Copy link
Collaborator

commented May 15, 2019

Hmm, running linux-git (5.1.r10240.g63863ee8e2f6-1) for two days now, and no freeze yet. Maybe this got fixed, or maybe some config change doesn't trigger it?

@The-Compiler

This comment has been minimized.

Copy link
Collaborator

commented May 16, 2019

Argh, I did get a freeze today. I also got a filesystem corruption in ext4 (at least fsck seemed to fix it)...

Going back to the linux package, this is no fun. I planned to bisect it between 4.20.3 and 5.0, but seeing that it only happened once in three days this time, I doubt I can bisect it correctly...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.