Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kitty freezes under wayland #1722

Closed
joshuarubin opened this issue Jun 17, 2019 · 32 comments
Closed

kitty freezes under wayland #1722

joshuarubin opened this issue Jun 17, 2019 · 32 comments

Comments

@joshuarubin
Copy link

I started mentioning this issue in #1681 but it seems to be a separate issue. kitty's been freezing up, requiring a kill -9, very frequently since version v0.14.0. I've tested using
721beb9 and it is still present.

I've had to pin my package manager to v0.13.3 as this is a very significant issue that greatly impedes my work.

I'm using sway and runnning kitty using wayland. It seems to happen when kitty loses focus (or immediately when it regains it, not sure which).

Happy to help troubleshoot. Thanks for such a great tool!

@kovidgoyal
Copy link
Owner

I'm going to need some way to reproduce it, since I cannot make it happen in sway regardless how much switching I do between kitty and other applications.

@joshuarubin
Copy link
Author

I understand. Is there any debug log or trace I can get you that would help start narrowing it down?

@joshuarubin
Copy link
Author

I have KITTY_ENABLE_WAYLAND=1 and here is my config (it's templated, but only for one font_size)

@kovidgoyal
Copy link
Owner

KITTY_ENABLE_WAYLAND is not used anymore. There is an option in
kitty.conf to control it and it will default to wayland under sway.
Set sync_to_monitor no in kitty.conf.

Post the output of --debug-config

and build kitty with

make debug-event-loop

and run it, it will print out event loop stages, so when it freezes we
will now what stage it froze at.

@kovidgoyal
Copy link
Owner

Also try running with XWayland (set linux_display_server to X11 in kitty.conf) does the freeze happen then also?

@joshuarubin
Copy link
Author

$ python3 . --debug-config
kitty 0.14.2 (603533e632) created by Kovid Goyal
Linux balerion 5.1.9-arch1-1-ARCH #1 SMP PREEMPT Tue Jun 11 16:18:09 UTC 2019 x86_64
Arch Linux \r (\l)
Loaded config files: /home/jrubin/.config/kitty/kitty.conf
Running under: Wayland

Config options different from defaults:
active_border_color                Color(red=108, green=122, blue=128)
active_tab_background              Color(red=197, green=200, blue=198)
active_tab_foreground              Color(red=45, green=60, blue=70)
allow_remote_control               True
background                         Color(red=35, green=44, blue=49)
bell_border_color                  Color(red=204, green=102, blue=102)
color0                             Color(red=45, green=60, blue=70)
color1                             Color(red=165, green=66, blue=66)
color10                            Color(red=181, green=189, blue=103)
color11                            Color(red=240, green=198, blue=116)
color12                            Color(red=129, green=162, blue=190)
color13                            Color(red=178, green=148, blue=186)
color14                            Color(red=138, green=190, blue=183)
color15                            Color(red=197, green=200, blue=198)
color2                             Color(red=140, green=148, blue=64)
color3                             Color(red=222, green=147, blue=95)
color4                             Color(red=95, green=129, blue=157)
color5                             Color(red=133, green=103, blue=143)
color6                             Color(red=94, green=141, blue=135)
color7                             Color(red=108, green=122, blue=128)
color8                             Color(red=66, green=80, blue=89)
color9                             Color(red=204, green=102, blue=102)
copy_on_select                     clipboard
cursor                             Color(red=2, green=253, blue=255)
cursor_blink_interval              0.5
enabled_layouts                    ['horizontal', 'tall', 'fat', 'grid', 'vertical', 'stack']
font_family                        SFMono
font_size                          10.0
foreground                         Color(red=197, green=200, blue=198)
inactive_border_color              Color(red=66, green=80, blue=89)
inactive_tab_background            Color(red=108, green=122, blue=128)
inactive_tab_foreground            Color(red=66, green=80, blue=89)
kitty_mod                          8
macos_option_as_alt                3
macos_quit_when_last_window_closed True
scrollback_pager_history_size      16777216
selection_background               Color(red=197, green=200, blue=198)
selection_foreground               Color(red=35, green=44, blue=49)
symbol_map                         {(57344, 65535): 'SFMono Nerd Font'}
sync_to_monitor                    False
tab_bar_edge                       1
tab_separator                       │
url_color                          Color(red=2, green=253, blue=255)
Added shortcuts:
	 shift+control+h KeyAction(func='send_text', args=['all', b'\x01H'])
	 shift+control+j KeyAction(func='send_text', args=['all', b'\x01J'])
	 shift+control+k KeyAction(func='send_text', args=['all', b'\x01K'])
	 shift+control+l KeyAction(func='send_text', args=['all', b'\x01L'])
	 super+d KeyAction(func='scroll_page_down', args=())
	 super+i KeyAction(func='kitten', args=['unicode_input'])
	 super+z KeyAction(func='show_scrollback', args=())
	 super+tab KeyAction(func='next_tab', args=())
	 shift+super+comma KeyAction(func='move_tab_backward', args=())
	 shift+super+period KeyAction(func='move_tab_forward', args=())
	 shift+super+w KeyAction(func='close_window', args=())
	 shift+super+tab KeyAction(func='previous_tab', args=())
	 control+super+f KeyAction(func='toggle_fullscreen', args=())
Changed shortcuts:
	 super+minus KeyAction(func='change_font_size', args=[True, '-', 1.0])
	 super+0 KeyAction(func='change_font_size', args=[True, None, 0.0])
	 super+1 KeyAction(func='goto_tab', args=(1,))
	 super+2 KeyAction(func='goto_tab', args=(2,))
	 super+3 KeyAction(func='goto_tab', args=(3,))
	 super+4 KeyAction(func='goto_tab', args=(4,))
	 super+5 KeyAction(func='goto_tab', args=(5,))
	 super+6 KeyAction(func='goto_tab', args=(6,))
	 super+7 KeyAction(func='goto_tab', args=(7,))
	 super+8 KeyAction(func='goto_tab', args=(8,))
	 super+9 KeyAction(func='goto_tab', args=(9,))
	 super+equal KeyAction(func='change_font_size', args=[True, '+', 1.0])
	 super+s KeyAction(func='kitty_shell', args=['window'])
	 super+t KeyAction(func='new_tab_with_cwd', args=())
	 super+u KeyAction(func='scroll_page_up', args=())
	 super+w KeyAction(func='close_tab', args=())
	 super+enter KeyAction(func='new_window_with_cwd', args=())

When running after debug-event-loop (with python3 .) all I see is lots of (it seems that the event loop debugging is only useful for x11):

[26.1760] loop tick
[26.1861] loop tick
[26.1962] loop tick
[26.2062] loop tick

Until it just stops.

When running with KITTY_DISABLE_WAYLAND=1 I haven't seen it hang yet. I'll update if I do.

@joshuarubin
Copy link
Author

I put the log output here if you want to see.

I've been able to pretty reliably reproduce it by running $ xeyes moving the cursor to the xeyes window and back to the kitty window a few times. I see a bunch of empty lines printed to the console and then it just stops.

When running under x11 mode this does not cause a hang.

@joshuarubin
Copy link
Author

joshuarubin commented Jun 18, 2019

Also, it seems that setting sync_to_monitor yes fixes the issue, or perhaps just makes the issue much less frequent.

@kovidgoyal
Copy link
Owner

So it only happens with sync_to_monitor no?

@joshuarubin
Copy link
Author

joshuarubin commented Jun 18, 2019

I can’t say definitively that it doesn’t happen, but I haven’t been able to reproduce it yet with sync_to_monitor yes

@kovidgoyal
Copy link
Owner

Run with it for a couple of days, and if you can confirm it, it will
help me debug.

@joshuarubin
Copy link
Author

You got it. Thanks!

@kovidgoyal
Copy link
Owner

And just as a general note, sync_to_monitor yes is the default and recommended setting, only use no if you have some overriding reason.

@kovidgoyal
Copy link
Owner

And did it work?

@kovidgoyal
Copy link
Owner

I tried running kitty under sway as

kitty -o sync_to_monitor=n

and spent 5 mins moving the mouse back and forth between xeyes and kitty, no crashes/freezes.

Since I cannot reproduce, and I think your issue is solved by using the default sync_to_monitor setitng, am closing.

@joshuarubin
Copy link
Author

I have not had issues since setting sync_to_monitor yes.

Having a terminal that could freeze up depending on a configuration change sure seems like a bug to me. If you can instrument the wayland pipeline better I'd be happy to get more log information.

It's definitely a regression since v0.13.x and would be nice to address.

@untoreh
Copy link

untoreh commented Jul 17, 2019

On my end sync_to_monitor yes makes it happen much less, but does not solve it

@kovidgoyal
Copy link
Owner

Then I need some way to reproduce it. Minimal sway/kitty configs, set of
steps, etc.

@kovidgoyal
Copy link
Owner

Also note that there have been lots of improvements/fixes to the event loop in master so try to reproduce there if possible.

@untoreh
Copy link

untoreh commented Jul 20, 2019

I tried to run from master froze once. Not sure if it helps but it seems to happen under high cpu load, while switching workspaces. (I also am using 120Hz)

@kovidgoyal
Copy link
Owner

Build kitty with make debug-event-loop and post the output from running kitty when it freezes, that should tell use where in the loop it is freezing. You can also use strace -f to see where it s freezing.

@untoreh
Copy link

untoreh commented Jul 21, 2019

mmh hope this helps
kitty.log.zip

@kovidgoyal
Copy link
Owner

That tells me the hang is happening in process_global_state() stick some
prints in there to see which call exactly is hanging. It is probably the
call to render() in which case stick some prints in there and see where
it is hanging.

@untoreh
Copy link

untoreh commented Jul 22, 2019

followed it up until swap_window_buffers

@kovidgoyal
Copy link
Owner

Then the bug is lower down in the stack than kitty. Probably in
sway/mesa/GPU drivers. swap_window_buffers() ends up calling
eglSwapBuffers() in glfw/egl_context.c

eglSwapBuffers() is what is blocking for you. kitty earlier sets the swap
interval to 0 which is supposed to make eglSwapBuffers() non-blocking.
Apparently, on your machine that is not happening.

You can confirm my diagnosis by putting in prints in egl_context.c One in
swapIntervalEGL() to confirm that kitty is setting it to 0 and a couple
in swapBuffersEGL() to confirm that it is buffer swapping that is
hanging.

If you are interested, you can read more about bffer swapping and render
frames in Wayland at https://emersion.fr/blog/2018/wayland-rendering-loop/

@ghost
Copy link

ghost commented Dec 29, 2019

Im able to reproduce this bug:

open two kitty single-instance windows in a single workspace, then send the second window to sway scratchpad, then they all freeze. Works every time. The bug doesnt occur with non-single-instance terms. Sway v 1.2, kitty v 0.14.6

edit: the bug always occur with 'sync_to_monitor no' - without that it happens unreliably, but still happens.

@kovidgoyal
Copy link
Owner

kovidgoyal commented Dec 29, 2019

I cannot replicate, steps I tried:

  1. kitty -1
  2. kitty -1
  3. mod+shift+minus to move window from (2) into scratchpad
  4. focus the window from (1) and type in it
  5. typing worked, without any freezes

@lnikkila
Copy link

I believe I'm also experiencing this, but I haven't been able to narrow down a sequence of events that would reproduce this reliably, the steps above don't work for me. Using --single-instance this happens every few hours, with individual instances I can't get this to happen at all.

What's also making things more difficult to diagnose is that I'm running a VMware VM, although I do remember this being a problem before on real hardware, with an Intel GPU.

Nevertheless I think I've managed to confirm that this is indeed happening on a lower level then Kitty. After a freeze, I attached gdb to the running process using gdb -p <pid> and pulled the backtrace with bt. The trace shows that Kitty's waiting on eglSwapBuffers as suspected above, which in turn is waiting on a futex in wl_display_dispatch_queue:

Backtrace
#0  0x00007f4a2d61eefc in futex_wait_cancelable (private=, expected=0, 
    futex_word=0x1586a78) at ../sysdeps/unix/sysv/linux/futex-internal.h:88
#1  __pthread_cond_wait_common (abstime=0x0, mutex=0x1586a20, cond=0x1586a50)
    at pthread_cond_wait.c:502
#2  __pthread_cond_wait (cond=0x1586a50, mutex=0x1586a20) at pthread_cond_wait.c:655
#3  0x00007f4a2b672823 in wl_display_read_events ()
   from /gnu/store/q3piqbkk5f55w46rh494h00p3a6w7wks-wayland-1.17.0/lib/libwayland-client.so.0
#4  0x00007f4a2b672fb9 in wl_display_dispatch_queue ()
   from /gnu/store/q3piqbkk5f55w46rh494h00p3a6w7wks-wayland-1.17.0/lib/libwayland-client.so.0
#5  0x00007f4a2b48bd23 in dri2_wl_swap_buffers ()
   from /gnu/store/9rrx3scjqcm8zn2d4c74c3v9gkl4q47n-mesa-19.2.7/lib/libEGL.so.1
#6  0x00007f4a2b47f2aa in dri2_swap_buffers ()
   from /gnu/store/9rrx3scjqcm8zn2d4c74c3v9gkl4q47n-mesa-19.2.7/lib/libEGL.so.1
#7  0x00007f4a2b472639 in eglSwapBuffers ()
   from /gnu/store/9rrx3scjqcm8zn2d4c74c3v9gkl4q47n-mesa-19.2.7/lib/libEGL.so.1
#8  0x00007f4a2c33c2aa in render ()
   from /gnu/store/ck26pip2c9z0wimijh30qkmlg5q8mx9d-kitty-0.14.6/bin/../lib/kitty/kitty/fast_data_types.so
#9  0x00007f4a2c33e55f in process_global_state ()
   from /gnu/store/ck26pip2c9z0wimijh30qkmlg5q8mx9d-kitty-0.14.6/bin/../lib/kitty/kitty/fast_data_types.so
#10 0x00007f4a2b72d6c7 in dispatchTimers.part.4.constprop.47 ()
   from /gnu/store/ck26pip2c9z0wimijh30qkmlg5q8mx9d-kitty-0.14.6/lib/kitty/kitty/glfw-wayland.so
#11 0x00007f4a2b723617 in glfwRunMainLoop ()
   from /gnu/store/ck26pip2c9z0wimijh30qkmlg5q8mx9d-kitty-0.14.6/lib/kitty/kitty/glfw-wayland.so
#12 0x00007f4a2c33803a in main_loop.lto_priv ()
   from /gnu/store/ck26pip2c9z0wimijh30qkmlg5q8mx9d-kitty-0.14.6/bin/../lib/kitty/kitty/fast_data_types.so
#13 0x00007f4a2d1f38aa in _PyMethodDef_RawFastCallKeywords ()
   from /gnu/store/608bvypsh90c58apvd2cgg3m9l2pwjqn-python-3.7.4/lib/libpython3.7m.so.1.0
#14 0x00007f4a2d1fc0fa in _PyMethodDescr_FastCallKeywords ()
   from /gnu/store/608bvypsh90c58apvd2cgg3m9l2pwjqn-python-3.7.4/lib/libpython3.7m.so.1.0
#15 0x00007f4a2d1cbc07 in _PyEval_EvalFrameDefault ()
   from /gnu/store/608bvypsh90c58apvd2cgg3m9l2pwjqn-python-3.7.4/lib/libpython3.7m.so.1.0
#16 0x00007f4a2d2ddc1e in _PyEval_EvalCodeWithName ()
   from /gnu/store/608bvypsh90c58apvd2cgg3m9l2pwjqn-python-3.7.4/lib/libpython3.7m.so.1.0
#17 0x00007f4a2d1f348f in _PyFunction_FastCallKeywords ()
   from /gnu/store/608bvypsh90c58apvd2cgg3m9l2pwjqn-python-3.7.4/lib/libpython3.7m.so.1.0
#18 0x00007f4a2d1c9b17 in _PyEval_EvalFrameDefault ()
   from /gnu/store/608bvypsh90c58apvd2cgg3m9l2pwjqn-python-3.7.4/lib/libpython3.7m.so.1.0
#19 0x00007f4a2d2ddc1e in _PyEval_EvalCodeWithName ()
   from /gnu/store/608bvypsh90c58apvd2cgg3m9l2pwjqn-python-3.7.4/lib/libpython3.7m.so.1.0
#20 0x00007f4a2d1f348f in _PyFunction_FastCallKeywords ()
   from /gnu/store/608bvypsh90c58apvd2cgg3m9l2pwjqn-python-3.7.4/lib/libpython3.7m.so.1.0
#21 0x00007f4a2d1c9b17 in _PyEval_EvalFrameDefault ()
   from /gnu/store/608bvypsh90c58apvd2cgg3m9l2pwjqn-python-3.7.4/lib/libpython3.7m.so.1.0
#22 0x00007f4a2d1c215f in function_code_fastcall ()
   from /gnu/store/608bvypsh90c58apvd2cgg3m9l2pwjqn-python-3.7.4/lib/libpython3.7m.so.1.0
#23 0x00007f4a2d1c9b17 in _PyEval_EvalFrameDefault ()
   from /gnu/store/608bvypsh90c58apvd2cgg3m9l2pwjqn-python-3.7.4/lib/libpython3.7m.so.1.0
#24 0x00007f4a2d1c215f in function_code_fastcall ()
   from /gnu/store/608bvypsh90c58apvd2cgg3m9l2pwjqn-python-3.7.4/lib/libpython3.7m.so.1.0
#25 0x00007f4a2d1c9b17 in _PyEval_EvalFrameDefault ()
   from /gnu/store/608bvypsh90c58apvd2cgg3m9l2pwjqn-python-3.7.4/lib/libpython3.7m.so.1.0
#26 0x00007f4a2d1c215f in function_code_fastcall ()
   from /gnu/store/608bvypsh90c58apvd2cgg3m9l2pwjqn-python-3.7.4/lib/libpython3.7m.so.1.0
#27 0x00007f4a2d1c9b17 in _PyEval_EvalFrameDefault ()
   from /gnu/store/608bvypsh90c58apvd2cgg3m9l2pwjqn-python-3.7.4/lib/libpython3.7m.so.1.0
#28 0x00007f4a2d2ddc1e in _PyEval_EvalCodeWithName ()
   from /gnu/store/608bvypsh90c58apvd2cgg3m9l2pwjqn-python-3.7.4/lib/libpython3.7m.so.1.0
#29 0x00007f4a2d2ddcfe in PyEval_EvalCodeEx ()
   from /gnu/store/608bvypsh90c58apvd2cgg3m9l2pwjqn-python-3.7.4/lib/libpython3.7m.so.1.0
#30 0x00007f4a2d2ddd2b in PyEval_EvalCode ()
   from /gnu/store/608bvypsh90c58apvd2cgg3m9l2pwjqn-python-3.7.4/lib/libpython3.7m.so.1.0
#31 0x00007f4a2d2db01d in builtin_exec ()
   from /gnu/store/608bvypsh90c58apvd2cgg3m9l2pwjqn-python-3.7.4/lib/libpython3.7m.so.1.0
#32 0x00007f4a2d1f38dd in _PyMethodDef_RawFastCallKeywords ()
   from /gnu/store/608bvypsh90c58apvd2cgg3m9l2pwjqn-python-3.7.4/lib/libpython3.7m.so.1.0
#33 0x00007f4a2d1f39c5 in _PyCFunction_FastCallKeywords ()
   from /gnu/store/608bvypsh90c58apvd2cgg3m9l2pwjqn-python-3.7.4/lib/libpython3.7m.so.1.0
#34 0x00007f4a2d1cb87b in _PyEval_EvalFrameDefault ()
   from /gnu/store/608bvypsh90c58apvd2cgg3m9l2pwjqn-python-3.7.4/lib/libpython3.7m.so.1.0
#35 0x00007f4a2d2ddc1e in _PyEval_EvalCodeWithName ()
   from /gnu/store/608bvypsh90c58apvd2cgg3m9l2pwjqn-python-3.7.4/lib/libpython3.7m.so.1.0
#36 0x00007f4a2d1f348f in _PyFunction_FastCallKeywords ()
   from /gnu/store/608bvypsh90c58apvd2cgg3m9l2pwjqn-python-3.7.4/lib/libpython3.7m.so.1.0
#37 0x00007f4a2d1c9b17 in _PyEval_EvalFrameDefault ()
   from /gnu/store/608bvypsh90c58apvd2cgg3m9l2pwjqn-python-3.7.4/lib/libpython3.7m.so.1.0
#38 0x00007f4a2d2ddc1e in _PyEval_EvalCodeWithName ()
   from /gnu/store/608bvypsh90c58apvd2cgg3m9l2pwjqn-python-3.7.4/lib/libpython3.7m.so.1.0
#39 0x00007f4a2d1f332f in _PyFunction_FastCallDict ()
   from /gnu/store/608bvypsh90c58apvd2cgg3m9l2pwjqn-python-3.7.4/lib/libpython3.7m.so.1.0
#40 0x00007f4a2d3307cd in pymain_run_module ()
   from /gnu/store/608bvypsh90c58apvd2cgg3m9l2pwjqn-python-3.7.4/lib/libpython3.7m.so.1.0
#41 0x00007f4a2d3367fa in pymain_main ()
   from /gnu/store/608bvypsh90c58apvd2cgg3m9l2pwjqn-python-3.7.4/lib/libpython3.7m.so.1.0
#42 0x00007f4a2d336947 in Py_Main ()
   from /gnu/store/608bvypsh90c58apvd2cgg3m9l2pwjqn-python-3.7.4/lib/libpython3.7m.so.1.0
#43 0x00000000004012cc in main ()

Now this is way outside my area of expertise, but I looked up wl_display_dispatch_queue and I found this section interesting:

A user created queue is dispatched with wl_display_dispatch_queue(). If there are no events to dispatch this function will block. If this is called by the main thread, this will attempt to read data from the display fd and queue any events on the appropriate queues. If calling from any other thread, the function will block until the main thread queues an event on the queue being dispatched.

A real world example of event queue usage is Mesa's implementation of eglSwapBuffers() for the Wayland platform. This function might need to block until a frame callback is received, but dispatching the main queue could cause an event handler on the client to start drawing again. This problem is solved using another event queue, so that only the events handled by the EGL code are dispatched during the block.

This creates a problem where the main thread dispatches a non-main queue, reading all the data from the display fd. If the application would call poll(2) after that it would block, even though there might be events queued on the main queue. Those events should be dispatched with wl_display_dispatch_pending() before flushing and blocking.

I don't think I'm capable enough to figure out on my own which of these scenarios is happening and whether this is due to Mesa's internals or some incorrect sequence of calls from Kitty, but maybe this is at least a little helpful.

Here's some other info from my environment:

kitty --debug-config
kitty 0.14.6 created by Kovid Goyal
Linux <snip> 5.4.8-gnu #1 SMP 1 x86_64

Loaded config files: /home/<snip>/.config/kitty/kitty.conf
Running under: Wayland

Config options different from defaults:
active_tab_background Color(red=0, green=0, blue=0)
active_tab_foreground Color(red=238, green=238, blue=238)
adjust_line_height 2
color1 Color(red=204, green=0, blue=0)
color10 Color(red=138, green=226, blue=52)
color11 Color(red=252, green=233, blue=79)
color12 Color(red=115, green=159, blue=207)
color13 Color(red=173, green=127, blue=168)
color14 Color(red=52, green=226, blue=226)
color15 Color(red=238, green=238, blue=236)
color2 Color(red=78, green=154, blue=6)
color3 Color(red=196, green=160, blue=0)
color4 Color(red=52, green=101, blue=164)
color5 Color(red=117, green=80, blue=123)
color6 Color(red=6, green=152, blue=154)
color7 Color(red=211, green=215, blue=207)
color8 Color(red=85, green=87, blue=83)
color9 Color(red=239, green=41, blue=41)
cursor Color(red=255, green=255, blue=255)
cursor_blink_interval 0.0
cursor_text_color None
enable_audio_bell False
focus_follows_mouse True
font_size 10.0
foreground Color(red=238, green=238, blue=238)
inactive_tab_background Color(red=0, green=0, blue=0)
inactive_tab_foreground Color(red=102, green=102, blue=102)
inactive_text_alpha 0.7
remember_window_size False
resize_draw_strategy 3
scrollback_lines 2048
scrollback_pager_history_size 16777216
strip_trailing_spaces smart
tab_bar_edge 1
tab_bar_margin_width 4.0
tab_bar_style separator
tab_separator   
tab_title_template {index}:{title}
touch_scroll_multiplier 5.0
update_check_interval 0.0
url_style 1
window_padding_width 4.0

glxinfo -B
name of display: :0
display: :0  screen: 0
direct rendering: Yes
Extended renderer info (GLX_MESA_query_renderer):
    Vendor: VMware, Inc. (0x15ad)
    Device: SVGA3D; build: RELEASE;  LLVM; (0x405)
    Version: 19.2.7
    Accelerated: no
    Video memory: 1MB
    Unified memory: no
    Preferred profile: core (0x1)
    Max core profile version: 3.3
    Max compat profile version: 3.3
    Max GLES1 profile version: 1.1
    Max GLES[23] profile version: 2.0
OpenGL vendor string: VMware, Inc.
OpenGL renderer string: SVGA3D; build: RELEASE;  LLVM;
OpenGL core profile version string: 3.3 (Core Profile) Mesa 19.2.7
OpenGL core profile shading language version string: 3.30
OpenGL core profile context flags: (none)
OpenGL core profile profile mask: core profile

OpenGL version string: 3.3 (Compatibility Profile) Mesa 19.2.7
OpenGL shading language version string: 3.30
OpenGL context flags: (none)
OpenGL profile mask: compatibility profile

OpenGL ES profile version string: OpenGL ES 2.0 Mesa 19.2.7
OpenGL ES profile shading language version string: OpenGL ES GLSL ES 1.0.16

Version info
kitty 0.14.6
sway 1.2
mesa 19.2.7
wlroots 0.7.0
wayland 1.17.0
wayland-protocols 1.18

Please let me know if I can provide anything else!

@kovidgoyal
Copy link
Owner

As I said in my comment above, as per the docs of eglSwapBuffers() it is not supposed to block when swapinterval is zero, that it does under wayland seems like a bug in it. There is no sequence of calls involved, as far as I know, you can call it at anytime and it should swap the buffers., without blocking when swapinterval is set to zero, which it is. kitty actually waits for render frames from the compositor and only then calls swapbuffers, so it sets swap interval to zero.

@kennylevinsen
Copy link
Contributor

kennylevinsen commented Jan 14, 2020

This issue was linked to in #sway. I took a brief look at @lnikkila's backtrace, and this is my two cents on the issue:

In mesa, the event queue is being dispatched, as the buffer swap is waiting for a "throttle callback" to become NULL. This callback is either a frame callback (if swap interval > 0) or a sync callback (if swap interval == 0). It just clears itself to indicate that it has run. As it appears that swap interval is set correctly in kitty, it is most likely just a sync callback (i.e. just an ACK from the compositor that it reached this point in its own event queue, and not a wait for when frames should be scheduled).

What is interesting is the hang in the queue dispatch. Due to there being multiple active readers (caused by multiple calls to wl_display_prepare_read), this read has ended up as a "slave", waiting for a condition for the other readers to finish. N calls to wl_display_prepare_read will require N calls to wl_display_read_events or wl_display_cancel_read to wake up your reader.

If you are stuck here indefinitely, it suggests that another active reader never executed nor cancelled its read.

One other reader I could find that might be the culprit is in the glfw code. Looking at https://github.com/kovidgoyal/kitty/blob/master/glfw/wl_window.c#L758, one thing to note is that abortOnFatalError is called before wl_display_cancel_read. This mean that in the event of an error, a parallel eglSwapBuffers will hang at least until abortOnFatalError finishes. It appears that abortOnFatalError can end up doing things like writing to pipes, so it could end up blocking. wl_display_cancel_read should be the first thing called after an error has occurred to wake all readers and make them aware of the error.

This may be the cause of the hang. For those that can reproduce this hang (@lnikkila?), I'd suggest swapping the cancel and abort calls around and see if it fixes the issue. For further debugging, I suggest running kitty with WAYLAND_DEBUG=1 set while either redirecting stderr to a file, or while having started kitty from another terminal so that the output can be monitored. Also, if you attach gdb, you should be able to get the backtrace from other threads, in case the hang is caused by a deadlock.

That's all I got from a quick look at the backtrace. Good luck with the issue. :)

@kovidgoyal
Copy link
Owner

I can see no issues with swapping those calls, so I have done that.

@kennylevinsen
Copy link
Contributor

I opened PR #2282 to cover a missed wl_display_cancel_read. It also covers another correctness issue with wl_display_cancel_read being called when it shouldn't be.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants