Skip to content
This repository has been archived by the owner on Nov 9, 2022. It is now read-only.

Wayland / Xorg compositor segfault #192

Closed
TRPB opened this issue Jan 5, 2020 · 25 comments
Closed

Wayland / Xorg compositor segfault #192

TRPB opened this issue Jan 5, 2020 · 25 comments

Comments

@TRPB
Copy link

TRPB commented Jan 5, 2020

I've found the cause of the #191 and implemented a workaround:

TRPB@f69bdfc

For some reason, running this.movingEnded.emit(); more than once per frame causes the segfault.

Do you want me to send a PR for these wayland workarounds to master?

My complete wayland changelog:

  1. krunner and screeen locker window classes to ignored.js (This should probably be in master)

  2. Changed the geometry timer in main.qml to 200ms. Any lower and some wayland windows get a
    strange infinite loop where they keep being resized between their tile size and original size. I had thought that this would be a similar once per frame issue but any lower than about ~80 causes it to happen often. And ~100 causes it to happen infrequently.

  3. The workaround above

  4. Removed the compositor check in tilelist.js clientAdded: TRPB@1fd493b Without this, Wayland windows are not tracked by the script.

@faho
Copy link
Collaborator

faho commented Jan 5, 2020

  1. krunner and screeen locker window classes to ignored.js (This should probably be in master)

Definitely

  1. Changed the geometry timer in main.qml to 200ms. Any lower and some wayland windows get a
    strange infinite loop where they keep being resized between their tile size and original size.

200ms seems quite high. Does the script still work okay with that delay?

  1. The workaround above

Running it twice per frame is probably too much anyway.

  1. Removed the compositor check in tilelist.js clientAdded: TRPB/kwin-tiling@1fd493b Without this, Wayland windows are not tracked by the script.

That was a workaround for uncomposited KWin. It's probably outdated now and can be removed.

So 1,3 and 4 can go in, number 2 seems dubious.


Personally I feel KWin Wayland is still not ready yet, so I'm not using it. That means I won't really notice if anything's broken on wayland.

I have noticed KWin being more crashy in general after a recent upgrade tho.

@TRPB
Copy link
Author

TRPB commented Jan 5, 2020

200ms seems quite high. Does the script still work okay with that delay?

It seems fine. There is a 0.2s delay in places. Even this isn't high enough really. Some window still flicker 3 or 4 times before they appear. Konsole seems to be the worst culprit for some reason. I also only noticed this after a recent update, it was fine for months. I'll try lowering the value to see where the cut off point is. I had hoped that 1 frame would be enough, but it's not. I wonder if an animation or something takes 0.2s which affects the way the script sees the geometry.

That was a workaround for uncomposited KWin. It's probably outdated now and can be removed.

The upstream fix to kwin meant that clientAdded now gets triggered correctly. However, either if (options.useCompositing) { evaluates to false or client.windowShown.connect(function(client) { is never triggered so this change is still required.

@TRPB
Copy link
Author

TRPB commented Jan 14, 2020

I think I have some additional information but need to do further tests:

I was getting random segfaults when opening new windows. I changed clientAdded back to clientActivated (which I was using until clientAdded was fixed upstream) and haven't had the issue since. Though it was fairly intermittent, I'll need to run it with clientActivated for a few days and see if I get a segfault.

After making the change, I also haven't had the weird flicker effect so when using clientActivated I can put the timer back to 1 without problems.

Is there a way the scripting engine can detect whether we're on Wayland or X? Because the activated/added change will need an if (wayland) {} else {}.

@faho
Copy link
Collaborator

faho commented Jan 14, 2020

Is there a way the scripting engine can detect whether we're on Wayland or X?

The only one I know of is that client.basicUnit doesn't exist on wayland (so if (client.basicUnit) is only taken on X).

@theangryangel
Copy link

theangryangel commented Feb 19, 2020

I don't think that this is an issue specific to Wayland, unfortunately. I'm reasonably confident I'm having the same issue under X. The patch does help mitigate it somewhat, but it's not perfect.

I figure this is probably an upstream issue, but I wanted to add some additional information here as well in-case it helps anyone more familiar with the innards of kwin and kwin-tiling;

I'm able to reliably reproduce this since about Christmas (currently on plasma 5.18, KDE 5.67, qt 5.14.1 and still see it) by constantly resizing a window - usually this triggers it within a few seconds.. obviously wouldn't normally be doing this.

However I see it in general use by also;

  • Occasionally when moving or resizing windows with the mouse - in general use I'm looking at a few times a day. Thankfully under X it recovers.
  • Having the Microsoft Teams app (electron) tiled and attempted to be resized beyond it's normal restricted size - with gaps enabled I get this weird "creeping" effect which almost always immediately triggers it

The creeping effect is a separate issue that I need to tie down further. The rest I realise is probably just apps interacting with each other in unfun ways triggering the kwin crash

@TRPB
Copy link
Author

TRPB commented Feb 19, 2020

I was also trying to track this down and have had the problem in X since I posted this.

My patch fixes dragging but sometimes when addClient is called it causes the same crash. I wasn't able to work out the cause.

@laloch
Copy link
Contributor

laloch commented Feb 19, 2020

I also have these more or less random crashes since v5.17.90. I do not, however, think it's something we should try to fix in the script. KWin should never crash due to the script engine activity.

@TRPB
Copy link
Author

TRPB commented Feb 20, 2020

While I agree, in the interest of having a stable desktop I've been trying to workaround it.

I just had the crash twice in 10 minutes on X. It's really frustrating to the point where I've just turned off the compositor entirely.

There's a bug report for KDE here: https://bugs.kde.org/show_bug.cgi?id=415872

If anyone can get a backtrace of the issue it'd be really useful if you can post it there. Unfortunately running Arch makes this difficult and I've been unable to replicate the problem using Neon in virtualbox.

@laloch
Copy link
Contributor

laloch commented Feb 20, 2020

My backtrace usually looks like:

#7  0x00007f00b288297e in QV4::MemoryManager::collectRoots(QV4::MarkStack*) () at /usr/lib/libQt5Qml.so.5
#8  0x00007f00b2882b7e in QV4::MemoryManager::mark() () at /usr/lib/libQt5Qml.so.5
#9  0x00007f00b2884672 in  () at /usr/lib/libQt5Qml.so.5
#10 0x00007f00b2886aaa in QV4::MemoryManager::allocData(unsigned long) () at /usr/lib/libQt5Qml.so.5
#11 0x00007f00b298101a in QV4::QObjectMethod::create(QV4::ExecutionContext*, QObject*, int) () at /usr/lib/libQt5Qml.so.5
#12 0x00007f00b2983b87 in QV4::QObjectWrapper::getQmlProperty(QQmlContextData*, QV4::String*, QV4::QObjectWrapper::RevisionMode, bool*, bool) const () at /usr/lib/libQt5Qml.so.5
#13 0x00007f00b2983d4f in QV4::QObjectWrapper::virtualGet(QV4::Managed const*, QV4::PropertyKey, QV4::Value const*, bool*) () at /usr/lib/libQt5Qml.so.5
#14 0x00007f00b29b9a1d in QV4::Runtime::CallProperty::call(QV4::ExecutionEngine*, QV4::Value const&, int, QV4::Value*, int) () at /usr/lib/libQt5Qml.so.5
#15 0x00007f007ebe90d2 in  ()
#16 0x0000000000000000 in  ()

So the bug in fact seems to be in Qt5's QV4::MemoryManager implementation.

@laloch
Copy link
Contributor

laloch commented Feb 20, 2020

May be related(?):
https://code.qt.io/cgit/qt/qtdeclarative.git/commit/src/qml/memory?id=e72b032cc1c5a8a07a99fc6522a692c36f369abc

Edit: The above patch was already merged into 5.14 branch, so there's a good chance the issue will get solved with the upcoming Qt 5.14.2 release (March 2020).
Edit 2: And no, I'm not going to build qt5-declarative from source in order to test it.

Edit 3: Never mind. The patch has already been released with v5.14.1 :(

@davidedmundson
Copy link

#12 0x00007f00b2983b87 in QV4::QObjectWrapper::getQmlProperty(QQmlContextData*, QV4::String*, QV4::QObjectWrapper::RevisionMode, bool*, bool) const () at /usr/lib/libQt5Qml.so.5

Where we've seen this before it has not been a QML bug.

It means QML is referencing an object that's been unexpectedly destroyed before processing the destruction.

This could mean some JS code calld some code that deleted an object then returned back to JS space all within the same "event".

It's frustrating that the backtrace is seemingly cut off.

@laloch
Copy link
Contributor

laloch commented Apr 22, 2020

Hi @davidedmundson, it's nice to have you here. I had much more useful backtrace from debug versions of KWin and qt5-declarative, but could not make any sense out of it either. I suspect KWin from destroying objects or triggering events concurrently on a "wrong" thread. Don't have a proof though.

@davidedmundson
Copy link

It should be all on one thread, but there's still multiple concepts "space" when it comes to QML.

Ultimately I'm somewhat stuck on fixing with the current information. I can try and install the extension and hope to randomly reproduce it (any tips for reproducing would help).

If you can find a way to recreate in a more minimal example that would also help.

@laloch
Copy link
Contributor

laloch commented Apr 22, 2020

Once you have the script installed and enabled, it's very easy to trigger - just open two windows on the same screen, grab one of the adjacent window frames and keep resizing. This usually crashes KWin in just a few seconds. Electron windows are more likely to trigger the bug than others for some reason.

@TRPB
Copy link
Author

TRPB commented May 4, 2020

@davidedmundson did you manage to replicate the issue? If not, I'll try to provide a VM that lets you reoproduce the issue easily.

edit: I cannot get this to happen in KDE Neon in virtualbox however hard I try. Does Neon (ubuntu) run an older version of QML that isn't affected? At first I thought it was because virtualbox defaulted to 1 core, but giving it 12 cores still didn't make the issue appear.

I will note that I cannot get Neon to run higher than 800x600 in virtualbox. I've no idea why but it refuses to let me set a higher resolution.

Unfortunately that leaves a lot of variables:

  1. QML version
  2. Distro (Is anyone on Ubuntu affected by this?)
  3. Screen resolution (unlikely to make any difference I'd have thought)
  4. GPU driver

@TRPB TRPB changed the title Do you want wayland workarounds in master? Wayland / Xorg compositor segfault May 4, 2020
@TRPB
Copy link
Author

TRPB commented May 26, 2020

it looks like this may be fixed in Qt 5.15 ( https://bugreports.qt.io/browse/QTBUG-84363 ). Once it's available in the arch stable repository I'll let you know but I'm hopeful!

@davidedmundson
Copy link

Would explain why I couldn't reproduce

@laloch
Copy link
Contributor

laloch commented Jun 4, 2020

it looks like this may be fixed in Qt 5.15 ( https://bugreports.qt.io/browse/QTBUG-84363 ).

Nope. Crashes all the same with Qt 5.15.0.
I can confirm, however, that disabling JIT compilation (QV4_FORCE_INTERPRETER=1) works the issue around.

@TRPB
Copy link
Author

TRPB commented Jun 5, 2020

QV4_FORCE_INTERPRETER=1 seems to have worked for me too. Are there any downsides to setting this variable?

@davidedmundson
Copy link

Yes, it's much slower.

@matejdro
Copy link

Where can I apply QV4_FORCE_INTERPRETER workaround? I assume I need to set this variable before starting kwin?

@TRPB
Copy link
Author

TRPB commented Jul 20, 2020

See the bottom of this page: https://community.kde.org/KWin/Environment_Variables

I'll note that I've been running this for a month now and "much slower" is probably like 1ms vs 10ms because I haven't noticed any differences in speed on the desktop at all.

@TRPB
Copy link
Author

TRPB commented Jul 20, 2020

Not sure if this is helpful in fixing the underlying issue, but with QV4_FORCE_INTERPRETER set, I sometimes get a weird taskbar flicker and although the windows don't move, in the Pager (desktop switcher), the tiled windows keep swapping like they are swapping positions several times a second. The whole taskbar flashes and the windows in the pager look like they're switching positions very quickly. Minimising/maximising a window fixes it. It seems to be some kind of infinite window position swapping loop (though it's not visible in the actual windows, only the taskbar/pager).

I mention it because I never noticed this without the fix and I wonder if this is the same issue as the crash only with different symptoms with/without QV4_FORCE_INTERPRETER set.

faho added a commit that referenced this issue Aug 7, 2020
This seems to work now, and it breaks Wayland.

See #192.
@faho
Copy link
Collaborator

faho commented Aug 7, 2020

krunner and screeen locker window classes to ignored.js (This should probably be in master)

The annoying thing here is:

  1. These don't have the proper window types ("normal", "dock", "dialog") under Wayland. I don't know if they aren't a thing in general (in which case that's just entirely unusable), I hope they'll be added.

  2. The classes changed to the awful "org.kde.*" reverse domain name scheme.

So I added them in 9f7cd08.

@faho
Copy link
Collaborator

faho commented Aug 7, 2020

Okay, if I'm seeing this correctly there's nothing left to do here that I want to add, it's all upstream work.

@faho faho closed this as completed Aug 7, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

6 participants