Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UI: Use an alternate signal stack on Linux/macOS #7973

Closed
wants to merge 1 commit into from

Conversation

fzwoch
Copy link
Contributor

@fzwoch fzwoch commented Dec 28, 2022

Description

Create an alternate signal stack and set up our signals to run on this stack instead of the default one.

Motivation and Context

This works around an issue that seems to manifests itself on macOS arm64 where the use of sigalstack() will return with an EPERM error after a certain point of OBS Studio UI initialization.

One example where this is an issue is the use of plugins written in Go. The Go runtime will try to create an alternate signal stack if none has been created yet and move existing signal handlers to the new stack. In case of an error the Go runtime will panic causing the OBS process to deadlock.

Currently, the Teleport plugin does not work because of that puzzling behavior.
Seems like other applications seem to show the similar behavior regarding sigaltstack() - for which no one seems to have a good explanation yet.

As far as I know having an alternate signal stack should be save and not interfere with anything else(?)

How Has This Been Tested?

Build OBS with this applied and loaded, tested with Teleport plugin.
Directly on M1 Macboon Air. Teleport via loopback.

Types of changes

Tweak (non-breaking change to improve existing functionality)

Checklist:

  • My code has been run through clang-format.
  • I have read the contributing document.
  • My code is not on the master branch.
  • The code has been tested.
  • All commit messages are properly formatted and commits squashed where appropriate.
  • I have included updates to all appropriate documentation.

@kkartaltepe
Copy link
Collaborator

In case of an error the Go runtime will panic causing the OBS process to deadlock.

If your plugin is trying to crash, it seems like we should let it crash/hang. This is an attempt to keep going despite it entering an invalid state which seems more dangerous than just crashing.

@fzwoch
Copy link
Contributor Author

fzwoch commented Dec 28, 2022

Having the stack should be a legal state. The Go runtime will work with the stack instead of trying to create one with sigalstack() itself.

Both cases are perfectly legal. The actual issue is that the sigaltstack() syscall on macOS arm64 fails for some reason. I could not find anyone yet who could explain why, and why also on this specific platform. (it begins to happen at some specific code locations, that does not give a specific reason why it should do so)

I'm not trying to keep going, I'm just selecting a code path that works around a potential OS bug(?). If there is an explanation why sigaltstack() suddenly fails and there are means to mitigate that I'm open to that too.

#7795 (comment) is my latest gibberish for investigating the issue.

@WizardCM WizardCM added Enhancement Improvement to existing functionality macOS Affects macOS Linux Affects Linux labels Dec 28, 2022
@RytoEX
Copy link
Member

RytoEX commented Jan 15, 2023

As an aside, sigalstack is written in the commit message, but is seemingly a typo.

@fzwoch
Copy link
Contributor Author

fzwoch commented Jan 16, 2023

Thanks, fixed the typo.

Some additional remarks:

  • The original reason of failure is still unclear to me. It works on other platforms and I have no grasp why it is different here. I suspected a memory corruption somewhere, but with Apple's address sanitizer I do not get any reports - on the other hand the issue also does not occur when running with the sanitizer enabled.
  • Go expects the signal handler to be installed with the SA_ONSTACK flag on all platforms, so it can move them to an alternate stack (which Go creates internally by itself, and moves the signals to that one). It is no a real issue until SIGINT is triggered (where Go will probably terminate with a complaint that a signal is triggered that is not on the current signal stack).
  • I'm not sure whether signal(SIGPIPE, SIG_IGN); needed to be moved to a call to sigaction(). I guess it does not need the SA_ONSTACK flag as it will be ignored and never hits the stack? But I'm not sure on that one. I read elsewhere that is in general recommend to use sigaction() instead of signal() so I went for it.

I can imagine it is discuss-able whether to merge this is or not. For proper Go support in plugins it would be nice to have the SA_ONSTACK flag for the installed signal handler though. But also it would be nice to understand and fix the original problem which I, frankly, do not.

@PatTheMav
Copy link
Member

@fzwoch Have you tried to reproduce the crash using an OBS built without the browser source enabled? Given that CEF more or less "takes over" the program once loaded and Chromium is interfering with signal stacks for crash handling iirc.

@fzwoch
Copy link
Contributor Author

fzwoch commented Jan 18, 2023

I tried removing as many plugins as possible.

On the other hand, in the above link to the discussion you can see that issue can be traced to a particular line in the code (may be different for others, not sure):

In ui_OBSBasic.h:

statusbar = new OBSBasicStatusBar(OBSBasic);

This line is run before any plugin is loaded. So I don't think there can be much interference from plugins.

I also send something in Apple's direction, but my experience from the last decade is that this bug tracker is a void-hole. I still have open tickets from 2015 there. They eventually fixed the issues I reported, but if it was due to the ticket I have no idea.. but lets see if have things have changed in the meantime.

@PatTheMav
Copy link
Member

PatTheMav commented Jan 18, 2023

I tried removing as many plugins as possible.

On the other hand, in the above link to the discussion you can see that issue can be traced to a particular line in the code (may be different for others, not sure):

In ui_OBSBasic.h:

statusbar = new OBSBasicStatusBar(OBSBasic);

This line is run before any plugin is loaded. So I don't think there can be much interference from plugins.

I also send something in Apple's direction, but my experience from the last decade is that this bug tracker is a void-hole. I still have open tickets from 2015 there. They eventually fixed the issues I reported, but if it was due to the ticket I have no idea.. but lets see if have things have changed in the meantime.

FWIW the code you use seems to rely on the old (and deprecated) sigaltstack struct - the current stack_t struct does not take a char pointer, but instead a void pointer.

Using this code (instead of yours) works fine:

    stack_t ss;
    memset(&ss, 0, sizeof(ss));
    ss.ss_sp = malloc(SIGSTKSZ);
    ss.ss_size = SIGSTKSZ;
    ss.ss_flags = 0;
    if (sigaltstack(&ss, NULL) < 0) {
        perror("sigaltstack");
        abort();
    }

Dunno if this is the actual fix, but I think the code does the same in essence, and I haven't gotten OBS to crash yet. So maybe that's what Go needs to do instead?

@fzwoch
Copy link
Contributor Author

fzwoch commented Jan 19, 2023

Small mistake here. You need to check for < 0.

It does not seem to make a differnce for me whether I malloc some data or point to char stack data.

@PatTheMav
Copy link
Member

Small mistake here. You need to check for < 0.

It does not seem to make a differnce for me whether I malloc some data or point to char stack data.

That's right, but even with the comparison fixed I cannot get it to abort on my arm64 Macs (I moved it after the calls you identified and even further back in the program execution). The moment I used the implementation you provided (using a char array instead) in place of mine the abort-branch is taken.

@fzwoch
Copy link
Contributor Author

fzwoch commented Jan 19, 2023

That let's me believe there is some kind of memory corruption somewhere.
I guess having a data array on the stack changes the stack memory layout compared to allocating it on the heap.

For me, there is no difference in aborting if I use the heap instead of the stack.
But it does work with the address sanitizer, which probably changes memory layout by a lot.

So the change of the signal stack may not the actual solution to the issue, but changes the memory layout in a way that prevents the failure.

Can you get away with using char signalStack[SIGSTKSZ]; when you move the code block as a first action ot the beginning of main()?

I have a Macbook Air M1 2020 with Monterey 12.6.1. It is the only machine I have and I need it to stay on Monterey at the moment.

I got reports from someone using a Mac Mini M1 and someone using Ventura Beta 2 with the same issue.

@PatTheMav
Copy link
Member

I'm calling this a Heisenbug now, because the closer you try to investigate it, the less likely it seems to occur - your variant obviously worked at the beginning of the main() function, when I moved it further back in OBSInit() however, it also never ran afoul of the check.

@fzwoch
Copy link
Contributor Author

fzwoch commented Jan 20, 2023

Thanks for checking it out @PatTheMav. So there is something behaving weirdly. A bit of sanity has returned to myself..

@fzwoch
Copy link
Contributor Author

fzwoch commented Jan 29, 2023

Another question @PatTheMav. Did you change something on your machines while testing when you got different behaviors?

I noticed something particular on my side. I can run the code successfully when I attach a monitor via USB-C and close the laptop's lid. When I run the code with the laptop lid open it will crash. Lid open and monitor attached also results in a crash.

@PatTheMav
Copy link
Member

Another question @PatTheMav. Did you change something on your machines while testing when you got different behaviors?

I noticed something particular on my side. I can run the code successfully when I attach a monitor via USB-C and close the laptop's lid. When I run the code with the laptop lid open it will crash. Lid open and monitor attached also results in a crash.

Can't really tell. Given that OBS is a) multithreaded and b) has Qt, CEF, macOS' native application framework, and depending on your setup Python's runtime all thrown into the mix, I'm not surprised that futzing around with the signal stack can lead to errors.

Create an alternate signal stack and set up our signals to run on
this stack instead of the default one.

This works around an issue that seems to manifests itself on
macOS arm64 where the use of sigaltstack() will return with an
EPERM error after a certain point of OBS Studio UI initialisation.

One example where this is an issue is the use of plugins written in
Go. The Go runtime will try to create an alternate signal stack if
none has been created yet and move existing signal handlers to the
new stack. In case of an error the Go runtime will panic causing
the OBS process to deadlock.
@fzwoch
Copy link
Contributor Author

fzwoch commented May 3, 2023

I will actually close this. Removing the stack also does not work as advertised on macOS. The man page contradicts the code in their kernel. I just give up on this platform.

@fzwoch fzwoch closed this May 3, 2023
@RytoEX
Copy link
Member

RytoEX commented May 3, 2023

I will actually close this. Removing the stack also does not work as advertised on macOS. The man page contradicts the code in their kernel. I just give up on this platform.

If the docs are incorrect, please log a bug to Apple and explain what you feel is broken so it can be properly investigated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement Improvement to existing functionality Linux Affects Linux macOS Affects macOS
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants