Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mono_jit_init on macOS 10.14 has graphics corruption due to mprotect invocation #10802

Closed
chamons opened this issue Sep 25, 2018 · 46 comments

Comments

@chamons
Copy link
Member

@chamons chamons commented Sep 25, 2018

This issue comes from the investigation of: xamarin/xamarin-macios#4509

There is a regression in macOS 10.14 where if all the following are true for me:

  • Building with Xcode10GM
  • Running on macOS 10.14
  • Launching from Finder or via "open" on the command line
  • In a project with a xib not a storyboard as the main entry point

AppKit starts drawing corrupted graphics.

If any of those preconditions are false, say launching from Xcode instead of Finder\open, everything appears to draw fine but sometimes later strange drawing behavior creeps in anyway.

I've tracked it down to a single mprotect call in startup after @lemonmojo did some amazing work and tracked it down to just mono_jit_init ().

I've filed radar://44763485 on this, but mono may need to implement some sort of work around.

Steps to Reproduce

  1. Unzip MProtectMadness_WithMono.zip
  2. cd MProtectMadness
    2 (a). You may need to open MProtectMadness.xcodeproj and change signing keys
  3. rm -fr build/ && xcodebuild -project MProtectMadness.xcodeproj/
  4. open ./build/Release/MProtectMadness.app

The built in screenshot tool doesn't correctly capture what's on the screen, but you should see a ghost outline of a modal dialog that is not painting correctly.

Hitting escape will break you out of it and let you close the main window \ quit.

A single mono_jit_init call or mprotect with some math I stole from the mono startup sequence both trigger it exactly the same. See USE_MONO define in sample.

MProtectMadness_WithMono.zip

Current Behavior

Trigger strange AppKit drawing bug.

Expected Behavior

Work around said brokeness

On which platforms did you notice this

[x] macOS
[ ] Linux
[ ] Windows

Version Used:

I've reproduced this on mono 5.12, 5.16, and 5.19 (master 3 commits ago today), and a local mono build at a90249f.

@marek-safar

This comment has been minimized.

Copy link
Member

@marek-safar marek-safar commented Sep 25, 2018

@luhenry could you please take over this issue

@lemonmojo

This comment has been minimized.

Copy link

@lemonmojo lemonmojo commented Sep 25, 2018

http://man7.org/linux/man-pages/man2/mprotect.2.html

mprotect(): POSIX.1-2001, POSIX.1-2008, SVr4. POSIX says that the behavior of mprotect() is unspecified if it is applied to a region of memory that was not obtained via mmap(2).

Isn't that the case here?

Also see https://stackoverflow.com/a/50691513

@lemonmojo

This comment has been minimized.

Copy link

@lemonmojo lemonmojo commented Sep 25, 2018

So, as suspected if I mmap the memory with r/w before calling mprotect, none of the weirdness happens. At least in @chamons repro case. Here's an updated version of that with mmap added in:
MProtectMadnessLM.zip

See this rust code for reference: https://github.com/rust-lang/rust/blob/master/src/libstd/sys/unix/thread.rs#L308

@lemonmojo

This comment has been minimized.

Copy link

@lemonmojo lemonmojo commented Sep 26, 2018

Quick update: Just installed 10.14.1 beta and can still reproduce the issue.

@lilleydnSub

This comment has been minimized.

Copy link

@lilleydnSub lilleydnSub commented Sep 26, 2018

Are we likely to get an update pretty soon for a workaround ?

@lemonmojo

This comment has been minimized.

Copy link

@lemonmojo lemonmojo commented Sep 28, 2018

@chamons @marek-safar @luhenry Any news from you guys or from Apple?

@vargaz

This comment has been minimized.

Copy link
Member

@vargaz vargaz commented Sep 28, 2018

We can disable that mprotect call if needed on osx, the side effect is that stack overflows will not be caught on the main thread.

@lemonmojo

This comment has been minimized.

Copy link

@lemonmojo lemonmojo commented Sep 28, 2018

@vargaz Well, that doesn't sound very desirable. Did you see my comment on mmap'ing the memory before calling mprotect?

@jaykrell

This comment has been minimized.

Copy link
Collaborator

@jaykrell jaykrell commented Sep 28, 2018

@lemonmojo Can you mmap the already allocated stack?

@vargaz

This comment has been minimized.

Copy link
Member

@vargaz vargaz commented Sep 28, 2018

I'm not sure what the effect of that would be. Stack overflow handling is niche functionality which doesn't even work very well, so its less important than mono apps working on mojave.

@lemonmojo

This comment has been minimized.

Copy link

@lemonmojo lemonmojo commented Sep 28, 2018

@jaykrell I only tried what's present in the project I linked to: https://github.com/mono/mono/files/2416840/MProtectMadnessLM.zip.
As previously mentioned, the rust guys do something similar: https://github.com/rust-lang/rust/blob/master/src/libstd/sys/unix/thread.rs#L308

@BrzVlad

This comment has been minimized.

Copy link
Member

@BrzVlad BrzVlad commented Sep 28, 2018

I can't reproduce this issue. Given, from what I understood, that it might be related to calling mprotect over memory that was not explicitly mmaped, does this call reversal happen to fix the issue ? @chamons

https://gist.github.com/BrzVlad/dcee172c24cc532a02d887b96cd4704c

@chamons

This comment has been minimized.

Copy link
Member Author

@chamons chamons commented Sep 28, 2018

I'm working with @BrzVlad now to test the mmap "fix" since he's having trouble reproducing locally (maybe hardware dependent?)

@lemonmojo

This comment has been minimized.

Copy link

@lemonmojo lemonmojo commented Sep 28, 2018

@chamons At some level it's definitely hardware dependent as I've never been able to reproduce it in a VM.

I can however reproduce it now consistently on 3 different machines: iMac Pro, 2016 MBP and 2018 MBP.

@chamons

This comment has been minimized.

Copy link
Member Author

@chamons chamons commented Sep 28, 2018

Hmm, I can still reproduce it (obviously) but your mmp fix doens't fix things for me (either your sample or adding:

mmap (addr, length, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANON | MAP_FIXED, -1, 0);

to mine. 🤔

@chamons

This comment has been minimized.

Copy link
Member Author

@chamons chamons commented Sep 28, 2018

This is still under active investigation (by the awesome @BrzVlad ) but so far I've only been able to "fix" this by effectively commenting out the mprotect. That makes us lose stack overflow detection, which is obviously :(

BrzVlad added a commit to BrzVlad/mono that referenced this issue Oct 1, 2018
On macOS Mojave, it seems that changing the mapping of stack pages can lead to corruption bugs.

mono#10802
@lemonmojo

This comment has been minimized.

Copy link

@lemonmojo lemonmojo commented Oct 1, 2018

@chamons Any news?

I just tested this using Xcode 9.4.1 and can still reproduce some weirdness when the mprotect call is not commented out (even the mmap workaround doesn't work for me in this case).

Here's what this looks like for me with Xcode 9 (note the title bar of the NSAlert):
monomojavexcode9

When this happens, Console.app shows the same old Context ID mismatch warning.

So for me this all but confirms that this is not a bug in the SDK but rather the OS itself. This also confirms that it's not related to an alternative code path Apple's taking for apps that are linked to the 10.14 SDK.

So to sum up: Downgrading to Xcode 9 is NOT a workaround unfortunately.

@chamons

This comment has been minimized.

Copy link
Member Author

@chamons chamons commented Oct 1, 2018

I have heard zero from Apple so far, which is disappointing as you can imagine.

The runtime team is actively working on it, and I've been told there should be a PR today related to it.

@lemonmojo

This comment has been minimized.

Copy link

@lemonmojo lemonmojo commented Oct 1, 2018

@chamons Yes, that is indeed disappointing. At this point I'd suggest opening a TSI. Even if the Apple support engineer can't provide a workaround, it might help promote the radar. I would've requested a TSI already but fear that they would just point to mono instead so I think it makes sense that the TSI is opened by MS.

BrzVlad added a commit to BrzVlad/mono that referenced this issue Oct 1, 2018
On macOS Mojave, it seems that changing the mapping of stack pages for main thread can lead to corruption bugs.

mono#10802
@vargaz

This comment has been minimized.

Copy link
Member

@vargaz vargaz commented Oct 1, 2018

Note that what the runtime was doing, i.e. mprotect some memory it didn't allocate might not be a supported use case, so it could get broken at any time.

@lemonmojo

This comment has been minimized.

Copy link

@lemonmojo lemonmojo commented Oct 1, 2018

So building mono with the changes from #10899 should fix the problem? Or is there anything else required?

@chamons

This comment has been minimized.

Copy link
Member Author

@chamons chamons commented Oct 1, 2018

@lemonmojo You beat me to the punch by one minute - Yes, please test with that and report back. We believe it should fix things.

@chamons

This comment has been minimized.

Copy link
Member Author

@chamons chamons commented Oct 1, 2018

Also, I'm sure a TSI would get back a "what you are doing in unsupported, we don't care, stop doing that".

monojenkins added a commit to monojenkins/mono that referenced this issue Oct 1, 2018
On macOS Mojave, it seems that changing the mapping of stack pages for main thread can lead to corruption bugs.

mono#10802
@marek-safar

This comment has been minimized.

Copy link
Member

@marek-safar marek-safar commented Oct 3, 2018

The fix was merged 2 days ago so it should show up there already

@lemonmojo

This comment has been minimized.

Copy link

@lemonmojo lemonmojo commented Oct 3, 2018

If the PR is really included in this release it doesn't fix the issue for me as I can again reproduce with all repro cases. @chamons Did you test this already?

@BrzVlad

This comment has been minimized.

Copy link
Member

@BrzVlad BrzVlad commented Oct 3, 2018

@lemonmojo That package doesn't contain the fix. The next nightly will.

@chamons

This comment has been minimized.

Copy link
Member Author

@chamons chamons commented Oct 3, 2018

Apple has not replied to mine yet at all

I didn't test it as you gave it a 👍 and you've had more luck reproducing this than I.

@DanWBR

This comment has been minimized.

Copy link

@DanWBR DanWBR commented Oct 3, 2018

@chamons when could we expect an updated Xamarin.Mac package to be downloaded through VS alpha update channel?

@Rogister

This comment has been minimized.

Copy link

@Rogister Rogister commented Oct 3, 2018

@chamons

yes, I see that there are fix but I also see that it does not seem to solve the problems .... so if I understood correctly, it is not a problem Xamarin.MAC but a problem Mono ?

@chamons

This comment has been minimized.

Copy link
Member Author

@chamons chamons commented Oct 3, 2018

@Rogister As I noted here xamarin/xamarin-macios#4509 (comment) and in other places, if you are having issues not described by an existing issue, please file another issue with a detailed description of your issue and steps to reproduce.

And please stop spamming mono/mono issues. Posting that you have an issue, with zero actionable details, multiple times on multiple threads is not going to get your a resolution more quickly.

@chamons

This comment has been minimized.

Copy link
Member Author

@chamons chamons commented Oct 3, 2018

@DanWBR This fix was just landed in master less than 2 days ago, and the cherry picks to the release branches are still hot. Once those land, I will spin up a build with that mono bump and post it on the original issue.

I can not give a timetable currently on when that will hit Alpha channel, but trust me I realize how pressing this issue is.

akoeplinger added a commit that referenced this issue Oct 3, 2018
* [runtime] Disable stack guard for main thread on osx

On macOS Mojave, it seems that changing the mapping of stack pages for main thread can lead to corruption bugs.

#10802

* [runtime] Fix undefined pthread_main_np (#10930)

It is exported only when _DARWIN_C_SOURCE is defined.
monojenkins added a commit that referenced this issue Oct 3, 2018
[2018-06] [runtime] Disable stack guard for main thread on osx

Backport of #10899.

/cc @luhenry @BrzVlad

Description of #10899:
On macOS Mojave, it seems that changing the mapping of stack pages for main thread can lead to corruption bugs.

#10802



<!--
Thank you for your Pull Request!

If you are new to contributing to Mono, please try to do your best at conforming to our coding guidelines http://www.mono-project.com/community/contributing/coding-guidelines/ but don't worry if you get something wrong. One of the project members will help you to get things landed.

Does your pull request fix any of the existing issues? Please use the following format: Fixes #issue-number
-->
monojenkins added a commit that referenced this issue Oct 3, 2018
[2018-08] [runtime] Disable stack guard for main thread on osx

Backport of #10899.

/cc @luhenry @BrzVlad

Description of #10899:
On macOS Mojave, it seems that changing the mapping of stack pages for main thread can lead to corruption bugs.

#10802



<!--
Thank you for your Pull Request!

If you are new to contributing to Mono, please try to do your best at conforming to our coding guidelines http://www.mono-project.com/community/contributing/coding-guidelines/ but don't worry if you get something wrong. One of the project members will help you to get things landed.

Does your pull request fix any of the existing issues? Please use the following format: Fixes #issue-number
-->
monojenkins added a commit that referenced this issue Oct 3, 2018
[2018-10] [runtime] Disable stack guard for main thread on osx

Backport of #10899.

/cc @luhenry @BrzVlad

Description of #10899:
On macOS Mojave, it seems that changing the mapping of stack pages for main thread can lead to corruption bugs.

#10802



<!--
Thank you for your Pull Request!

If you are new to contributing to Mono, please try to do your best at conforming to our coding guidelines http://www.mono-project.com/community/contributing/coding-guidelines/ but don't worry if you get something wrong. One of the project members will help you to get things landed.

Does your pull request fix any of the existing issues? Please use the following format: Fixes #issue-number
-->
@Rogister

This comment has been minimized.

Copy link

@Rogister Rogister commented Oct 4, 2018

@chamons

Can you tell when the fix will be available in ALPHA Channel or right here ?

Thank you.

Alain

@chamons

This comment has been minimized.

Copy link
Member Author

@chamons chamons commented Oct 5, 2018

I believe this issue is fixed with #10899 and the associated cherry-picks.

Please see xamarin/xamarin-macios#4848 for Xamarin.Mac specific information on how to test this fix and report back. Please only comment here in mono/mono if you specifically have issues with mono outside of Xamarin.Mac.

@chamons chamons closed this Oct 5, 2018
@lemonmojo

This comment has been minimized.

Copy link

@lemonmojo lemonmojo commented Oct 8, 2018

@chamons @vargaz: I installed the mono nightly from 2018-10-07 today and it appears to indeed fix the issue discussed in this thread.

Unfortunately, I'm running into all kinds of other problems with this build. I've seen mono apps (including my own and Visual Studio!) just randomly hang and crash out of nowhere.
One crash I can reproduce consistently and that never happened before the update (I ran mono 5.14 before) is happening when calling the completion handler of a WKWebView delegate method (webView:didReceiveAuthenticationChallenge:completionHandler:).

Here are two different error messages I've seen when calling the completion handler:
mono_coop_mutex_lock Cannot transition thread 0x1106995c0 from STATE_BLOCKING with DO_BLOCKING
Native Crash Log

mono_threads_enter_gc_safe_region_unbalanced Cannot transition thread 0x11341c5c0 from STATE_BLOCKING with DO_BLOCKING
Native Crash Log

@lemonmojo

This comment has been minimized.

Copy link

@lemonmojo lemonmojo commented Oct 8, 2018

Could you by any chance backport the changes to 5.14 so that I can verify if the change itself causes these issues or if it's been some other changes since 5.14?

@vargaz

This comment has been minimized.

Copy link
Member

@vargaz vargaz commented Oct 8, 2018

There are backports to older mono versions above, the packages are available by clicking on the green checkmarks.

@chamons

This comment has been minimized.

Copy link
Member Author

@chamons chamons commented Oct 8, 2018

@lemonmojo Also, if swapping to an older mono "fixes" it please file bug reports if you can. :)

@lemonmojo

This comment has been minimized.

Copy link

@lemonmojo lemonmojo commented Oct 8, 2018

@vargaz: Not sure where to find those, could you please point me in the right direction? Thx

@chamons

This comment has been minimized.

Copy link
Member Author

@chamons chamons commented Oct 8, 2018

So 5.14 is "2018-04" in VS release speak so you want this branch:

https://github.com/mono/mono/tree/2018-04

Click commits, find the green checkmark at the last commit, and use pkg-mono.

Or just https://xamjenkinsartifact.azureedge.net/build-package-osx-mono/2018-04/165/969357ac02b2c08a43ef89d98aca550d3648bf00/MonoFramework-MDK-5.14.0.205.macos10.xamarin.universal.pkg

@lemonmojo

This comment has been minimized.

Copy link

@lemonmojo lemonmojo commented Oct 8, 2018

@chamons Perfect, thx a lot, I'll give this a try later.

@lemonmojo

This comment has been minimized.

Copy link

@lemonmojo lemonmojo commented Oct 8, 2018

@chamons 5.14 and 5.16 look good. Have none of the mentioned issues and the issue from this thread appears to be fixed as well.

Something's seriously broken since 5.18 though. Unfortunately I spent so much time debugging the original issue from this thread that I currently can't file separate bug reports for this new issue. Guess I'll be continuing to use 5.14 with the fix for the time being.

jonpryor added a commit to xamarin/xamarin-android that referenced this issue Oct 8, 2018
jonpryor added a commit to xamarin/xamarin-android that referenced this issue Oct 8, 2018
@marek-safar marek-safar added this to the 2018-06 (5.16.xx) milestone Oct 13, 2018
alexanderkyte added a commit to alexanderkyte/mono that referenced this issue Oct 15, 2018
On macOS Mojave, it seems that changing the mapping of stack pages for main thread can lead to corruption bugs.

mono#10802
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
9 participants
You can’t perform that action at this time.