Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Libco defect in many libretro-cores causes crashes in Windows x64 #1643

Open
snaphat opened this issue Apr 3, 2022 · 4 comments
Open

Comments

@snaphat
Copy link

snaphat commented Apr 3, 2022

Many of the libretro cores contain a defect in libco that causes floating point register corruption under x86_64 Windows due to incorrect handwritten 'machine code' found here for example.

Specifically, the issue is that the restore code uses the wrong opcode for MOVAPS when attempting to restore callee-saved registers. The bug appears to originate in the earliest incarnations of libco. It seems to have been fixed some years ago in some of the libretro repositories when an update was pulled from the mainline libco repository (e.g. here). Many of the repositories hold a broken copy though, and it does result in real-world crashes and/or broken behavior.

More specifically, lines 62 through 71 use opcode 0x29 instead of 0x28. Semantically, the difference is as follows:

  • 0x28 : (reg <-- [mem])
  • 0x29 : ([mem] <-- reg)

The former is for storing a memory value into a register, and the latter is for storing register contents into a memory location.

The earliest incarnations of the code were uncommented and written in an era when 64-bit processors were rare, so it isn't surprising it wasn't fixed in mainline for almost a decade. Interestingly, eventually the incorrect code was commented with the actual assembly it represents and even it shows the operations are incorrect (in the libretro copies). The original author of those comments probably didn't realize the implications at the time.

This issue was found while investigating a crash bug in scummvm.

I've determined that the following scummvm games are affected by this issue and either crash or have severe display issues as result of the fault:

  • Beneathe a Steel Sky
  • The Bizarre Adventures of Woodruff and the Schnibble
  • Bud Tucker in Double Trouble
  • Gobliins 2
  • Goblins 3 / Goblins Quest 3
  • Legend of Kyrandia
  • Legend of Kyrandia 3
  • Might and Magic IV: Clouds of Xeen
  • Might and Magic V: Darkside of Xeen
  • Might and Magic IV & V: World of Xeen
  • Might and Magic: Swords of Xeen
  • Toonstruck

The following scummvm games are most likely affected by this issue but I am unable to confirm:

  • Drascula - The Vampire Strikes Back
  • Leisure Suit Larry 7 - Love for Sail
  • The Neverhood Chronicles
  • Phantasmagoria 2
  • Starship Titanic
  • Touche
  • Tony Tough and the Night of the Roasted Moths

Here is a list of related open issue reports that have a high probability of having been caused by this bug:

The following is a list of ALL cores that currently contain the faulty code in the main branch. I am unable to determine if this causes fault behavior (in practice) on any of the cores outside of scummvm:
- 3dengine

The following is a list of additional subrepositories within the libretro organization that currently contain the faulty code in the main branch:

The following is a list of 3rd party libretro repos that currently contain the faulty code in the main branch:

@hizzlekizzle
Copy link
Contributor

well damn, thanks for the excellent bug report!

@snaphat
Copy link
Author

snaphat commented Apr 3, 2022

Couple of things I neglected to mention are as follows:

  1. Fault behavior (in practice) will be dependent on the exact compiler used, version, compilation options, and core code itself. So some games may just work in scummvm on one version of retroarch or core and then break in another, etc.
  2. -O2 compilation does not result in fault behavior because the xmm registers are not used to store pointers. This is why the debug build of retroarch does not exhibit the bug (as some of the reports indicate).
  3. Declaring local variables as volatile works around the issue by ensuring that they are not saved into xmm registers.
  4. Binaries compiled with -03 and -g exhibit the issue. Noting this due to a comment I saw in one of the linked threads regarding debugging this issue on release builds of Retroarch. Issues like these that only show up in 'release builds' can be debugged by removing the stripping, adding -g, and providing an executable to users.

@i30817
Copy link

i30817 commented Apr 10, 2022

Does this only affect windows and not linux?

@snaphat
Copy link
Author

snaphat commented Apr 11, 2022

Does this only affect windows and not linux?

Linux uses the System V ABI which doesn't require callee-saving the XMM6-XMM15 registers so it shouldn't be affected by this issue specifically.

I reviewed the libco call-site code for both Windows 64-bit and System V 64-bit and didn't see any issues outside of this particular XMM issue. All other callee-save registers appear to be properly saved and restored in both the inline assembly and inline machine code for both ABIs.

For reference to the calling conventions, see Table 4 here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants