Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

core: Rewrite thread local storage implementation #118

Merged
merged 1 commit into from
May 1, 2024

Conversation

raphaelthegreat
Copy link
Collaborator

@raphaelthegreat raphaelthegreat commented Apr 30, 2024

It's not uncommon for ps4 guest applications to launch and use many threads, which also necessitates handling thread local storage properly. In x86 thread local accesses are performed by loading the pointer in the fs segment register. This is a problem as Windows doesn't allow you to change the value of this register to what the guest expects. Not quite true, see first reply

On master this is handled with a simple exception handler that will patch the value of the destination register with a thread_local buffer. This works fine but will be a problem later on. Obviously the performance impact is pretty large for any access. In addition, the new texture cache that does fault tracking also needs a custom exception handler, so they end up conflicting. Also, guest apps can use negative offsets when accessing the buffer, so the current implementation would trigger UB in these cases.

This PR attempts to fix all of the above, by using assembly trampolines instead of the exception handler. For storing the TLS image pointer, a new TLS slot is allocated from the parent process and the logic from wine's TlsGetValue is used to retrieve the value. This means we also don't have to rely on undefined/unused spaces in TEB structure to store our data. Each mov instruction from FS segment is patched with a jump to a trampoline that loads the actual pointer.

While at it, also fixed a problem with fault tracking that caused crashing in pngdec demo. The tracking was being performed in the texture cache page size, when it should be on 4KB boundary like the host/guest. Also bumped the cache page size to vastly reduce the amount of page table accesses.

@red-prig
Copy link

I’ll make a small clarification that you can change the fs_base value with a special processor instruction, but this turned out to be pointless since any Windows context switch resets this value to its initial state

@raphaelthegreat
Copy link
Collaborator Author

Thanks, edited the post to include the clarification

@raphaelthegreat raphaelthegreat merged commit 1b9bf92 into shadps4-emu:main May 1, 2024
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants