Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix sceKernelUnlockMutex timing #15930

Merged
merged 1 commit into from
Aug 30, 2022
Merged

Conversation

sum2012
Copy link
Collaborator

@sum2012 sum2012 commented Aug 30, 2022

Fix #6557

Match Real PSP log

20:29:15.097838 user_main - -> sceKernelUnlockMutex 0x3EFE17F, 0x1 = 0x0
20:29:15.097979 user_main - <- sceKernelUnlockMutex 0x3EFE17F, 0x1 = 0x0

@sum2012 sum2012 added the HLE/Kernel Kernel, memory manager, other HLE issues label Aug 30, 2022
@hrydgard hrydgard added this to the v1.14.0 milestone Aug 30, 2022
@hrydgard
Copy link
Owner

That seems fine.

@hrydgard hrydgard merged commit fd863be into hrydgard:master Aug 30, 2022
@sum2012 sum2012 deleted the kernel-minor branch August 30, 2022 13:44
@unknownbrackets
Copy link
Collaborator

Hm, when a mutex is unlocked it should run another thread. I'm worried this might not be delaying the right thread, even if it appears to work.

Does the game also work if you eat cycles here instead of doing a thread delay?

The log only shows when user_main returns, but presumably the mutex unlock woke up some other thread, which did stuff. This could actually mean that the other thread called some other function that should've taken longer.

-[Unknown]

@hrydgard
Copy link
Owner

Hm, that's a good point. EatCycles might make more sense.

I do think that mutex unlocks are gonna take a bunch of cycles either way, definitely not zero-cost, and in 150 cycles, I think there's not really enough time to switch to another thread and do something that blocks, and surely mutex unlocks are not completely free. Just the cost of switching threads twice has to be up there around 150, what with all the register saving and loading... (though not sure how well we account for that).

@sum2012
Copy link
Collaborator Author

sum2012 commented Aug 31, 2022

I think invalid address is due to "CRI FS Data Decompression 0" thread
Attach full real psp log
Jpcsptracelog.zip
20:27:36.533747 CRI FS Data Decompression 0 - -> sceKernelUnlockMutex 0x3EEC95F, 0x1 = 0x0
20:27:36.533816 CRI FS Data Decompression 0 - <- sceKernelUnlockMutex 0x3EEC95F, 0x1 = 0x0

@sum2012
Copy link
Collaborator Author

sum2012 commented Aug 31, 2022

hleEatCycles(usToCycles(150)); or hleEatCycles(500); don't work
1

@unknownbrackets
Copy link
Collaborator

Hm, that's a good point. EatCycles might make more sense.

I do think that mutex unlocks are gonna take a bunch of cycles either way, definitely not zero-cost, and in 150 cycles, I think there's not really enough time to switch to another thread and do something that blocks, and surely mutex unlocks are not completely free. Just the cost of switching threads twice has to be up there around 150, what with all the register saving and loading... (though not sure how well we account for that).

We do account for that in thread switching based on some tests. This delay here is actually 33300 cycles, depending on CPU speed (it's delaying the thread for 150 microseconds.) It's true this function might take insufficient time.

My bigger concern is that this doesn't seem like it could possibly be handling the correct behavior of a better-priority thread unlocking a mutex that a worse-priority thread is waiting on. This should not delay the thread that unlocked the mutex, and it should not reschedule to the worse-priority thread. Lots of tests confirmed this. Unless I'm missing something, this change makes that happen.

I'm actually surprised this passes our pspautotests, so maybe hleDelayResult() is buggy in some way that happens to work out here mostly when combined with hleReSchedule(). Worried this might break Radiant Mythology or something.

-[Unknown]

@unknownbrackets
Copy link
Collaborator

unknownbrackets commented Sep 1, 2022

I created a quick test (will send to pspautotests), which does show this behavior is incorrect. Simply set flags: 1 for worse-priority thread, 2 for same-priority thread, and 4 for better-priority thread.

--- PSP output
+++ PPSSPP output (after #15930)
@@ -1,3 +1,3 @@
-[x/] Unlock test
-[x/157]   Unlocked, ran: 4
-[r/322]   Delayed, ran: 7
+[x/] Unlock test
+[r/180]   Unlocked, ran: 7
+[r/250]   Delayed, ran: 7

On a real PSP, the entire unlock process (including running the same-priority thread) does take 157us total. It does not reschedule for anything but the better-priority thread (4) and returns to the current thread before executing any other same-priority thread (which is why it's not 6, which is 4 + 2, and also why it shows 'x' not 'r' before the time.)

However, this change makes it run all the threads (same and worse priority) immediately on unlock, which is incorrect and not how mutexes are supposed to work (just generally, even outside PSP's threadman library.)

Before this change:

--- PSP output
+++ PPSSPP output (before #15930)
@@ -1,3 +1,3 @@
-[x/] Unlock test
-[x/157]   Unlocked, ran: 4
-[r/322]   Delayed, ran: 7
+[x/] Unlock test
+[x/49]   Unlocked, ran: 4
+[r/250]   Delayed, ran: 7

Indeed, the unlock is too fast and should take longer. However, outside the time, the results were correct before as far as which threads were scheduled.

-[Unknown]

@unknownbrackets
Copy link
Collaborator

Also, here's another run recording more detail for timing, minimizing overhead noise that could make it wrong:

Times:
  better: 48
  unlock done: 83
  same: 182
  worse: 250
  done: 384

This means the unlock "completed" in 48 microseconds, by which point the better-priority thread already ran. The unlock was complete after 83 microseconds (not including overhead of checkpoint()), and it was back to the caller by then. After delaying, it took another 100 microseconds for the same-priority thread to unlock, and an additional 68 microseconds before worse priority ran. Done simply represents when the thread delay from the main thread completed.

Without this PR, the results are:

--- PSP results
+++ PPSSPP results (before #15930)
@@ -2,8 +2,8 @@
 [x]   Unlocked, ran: 4
 [r]   Delayed, ran: 7
 Times:
-  better: 48
-  unlock done: 83
-  same: 182
-  worse: 250
-  done: 384
+  better: 13
+  unlock done: 27
+  same: 72
+  worse: 98
+  done: 275

Using hleEatCycles(7500); goes too far, putting "unlock done" at 94 (past target of 83) but "same" is only 182, probably because something else isn't burning sufficient time. Using hleEatCycles(6000); (or something between 5000-6000) seems closer to right, although done clocks in at 329 still with that.

These were the changes (not committing because we'll never match exact times, so it makes for a useless test):

diff --git a/tests/threads/mutex/unlock2.cpp b/tests/threads/mutex/unlock2.cpp
index 45b3513..37b7492 100644
--- a/tests/threads/mutex/unlock2.cpp
+++ b/tests/threads/mutex/unlock2.cpp
@@ -5,6 +5,7 @@ extern "C" {
 struct ThreadArg {
        SceUID id;
        int bits;
+       u64 unlockTime;
 };

 struct KernelThread {
@@ -18,7 +19,7 @@ struct KernelThread {
        }

        template <typename T>
-       void Start(T &arg) {
+       void Start(T *arg) {
                sceKernelStartThread(id_, sizeof(arg), &arg);
        }

@@ -28,10 +29,11 @@ struct KernelThread {
 static int threadsRun = 0;

 static int waitLock(SceSize argSize, void *argPointer) {
-       ThreadArg *arg = (ThreadArg *)argPointer;
+       ThreadArg *arg = *(ThreadArg **)argPointer;
        SceUInt timeout = 1000;
        schedulingResult = sceKernelLockMutex(arg->id, 1, &timeout);
        threadsRun |= arg->bits;
+       arg->unlockTime = sceKernelGetSystemTimeWide();
        sceKernelUnlockMutex(arg->id, 1);
        return 0;
 }
@@ -53,11 +55,11 @@ extern "C" int main(int argc, char **argv) {
        }

        ThreadArg worseArg = { id, 1 };
-       worseThread.Start(worseArg);
+       worseThread.Start(&worseArg);
        ThreadArg sameArg = { id, 2 };
-       sameThread.Start(sameArg);
+       sameThread.Start(&sameArg);
        ThreadArg betterArg = { id, 4 };
-       betterThread.Start(betterArg);
+       betterThread.Start(&betterArg);

        if (threadsRun != 0) {
                printf("Test failure: threads not blocked by lock\n");
@@ -69,12 +71,22 @@ extern "C" int main(int argc, char **argv) {
        //CHECKPOINT_ENABLE_TIME = 1;

        checkpointNext("Unlock test");
+       u64 beforeUnlock = sceKernelGetSystemTimeWide();
        sceKernelUnlockMutex(id, 1);
        int immediately = threadsRun;
+       u64 unlocked = sceKernelGetSystemTimeWide();
        checkpoint("  Unlocked, ran: %x", immediately);
        sceKernelDelayThread(200);
+       u64 done = sceKernelGetSystemTimeWide();
        checkpoint("  Delayed, ran: %x", threadsRun);

+       schedf("Times:\n");
+       schedf("  better: %lld\n", betterArg.unlockTime - beforeUnlock);
+       schedf("  unlock done: %lld\n", unlocked - beforeUnlock);
+       schedf("  same: %lld\n", sameArg.unlockTime - beforeUnlock);
+       schedf("  worse: %lld\n", worseArg.unlockTime - beforeUnlock);
+       schedf("  done: %lld\n", done - beforeUnlock);
+
        sceKernelDeleteMutex(id);

        return 0;

-[Unknown]

@sum2012
Copy link
Collaborator Author

sum2012 commented Sep 1, 2022

hleEatCycles(6000); or hleEatCycles(33300); also don't work

hrydgard added a commit that referenced this pull request Sep 1, 2022
This reverts commit fd863be, reversing
changes made to 59813ff.
@unknownbrackets
Copy link
Collaborator

It's probably that something else should be causing the other thread to run before the crash happens, but sceKernelUnlockMutex isn't the right function. It must be some other thing before the crash.

The PSP OS just runs the best-priority thread at all times, as long as it's not sleeping/waiting. So it would be good to compare the priority of the thread which called sceKernelUnlockMutex and the thread that needs to run to prevent the crash. Lower values are better.

If the thread that needs to run has a better (lower value) priority than the current thread, it should've run without this change.
If the thread that needs to run has the same or worse (higher value) priority, it most likely means some function should be delaying or delaying for longer, but that function is not sceKernelUnlockMutex. It could also means something should rotate the thread queue, but not many things do that.

-[Unknown]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
HLE/Kernel Kernel, memory manager, other HLE issues
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Mahou Shoujo Madoka Magica Portable in-game problem
4 participants