Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RSP Recompiler has issues with Rogue Squadron and Battle For Naboo #139

Closed
LegendOfDragoon opened this issue Feb 25, 2015 · 24 comments · Fixed by #1033
Closed

RSP Recompiler has issues with Rogue Squadron and Battle For Naboo #139

LegendOfDragoon opened this issue Feb 25, 2015 · 24 comments · Fixed by #1033

Comments

@LegendOfDragoon
Copy link
Contributor

Basically those 2 games won't run with LLE gfx enabled. It has something to do with CPU<->RSP sync. Raising the count to a high number like 0x4000, allows the game to run. I will spend some time investigating it today. I highly doubt it's an opcode issue, unless it's a branching problem.

It would help a lot, if I knew exactly what PJ64 does for CPU<->RSP sync.

@project64
Copy link
Owner

On n64 Rsp/cpu run at the same time and use different ops to sync between them. In pj64 the rsp useally just runs till the end of the task before the cpu continues. I have always wanted it to get it where the RSP ran just a block of code, then cpu ran a block then back to rsp etc ... one of ziggy's tries ran the RSP in a thread so things would work at the same time, but you could easily get non constant results.

@cxd4
Copy link
Contributor

cxd4 commented Feb 25, 2015

It turned out actually, that angrylion mod of ziggy RSP fixed the CPU-RSP sync issue by more-or-less "HLE"-ing it. The looped SP_PC_REG values where the infinite loop happens in some of those games are tested within each fetch-decode-execute iteration of the RSP interpreter loop, and it then makes sure some qualifying conditions such as the game code and instructions match. If so, it quits the RSP by force. So according to angrylion it's more of a bypass hack than a timing fix to run both at the same time.

@dsx-
Copy link
Contributor

dsx- commented Feb 25, 2015

this is a regression, works fine with 1.7.0.3 rsp recompiler

@cxd4
Copy link
Contributor

cxd4 commented Feb 25, 2015

Back then, it didn't emulate SP_SEMAPHORE_REG correctly, where it did an early exit and resumed SP.

So something about how is handled in the recompiler causes some conflict I guess.

@LegendOfDragoon
Copy link
Contributor Author

I think I understand how the sync works from the rsp's side, enough. The part I don't understand too well is what the cpu does after a yield, and how PJ64 handles this.

Spent a few hours testing various things. Now I'm wondering whether all games even use this functionality. Even though Rogue Squadron works in RSP interpreter, it's very slow! I'm thinking this stuff might be game specific.

I tried debugging WDC to get a better idea of how it works. Tweaking the mf status read timeout has alters the execution flow! Thanks to Cheat Engine, I was able to freely change the value during runtime :) .

0x0F8  0x40022000   MFC0  V0, SP status
0x0FC  0x13A0FFF7   BEQZ  SP, 0x0DC
0x100  0x8FBC0A70   LW    GP, 0x0A70(SP)
0x104  0x8FBB0A74   LW    K1, 0x0A74(SP)
0x108  0x30420080   ANDI  V0, V0, 0x0080
0x10C  0x001C0E02   SRL   AT, GP, 0x18
0x110  0x94210BB0   LHU   AT, 0x0BB0(AT)
0x114  0x14400003   BNE   V0, R0, 0x124
0x118  0x00000000   NOP
0x11C  0x00200008   JR    AT
0x120  0x23BD0008   ADDI  SP, SP, 0x0008

In this piece of code (WDC), the lower the mf status read timeout, the more often V0 = 0x80. If I set the number to like 0x80 or greater, I never see V0 = 0x80. I added debug code to rsp recompiler just to check this particular code. I figure I might actually be better off just HLEing this stuff. At the very least I kinda understand it better.

As for solving the Rogue Squadron issue, I'm probably just better off carefully examining the RSP recompiler, to look for mistakes in the source code, because stepping through thousands of instructions doesn't seem to give me any clues ;/ . Rofl, maybe it would have been easier to debug the issue with SP_SEMAPHORE_REG in RSP recompiler.. I'll devote some more time today to debugging.

@cxd4
Copy link
Contributor

cxd4 commented Feb 25, 2015

Might be repeating someone here but RSP/CPU execute in parallel.
So in a single-threaded environment you have to find some way to work out when the CPU executes and when the RSP executes to match the results of them as if they were running at the same time.

So when SP_SEMAPHORE_REG is read or the MF status read timeout hits, it quits the RSP early only to wait for the main CPU to catch up on its part, then resumes work on the RSP. Also, it's 128, not 0x80. :D

@cxd4
Copy link
Contributor

cxd4 commented Feb 25, 2015

So maybe there is a part to the recompiler instructions that makes a false assumption on when a global state changes, and that assumption breaks once it has analyzed something that is no longer true after resuming the RSP and waiting for the CPU to update the data and instruction caches. In other words yeah, you might have to analyze the recompiler source itself.

@LegendOfDragoon
Copy link
Contributor Author

I know they run in parallel on the real hardware. That's why I feel that it's probably best to tweak the mfc0 timeout, based on game specific knowledge. I'll keep debugging certain games.

Also, it's 128, not 0x80. :D

I developed a bad habit of using Hex too much xD.

I forgot to mention that changing the mfc0 timeout, also effects the interpreter's execution flow.

How would you know what number to use for the mfc0 timeout?

So far one difference I've noticed between RSP recompiler and interpreter is that when
if ( ( RSP_GPR[RSPOpC.rt].W & SP_SET_INTR ) != 0) is true, it doesn't exit the rsp after the end of the instruction in recompiler. But when i set a breakpoint there, it never seemed to even be triggered in Rogue Squadron. I know the breakpoint gets triggered in games like WDC.

@Frank-74
Copy link
Contributor

Would adding dual core support to the cpu/rsp help here? So CPU and RSP run on dedicated cores. Every PC sold since 2006/7, has been at least dual core.

@cxd4
Copy link
Contributor

cxd4 commented Feb 25, 2015

Would adding dual core support to the cpu/rsp help here? So CPU and RSP run on dedicated cores.

Some of us tend to view that as a last resort, since assuming more than 1 CPU core can be a bit of a gateway to new problems. But I think you present a good concept...both major components to the RCP are major powerhouses to the performance on hardware, so it would make sense to have RCP in one thread, main CPU in another. It's just challenging to implement without side effects, and even harder with the plugin system in place.

I know they run in parallel on the real hardware.

So changing mfc0 timeout affects the interval between how much work is done on the RSP in-between resuming to continue the primary, master CPU host loop. So naturally it affects interpreter execution flow because the data is updated by the CPU before resuming the RSP too early.

How would you know what number to use for the mfc0 timeout?

I was fine with just 10. Only reason I was inclined to change it to something unreachably high like 4096 was so I wouldn't have to make it a configurable option for different emulators that might or might not support it.

Project64 should probably be counting the number of MF status reads per an array of 32 possible GPRs, rather than counting the total number of MF status reads as it currently does. The number of times an MFC0 instruction was encountered to read from SP_STATUS_REG is only really significant to indicate a permanent loop if it's the same exact MFC0 instruction, so I think MF status reads should be counted in an array of all 32 possible MFC0 instructions reading from SP_STATUS rather than total. This way you could set a more accurate count.

I developed a bad habit of using Hex too much xD.

I don't mind saying 0x0080 like pj64 did, just 0x80 that bothers me. It's an ANDI instruction so using hex for that makes perfect sense as hexadecimal is the best numerical base we have in C for modeling bit-wise or binary logic.

@LegendOfDragoon
Copy link
Contributor Author

I guess I'll need to do more experimenting to figure out if the number is supposed to be low, or if having it low is just a side effect and V0 should never = 0x80.

So the whole point is to just exit out of permanent loops? If so, then 10 is too low of a number. Basically what I'd like to know now, is what should happen for instructions like
0x0F8 0x40022000 MFC0 V0, SP status because it's not inside a permanent loop. V0 will only = 0x0080 after the ANDI V0, V0, 0x0080, if the mfc0 timeout is low enough. I know that it's different since we're not running it in parallel. Just wondering if the count is supposed to be low enough for this to happen, or if the count should be higher so that stuff like that never happens.

I agree with your suggestion of using an array for the counters though.

Well, time to experiment more with Rogue Squadron.

@cxd4
Copy link
Contributor

cxd4 commented Feb 25, 2015

V0 will only = 0x0080 after the ANDI V0, V0, 0x0080, if the mfc0 timeout is low enough.

If mfc0 timeout is low enough, the RSP restarts, early exits and resumes more and more frequently. So a timeout of 10 instead of 4096 like I am doing in my RSP more closely guarantees that the host CPU has done as much of its part as possible before resuming to the RSP again.

If you do too much work in the RSP and not enough in the main CPU, the RSP will be in a permanent loop waiting on the main CPU.

If you do too much work in the main CPU and not enough in the RSP, the main CPU could be in a permanent loop waiting on the RSP.

How often should the RSP early exit and wait for the CPU to update state machine properties co-shared with the RSP? Should it be an MFC0 timeout of 10 for extremely often or 4096 for rarely if ever at all? Well it depends. I'm sure the perfect "timing" would be a constant variable falling in between both of those values. Since I don't care to guesstimate it, I make it 4096 to make early exiting the RSP almost never happen unless a game really is absolutely stuck in a permanent loop.

Bear in mind that this is all just a hack. You could be detecting infinite loops based on counting high frequencies of not just MFC0 SP STATUS, but also ANDI or AND by SP_STATUS_SIG0, SP_STATUS_SIG1, or SP_STATUS_SIG2, also other instructions that have way too high a count to make sense that it is still continuing normally in the RSP. The only reason zilmar and I chose MFC0 is because that's the only instruction that, should it be repeated way too often, proves that the RSP is asking what the N64 has stored in the status register so far. Any other instruction does not necessarily mean either the CPU or the RSP is waiting on the other.

@LegendOfDragoon
Copy link
Contributor Author

Lol, I guess I was over thinking things. Had fun experimenting though :) . I'll have to take it slow and slowly review code.

I'm intrigued at how hacking the time out can make WDC work for PJ64 1.4. I'm wondering why it works on older versions of PJ64, but not 1964 or Mupen. So far I know of 3 games that require this cpu<->rsp sync. WDC, Stunt Racer, and Gauntlet. Are there any other games that require it? For the Semaphore, I know that Mario No Photopie requires it for LLE, are there any other games that need it as well?

@cxd4
Copy link
Contributor

cxd4 commented Feb 27, 2015

Also have to have it mid-game for other games probably, like Animal Forest.

@LegendOfDragoon
Copy link
Contributor Author

Idk how I didn't realize this sooner, but Rogue Squadron is actually able to run with RSP recompiler if you do HLE audio. It's thanks to that RSP jump table setting. So for Rogue Squadron I think the problem is just the semaphore lock.

CPU<->RSP sync is still a problem in other factor 5 games like Battle For Naboo.

@LegendOfDragoon
Copy link
Contributor Author

Ok, i think I understand the problem better now. For Star Wars Rogue Squadron, the issue is simply that the number of crc maps exceed the max. I'm not sure what's a good number to change it to. As for Indiana Jones, the issue is that the current crc algorithm for the jump table isn't reliable enough, when using the RSP_MfStatusCount implementation. I'm afraid that tweaking the crc algorithm, could have side effects in other games, as well as possibly further slowing down the performance.

I'm still experimenting with the jump table crc code, but I'll try and see what I can come up with.

@project64 I think a good idea is to make a RDB option for the RSP_MfStatusCount limit. Would you like me to add the setting? Also I think it would be nice to be able to disable it by choice as well (perhaps if the user sets the count limit to 0). It seems that it's best to tweak this setting per game because a game like World Driver Championship may do well with 10, but for Rogue Squadron, it causes graphical glitches with Jabo's D3D8 (when both using LLE audio & gfx).

I personally like to disable it for all games actually. For WDC, Stunt Racer 64, and Gauntlet, I just wrote game specific code to handle it.

@project64
Copy link
Owner

@LegendOfDragoon I am happy for them to be rdb settings, like if extending the number of crc tables could be a rdb settings as well

@LegendOfDragoon
Copy link
Contributor Author

I think the first thing I should do is make that option for the mfc0 count. What should the default value be? I think most games probably don't need it, so I'd personally disable it for most games.

Should I also make an option for enabling the semaphore exit code too? I noticed that save states are an issue with these 2 implementations. Probably because the RSP registers are not being saved to the save states.

@purplemarshmallow
Copy link
Contributor

Is it possible to fix this?
I removed the exit code from the RSP recompiler here
https://github.com/project64/project64/blob/master/Source/RSP/Recompiler%20Ops.c#L1686
and here
https://github.com/project64/project64/blob/master/Source/RSP/Recompiler%20Ops.c#L1708
It fixed both Rogue Squadron and Battle For Naboo. It also seriously improved the performance of the RSP recompiler and fixed many other issues.

If there is no good alternative I think there should be per-game settings

@AmbientMalice
Copy link
Contributor

@purplemarshmallow If I'm not mistaken, it was originally added to make World Driver Championship not freeze. And it does so quite nicely. (Lovely game. Should be better known.)

But outside of WDC, it just seems to cause random and unpredictable problems.

edit: Edited to clarify.

@purplemarshmallow
Copy link
Contributor

Yes but this code should not be enabled globally. It causes many issues in other games and it costs performance

@LegendOfDragoon
Copy link
Contributor Author

If there is no good alternative I think there should be per-game settings

That was the plan I had. Per game settings for the mfc0 count, with 0 disabling the exiting. Problem is, some games may only need it for a certain part, like GB Tower in Pokemon Stadium 2 (could never get Stadium 1 GB Tower working).

There are still other problems with the recompiler.

@purplemarshmallow
Copy link
Contributor

I made a PR which introduces per-game settings. I think these hacks cause so many problems they can't be on by default. Maybe there will be a better and more accurate solution in the future

@AmbientMalice
Copy link
Contributor

Great work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants