New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RSP Recompiler has issues with Rogue Squadron and Battle For Naboo #139
Comments
On n64 Rsp/cpu run at the same time and use different ops to sync between them. In pj64 the rsp useally just runs till the end of the task before the cpu continues. I have always wanted it to get it where the RSP ran just a block of code, then cpu ran a block then back to rsp etc ... one of ziggy's tries ran the RSP in a thread so things would work at the same time, but you could easily get non constant results. |
It turned out actually, that angrylion mod of ziggy RSP fixed the CPU-RSP sync issue by more-or-less "HLE"-ing it. The looped SP_PC_REG values where the infinite loop happens in some of those games are tested within each fetch-decode-execute iteration of the RSP interpreter loop, and it then makes sure some qualifying conditions such as the game code and instructions match. If so, it quits the RSP by force. So according to angrylion it's more of a bypass hack than a timing fix to run both at the same time. |
this is a regression, works fine with 1.7.0.3 rsp recompiler |
Back then, it didn't emulate SP_SEMAPHORE_REG correctly, where it did an early exit and resumed SP. So something about how is handled in the recompiler causes some conflict I guess. |
I think I understand how the sync works from the rsp's side, enough. The part I don't understand too well is what the cpu does after a yield, and how PJ64 handles this. Spent a few hours testing various things. Now I'm wondering whether all games even use this functionality. Even though Rogue Squadron works in RSP interpreter, it's very slow! I'm thinking this stuff might be game specific. I tried debugging WDC to get a better idea of how it works. Tweaking the mf status read timeout has alters the execution flow! Thanks to Cheat Engine, I was able to freely change the value during runtime :) .
In this piece of code (WDC), the lower the mf status read timeout, the more often V0 = 0x80. If I set the number to like 0x80 or greater, I never see V0 = 0x80. I added debug code to rsp recompiler just to check this particular code. I figure I might actually be better off just HLEing this stuff. At the very least I kinda understand it better. As for solving the Rogue Squadron issue, I'm probably just better off carefully examining the RSP recompiler, to look for mistakes in the source code, because stepping through thousands of instructions doesn't seem to give me any clues ;/ . Rofl, maybe it would have been easier to debug the issue with SP_SEMAPHORE_REG in RSP recompiler.. I'll devote some more time today to debugging. |
Might be repeating someone here but RSP/CPU execute in parallel. So when SP_SEMAPHORE_REG is read or the MF status read timeout hits, it quits the RSP early only to wait for the main CPU to catch up on its part, then resumes work on the RSP. Also, it's 128, not 0x80. :D |
So maybe there is a part to the recompiler instructions that makes a false assumption on when a global state changes, and that assumption breaks once it has analyzed something that is no longer true after resuming the RSP and waiting for the CPU to update the data and instruction caches. In other words yeah, you might have to analyze the recompiler source itself. |
I know they run in parallel on the real hardware. That's why I feel that it's probably best to tweak the mfc0 timeout, based on game specific knowledge. I'll keep debugging certain games.
I developed a bad habit of using Hex too much xD. I forgot to mention that changing the mfc0 timeout, also effects the interpreter's execution flow. How would you know what number to use for the mfc0 timeout? So far one difference I've noticed between RSP recompiler and interpreter is that when |
Would adding dual core support to the cpu/rsp help here? So CPU and RSP run on dedicated cores. Every PC sold since 2006/7, has been at least dual core. |
Some of us tend to view that as a last resort, since assuming more than 1 CPU core can be a bit of a gateway to new problems. But I think you present a good concept...both major components to the RCP are major powerhouses to the performance on hardware, so it would make sense to have RCP in one thread, main CPU in another. It's just challenging to implement without side effects, and even harder with the plugin system in place.
So changing mfc0 timeout affects the interval between how much work is done on the RSP in-between resuming to continue the primary, master CPU host loop. So naturally it affects interpreter execution flow because the data is updated by the CPU before resuming the RSP too early.
I was fine with just 10. Only reason I was inclined to change it to something unreachably high like 4096 was so I wouldn't have to make it a configurable option for different emulators that might or might not support it. Project64 should probably be counting the number of MF status reads per an array of 32 possible GPRs, rather than counting the total number of MF status reads as it currently does. The number of times an MFC0 instruction was encountered to read from SP_STATUS_REG is only really significant to indicate a permanent loop if it's the same exact MFC0 instruction, so I think MF status reads should be counted in an array of all 32 possible MFC0 instructions reading from SP_STATUS rather than total. This way you could set a more accurate count.
I don't mind saying 0x0080 like pj64 did, just 0x80 that bothers me. It's an ANDI instruction so using hex for that makes perfect sense as hexadecimal is the best numerical base we have in C for modeling bit-wise or binary logic. |
I guess I'll need to do more experimenting to figure out if the number is supposed to be low, or if having it low is just a side effect and V0 should never = 0x80. So the whole point is to just exit out of permanent loops? If so, then 10 is too low of a number. Basically what I'd like to know now, is what should happen for instructions like I agree with your suggestion of using an array for the counters though. Well, time to experiment more with Rogue Squadron. |
If mfc0 timeout is low enough, the RSP restarts, early exits and resumes more and more frequently. So a timeout of 10 instead of 4096 like I am doing in my RSP more closely guarantees that the host CPU has done as much of its part as possible before resuming to the RSP again. If you do too much work in the RSP and not enough in the main CPU, the RSP will be in a permanent loop waiting on the main CPU. If you do too much work in the main CPU and not enough in the RSP, the main CPU could be in a permanent loop waiting on the RSP. How often should the RSP early exit and wait for the CPU to update state machine properties co-shared with the RSP? Should it be an MFC0 timeout of 10 for extremely often or 4096 for rarely if ever at all? Well it depends. I'm sure the perfect "timing" would be a constant variable falling in between both of those values. Since I don't care to guesstimate it, I make it 4096 to make early exiting the RSP almost never happen unless a game really is absolutely stuck in a permanent loop. Bear in mind that this is all just a hack. You could be detecting infinite loops based on counting high frequencies of not just MFC0 SP STATUS, but also ANDI or AND by SP_STATUS_SIG0, SP_STATUS_SIG1, or SP_STATUS_SIG2, also other instructions that have way too high a count to make sense that it is still continuing normally in the RSP. The only reason zilmar and I chose MFC0 is because that's the only instruction that, should it be repeated way too often, proves that the RSP is asking what the N64 has stored in the status register so far. Any other instruction does not necessarily mean either the CPU or the RSP is waiting on the other. |
Lol, I guess I was over thinking things. Had fun experimenting though :) . I'll have to take it slow and slowly review code. I'm intrigued at how hacking the time out can make WDC work for PJ64 1.4. I'm wondering why it works on older versions of PJ64, but not 1964 or Mupen. So far I know of 3 games that require this cpu<->rsp sync. WDC, Stunt Racer, and Gauntlet. Are there any other games that require it? For the Semaphore, I know that Mario No Photopie requires it for LLE, are there any other games that need it as well? |
Also have to have it mid-game for other games probably, like Animal Forest. |
Idk how I didn't realize this sooner, but Rogue Squadron is actually able to run with RSP recompiler if you do HLE audio. It's thanks to that RSP jump table setting. So for Rogue Squadron I think the problem is just the semaphore lock. CPU<->RSP sync is still a problem in other factor 5 games like Battle For Naboo. |
Ok, i think I understand the problem better now. For Star Wars Rogue Squadron, the issue is simply that the number of crc maps exceed the max. I'm not sure what's a good number to change it to. As for Indiana Jones, the issue is that the current crc algorithm for the jump table isn't reliable enough, when using the RSP_MfStatusCount implementation. I'm afraid that tweaking the crc algorithm, could have side effects in other games, as well as possibly further slowing down the performance. I'm still experimenting with the jump table crc code, but I'll try and see what I can come up with. @project64 I think a good idea is to make a RDB option for the RSP_MfStatusCount limit. Would you like me to add the setting? Also I think it would be nice to be able to disable it by choice as well (perhaps if the user sets the count limit to 0). It seems that it's best to tweak this setting per game because a game like World Driver Championship may do well with 10, but for Rogue Squadron, it causes graphical glitches with Jabo's D3D8 (when both using LLE audio & gfx). I personally like to disable it for all games actually. For WDC, Stunt Racer 64, and Gauntlet, I just wrote game specific code to handle it. |
@LegendOfDragoon I am happy for them to be rdb settings, like if extending the number of crc tables could be a rdb settings as well |
I think the first thing I should do is make that option for the mfc0 count. What should the default value be? I think most games probably don't need it, so I'd personally disable it for most games. Should I also make an option for enabling the semaphore exit code too? I noticed that save states are an issue with these 2 implementations. Probably because the RSP registers are not being saved to the save states. |
Is it possible to fix this? If there is no good alternative I think there should be per-game settings |
@purplemarshmallow If I'm not mistaken, it was originally added to make World Driver Championship not freeze. And it does so quite nicely. (Lovely game. Should be better known.) But outside of WDC, it just seems to cause random and unpredictable problems. edit: Edited to clarify. |
Yes but this code should not be enabled globally. It causes many issues in other games and it costs performance |
That was the plan I had. Per game settings for the mfc0 count, with 0 disabling the exiting. Problem is, some games may only need it for a certain part, like GB Tower in Pokemon Stadium 2 (could never get Stadium 1 GB Tower working). There are still other problems with the recompiler. |
I made a PR which introduces per-game settings. I think these hacks cause so many problems they can't be on by default. Maybe there will be a better and more accurate solution in the future |
Great work. |
Basically those 2 games won't run with LLE gfx enabled. It has something to do with CPU<->RSP sync. Raising the count to a high number like 0x4000, allows the game to run. I will spend some time investigating it today. I highly doubt it's an opcode issue, unless it's a branching problem.
It would help a lot, if I knew exactly what PJ64 does for CPU<->RSP sync.
The text was updated successfully, but these errors were encountered: