Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NES: Rework CNROM, UNROM and UNROM-512 to properly handle NMIs. #198

Merged
merged 4 commits into from
Sep 14, 2023

Conversation

asiekierka
Copy link
Contributor

@asiekierka asiekierka commented Sep 13, 2023

Fixes #195 for these three mappers. I think this leaves only MMC3 as the remaining mapper without NMI handling (MMC1 and Action53 already had their own; NROM does not require it).

I've implemented my own approach to handling NMIs:

  • _BANK_SHADOW is always written to before the mapper's MMIO register itself;
  • _BANK_SHADOW is written to MMIO at an early stage of NMI, allowing developer-side consistency for NMI code written in C once the AXY calling convention is added.

For CNROM and UNROM-512, I've extended this with additional functionality:

  • The graphical part of the register (CHR bank, mirroring) is now deliberately separated between "shadow" (written to the MMIO register on NMI or on user request) and the MMIO register.
  • Writing to the "shadow" without writing to the MMIO register is called a "swap" - this will only apply the result on the next frame. This allows preparing a CHR bank change without affecting the currently drawn frame.
  • Writing to the MMIO register without writing to the "shadow" is called a "split" - this will only apply the result until the next frame. This allows doing consistent mid-frame effects.

As this may not be immediately intuitive to someone tinkering with the code in the future, I have added a test for these two mappers which automatically validates it.

This required removing get_chr_bank on CNROM - well, it didn't strictly require it, but then it would require two RAM bytes instead of one; I expect the functionality to be sufficiently unnecessary as to not be worth the added cost. Likewise, the public API of UNROM-512 has changed; as neither API has yet shipped in a production build of the SDK, this is currently not a breaking change.

@mysterymath
Copy link
Member

mysterymath commented Sep 13, 2023

EDIT: I withdraw most of this; I noticed this was based off of jroweboy's work w/ Action53. I need to read that more carefully and fold this into my opinion.

@asiekierka
Copy link
Contributor Author

asiekierka commented Sep 13, 2023

Perhaps it'd help to get more eyeballs on the PR? There are a few NES LLVM-MOS-SDK users by now; they might have their own opinions to contribute.

@mysterymath
Copy link
Member

Yeah, paging @jroweboy , if you get the chance.

@@ -8,10 +8,37 @@

__attribute__((section(".zp.bss"))) char _BANK_SHADOW;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should always be a real shadow of the register; CNROM shouldn't be the only mapper without this. The other mappers have another byte to deal with this, and one should really be spent here too.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be volatile, similarly for the other shadows. They're written in regular contexts and read in interrupt contexts; that's classic volatile.

Copy link
Contributor Author

@asiekierka asiekierka Sep 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should always be a real shadow of the register

UNROM also uses only one byte; it just so happens that it doesn't have any CHR/video-related functionality. UNROM-512 uses two bytes, but that's because it has both CHR/video-related and PRG-related mapper configuration functionality, so it'd require two bytes regardless.

Is there a developer-side reason to allow reading back the CNROM shadow for CHR? It's not the kind of thing where you'd typically have a trampoline/"push-pop" pattern for development, like with doing things with PRG data... In addition, having a separate real shadow requires more cycles spent on upkeeping them (as now NMI can get fired between writing to shadow A and shadow B, as well as shadow B and the actual MMIO register).

Overall, I'm not quite convinced.

This should be volatile, similarly for the other shadows. They're written in regular contexts and read in interrupt contexts; that's classic volatile.

Volatile will force the bank shadow to be present even if the code never reads from it; it is possible to construct a hypothetical NES game's code in such a way that the bank shadow would be unnecessary (NMI using fixed bank only + careful PRG bank arrangement), thus making MMIO register writes atomic and speeding up the whole ordeal a bit. The reason I opted for memory barriers instead is to allow the compiler to optimize _BANK_SHADOW out entirely, but that is currently not possible as the NMI handler cannot be expressed with C code alone; even then, I don't think the compiler would allow modeling what I'm trying to achieve here, so might be best to just move back to volatile.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another EDIT: My original reply was substantially quite incorrect; I had misremembered why I added these.

This should always be a real shadow of the register
Is there a developer-side reason to allow reading back the CNROM shadow for CHR? It's not the kind of thing where you'd typically have a trampoline/"push-pop" pattern for development, like with doing things with PRG data... In addition, having a separate real shadow requires more cycles spent on upkeeping them (as now NMI can get fired between writing to shadow A and shadow B, as well as shadow B and the actual MMIO register).

I had originally thought that I had copied these over from nesdoug, but I went back to the original sources, and it looks like they were inventions of mine. The reason for their inclusion was then certainly because they seemed simple to include and for orthogonality with PRG-related shadows, but the closer examination done here reveals that they're not actually very simple to maintain.

There is still an orthogonality argument; if we pull out CHR-ROM getters, we should do it across mappers and take the ABI hit. Increasingly, I think that's what we should do; there's nearly always a good reason to maintain careful PRG state shadows (banked_call, but the case is harder to make for CHR, and it's something that a user can take care of themselves to their desired degree of fidelity. Library code is usually worse than user code, since we have to defensively deal with situations that the user can know never happen, so I'd be in favor of ripping all this out.

Volatile will force the bank shadow to be present even if the code never reads from it; it is possible to construct a hypothetical NES game's code in such a way that the bank shadow would be unnecessary (NMI using fixed bank only + careful PRG bank arrangement), thus making MMIO register writes atomic and speeding up the whole ordeal a bit. The reason I opted for memory barriers instead is to allow the compiler to optimize _BANK_SHADOW out entirely, but that is currently not possible as the NMI handler cannot be expressed with C code alone; even then, I don't think the compiler would allow modeling what I'm trying to achieve here, so might be best to just move back to volatile.

Lost the original comment on this one, but the gist was this: the memory barriers are quite a bit harder to read than volatile would be, and it doesn't appear we get any gain from them today. When we can know for sure that there's a design that can use them to provide tangible benefit, that's a good time to put that in, but the best way to know something like that is to have an implementation of it in-hand. Taking the opposite policy tends to cause strange code to accumulate in projects like this, and it's difficult to root it out once it's forgotten whether or not its quirks are load-bearing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think pulling out the CHR getters is fairly separate from this review, so we can table that discussion for this one and introduce a disparity between the mappers. The set/swap/split API seems worth including regardless. I'd also like to see it backported to MMC1/MMC3; we need to figure out the right way forward there, trading off nesdoug tutorial compatibility and internal SDK consistency.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think pulling out the CHR getters for now is a good idea; it's less breaking to add an API some time later than to remove an API some time later.

@@ -6,12 +6,32 @@
#include <rompoke.h>
#include "bank.h"

__attribute__((section(".zp.bss"))) char _BANK_SHADOW;
Copy link
Member

@mysterymath mysterymath Sep 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be called something like _BANK_NEXT, since it entirely controls the bank as of next NMI, and it has no relation to the current bank state.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@mysterymath mysterymath merged commit fcc6088 into llvm-mos:main Sep 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Audit NES shadow registers for non-reentrancy
2 participants