Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prestacked annotation proposal #53

Open
wants to merge 20 commits into
base: main
Choose a base branch
from

Conversation

jnk0le
Copy link

@jnk0le jnk0le commented Oct 29, 2023

@cmuellner
Copy link
Collaborator

I found the a similar command line flag in GCC for ARC: -mirq-ctrl-saved=.

As I don't see why we would have different ISRs with different prestacked register lists, I would prefer a command line switch over a function attribute.

Signed-off-by: jnk0le <jnk0le@hotmail.com>
@jnk0le
Copy link
Author

jnk0le commented Nov 8, 2023

Command line flag does prevent the auxiliary uses as well as e.g. extra shadow registers registers at highest nesting level.
And the custom irq controllers reusing standard return mechanism (return pattern or lsb in ra as in TEIC) will require yet another attribute to work with this flag.

@cmuellner
Copy link
Collaborator

extra shadow registers registers at highest nesting level

Do you have a link to an MCU datasheet that describes such an ISR behavior (different register banking behavior on ISR entry based on nesting level)?

To avoid copy-paste mistakes and to improve portability you would probably not use the prestacked attribute directly, but hide it in an macro. E.g.:

#if defined IRQ_FOO
#define MY_ISR_ATTRIBUTES `__attribute__((interrupt, prestacked("x1,x5-x7,x10-x15")))`
#elif defined IRQ_BAR
#define MY_ISR_ATTRIBUTES `__attribute__((interrupt, prestacked("x7,x10-x15")))`
#endif
...
void USBHS_IRQHandler (void) MY_ISR_ATTRIBUTES;
void USBHS_IRQHandler (void)
{
  ...
}

...and then we already have a global configuration that can be controlled via command line flags (here -DIRQ_FOO or -DIRQ_BAR). That would be equivalent to use __attribute((interrupt)) and -mirq-hw-stacked=x1,x5-x7,x10-x15.

Also, I'm not convinced that full flexibility for all possible HW stacking schemes is really needed, because I believe that features should be derived from available HW.
So I would prefer a list of HW stacking schemes. E.g. -mirq-hw-stacked=[none,all,caller-saved,integer-caller-saved].

@jnk0le
Copy link
Author

jnk0le commented Nov 9, 2023

Do you have a link to an MCU datasheet that describes such an ISR behavior (different register banking behavior on ISR entry based on nesting level)?

ch32v307 has 8 nesting levels with shadow registers only on the 3 lowest levels (should be on highest IMO)
https://github.com/openwch/ch32v307/blob/main/Datasheet/README.md

Also c2000 arch has shadow regfile of floating point registers designated for HPIs, but being a typical CISC, it switches it by separate instructions, emmited by compiler interrupt (HPI) attribute. Even though those could be switched automatically.

Also many other archs (like ARM32 FIQ/IRQ and clones) that are stabilised by compiler attributes and central governance.

To avoid copy-paste mistakes and to improve portability you would probably not use the prestacked attribute directly, but hide it in an macro. E.g.:

I think that the vendors could provide those definitions in device specific headers e.g.

#define MYVENDOR_IRQ_FAST __attribute__((interrupt, prestacked("x1,x5-x7,x10-x15")))
#define MYVENDOR_IRQ_REGULAR __attribute__((interrupt, prestacked("x10-x15")))

E.g. -mirq-hw-stacked=[none,all,caller-saved,integer-caller-saved]

Those will bind any given core to the ABI it was designed for. (i.e. there could appear an IPRA optimized ABI, or just getting rid of tp - not possible to use because HW stacker was designed for ilp32[e])

Also, note that the shadow registers (and decoding them) are quite expensive and typical DSP code doesn't need that many registers (3p3z irq is like ~6-7 regs total (gcc compiled)). So vendors are likely to seek for balance.

@kito-cheng
Copy link
Collaborator

+1 for unified attribute for modeling that, I am tired to maintain __attribute__((interrupt("SiFive-CLIC-preemptible"))) on downsream :P also not interested to maintain many different attribute for those interrupt stuffs, this proposal seem a solution for that :)

Also I would prefer using attribute over command line since some low level run time stuffs may support different HW platform, typically that will support different platform in different files*, and it would be easier to control that within source file rather than manage those stuffs on the build system/build script level.

*: One of example is SiFive's FESDK: https://github.com/sifive/freedom-e-sdk, I also vaguely remember many different project using similar scheme.

@cmuellner
Copy link
Collaborator

Do you have a link to an MCU datasheet that describes such an ISR behavior (different register banking behavior on ISR entry based on nesting level)?

ch32v307 has 8 nesting levels with shadow registers only on the 3 lowest levels (should be on highest IMO) https://github.com/openwch/ch32v307/blob/main/Datasheet/README.md

So they have three sets of shadow registers for caller-saved registers.
The fourth level of interrupt nesting will not save the registers anymore.
However, this has to be handled in SW anyway, because it is the same ISR that is called.
And the behavior is identical for all ISRs, so the attribute does not help to solve this aspect.

Also c2000 arch has shadow regfile of floating point registers designated for HPIs, but being a typical CISC, it switches it by separate instructions, emmited by compiler interrupt (HPI) attribute. Even though those could be switched automatically.

Also many other archs (like ARM32 FIQ/IRQ and clones) that are stabilised by compiler attributes and central governance.

Afaik ARM7 has shadow registers, but only one set of them (per exception type).
So there is not change in the behavior in case of nesting (registers will simply be overwritten).

To avoid copy-paste mistakes and to improve portability you would probably not use the prestacked attribute directly, but hide it in an macro. E.g.:

I think that the vendors could provide those definitions in device specific headers e.g.

#define MYVENDOR_IRQ_FAST __attribute__((interrupt, prestacked("x1,x5-x7,x10-x15")))
#define MYVENDOR_IRQ_REGULAR __attribute__((interrupt, prestacked("x10-x15")))

E.g. -mirq-hw-stacked=[none,all,caller-saved,integer-caller-saved]

Those will bind any given core to the ABI it was designed for. (i.e. there could appear an IPRA optimized ABI, or just getting rid of tp - not possible to use because HW stacker was designed for ilp32[e])

"there could appear" -> does not exist
Hard to discuss a feature, which does not exist in real HW.
But I agree that the list would have to be extended if such hardware comes to the market.
And using registers instead of named register sets addresses this.

@cmuellner
Copy link
Collaborator

+1 for unified attribute for modeling that, I am tired to maintain __attribute__((interrupt("SiFive-CLIC-preemptible"))) on downsream :P also not interested to maintain many different attribute for those interrupt stuffs, this proposal seem a solution for that :)

Also I would prefer using attribute over command line since some low level run time stuffs may support different HW platform, typically that will support different platform in different files*, and it would be easier to control that within source file rather than manage those stuffs on the build system/build script level.

Is there a way to avoid this?
Even the mentioned freedom-e-sdk needs this configuration (https://github.com/sifive/freedom-e-sdk/blob/master/bsp/sifive-hifive-unmatched/settings.mk).

In there we have:

  • ARCH=rv64imac
  • ABI=lp64
  • CMODEL=medany

I would claim that the ISR HW-ABI would fit there as well.

The reason, why I don't like function attributes is, that they have be set for each ISR, although there is no evidence that we need two ISRs with different attributes. And this could lead again to subtle issues (resulting in forum questions with responses like "check if all your ISRs have the correct function attributes set").

@jnk0le
Copy link
Author

jnk0le commented Nov 10, 2023

Main issue with CLIC is that in addition to __attribute__((interrupt("SiFive-CLIC-preemptible"))), the highest nesting level of CLIC can be just a simple __attribute__((interrupt)). And there should be also yet another attribute (actually two: preemtible and highest nesting) for horizontal nesting (additional use of xscratchcsw).

Nothing can be done about it.

Do you have a link to an MCU datasheet that describes such an ISR behavior (different register banking behavior on ISR entry based on nesting level)?

ch32v307 has 8 nesting levels with shadow registers only on the 3 lowest levels (should be on highest IMO) https://github.com/openwch/ch32v307/blob/main/Datasheet/README.md

So they have three sets of shadow registers for caller-saved registers. The fourth level of interrupt nesting will not save the registers anymore. However, this has to be handled in SW anyway, because it is the same ISR that is called. And the behavior is identical for all ISRs, so the attribute does not help to solve this aspect.

I don't quite understand, WCH PFIC similarly to NVIC (which appears more frequently in their headers than "PFIC" BTW) allows to configure any numbers of (vectored) handlers to any available nesting level.

So if SPI and I2C handlers are configured to 3rd nesting level, TIM3 and TIM4 are configured at 4th nesting level, the attributes look like this:

__attribute__((interrupt, prestacked("x1, x5-x7,x10-x17, x28-x31"))) void SPI1_IRQHandler();
__attribute__((interrupt, prestacked("x1, x5-x7,x10-x17, x28-x31"))) void I2C1_EV_IRQHandler();
__attribute__((interrupt)) void TIM3_IRQHandler();
__attribute__((interrupt)) void TIM4_IRQHandler();

Afaik ARM7 has shadow registers, but only one set of them (per exception type).
So there is not change in the behavior in case of nesting (registers will simply be overwritten).

PIC32 should have more sets (cover all GPRs, amount is implementation specific)
https://microchipdeveloper.com/32bit:mz-arch-exceptions-usage#AssignShadowRegisterSets

"there could appear" -> does not exist
But I agree that the list would have to be extended if such hardware comes to the market.
And using registers instead of named register sets addresses this.

I think that the vendors don't want to find themselves in WCH situation with a new stacker or shadow regs.

Also adding to compiler alias lists means:

  • no support by "older" compiler builds
  • requires blessing from compiler maintainers to get upstreamed, otherwise we are back to "WCH-Interrupt-Fast" situation.

@jnk0le
Copy link
Author

jnk0le commented Nov 10, 2023

The preserving (like in ARM FIQ) shadow registers can be also shared by software
e.g. assembly handler reserves some of the shadow regs, and the remaining can be used normally by C handlers (even by different nesting levels if shadow regs are common across them).

@cmuellner
Copy link
Collaborator

Main issue with CLIC is that in addition to __attribute__((interrupt("SiFive-CLIC-preemptible"))), the highest nesting level of CLIC can be just a simple __attribute__((interrupt)). And there should be also yet another attribute (actually two: preemtible and highest nesting) for horizontal nesting (additional use of xscratchcsw).

Nothing can be done about it.

Do you have a link to an MCU datasheet that describes such an ISR behavior (different register banking behavior on ISR entry based on nesting level)?

ch32v307 has 8 nesting levels with shadow registers only on the 3 lowest levels (should be on highest IMO) https://github.com/openwch/ch32v307/blob/main/Datasheet/README.md

So they have three sets of shadow registers for caller-saved registers. The fourth level of interrupt nesting will not save the registers anymore. However, this has to be handled in SW anyway, because it is the same ISR that is called. And the behavior is identical for all ISRs, so the attribute does not help to solve this aspect.

I don't quite understand, WCH PFIC similarly to NVIC (which appears more frequently in their headers than "PFIC" BTW) allows to configure any numbers of (vectored) handlers to any available nesting level.

So if SPI and I2C handlers are configured to 3rd nesting level, TIM3 and TIM4 are configured at 4th nesting level, the attributes look like this:

__attribute__((interrupt, prestacked("x1, x5-x7,x10-x17, x28-x31"))) void SPI1_IRQHandler();
__attribute__((interrupt, prestacked("x1, x5-x7,x10-x17, x28-x31"))) void I2C1_EV_IRQHandler();
__attribute__((interrupt)) void TIM3_IRQHandler();
__attribute__((interrupt)) void TIM4_IRQHandler();

I have a different understanding of that mechanism.
We have:

  • priorities (IRQ with higher priority can interrupts with lower priority): statically known
  • nesting depth (how many interrupts are interrupted): dynamic aspect

The HPE mechanism supports to interrupt up to three interrupts until it runs out of shadow registers.
The priority matters in this context just to define which interrupts can be interrupted.

So in your example TIM3 and TIM4 can use the same prestacked/HPE attribute.
This works fine unless it happens that they are executed as a nested interrupt of depth 4 or higher
(e.g. USB -> SPI -> I2C -> TIM3). But this is a dynamic aspect, that you cannot predict in general.

So in your example you would use preventive SW-stacking for IRQs with high priorities.
This has the implication, that you will observe the worst IRQ latencies for high priority interrupts
(in most cases HPE will save the registers and the ISR itself will save them again), which contradicts
the purpose of HPE.

Better solutions would be:

  • Define a priority scheme such, that you don't use more than 3 different IRQ priorities. In this case HPE will not run out of registers.
  • Enable INTSYSCR[HWSTKOVEN], which disables IRQs if HPE is out of registers.

Afaik ARM7 has shadow registers, but only one set of them (per exception type).
So there is not change in the behavior in case of nesting (registers will simply be overwritten).

PIC32 should have more sets (cover all GPRs, amount is implementation specific) https://microchipdeveloper.com/32bit:mz-arch-exceptions-usage#AssignShadowRegisterSets

PIC32 defines IPLx[SRSy|SOFT|AUTO] per interrupt (needs to be configured in the IRQ controller and the compiler needs to know what to do in the ISR proloue/epilogue).
So this is a valid precedent where each ISR can be configured to have a different ABI.

After seeing this, I'm fine with the function attribute approach.
Thanks for providing the reference to that doc!

"there could appear" -> does not exist
But I agree that the list would have to be extended if such hardware comes to the market.
And using registers instead of named register sets addresses this.

I think that the vendors don't want to find themselves in WCH situation with a new stacker or shadow regs.

Also adding to compiler alias lists means:

  • no support by "older" compiler builds
  • requires blessing from compiler maintainers to get upstreamed, otherwise we are back to "WCH-Interrupt-Fast" situation.

I don't know the backstory of the WCH situation.
But yes, as said before, using old toolchains with new devices is a clear benefit.

That said, my concerns are properly addressed and I'm fine with this PR.
Thanks for providing the relevant context!

this attribute has been implemented as `interrupt("SiFive-CLIC-preemptible")`

Should be renamed into "CLIC-preemptible" after freeze complets
@jnk0le
Copy link
Author

jnk0le commented May 29, 2024

Also, the CLIC has already been extended with hardware stacking.
[1] uses background lazy stacking, configurable to psABI and "EABI". Requires new attribute due to new instructions and changed way of handling prologues.
[2] is paywalled but according to summary in [1] it just adds a classic HW stacker to CLIC. (no new attribute with prestacked annotation)

[1] https://arxiv.org/pdf/2311.08320
[2] DOI 10.1109/ICICM54364.2021.9660345

@jnk0le
Copy link
Author

jnk0le commented May 31, 2024

Corner case for "IPRA" alt usage: annotation doesn't cover stack passed arguments, which are caller saved (documented on arm and defacto on risc-v)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants