stub: implement `flush_instruction_cache` on i686 and AArch64 #151

alois31 · 2023-04-16T14:09:14Z

People reportedly want to compile the stub on i686 and AArch64 platforms for testing. Make compilation possible by providing proper flush_instruction_cache implementations on these platforms. For x86 (just as x86_64), this is a no-op, because Intel made the instruction cache coherent for compatibility with code that was written before caches existed.
For AArch64, adapt the procedure from their manual to multiple instructions.

alois31 · 2023-04-16T14:10:27Z

Not tested since I don't have devices with UEFI on these architectures.

blitz

Thanks for the contribution! 👍

I'll read up on the ARM specifics to make up my mind about this. Give me a couple of days.

Btw, i wouldn't add 32-bit support, because this will pretty much never be used. Or is there a usecase?

RaitoBezarius · 2023-04-17T10:01:12Z

Thanks for the contribution! +1

I'll read up on the ARM specifics to make up my mind about this. Give me a couple of days.

Btw, i wouldn't add 32-bit support, because this will pretty much never be used. Or is there a usecase?

We shared some details about ARM in the dev channel (notably the reference of ARMARM explaining cache maintenance for those usecases). Otherwise, ARMARM B.2.4.(ii) is the good reference IIRC.

blitz · 2023-04-17T13:18:52Z

Thanks for the contribution! +1

I'll read up on the ARM specifics to make up my mind about this. Give me a couple of days.

Btw, i wouldn't add 32-bit support, because this will pretty much never be used. Or is there a usecase?

We shared some details about ARM in the dev channel (notably the reference of ARMARM explaining cache maintenance for those usecases). Otherwise, ARMARM B.2.4.(ii) is the good reference IIRC.

UEFI had as a InvalidateInstructionCache method. That would remove the platform specific code entirely.

Otherwise, the flushing code could be simplified by only looping once and doing the flushes every 16-bytes (minimal cache line size on ARMv8?).

RaitoBezarius · 2023-04-17T14:31:11Z

For the ranges, it seems to do something yes: https://github.com/tianocore/edk2/blob/master/ArmPkg/Library/ArmCacheMaintenanceLib/ArmCacheMaintenanceLib.c#L41-L82 (but the global one, not).

alois31 · 2023-04-17T16:37:13Z

The documentation of the uefi crate states that "OVMF only implements this protocol interface for the virtual EBC processor". This lead me to the assumption that the debug protocol is not reliably available. If this assumption is mistaken, we could also just switch to InvalidateInstructionCache.

blitz · 2023-04-18T14:23:36Z

rust/stub/src/pe_loader.rs

+    for address in (start_address..end_address).step_by(4) {
+        unsafe { asm!("dc cvau, {address}", address = in(reg) address) };
+    }
+    unsafe { asm!("dsb ish") };
+    // Force reloading the written instructions.
+    for address in (start_address..end_address).step_by(4) {
+        unsafe { asm!("ic ivau, {address}", address = in(reg) address) };
+    }
+    unsafe { asm!("dsb ish", "isb") };


What about this?

Suggested change

for address in (start_address..end_address).step_by(4) {

unsafe { asm!("dc cvau, {address}", address = in(reg) address) };

}

unsafe { asm!("dsb ish") };

// Force reloading the written instructions.

for address in (start_address..end_address).step_by(4) {

unsafe { asm!("ic ivau, {address}", address = in(reg) address) };

}

unsafe { asm!("dsb ish", "isb") };

// ARM mandates at least 16-byte cache line size.

for address in (start_address..end_address).step_by(16) {

unsafe { asm!("dc cvau, {address}; ic ivau, {address}", address = in(reg) address) };

}

unsafe { asm!("dsb ish", "isb") };

I think that doesn't work. By missing the dsb ish between dc cvau and ic ivau, the CPU may reorder their effects and prefetch stale code in between.

rust/stub/src/pe_loader.rs

blitz · 2023-04-18T14:31:27Z

The documentation of the uefi crate states that "OVMF only implements this protocol interface for the virtual EBC processor". This lead me to the assumption that the debug protocol is not reliably available. If this assumption is mistaken, we could also just switch to InvalidateInstructionCache.

Yes, I would say the debug protocol is out then and we have to do it ourselves.

nikstur · 2023-04-18T21:34:21Z

Btw, i wouldn't add 32-bit support, because this will pretty much never be used.

I would also encourage not adding 32-bit support. Many other projects are just dropping it ;)

People reportedly want to compile the stub on i686 and AArch64 platforms for testing. Make compilation possible by providing proper `make_instruction_cache_coherent` implementations on these platforms. For x86 (just as x86_64), this is a no-op, because Intel made the instruction cache coherent for compatibility with code that was written before caches existed. For AArch64, adapt the procedure from their manual to multiple instructions.

alois31 · 2023-04-21T16:06:07Z

I would also encourage not adding 32-bit support. Many other projects are just dropping it ;)

This PR was actually prompted by someone complaining on Matrix that the stub does not compile on i686, so I guess thre is interest. 32-bit x86, while indeed not really being a relevant platform any more, also has the unique advantage of having a word size other than 64 bits while being readily available, so it can be useful for testing portability.

blitz · 2023-04-21T16:23:19Z

32-bit x86, while indeed not really being a relevant platform any more, also has the unique advantage of having a word size other than 64 bits while being readily available, so it can be useful for testing portability.

32-bit UEFI platforms are incredibly rare these days. I can only remember some ancient Atom tablets. But okay, as long as there is no measurable maintenance cost, we can add it.

alois31 · 2023-04-21T16:26:16Z

32-bit x86, while indeed not really being a relevant platform any more, also has the unique advantage of having a word size other than 64 bits while being readily available, so it can be useful for testing portability.

32-bit UEFI platforms are incredibly rare these days. I can only remember some ancient Atom tablets. But okay, as long as there is no measurable maintenance cost, we can add it.

I mean, the part implemented by this PR is literally one empty function, maintenance cost for this seems really negligible to me.

blitz · 2023-04-21T16:29:21Z

rust/stub/src/pe_loader.rs

+    // The start address gets rounded down, while the end address gets rounded up.
+    // This guarantees we flush precisely every cache line touching the passed slice.
+    let start_address = memory.as_ptr() as usize & CACHE_LINE_SIZE.wrapping_neg();
+    let end_address = ((memory.as_ptr() as usize + memory.len() - 1) | (CACHE_LINE_SIZE - 1)) + 1;


nit:

It would be more idiomatic (in the C programmer sense) to say:

Suggested change

let end_address = ((memory.as_ptr() as usize + memory.len() - 1) | (CACHE_LINE_SIZE - 1)) + 1;

let end_address = ((memory.as_ptr() as usize + memory.len() + CACHE_LINE_SIZE - 1) & CACHE_LINE_SIZE.wrapping_neg();

RaitoBezarius · 2023-04-21T16:33:16Z

i686 is something that can easily exercise our multi-platform support, as the cost is zero at the moment, I am in favor of keeping it.

blitz · 2023-04-21T16:39:09Z

Tests came back green. 👍

stub: clarify instruction cache coherence

81e25ee

alois31 force-pushed the icache branch from 1f088c1 to 0e0c756 Compare April 16, 2023 14:19

blitz reviewed Apr 16, 2023

View reviewed changes

blitz requested changes Apr 18, 2023

View reviewed changes

alois31 force-pushed the icache branch from 0e0c756 to ae401e4 Compare April 21, 2023 16:00

blitz reviewed Apr 21, 2023

View reviewed changes

Merge branch 'master' into icache

ddd22a8

blitz approved these changes Apr 21, 2023

View reviewed changes

blitz merged commit ce0e72a into nix-community:master Apr 21, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

stub: implement `flush_instruction_cache` on i686 and AArch64 #151

stub: implement `flush_instruction_cache` on i686 and AArch64 #151

alois31 commented Apr 16, 2023

alois31 commented Apr 16, 2023

blitz left a comment

RaitoBezarius commented Apr 17, 2023

blitz commented Apr 17, 2023

RaitoBezarius commented Apr 17, 2023

alois31 commented Apr 17, 2023

blitz Apr 18, 2023

alois31 Apr 18, 2023

blitz commented Apr 18, 2023

nikstur commented Apr 18, 2023

alois31 commented Apr 21, 2023

blitz commented Apr 21, 2023

alois31 commented Apr 21, 2023

blitz Apr 21, 2023

RaitoBezarius commented Apr 21, 2023

blitz commented Apr 21, 2023

	let end_address = ((memory.as_ptr() as usize + memory.len() - 1) \| (CACHE_LINE_SIZE - 1)) + 1;
	let end_address = ((memory.as_ptr() as usize + memory.len() + CACHE_LINE_SIZE - 1) & CACHE_LINE_SIZE.wrapping_neg();

stub: implement flush_instruction_cache on i686 and AArch64 #151

stub: implement flush_instruction_cache on i686 and AArch64 #151

Conversation

alois31 commented Apr 16, 2023

alois31 commented Apr 16, 2023

blitz left a comment

Choose a reason for hiding this comment

RaitoBezarius commented Apr 17, 2023

blitz commented Apr 17, 2023

RaitoBezarius commented Apr 17, 2023

alois31 commented Apr 17, 2023

blitz Apr 18, 2023

Choose a reason for hiding this comment

alois31 Apr 18, 2023

Choose a reason for hiding this comment

blitz commented Apr 18, 2023

nikstur commented Apr 18, 2023

alois31 commented Apr 21, 2023

blitz commented Apr 21, 2023

alois31 commented Apr 21, 2023

blitz Apr 21, 2023

Choose a reason for hiding this comment

RaitoBezarius commented Apr 21, 2023

blitz commented Apr 21, 2023

stub: implement `flush_instruction_cache` on i686 and AArch64 #151

stub: implement `flush_instruction_cache` on i686 and AArch64 #151