-
-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
stub: implement flush_instruction_cache
on i686 and AArch64
#151
Conversation
Not tested since I don't have devices with UEFI on these architectures. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the contribution! 👍
I'll read up on the ARM specifics to make up my mind about this. Give me a couple of days.
Btw, i wouldn't add 32-bit support, because this will pretty much never be used. Or is there a usecase?
We shared some details about ARM in the dev channel (notably the reference of ARMARM explaining cache maintenance for those usecases). Otherwise, ARMARM B.2.4.(ii) is the good reference IIRC. |
UEFI had as a InvalidateInstructionCache method. That would remove the platform specific code entirely. Otherwise, the flushing code could be simplified by only looping once and doing the flushes every 16-bytes (minimal cache line size on ARMv8?). |
For the ranges, it seems to do something yes: https://github.com/tianocore/edk2/blob/master/ArmPkg/Library/ArmCacheMaintenanceLib/ArmCacheMaintenanceLib.c#L41-L82 (but the global one, not). |
The documentation of the uefi crate states that "OVMF only implements this protocol interface for the virtual EBC processor". This lead me to the assumption that the debug protocol is not reliably available. If this assumption is mistaken, we could also just switch to |
rust/stub/src/pe_loader.rs
Outdated
for address in (start_address..end_address).step_by(4) { | ||
unsafe { asm!("dc cvau, {address}", address = in(reg) address) }; | ||
} | ||
unsafe { asm!("dsb ish") }; | ||
// Force reloading the written instructions. | ||
for address in (start_address..end_address).step_by(4) { | ||
unsafe { asm!("ic ivau, {address}", address = in(reg) address) }; | ||
} | ||
unsafe { asm!("dsb ish", "isb") }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about this?
for address in (start_address..end_address).step_by(4) { | |
unsafe { asm!("dc cvau, {address}", address = in(reg) address) }; | |
} | |
unsafe { asm!("dsb ish") }; | |
// Force reloading the written instructions. | |
for address in (start_address..end_address).step_by(4) { | |
unsafe { asm!("ic ivau, {address}", address = in(reg) address) }; | |
} | |
unsafe { asm!("dsb ish", "isb") }; | |
// ARM mandates at least 16-byte cache line size. | |
for address in (start_address..end_address).step_by(16) { | |
unsafe { asm!("dc cvau, {address}; ic ivau, {address}", address = in(reg) address) }; | |
} | |
unsafe { asm!("dsb ish", "isb") }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that doesn't work. By missing the dsb ish
between dc cvau
and ic ivau
, the CPU may reorder their effects and prefetch stale code in between.
Yes, I would say the debug protocol is out then and we have to do it ourselves. |
I would also encourage not adding 32-bit support. Many other projects are just dropping it ;) |
People reportedly want to compile the stub on i686 and AArch64 platforms for testing. Make compilation possible by providing proper `make_instruction_cache_coherent` implementations on these platforms. For x86 (just as x86_64), this is a no-op, because Intel made the instruction cache coherent for compatibility with code that was written before caches existed. For AArch64, adapt the procedure from their manual to multiple instructions.
This PR was actually prompted by someone complaining on Matrix that the stub does not compile on i686, so I guess thre is interest. 32-bit x86, while indeed not really being a relevant platform any more, also has the unique advantage of having a word size other than 64 bits while being readily available, so it can be useful for testing portability. |
32-bit UEFI platforms are incredibly rare these days. I can only remember some ancient Atom tablets. But okay, as long as there is no measurable maintenance cost, we can add it. |
I mean, the part implemented by this PR is literally one empty function, maintenance cost for this seems really negligible to me. |
// The start address gets rounded down, while the end address gets rounded up. | ||
// This guarantees we flush precisely every cache line touching the passed slice. | ||
let start_address = memory.as_ptr() as usize & CACHE_LINE_SIZE.wrapping_neg(); | ||
let end_address = ((memory.as_ptr() as usize + memory.len() - 1) | (CACHE_LINE_SIZE - 1)) + 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
It would be more idiomatic (in the C programmer sense) to say:
let end_address = ((memory.as_ptr() as usize + memory.len() - 1) | (CACHE_LINE_SIZE - 1)) + 1; | |
let end_address = ((memory.as_ptr() as usize + memory.len() + CACHE_LINE_SIZE - 1) & CACHE_LINE_SIZE.wrapping_neg(); |
i686 is something that can easily exercise our multi-platform support, as the cost is zero at the moment, I am in favor of keeping it. |
Tests came back green. 👍 |
People reportedly want to compile the stub on i686 and AArch64 platforms for testing. Make compilation possible by providing proper
flush_instruction_cache
implementations on these platforms. For x86 (just as x86_64), this is a no-op, because Intel made the instruction cache coherent for compatibility with code that was written before caches existed.For AArch64, adapt the procedure from their manual to multiple instructions.