Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stub: implement flush_instruction_cache on i686 and AArch64 #151

Merged
merged 3 commits into from
Apr 21, 2023

Conversation

alois31
Copy link
Contributor

@alois31 alois31 commented Apr 16, 2023

People reportedly want to compile the stub on i686 and AArch64 platforms for testing. Make compilation possible by providing proper flush_instruction_cache implementations on these platforms. For x86 (just as x86_64), this is a no-op, because Intel made the instruction cache coherent for compatibility with code that was written before caches existed.
For AArch64, adapt the procedure from their manual to multiple instructions.

@alois31
Copy link
Contributor Author

alois31 commented Apr 16, 2023

Not tested since I don't have devices with UEFI on these architectures.

Copy link
Member

@blitz blitz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution! 👍

I'll read up on the ARM specifics to make up my mind about this. Give me a couple of days.

Btw, i wouldn't add 32-bit support, because this will pretty much never be used. Or is there a usecase?

@RaitoBezarius
Copy link
Member

Thanks for the contribution! +1

I'll read up on the ARM specifics to make up my mind about this. Give me a couple of days.

Btw, i wouldn't add 32-bit support, because this will pretty much never be used. Or is there a usecase?

We shared some details about ARM in the dev channel (notably the reference of ARMARM explaining cache maintenance for those usecases). Otherwise, ARMARM B.2.4.(ii) is the good reference IIRC.

@blitz
Copy link
Member

blitz commented Apr 17, 2023

Thanks for the contribution! +1

I'll read up on the ARM specifics to make up my mind about this. Give me a couple of days.

Btw, i wouldn't add 32-bit support, because this will pretty much never be used. Or is there a usecase?

We shared some details about ARM in the dev channel (notably the reference of ARMARM explaining cache maintenance for those usecases). Otherwise, ARMARM B.2.4.(ii) is the good reference IIRC.

UEFI had as a InvalidateInstructionCache method. That would remove the platform specific code entirely.

Otherwise, the flushing code could be simplified by only looping once and doing the flushes every 16-bytes (minimal cache line size on ARMv8?).

@RaitoBezarius
Copy link
Member

For the ranges, it seems to do something yes: https://github.com/tianocore/edk2/blob/master/ArmPkg/Library/ArmCacheMaintenanceLib/ArmCacheMaintenanceLib.c#L41-L82 (but the global one, not).

@alois31
Copy link
Contributor Author

alois31 commented Apr 17, 2023

The documentation of the uefi crate states that "OVMF only implements this protocol interface for the virtual EBC processor". This lead me to the assumption that the debug protocol is not reliably available. If this assumption is mistaken, we could also just switch to InvalidateInstructionCache.

Comment on lines 26 to 34
for address in (start_address..end_address).step_by(4) {
unsafe { asm!("dc cvau, {address}", address = in(reg) address) };
}
unsafe { asm!("dsb ish") };
// Force reloading the written instructions.
for address in (start_address..end_address).step_by(4) {
unsafe { asm!("ic ivau, {address}", address = in(reg) address) };
}
unsafe { asm!("dsb ish", "isb") };
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about this?

Suggested change
for address in (start_address..end_address).step_by(4) {
unsafe { asm!("dc cvau, {address}", address = in(reg) address) };
}
unsafe { asm!("dsb ish") };
// Force reloading the written instructions.
for address in (start_address..end_address).step_by(4) {
unsafe { asm!("ic ivau, {address}", address = in(reg) address) };
}
unsafe { asm!("dsb ish", "isb") };
// ARM mandates at least 16-byte cache line size.
for address in (start_address..end_address).step_by(16) {
unsafe { asm!("dc cvau, {address}; ic ivau, {address}", address = in(reg) address) };
}
unsafe { asm!("dsb ish", "isb") };

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that doesn't work. By missing the dsb ish between dc cvau and ic ivau, the CPU may reorder their effects and prefetch stale code in between.

rust/stub/src/pe_loader.rs Outdated Show resolved Hide resolved
@blitz
Copy link
Member

blitz commented Apr 18, 2023

The documentation of the uefi crate states that "OVMF only implements this protocol interface for the virtual EBC processor". This lead me to the assumption that the debug protocol is not reliably available. If this assumption is mistaken, we could also just switch to InvalidateInstructionCache.

Yes, I would say the debug protocol is out then and we have to do it ourselves.

@nikstur
Copy link
Collaborator

nikstur commented Apr 18, 2023

Btw, i wouldn't add 32-bit support, because this will pretty much never be used.

I would also encourage not adding 32-bit support. Many other projects are just dropping it ;)

People reportedly want to compile the stub on i686 and AArch64
platforms for testing. Make compilation possible by providing proper
`make_instruction_cache_coherent` implementations on these platforms.
For x86 (just as x86_64), this is a no-op, because Intel made the
instruction cache coherent for compatibility with code that was written
before caches existed.
For AArch64, adapt the procedure from their manual to multiple
instructions.
@alois31
Copy link
Contributor Author

alois31 commented Apr 21, 2023

I would also encourage not adding 32-bit support. Many other projects are just dropping it ;)

This PR was actually prompted by someone complaining on Matrix that the stub does not compile on i686, so I guess thre is interest. 32-bit x86, while indeed not really being a relevant platform any more, also has the unique advantage of having a word size other than 64 bits while being readily available, so it can be useful for testing portability.

@blitz
Copy link
Member

blitz commented Apr 21, 2023

32-bit x86, while indeed not really being a relevant platform any more, also has the unique advantage of having a word size other than 64 bits while being readily available, so it can be useful for testing portability.

32-bit UEFI platforms are incredibly rare these days. I can only remember some ancient Atom tablets. But okay, as long as there is no measurable maintenance cost, we can add it.

@alois31
Copy link
Contributor Author

alois31 commented Apr 21, 2023

32-bit x86, while indeed not really being a relevant platform any more, also has the unique advantage of having a word size other than 64 bits while being readily available, so it can be useful for testing portability.

32-bit UEFI platforms are incredibly rare these days. I can only remember some ancient Atom tablets. But okay, as long as there is no measurable maintenance cost, we can add it.

I mean, the part implemented by this PR is literally one empty function, maintenance cost for this seems really negligible to me.

// The start address gets rounded down, while the end address gets rounded up.
// This guarantees we flush precisely every cache line touching the passed slice.
let start_address = memory.as_ptr() as usize & CACHE_LINE_SIZE.wrapping_neg();
let end_address = ((memory.as_ptr() as usize + memory.len() - 1) | (CACHE_LINE_SIZE - 1)) + 1;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

It would be more idiomatic (in the C programmer sense) to say:

Suggested change
let end_address = ((memory.as_ptr() as usize + memory.len() - 1) | (CACHE_LINE_SIZE - 1)) + 1;
let end_address = ((memory.as_ptr() as usize + memory.len() + CACHE_LINE_SIZE - 1) & CACHE_LINE_SIZE.wrapping_neg();

@RaitoBezarius
Copy link
Member

i686 is something that can easily exercise our multi-platform support, as the cost is zero at the moment, I am in favor of keeping it.

@blitz
Copy link
Member

blitz commented Apr 21, 2023

Tests came back green. 👍

@blitz blitz merged commit ce0e72a into nix-community:master Apr 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants