Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

plat/rpi: implement PSCI CPU_OFF. #2

Open
wants to merge 1 commit into
base: pi4
from
Open

plat/rpi: implement PSCI CPU_OFF. #2

wants to merge 1 commit into from

Conversation

@andreiw
Copy link

andreiw commented Mar 12, 2020

See https://developer.trustedfirmware.org/T686

We simulate CPU off by reseting core via RMR. For secondaries,
that already puts them in the holding pen waiting for a
"warm boot" request as part of PSCI CPU_ON. For the BSP,
we have to add logic to distinguish a regular boot from
a CPU_OFF state, where like the secondaries, the BSP
needs to wait foor a "warm boot" request as part of CPU_ON.

I didn't bother refactoring out the common code out of
platform_defs.h and palt_helpers.S to keep this clear.
These Pi 3/4 files were mostly identical even before this
change.

Testing:

  • ACS suite now passes more tests (since it repeatedly
    calls code on secondaries via CPU_ON).

  • Linux testing including offlining/onlineing CPU0, e.g.
    "echo 0 > /sys/devices/system/cpu/cpu0/online".

Signed-off-by: Andrei Warkentin andrey.warkentin@gmail.com

See https://developer.trustedfirmware.org/T686

We simulate CPU off by reseting core via RMR. For secondaries,
that already puts them in the holding pen waiting for a
"warm boot" request as part of PSCI CPU_ON. For the BSP,
we have to add logic to distinguish a regular boot from
a CPU_OFF state, where like the secondaries, the BSP
needs to wait foor a "warm boot" request as part of CPU_ON.

I didn't bother refactoring out the common code out of
platform_defs.h and palt_helpers.S to keep this clear.
These Pi 3/4 files were mostly identical even before this
change.

Testing:

- ACS suite now passes more tests (since it repeatedly
calls code on secondaries via CPU_ON).

- Linux testing including offlining/onlineing CPU0, e.g.
"echo 0 > /sys/devices/system/cpu/cpu0/online".

Signed-off-by: Andrei Warkentin <andrey.warkentin@gmail.com>
@Andre-ARM

This comment has been minimized.

Copy link

Andre-ARM commented Mar 19, 2020

@andreiw: many thanks for your work on this. If you don't mind I will pick this up and push it into gerrit. As part of a WIP series to unify RPi3 and RPi4 I have a patch to unify plat_helpers.S, I think I will push that first to make your patch smaller.
I am a bit wary about the lack of any barriers here, but will double check this.

@andreiw

This comment has been minimized.

Copy link
Author

andreiw commented Mar 19, 2020

Thanks André, sure thing.

Copy link

Andre-ARM left a comment

As mentioned before, my patch to unify plat_helpers.S makes the patch smaller.
I will try to get some input on the points below from the TF-A team.

static void __dead2 rpi3_pwr_down_wfi(
const psci_power_state_t *target_state)
{
uint64_t *hold_base = (uint64_t *)PLAT_RPI3_TM_HOLD_BASE;

This comment has been minimized.

Copy link
@Andre-ARM

Andre-ARM Mar 20, 2020

I see this being used elsewhere (in this file and QEMU), but am wondering whether this should be either volatile or maybe even better using MMIO accessors, to guarantee that the compiler doesn't optimise away the seemingly pointless write below.

This comment has been minimized.

Copy link
@andreiw

andreiw Mar 20, 2020

Author

Yes, moreover need to check how that first page (where PLAT_RPI3_TM_HOLD_BASE) gets mapped. If it is mapped cacheable, then we need to clean to PoC after the write (as the code polling this runs with MMU off IIRC). Okay if it is mapped NC or WT.

This comment has been minimized.

Copy link
@Andre-ARM

Andre-ARM Mar 21, 2020

At least for the RPi4 it's mapped as device memory, but this is not really relevant, since this function is run with caches off already. So no cache maintenance here, but we need that above in the rpi3_pwr_domain_on() function, for the reasons you mentioned.

This comment has been minimized.

Copy link
@Andre-ARM

Andre-ARM Mar 21, 2020

Scratch the part about maintenance in rpi3_pwr_domain_on(), on the RPi3 it's this shared RAM/"secure" SRAM region, which is also mapped as device memory. So we don't need any cache maintenance in any case when accessing PLAT_RPI3_TM_HOLD_BASE.

cmp x0, PLAT_RPI3_TM_HOLD_STATE_BSP_OFF
bne 1f
adr x0, plat_wait_for_warm_boot
mov x30, x1

This comment has been minimized.

Copy link
@Andre-ARM

Andre-ARM Mar 20, 2020

This looks redundant, don't we restore LR above already?

This comment has been minimized.

Copy link
@andreiw

andreiw Mar 20, 2020

Author

yep, shouldn't be here... leftover from a prior approach

* Wait until we have a go - the sevl deals with the race
* of someone setting a GO state before our wfe.
*/
sevl

This comment has been minimized.

Copy link
@Andre-ARM

Andre-ARM Mar 20, 2020

So the idea is to trick the first wfe into always returning immediately?
Wouldn't it be cleaner to always check the variable first, before entering wfe (in the loop below)?

This comment has been minimized.

Copy link
@andreiw

andreiw Mar 20, 2020

Author

That would mean just more branches, so I don't think it would be that much cleaner.

This comment has been minimized.

Copy link
@andreiw

andreiw Mar 20, 2020

Author

e.g. there is no wfne is AArch64

This comment has been minimized.

Copy link
@andreiw

andreiw Mar 20, 2020

Author

May need a dsb ld before the ldr x1, [x0]

This comment has been minimized.

Copy link
@andreiw

andreiw Mar 20, 2020

Author

(to match barriers in pwr_domain_on and pwr_down_wfi)

and of course ldr and not ldxr... we're running with no MMU and thus no caches (and thus no expectation that global monitor would work)..so definitely continue using SEV in C code to unblock wfe in poll_mailbox

This comment has been minimized.

Copy link
@Andre-ARM

Andre-ARM Mar 21, 2020

What I was after is that triggering the local event register just to be able to use a pre-test loop sounds a bit like a hack - if the only reason we do so is to avoid an extra branch.
Regarding the dsb: I don't see why? I think the sev/wfe pair provides a barrier pair already. For the str/ldr sequence in here: this is a proper data dependency, the CPU would do the right thing already.

bne 1f
adr x0, plat_wait_for_warm_boot
Comment on lines +141 to +142

This comment has been minimized.

Copy link
@Andre-ARM

Andre-ARM Mar 20, 2020

Probably premature optimisation overkill, but always loading x0, then using csel with xzr would avoid the branch.

This comment has been minimized.

Copy link
@andreiw

andreiw Mar 20, 2020

Author

less branching definitely makes for clearer and thus more maintainable code :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

2 participants
You can’t perform that action at this time.