From 696d378d7b7295366e115e89a785640bf72a5043 Mon Sep 17 00:00:00 2001 From: Stewart Smith Date: Mon, 2 Oct 2017 12:08:25 +1100 Subject: [PATCH] fsp: return OPAL_BUSY_EVENT on failure sending FSP_CMD_POWERDOWN_NORM We had a race condition between FSP Reset/Reload and powering down the system from the host: Roughly: FSP Host --- ---- Power on Power on (inject EPOW) (trigger FSP R/R) Processes EPOW event, starts shutting down calls OPAL_CEC_POWER_DOWN (is still in R/R) gets OPAL_INTERNAL_ERROR, spins in opal_poll_events (FSP comes back) spinning in opal_poll_events (thinks host is running) The call to OPAL_CEC_POWER_DOWN is only made once as the reset/reload error path for fsp_sync_msg() is to return -1, which means we give the OS OPAL_INTERNAL_ERROR, which is fine, except that our own API docs give us the opportunity to return OPAL_BUSY when trying again later may be successful, and we're ambiguous as to if you should retry on OPAL_INTERNAL_ERROR. For reference, the linux code looks like this: >static void __noreturn pnv_power_off(void) >{ > long rc = OPAL_BUSY; > > pnv_prepare_going_down(); > > while (rc == OPAL_BUSY || rc == OPAL_BUSY_EVENT) { > rc = opal_cec_power_down(0); > if (rc == OPAL_BUSY_EVENT) > opal_poll_events(NULL); > else > mdelay(10); > } > for (;;) > opal_poll_events(NULL); >} Which means that *practically* our only option is to return OPAL_BUSY or OPAL_BUSY_EVENT. We choose OPAL_BUSY_EVENT for FSP systems as we do want to ensure we're running pollers to communicate with the FSP and do the final bits of Reset/Reload handling before we power off the system. Additionally, we really should update our documentation to point all of these return codes and what action an OS should take. CC: stable Reported-by: Pridhiviraj Paidipeddi Signed-off-by: Stewart Smith Reviewed-by: Vasant Hegde Acked-by: Ananth N Mavinakayanahalli Signed-off-by: Stewart Smith --- doc/opal-api/opal-cec-power-down-5.rst | 18 +++++++++++++++--- doc/opal-api/return-codes.rst | 6 +++++- platforms/ibm-fsp/common.c | 2 +- 3 files changed, 21 insertions(+), 5 deletions(-) diff --git a/doc/opal-api/opal-cec-power-down-5.rst b/doc/opal-api/opal-cec-power-down-5.rst index 6daea3de65ff..bdcb84e3b958 100644 --- a/doc/opal-api/opal-cec-power-down-5.rst +++ b/doc/opal-api/opal-cec-power-down-5.rst @@ -24,16 +24,28 @@ Return Values ------------- ``OPAL_SUCCESS`` - the power down was updated successful + the power down request was successful. + This may/may not result in immediate power down. An OS should + spin in a loop after getting `OPAL_SUCCESS` as it is likely that there + will be a delay before instructions stop being executed. ``OPAL_BUSY`` - unable to power down, try again later + unable to power down, try again later. + +``OPAL_BUSY_EVENT`` + Unable to power down, call `opal_run_pollers` and try again. ``OPAL_PARAMETER`` a parameter was incorrect ``OPAL_INTERNAL_ERROR`` - hal code sent incorrect data to hardware device + Something went wrong, and waiting and trying again is unlikely to be + successful. Although, considering that in a shutdown code path, there's + unlikely to be any other valid option to take, retrying is perfectly valid. + + In older OPAL versions (prior to skiboot v5.9), on IBM FSP systems, this + return code was returned erroneously instead of OPAL_BUSY_EVENT during an + FSP Reset/Reload. ``OPAL_UNSUPPORTED`` this platform does not support being powered off. diff --git a/doc/opal-api/return-codes.rst b/doc/opal-api/return-codes.rst index 03ea5c19d728..3ea4a3d85169 100644 --- a/doc/opal-api/return-codes.rst +++ b/doc/opal-api/return-codes.rst @@ -40,7 +40,8 @@ OPAL_BUSY #define OPAL_BUSY -2 -Try again later. +Try again later. Related to `OPAL_BUSY_EVENT`, but `OPAL_BUSY` indicates that the +caller need not call `OPAL_POLL_EVENTS` itself. **TODO** Clarify current situation. OPAL_PARTIAL ------------ @@ -126,6 +127,9 @@ OPAL_BUSY_EVENT #define OPAL_BUSY_EVENT -12 +The same as `OPAL_BUSY` but signals that the OS should call `OPAL_POLL_EVENTS` as +that may be required to get into a state where the call will succeed. + OPAL_HARDWARE_FROZEN -------------------- :: diff --git a/platforms/ibm-fsp/common.c b/platforms/ibm-fsp/common.c index 237b63fb4f05..0a9b06f77e33 100644 --- a/platforms/ibm-fsp/common.c +++ b/platforms/ibm-fsp/common.c @@ -223,7 +223,7 @@ int64_t ibm_fsp_cec_power_down(uint64_t request) printf("FSP: Sending shutdown command to FSP...\n"); if (fsp_sync_msg(fsp_mkmsg(FSP_CMD_POWERDOWN_NORM, 1, request), true)) - return OPAL_INTERNAL_ERROR; + return OPAL_BUSY_EVENT; fsp_reset_links(); return OPAL_SUCCESS;