Skip to content

Commit

Permalink
fsp: return OPAL_BUSY_EVENT on failure sending FSP_CMD_POWERDOWN_NORM
Browse files Browse the repository at this point in the history
We had a race condition between FSP Reset/Reload and powering down
the system from the host:

Roughly:

  FSP                Host
  ---                ----
  Power on
                     Power on
  (inject EPOW)
  (trigger FSP R/R)
                     Processes EPOW event, starts shutting down
                     calls OPAL_CEC_POWER_DOWN
  (is still in R/R)
                     gets OPAL_INTERNAL_ERROR, spins in opal_poll_events
  (FSP comes back)
                     spinning in opal_poll_events
  (thinks host is running)

The call to OPAL_CEC_POWER_DOWN is only made once as the reset/reload
error path for fsp_sync_msg() is to return -1, which means we give
the OS OPAL_INTERNAL_ERROR, which is fine, except that our own API
docs give us the opportunity to return OPAL_BUSY when trying again
later may be successful, and we're ambiguous as to if you should retry
on OPAL_INTERNAL_ERROR.

For reference, the linux code looks like this:
>static void __noreturn pnv_power_off(void)
>{
>        long rc = OPAL_BUSY;
>
>        pnv_prepare_going_down();
>
>        while (rc == OPAL_BUSY || rc == OPAL_BUSY_EVENT) {
>                rc = opal_cec_power_down(0);
>                if (rc == OPAL_BUSY_EVENT)
>                        opal_poll_events(NULL);
>                else
>                        mdelay(10);
>        }
>        for (;;)
>                opal_poll_events(NULL);
>}

Which means that *practically* our only option is to return OPAL_BUSY
or OPAL_BUSY_EVENT.

We choose OPAL_BUSY_EVENT for FSP systems as we do want to ensure we're
running pollers to communicate with the FSP and do the final bits of
Reset/Reload handling before we power off the system.

Additionally, we really should update our documentation to point all
of these return codes and what action an OS should take.

CC: stable
Reported-by: Pridhiviraj Paidipeddi <ppaidipe@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
Reviewed-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Acked-by: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
  • Loading branch information
stewartsmith committed Oct 11, 2017
1 parent e363cd6 commit 696d378
Show file tree
Hide file tree
Showing 3 changed files with 21 additions and 5 deletions.
18 changes: 15 additions & 3 deletions doc/opal-api/opal-cec-power-down-5.rst
Expand Up @@ -24,16 +24,28 @@ Return Values
-------------

``OPAL_SUCCESS``
the power down was updated successful
the power down request was successful.
This may/may not result in immediate power down. An OS should
spin in a loop after getting `OPAL_SUCCESS` as it is likely that there
will be a delay before instructions stop being executed.

``OPAL_BUSY``
unable to power down, try again later
unable to power down, try again later.

``OPAL_BUSY_EVENT``
Unable to power down, call `opal_run_pollers` and try again.

``OPAL_PARAMETER``
a parameter was incorrect

``OPAL_INTERNAL_ERROR``
hal code sent incorrect data to hardware device
Something went wrong, and waiting and trying again is unlikely to be
successful. Although, considering that in a shutdown code path, there's
unlikely to be any other valid option to take, retrying is perfectly valid.

In older OPAL versions (prior to skiboot v5.9), on IBM FSP systems, this
return code was returned erroneously instead of OPAL_BUSY_EVENT during an
FSP Reset/Reload.

``OPAL_UNSUPPORTED``
this platform does not support being powered off.
6 changes: 5 additions & 1 deletion doc/opal-api/return-codes.rst
Expand Up @@ -40,7 +40,8 @@ OPAL_BUSY

#define OPAL_BUSY -2

Try again later.
Try again later. Related to `OPAL_BUSY_EVENT`, but `OPAL_BUSY` indicates that the
caller need not call `OPAL_POLL_EVENTS` itself. **TODO** Clarify current situation.

OPAL_PARTIAL
------------
Expand Down Expand Up @@ -126,6 +127,9 @@ OPAL_BUSY_EVENT

#define OPAL_BUSY_EVENT -12

The same as `OPAL_BUSY` but signals that the OS should call `OPAL_POLL_EVENTS` as
that may be required to get into a state where the call will succeed.

OPAL_HARDWARE_FROZEN
--------------------
::
Expand Down
2 changes: 1 addition & 1 deletion platforms/ibm-fsp/common.c
Expand Up @@ -223,7 +223,7 @@ int64_t ibm_fsp_cec_power_down(uint64_t request)
printf("FSP: Sending shutdown command to FSP...\n");

if (fsp_sync_msg(fsp_mkmsg(FSP_CMD_POWERDOWN_NORM, 1, request), true))
return OPAL_INTERNAL_ERROR;
return OPAL_BUSY_EVENT;

fsp_reset_links();
return OPAL_SUCCESS;
Expand Down

0 comments on commit 696d378

Please sign in to comment.