Confusion with se_mean and standard deviation #8699

Elisei-Kungurov · 2023-02-22T12:12:46Z

I would like to clarify that I understood conception of se_mean and standard deviation in statsmodels correctly. Could you help me with this?

In documentation statsmodels.tsa.base.prediction.PredictionResults.se_mean we have description that se_mean is the standard deviation of the predicted mean. At the same time in Release 0.8.0 there is a passage that get_forecast provides standard errors. As far as I know standard deviation and standard errors of mean are not the same things.

As a rookie in statistics I found in wiki that std is a variation in measurements, while the standard error of the mean is a probabilistic statement about how the sample size will provide a better bound on estimates of the population mean, in light of the central limit theorem. However, standard error can be described as an estimation of that standard deviation. Does it mean that in statespace.sarimax we estimate possible future values of standard deviations and the model outputs std which depends on the number of time series point (more additional time points, better prediction of std)?

I build SARIMAX model and want to construct сonfidence interval as a variation in measurements for forecast. Is it possible to use mean_se for this or I need to convert these values to std by multiplying SE by sqrt(n)? And does n equal the number of data points in time series before forecasting?

Thank you for yor reply in advance!

ChadFulton · 2023-02-23T03:21:21Z

For a simple state space model (of which SARIMAX is a special case), we have:

$$y_t = Z \alpha_t + \varepsilon_t, \varepsilon_t \sim N(0, H)$$

$$\alpha_t = T \alpha_{t-1} + \zeta_t, \zeta_t \sim N(0, Q)$$

Here we will assume that the matrices $Z, H, T, Q$ are known. (Actually, the estimated parameters of the model are in those matrices, but the state space model prediction results standard errors and confidence intervals supported by Statsmodels never account for parameter uncertainty, so we can ignore that for now).

By default, get_prediction (or get_forecast) gives one-step-ahead predictions of $y_t$, so that:

PredictionResults.predicted_mean = $E[y_t | y_{t-1}, y_{t-2}, \dots]$
PredictionResults.se_mean = $StdDev[y_t | y_{t-1}, y_{t-2}, \dots]$

(Aside: @josef-pkt pointed out that this actually doesn't match the intended/typical Statsmodels usage of the _mean suffix, which I believe would be intended to capture e.g. $E[Z \alpha_t | y_{t-1}, y_{t-2}, \dots]$. But things are a little bit different in state space models, because (a) many models do not have a $\varepsilon_t$ term anyway, e.g. the SARIMAX model, and (b) you can always rewrite any state-space model such that it doesn't have a $\varepsilon_t$ term, by placing that term into the state vector $\alpha$).

I'm not sure if that answers your question or not, but please feel free to follow up.

josef-pkt · 2023-02-23T03:52:28Z

PredictionResults.predicted_mean = $E[y_t | y_{t-1}, y_{t-2}, \dots]$

"mean" here sounds fine, it's a conditional expectation of y

PredictionResults.se_mean = $StdDev[y_t | y_{t-1}, y_{t-2}, \dots]$

In OLS I used se_obs for similar (which includes parameter uncertainty plus residual standard deviation), corresponding to prediction interval.

se_mean would be the uncertainty of the conditional expectations (coming from parameter uncertainty)
y_hat = $E[y_t | y_{t-1}, y_{t-2}, \dots]$
se_mean = std(y_hat | ...)

aside:
In the newer prediction results class I use only se because get_prediction for discrete models can predict other statistics than mean. Outside of tsa and linear models, we don't have prediction intervals and se_obs yet.

ChadFulton · 2023-02-23T03:58:41Z

Thanks @josef-pkt!

ChadFulton added comp-tsa-statespace question labels Feb 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Confusion with se_mean and standard deviation #8699

Confusion with se_mean and standard deviation #8699

Elisei-Kungurov commented Feb 22, 2023

ChadFulton commented Feb 23, 2023 •

edited

josef-pkt commented Feb 23, 2023

ChadFulton commented Feb 23, 2023

Confusion with se_mean and standard deviation #8699

Confusion with se_mean and standard deviation #8699

Comments

Elisei-Kungurov commented Feb 22, 2023

ChadFulton commented Feb 23, 2023 • edited

josef-pkt commented Feb 23, 2023

ChadFulton commented Feb 23, 2023

ChadFulton commented Feb 23, 2023 •

edited