Box-Cox transform (some code needed: lambda estimator) #1309

ccsv opened this Issue Jan 10, 2014 · 12 comments


None yet

5 participants

ccsv commented Jan 10, 2014

Here is working code for the Box-Cox transform with an optional shift operator to keep data positive. My multivariable calculus is a bit rusty and I am unfamiliar with the python functions to perform the algorithms for calculating lambda. Code works for general use.

import numpy as np

def box_cox(x, lam, shift=0, method='guerrero'):
    Box Cox Transforms
    Transforms data to be normal. Also known as power transform.

    x: array
        Time series data
    lam: float or str {'auto'}(Optional)
        lambda the power parameter. The 'auto' input automatically calculates
        lambda based on method chosen
    shift: int
        Use to normalize data by shifting data to all positive values
    method: str {'guerrero', 'loglike', 'newtonraphson'} NOT IMPLEMENTED
        Methods of calculating lambda default is 'guerrero' method.
        loglike uses log likelihood, newtonraphson uses Newton Raphson

    y: array
        Data that is transformed


    *Likelihood based on inference on the Box-Cox Family of
     Transformations: SAS and Matlab programs, Scott Hyde 1999
     Guerrero, VM 1993

    x = np.asarray(x)
    # Shift operator for negative data
    if shift > 0:
        x += shift

    #Automatic estimators
    if lam == 'auto':
        #Log Likelihood
        if method == 'loglike':
            raise NotImplementedError("Feature is unavailable")
        #Newton-Raphson algorithm
        elif method == 'newtonraphson':
            raise NotImplementedError("Feature is unavailable")
        #Guerrero’s (1993)
        if method == 'guerrero':
            raise NotImplementedError("Feature is unavailable")

    #Main function
    if lam == 0:
        y = np.log10(x)
        y = ((x ** lam) - 1.) / lam
    return y

The code works but the automatic feature for calculating lambda should be default and code for two of the calculation methods can be found here:
The last method can be found in:

Guerrero, VM 1993
But you will need access to Wiley


Did you see that this is in scipy.stats already? (and links in the See Also section of that docstring)

ccsv commented Jan 11, 2014

Nope I did not see that let me close this

@ccsv ccsv closed this Jan 11, 2014

I've tried adding the Newton-Raphson code to scipy to estimate lambda, and it seems to work pretty well.

By the way, the main function

    #Main function
    if lam == 0:
        y = np.log10(x)
        y = ((x ** lam) - 1.) / lam

should use natural log instead of base 10 for continuity

>>> lam = 1e-8
>>> import numpy as np
>>> x = 3.0
>>> ((x**lam) - 1)/lam
>>> np.log(x)
>>> np.log10(x)

see the scipy PR #5120 if we want to resurrect this.
There still is no box-cox or similar parameterized transform used in statsmodels


reopening. I don't see anything specific to Box-Cox or similar transformation. Stata has a estimator IIRC.

general issue #1480 and there are several other issues for how to support transformed endog.
Over the years I looked at several articles that transform either endog or the exogs, but never tried to built this into a model.

Possible usecase for simple transformation and estimating lambda: In statistics it's often used to remove skew or kurtosis to have a better approximation to normality.
Box-Cox and generalizations could be used to estimate the required transformation parameters.

(Note: Following the "Use Poisson" literature, generalizing GLM would be more useful to estimate E(y|x) directly without the heavier distributional assumption, see parameterized link functions and link tests. But simple transformation looks fine for getting better parameter estimates and hypothesis test on them.)

@josef-pkt josef-pkt reopened this Feb 1, 2016
N-Wouda commented May 3, 2016

Do we want to continue work on this? @josef-pkt

Based on the method outlined in Guerrero (1992), I wrote some Scala code on lambda selection for the box-cox transform a couple months back. I could easily refactor and open a PR, but I would rather also have the two parameter transform included (I'd have to read up on the 'original' profile likelihood selection, but I foresee no difficulties).



I still think adding support for box-cox and similar transformation is of practical importance and should be added. We also have a new PR, #2892, that includes box-cox transformation in a new group of time series models.
I never looked at box-cox in the context of time series forecasting, so I read Guerrero today, and
Proietti, Tommaso, and Helmut Lütkepohl. 2013. “Does the Box–Cox Transformation Help in Forecasting Macroeconomic Time Series?” International Journal of Forecasting 29 (1): 88–99. doi:10.1016/j.ijforecast.2012.06.001.
which cite Guerrero but don't say much about the article. They check how well box-cox transformation works in a large set of seasonal (monthly) time series. Surprisingly, the naive predictor that ignores the retransformation problem is doing better than the optimal (under normality) retransformation.

My main worry about the Guerrero approach is that it is very outlier (special event) sensitive, so maybe adding some robust options for the variance estimation (MAD or IQR based) might be useful.

The longer story (not necessary to read, but also for me to write down a summary once):

I'm looking the topic of transforming endog every once in a while, but still don't have a clear picture of what to do. I opened several issues related to supporting transformation.
It is useful for time series analysis where we have a large number of models that are only for the linear case. It is also used in several cases in hypothesis testing to get a better approximation to the normal distribution of a test statistic.
However, for prediction and estimation outside of the time series context, I think now that we should be able to do almost everything without transformation through GLM or GLM like models, single index models in general. This handles the nonlinearity. The variance stabilization that Guerrero emphasizes is, I think, better or more directly handled by modelling the variance.
GLM works numerically very well in almost all cases, which removes computational reason to use transformation at least for the case of cross-sectional data. This avoids any retransformation problems.
However, the same (re)transformation problem also shows up when we want to calculate other nonlinear function of the parameters or the prediction, so we still need support "bias correction" in the retransformation.
Another advantage of GLM (with possible heteroscedasticity) is that it preserves that the mean function is consistently estimated even if the variance is misspecified. While in the parametric retransformation as in Guerrero the mean prediction is directly affected by the variance estimate which is more restrictive in terms of additional parametric assumptions or more noisy.

Computational consideration might still go the other way. IIRC, I have seen references that sqrt transform a Poisson count variable to be able to use linear time series models.

N-Wouda commented May 4, 2016

@josef-pkt well, my personal background is in (predominantly) tsa, where I've come across some of these transforms now and then. While they do not always have sizeable effects, they are convenient to use/try on the data.

Feel free to tag me into anything alike to this if you decide on what to do. I'd very much support adding transformations for time series data and am willing to contribute.


If you would like to get started with this, which would be great, then I'd suggest collecting the necessary pieces.
For example last year I started to write "wrapper models" for some new models that build on existing model, either as mixins or taking an existing model as argument, or specialized cases that just delegate the work to an existing model.

For example, a OLSBoxCox or ARIMABoxCox would mainly need to add the transformation of endog, delegate to some helper functions to estimate the BoxCox parameter from the residuals and whatever else is needed, and add a predict feature to get the retransformed prediction.

It's possible to go the full MLE route including estimating the transformation, but I guess it's more robust and it's easier to get started with estimating the transformation parameters in a separate step, as in Guerrero and in Proietti/Lütkepohl

Guerrero has the approximate unbiased/bias corrected retransformation that would be good to have as reusable standalone function (or as part of a transformation class), because it's not provided by scipy. Before I read that paper, I had only seen the usual lognormal retransformation, or nonparametric retransformations.

(I've downloaded but not looked at a paper about similar transformation for GLM, most likely for applications like GARMA, Generalized ARMA, that are linearized ARMA version of, for example, Poisson and other positive valued random variables. I haven't really looked at more than a few abstracts in this area, but time series analysis for count data is also on my personal wishlist.)

predict helper function are also on the statsmodels wishlist to answer FAQs like "I log transformed my endog, what do I do now?" #2791 and related

N-Wouda commented May 4, 2016 edited

@josef-pkt I'll write up something real quick, expect a PR within a few hours.

I came across Riani, Marco. 2009. "Robust Transformations in Univariate and Multivariate Time Series". Econometric Reviews. 28 (1-3): 1-3., which seemed promising. If you find the time, would you care to take a look?


I've downloaded Riani Marco yesterday but didn't look at it (or don't remember why I didn't look closer, except that I downloaded more than 100 articles afterwards)
I will look at it today.

outliers and special events in time series are a more serious issue if it effects the mean parameter estimation and prediction. We will need to handle that in statsmodels at some point.

However, for box-cox transformation and retransformation I worry more about the extra effect on the mean that we will get from outliers through the variance and transformation. In general, mean estimation is relatively robust to non-normality unless there is a very serious outlier problem. In contrast, second and higher moment estimates can be very quickly distorted by just a few outliers, skewness or a few heavy-tail observations.
This could be a reason also that lognormal or similar variance based retransformation doesn't work very well in some of the prediction exercises.

(Two analogies:
t-test is pretty robust to non-normality, Bartlett test for variance is very non-robust to non-normality.
GMM for mean estimation is ok in medium sized samples, for second moment (variance and covariance) estimation, the sample size has to be pretty large to get reliable estimates.)


About the Riani paper:
My main impression: too much work and too complicated for the benefits.

There are some interesting parts in there that I never looked at before, like full MLE for the transformation in statespace models and score tests for statespace models (or ARMA) by variable addition test. However, the forward search goes in the direction of robust, high breakdown estimators that would be a different project and quite a bit of work in itself. And given the relatively low number of citation of this strand of literature it might not be the best approach for outlier, special event handling.
Some discussion about outliers and special event removal in time series starts here:
#2873 (comment)
which sounded like a useful approach (and has more citations), even if it is not high-breakdown robust and won't be able to handle the case when 40% of the data are outliers. :)
(Another possibility if it's mainly heavy-tailed innovations: use t-distributed ARMA and a variance estimator that is robust to outliers or large observations.)

In any case, those are useful projects and models that we can add, but I wouldn't get distracted by them for implementing transformations, We can also switch to using one of the robust underlying models for the transformation approach once they are available.

Proietti/Lütkepohl mention adding dummy variables for observations, and in their case frequencies, to lower the impact of outliers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment