# WLS log likelihood, aic and bic #1170

Closed
opened this Issue Nov 2, 2013 · 21 comments

None yet

### 2 participants

Contributor
commented Nov 2, 2013
 The log likelihood for WLS should. Be the same as for GLS when when the weights are chosen 1/sqrt(sigma) when sigma is an nobs by 1 vector for use in GLS. This is not the case and the log likelihood calculation for WLS differs from the formula given in the doc string by a term which would fix this. I implemented this fix abut now the test for WLS fails. Is there some reason for this omission? Comparability with some other piece of software which seems to be broken?
Member
 I will look into this. From what I remember when this was originally written, Skipper, when he wrote the unit tests, had some problems with different definitions especially of aic and bic across packages, because they dropped different terms. Terms that are constant when we compare aic, bic for the same model with different number of parameters. Do you have the code or the term that is missing somewhere public?
Member
 based on comments in the test suite: WLS seems to be tested against Stata GLS seems to be tested against R precision in the unit tests for llf, aic, bic is not very high, but looks like around 3 digits. We don't have any test that compares GLS and WLS (only GLS-OLS and WLS-OLS) At the very beginning when most of this code was written, we tested directly against R with rpy.
Member
 just another piece of information: it was unclear at some point how weights are really defined https://github.com/statsmodels/statsmodels/blob/master/statsmodels/regression/tests/test_regression.py#L458 uses a different rescaling of the weights than Stats, and if I remember correctly, Skipper needed to adjust R_squared for weights a while ago BTW: weights are the squared weights import numpy as np import statsmodels.api as sm from statsmodels.datasets.ccard import load dta = load() dta.exog = sm.add_constant(dta.exog, prepend=False) nobs = 72. weights = 1/dta.exog[:,2] #weights = ((weights * nobs)/weights.sum()) res_wls = sm.WLS(dta.endog, dta.exog, weights=weights).fit() res_gls = sm.GLS(dta.endog, dta.exog, sigma=np.diag(1./weights)).fit() print res_wls.params print res_gls.params print res_wls.bse print res_gls.bse print res_wls.pvalues print res_gls.pvalues print res_wls.rsquared, res_gls.rsquared print res_wls.llf, res_gls.llf print res_wls.aic, res_gls.aic print res_wls.bic, res_gls.bic attributes = ['llf', 'aic', 'bic', 'rsquared', 'mse_resid'] print '\n' for attr in attributes: print getattr(res_wls, attr), getattr(res_gls, attr)  params, bse, pvalues are identical 'llf', 'aic', 'bic', 'rsquared' differ
Member
 in Stata 12.1  ----------------------------------------------------------------------------- Model | Obs ll(null) ll(model) df AIC BIC -------------+--------------------------------------------------------------- . | 72 -487.5725 -476.9793 5 963.9586 975.3419 ----------------------------------------------------------------------------- Note: N=Obs used in calculating BIC; see [R] BIC note  GLS llf matches ll(null) in Stata, but Stata has a different ll(model) that is used in the aic, bic calulations correction to script above: using the same scaling of the scaled_weights as in the unit tests weights = ((weights * nobs)/weights.sum()) WLS matches what Stata is reporting. >>> import statsmodels.regression.tests.test_regression as tr >>> tt = tr.TestWLSExogWeights() >>> tt.res1.llf -476.97929465628067 >>> tt.res2.llf -476.9792946562806 >>> res_wls.llf -476.97929465628067 >>> res_wls.aic 963.95858931256134 >>> res_wls.bic 975.34191990764157  Which is what the unit tests are testing. another related issue: loglike/llf in linear models (at least in one of the three that I checked in detail) is based on the concentrated loglikelihood, with scale/variance estimated by the unbiased least squares estimator not the biased MLE (small difference in denominator)
Member
 Looks like compared to Stata we are missing a normalization, which doesn't affect the parameter and bse estimates, but does affect other statistics like llf. I haven't found yet the Stata documentation for ll(model) versus ll(null), and we still don't have identical results for WLS and GLS even with the normalized weights.
Member
 @bashtage I saw your commit for the WLS GLS tests. Interpretation of weights: I'm still not clear about all the different possible kinds of weights that are used in this and similar. (I have Stata only for one year and I'm still unclear about several details). numpy has an enhancement issue for correlation with weights numpy/numpy#3864 I tried a short time ago to see what statsmodels WLS is doing in this case: Essentially, the weights in WLS are currently not scale invariant for all results: parameter and their covariance are scale invariant, but scale and other results that depend on the total number of observations are not. I think we are implicitly using a definition where effective nobs = weights.sum() and not effective nobs = len(endog). The following gives the same results for all 3 estimators for llf, aic, bic and rsquared weights = np.ones(nobs, float) / nobs scaled_weights = nobs * weights / weights.sum() print '\nnobs', nobs, scaled_weights.sum() #assert res_ols = sm.OLS(dta.endog, dta.exog).fit() res_wls = sm.WLS(dta.endog, dta.exog, weights=scaled_weights).fit() res_gls = sm.GLS(dta.endog, dta.exog, sigma=np.diag(1./weights)).fit() 
Member
 To spell it out: I'm not sure if we should be doing this, as long as we don't have full support for different kinds of weights, especially frequency/case weights. (Stata's behavior with weights is also not undisputed from what I have seen in several FAQs or mailing list threads.)
referenced this issue Nov 2, 2013
Open

#### GLS uncentered_tss and rsquared #1171

Contributor
commented Nov 3, 2013
 As you said, the deep issue is whether the weights in WLS are scale or frequency weights. If they are scale, then WLS should be identical to GLS using a simple transformation of the weights to get sigma. If they are frequency weights, then the log-likelihood is actually different. I have implemented the log-likelihood as defined in the loglike method of WLS. The current version was not right and was missing the term which corresponds to the math: \frac{1}{2}log\left(\left|W\right|\right) Including this term makes GLS the same as WLS using transformed weights. I added some tests for OLS/GLS/WLS equivalence including some edge cases. I also fixed a bug in WLS/GLS which occurs when the weights are very large which resulted in an inf likelihood. https://github.com/bashtage/statsmodels/tree/linear-model-robust-tests I'm pretty bad at GIT still so I probably have far too many changes in this branch.
Contributor
commented Nov 3, 2013
 There are further advantages to normalization, in particular using normalized weight vector is useful for avoiding numerical issues due to weights being poorly specified. For example, using w = 100 * np.ones_like(endog)  which produces identical parameter estimates will break LL, AIC and BIC. I think these are all reasonable choices, and I think it is reasonable to leave GLS alone and use normalized Weights in WLS, and correct the (mathematical) definition of the loglike in WLS so that the code does what the docs state. Most importantly the docstrings should simply explain these choices so that users know what assumptions/choices are being made.
Contributor
commented Nov 3, 2013
 loglike/llf in linear models (at least in one of the three that I checked in detail) is based on the concentrated loglikelihood, with scale/variance estimated by the unbiased least squares estimator not the biased MLE (small difference in denominator) This is a little surprising since the unbiased scale estimators are not ML estimators. I suppose this is the risk of trying to emulate the behavior of other software which may be broken.
Member
 about unbiased scale is not MLE: I think the point here is that OLS, WLS and GLS are not MLE estimates (if we count the scale as one of the parameters) they are standard Least Squares estimates. 'llf' is the loglike evaluated at the least squares estimate, as far as I understand. I started at some point an MLE normal distributed model that directly estimates all parameters by MLE. That's when I discovered that the concentrated llf doesn't match the llf from the full normal MLE. I think in most cases the concentrated MLE in the models are straight from textbooks like Greene. The GLS subclasses with autoregressive GLSAR or GLSHet are also iterative Least Squares estimates, not directly MLE. (We still need to add full MLE estimators for those.) Although these estimates are asymptotically the same as MLE. The statisticians use very often REML in similar cases, which penalized second moment estimates to get an unbiased version instead of the biased MLE version. But even after a few years, I still didn't understand well enough what their REML is, to be able to include it in code or documentation.
Member
 about normalizing WLS w = 100 * np.ones_like(endog) Before we automatically normalize in these cases, I think we should check if the interpretation of weights as the product of frequency weights times GLS-like scale_weights works. If it does, then we could add an option to the model and include both kinds of weights. I didn't expect that this might already (almost) work. see #743 Including also frequency weights would be very useful for large data sets like survey data that don't have continuous explanatory variables. For descriptive statistics and hypothesis testing with weights, I used nobs_effective = sum(weights) https://github.com/statsmodels/statsmodels/blob/master/statsmodels/stats/weightstats.py#L42 The unit tests compare frequency weights with the repeated data as_repeats in a case when weights are integers: https://github.com/statsmodels/statsmodels/blob/master/statsmodels/stats/tests/test_weightstats.py#L71
Contributor
commented Nov 3, 2013
 The loglike is actually correct for OLS and GLS since it is based on SSR and nobs and doesn't make any reference to the degree of freedom correction. The WLS loglike in master is a bit incongruous. The implementation is one which is correct for normalized weights, but no weight normalization is taking place. OTOH, it doesn't match the GLS likelihood which is what the docstring says.
Contributor
commented Nov 3, 2013
 The GLS subclasses with autoregressive GLSAR or GLSHet are also iterative Least Squares estimates, not directly MLE. (We still need to add full MLE estimators for those.) Although these estimates are asymptotically the same as MLE. In some cases these are also the MLE estimates using the EM-algorithm - that is, maximize the log-likelihood for a fixed covariance, and then use expectations to compute the covariance using the residuals. I think always computing the log-likelihood conditional on sigma (in the GLS case) is reasonable and defendable.
referenced this issue Nov 3, 2013
Open

#### Normal Linear Model, llf #323

Member
 The loglike is actually correct for OLS and GLS since it is based on SSR and nobs and doesn't make any reference to the degree of freedom correction. my mistake: I mixed up where the difference is: llf is the loglike of the MLE estimate and not of the least squares estimate #323
Member
 I'm giving up with trying to match Stata's llf for WLS I also think Kevin's change that makes WLS equivalent to GLS makes more sense than matching Stata. In that case WLS has the straightforward heteroscedasticity interpretation, and llf is invariant to the scale of the weights. From reverse engineering the results, it looks like the correction to Stata's llf is the following, which seems to corresponds to the rescaling Stata is doing for aweights and pweights. >>> tt.res1.llf - 0.5 * np.sum(np.log(weights / weights.sum() * nobs)) -476.97929465628067 >>> res_wls.llf - 0.5 * np.sum(np.log(weights / weights.sum() * nobs)) -476.97929465628062 >>> tt.res2.llf # Stata from test case -476.9792946562806  The only test failures that I can see, are the direct comparison with Stata in statsmodels.regression.tests.test_regression.TestWLSExogWeights in loglike, aic and bic. aside: checking llf for GLSAR which were not enabled, (GLSHET still doesn't have unit tests because I didn't find an exact equivalent for the FGLS estimator. Stata can estimate heteroscedasticity only through arch with MLE, as far as I could see.) glsar against GRETL ====================================================================== FAIL: statsmodels.regression.tests.test_glsar_gretl.TestGLSARGretl.test_all ---------------------------------------------------------------------- Arrays are not almost equal to 7 decimals ACTUAL: -763.97521816022379 DESIRED: -763.9752  a GLSAR case against Stata prais ..., corc rhotype(tscorr) Arrays are not almost equal to 7 decimals ACTUAL: -759.50012510798592 DESIRED: -759.5001027340874 
Contributor
commented Nov 4, 2013
 I have thought a bit more about it and it really it all comes down to how weights should be treated. The frequency weights in Stata erms should definitely affect the LL since if y_i has a weight w_i when y_i should be repeated n_i times in the model, and so for example if w_i was always 2, the log-likelihood would change by a factor of 2 relative to OLS. On the other hand, if they are average weights, which essentially turns WLS into GLS, then the log-likelihoods should match.. I don't know a lot about probability weights, and these will probably affect the log-likelihood since they are probabilities. Would need to find a reference for this or derive it from principles. Importance weights have no theoretical justification and are essentially prior information about which observation the researcher things are more important. In this case, I think treating these more or less the same as average weights so that if they are all the same (up to scale) then the log-likelihood is unaffected. Frequency weights: change affect LL when all identical and !=1.0 Average weights: No change in LL when all identical and !=1.0, should exactly match GLS by reinterpreting the weights as sigma_i = 1/sqrt(w_i) Probability weights: Probably affect the LL, although not completely sure when all probability weights are the same Importance weights: I would always normalize these so that sum(w)=nobs, and then treat these as average weights Eventually general weighting of cross-sectional models may be useful, and I think it would make sense to reconsider this then. Probably further off in the future. Current I think I would always treat as average weights and state the relationship between GLS and WLS in the docstring for WLS
added a commit to josef-pkt/statsmodels that referenced this issue Nov 17, 2013
 josef-pkt TST: enable tests for llf after change to WLS.loglike see #1170 d955aac
referenced this issue Nov 17, 2013
Merged

#### TST: enable tests for llf after change to WLS.loglike see #1170 #1192

Member
 @bashtage Do you know what happened to the commit that made loglike/llf of WLS the same as the corresponding loglike/llf of WLS? I thought I merged this as part of the PRs, but master still seems to be missing the adjustment in WLS.loglike
Contributor
 Maybe gone forever. I reset my master after you rebased so I could work on new branches, and so this may be gone forever. Is a pretty simple fix.
Member
 Can you make a PR? I can adjust the unit tests that fail, given that we won't match Stata anymore. There is a unit test class TestOLS_GLS_WLS_equivalence that uses weight=ones, I don't remember if you had the same for WLS/GLS with unequal weights.
referenced this issue Dec 18, 2013
Merged

#### Wls llf fix #1253

closed this in #1253 Dec 19, 2013
Member
 We merged the change in PR #1253 and adjusted the test. The only difference between WLS and GLS that I checked in rsquared and centered_tss #1252 Thanks Kevin for looking into these things and improving them.
added a commit to josef-pkt/statsmodels that referenced this issue Dec 19, 2013
 josef-pkt TST: enable tests for llf after change to WLS.loglike see #1170 1ac21e4
pushed a commit to PierreBdR/statsmodels that referenced this issue Sep 2, 2014
 josef-pkt TST: enable tests for llf after change to WLS.loglike see #1170 6e59f67
referenced this issue Feb 9, 2017
Open