WLS log likelihood, aic and bic #1170

Closed
bashtage opened this Issue Nov 2, 2013 · 21 comments

Projects

None yet

2 participants

@bashtage
Contributor
bashtage commented Nov 2, 2013

The log likelihood for WLS should. Be the same as for GLS when when the weights are chosen 1/sqrt(sigma) when sigma is an nobs by 1 vector for use in GLS.

This is not the case and the log likelihood calculation for WLS differs from the formula given in the doc string by a term which would fix this. I implemented this fix abut now the test for WLS fails.

Is there some reason for this omission? Comparability with some other piece of software which seems to be broken?

@josef-pkt
Member

I will look into this.
From what I remember when this was originally written, Skipper, when he wrote the unit tests, had some problems with different definitions especially of aic and bic across packages, because they dropped different terms.
Terms that are constant when we compare aic, bic for the same model with different number of parameters.

Do you have the code or the term that is missing somewhere public?

@josef-pkt
Member

based on comments in the test suite:
WLS seems to be tested against Stata
GLS seems to be tested against R
precision in the unit tests for llf, aic, bic is not very high, but looks like around 3 digits.

We don't have any test that compares GLS and WLS (only GLS-OLS and WLS-OLS)

At the very beginning when most of this code was written, we tested directly against R with rpy.

@josef-pkt
Member

just another piece of information: it was unclear at some point how weights are really defined
https://github.com/statsmodels/statsmodels/blob/master/statsmodels/regression/tests/test_regression.py#L458
uses a different rescaling of the weights than Stats, and if I remember correctly, Skipper needed to adjust R_squared for weights a while ago

BTW: weights are the squared weights

import numpy as np
import statsmodels.api as sm
from statsmodels.datasets.ccard import load
dta = load()

dta.exog = sm.add_constant(dta.exog, prepend=False)
nobs = 72.

weights = 1/dta.exog[:,2]
#weights = ((weights * nobs)/weights.sum())

res_wls = sm.WLS(dta.endog, dta.exog, weights=weights).fit()
res_gls = sm.GLS(dta.endog, dta.exog, sigma=np.diag(1./weights)).fit()

print res_wls.params
print res_gls.params
print res_wls.bse
print res_gls.bse
print res_wls.pvalues
print res_gls.pvalues

print res_wls.rsquared, res_gls.rsquared
print res_wls.llf, res_gls.llf
print res_wls.aic, res_gls.aic
print res_wls.bic, res_gls.bic

attributes = ['llf', 'aic', 'bic', 'rsquared', 'mse_resid']

print '\n'
for attr in attributes:
    print getattr(res_wls, attr), getattr(res_gls, attr)

params, bse, pvalues are identical
'llf', 'aic', 'bic', 'rsquared' differ

@josef-pkt
Member

in Stata 12.1


-----------------------------------------------------------------------------
       Model |    Obs    ll(null)   ll(model)     df          AIC         BIC
-------------+---------------------------------------------------------------
           . |     72   -487.5725   -476.9793      5     963.9586    975.3419
-----------------------------------------------------------------------------
               Note:  N=Obs used in calculating BIC; see [R] BIC note

GLS llf matches ll(null) in Stata, but Stata has a different ll(model) that is used in the aic, bic calulations

correction to script above: using the same scaling of the scaled_weights as in the unit tests
weights = ((weights * nobs)/weights.sum())
WLS matches what Stata is reporting.

>>> import statsmodels.regression.tests.test_regression as tr
>>> tt = tr.TestWLSExogWeights()
>>> tt.res1.llf
-476.97929465628067
>>> tt.res2.llf
-476.9792946562806
>>> res_wls.llf
-476.97929465628067
>>> res_wls.aic
963.95858931256134
>>> res_wls.bic
975.34191990764157

Which is what the unit tests are testing.


another related issue:
loglike/llf in linear models (at least in one of the three that I checked in detail) is based on the concentrated loglikelihood, with scale/variance estimated by the unbiased least squares estimator not the biased MLE (small difference in denominator)

@josef-pkt
Member

Looks like compared to Stata we are missing a normalization, which doesn't affect the parameter and bse estimates, but does affect other statistics like llf.

I haven't found yet the Stata documentation for ll(model) versus ll(null), and we still don't have identical results for WLS and GLS even with the normalized weights.

@josef-pkt
Member

@bashtage I saw your commit for the WLS GLS tests.

Interpretation of weights:
I'm still not clear about all the different possible kinds of weights that are used in this and similar. (I have Stata only for one year and I'm still unclear about several details).

numpy has an enhancement issue for correlation with weights numpy/numpy#3864
I tried a short time ago to see what statsmodels WLS is doing in this case:
Essentially, the weights in WLS are currently not scale invariant for all results: parameter and their covariance are scale invariant, but scale and other results that depend on the total number of observations are not.

I think we are implicitly using a definition where effective nobs = weights.sum() and not effective nobs = len(endog).

The following gives the same results for all 3 estimators for llf, aic, bic and rsquared

weights = np.ones(nobs, float) / nobs
scaled_weights = nobs * weights / weights.sum()

print '\nnobs', nobs, scaled_weights.sum()  #assert

res_ols = sm.OLS(dta.endog, dta.exog).fit()
res_wls = sm.WLS(dta.endog, dta.exog, weights=scaled_weights).fit()
res_gls = sm.GLS(dta.endog, dta.exog, sigma=np.diag(1./weights)).fit()
@josef-pkt
Member

To spell it out: I'm not sure if we should be doing this, as long as we don't have full support for different kinds of weights, especially frequency/case weights.
(Stata's behavior with weights is also not undisputed from what I have seen in several FAQs or mailing list threads.)

@bashtage
Contributor
bashtage commented Nov 3, 2013

As you said, the deep issue is whether the weights in WLS are scale or frequency weights. If they are scale, then WLS should be identical to GLS using a simple transformation of the weights to get sigma. If they are frequency weights, then the log-likelihood is actually different.

I have implemented the log-likelihood as defined in the loglike method of WLS. The current version was not right and was missing the term which corresponds to the math:

\frac{1}{2}log\left(\left|W\right|\right)

Including this term makes GLS the same as WLS using transformed weights.

I added some tests for OLS/GLS/WLS equivalence including some edge cases.

I also fixed a bug in WLS/GLS which occurs when the weights are very large which resulted in an inf likelihood.

https://github.com/bashtage/statsmodels/tree/linear-model-robust-tests

I'm pretty bad at GIT still so I probably have far too many changes in this branch.

@bashtage
Contributor
bashtage commented Nov 3, 2013

There are further advantages to normalization, in particular using normalized weight vector is useful for avoiding numerical issues due to weights being poorly specified. For example, using

w = 100 * np.ones_like(endog)

which produces identical parameter estimates will break LL, AIC and BIC.

I think these are all reasonable choices, and I think it is reasonable to leave GLS alone and use normalized Weights in WLS, and correct the (mathematical) definition of the loglike in WLS so that the code does what the docs state.

Most importantly the docstrings should simply explain these choices so that users know what assumptions/choices are being made.

@bashtage
Contributor
bashtage commented Nov 3, 2013

loglike/llf in linear models (at least in one of the three that I checked in detail) is based on the concentrated loglikelihood, with scale/variance estimated by the unbiased least squares estimator not the biased MLE (small difference in denominator)

This is a little surprising since the unbiased scale estimators are not ML estimators. I suppose this is the risk of trying to emulate the behavior of other software which may be broken.

@josef-pkt
Member

about unbiased scale is not MLE: I think the point here is that OLS, WLS and GLS are not MLE estimates (if we count the scale as one of the parameters) they are standard Least Squares estimates.
'llf' is the loglike evaluated at the least squares estimate, as far as I understand.

I started at some point an MLE normal distributed model that directly estimates all parameters by MLE. That's when I discovered that the concentrated llf doesn't match the llf from the full normal MLE.
I think in most cases the concentrated MLE in the models are straight from textbooks like Greene.

The GLS subclasses with autoregressive GLSAR or GLSHet are also iterative Least Squares estimates, not directly MLE. (We still need to add full MLE estimators for those.)

Although these estimates are asymptotically the same as MLE.

The statisticians use very often REML in similar cases, which penalized second moment estimates to get an unbiased version instead of the biased MLE version.
But even after a few years, I still didn't understand well enough what their REML is, to be able to include it in code or documentation.

@josef-pkt
Member

about normalizing WLS

w = 100 * np.ones_like(endog)

Before we automatically normalize in these cases, I think we should check if the interpretation of weights as the product of frequency weights times GLS-like scale_weights works. If it does, then we could add an option to the model and include both kinds of weights.
I didn't expect that this might already (almost) work. see #743
Including also frequency weights would be very useful for large data sets like survey data that don't have continuous explanatory variables.

For descriptive statistics and hypothesis testing with weights, I used nobs_effective = sum(weights)
https://github.com/statsmodels/statsmodels/blob/master/statsmodels/stats/weightstats.py#L42
The unit tests compare frequency weights with the repeated data as_repeats in a case when weights are integers:
https://github.com/statsmodels/statsmodels/blob/master/statsmodels/stats/tests/test_weightstats.py#L71

@bashtage
Contributor
bashtage commented Nov 3, 2013

The loglike is actually correct for OLS and GLS since it is based on SSR and nobs and doesn't make any reference to the degree of freedom correction. The WLS loglike in master is a bit incongruous. The implementation is one which is correct for normalized weights, but no weight normalization is taking place. OTOH, it doesn't match the GLS likelihood which is what the docstring says.

@bashtage
Contributor
bashtage commented Nov 3, 2013

The GLS subclasses with autoregressive GLSAR or GLSHet are also iterative Least Squares estimates, not directly MLE. (We still need to add full MLE estimators for those.)

Although these estimates are asymptotically the same as MLE.

In some cases these are also the MLE estimates using the EM-algorithm - that is, maximize the log-likelihood for a fixed covariance, and then use expectations to compute the covariance using the residuals.

I think always computing the log-likelihood conditional on sigma (in the GLS case) is reasonable and defendable.

@josef-pkt
Member

The loglike is actually correct for OLS and GLS since it is based on SSR and nobs and doesn't make any reference to the degree of freedom correction.

my mistake: I mixed up where the difference is: llf is the loglike of the MLE estimate and not of the least squares estimate #323

@josef-pkt
Member

I'm giving up with trying to match Stata's llf for WLS

I also think Kevin's change that makes WLS equivalent to GLS makes more sense than matching Stata.
In that case WLS has the straightforward heteroscedasticity interpretation, and llf is invariant to the scale of the weights.

From reverse engineering the results, it looks like the correction to Stata's llf is the following, which seems to corresponds to the rescaling Stata is doing for aweights and pweights.

>>> tt.res1.llf - 0.5 * np.sum(np.log(weights / weights.sum() * nobs))
-476.97929465628067
>>> res_wls.llf - 0.5 * np.sum(np.log(weights / weights.sum() * nobs))
-476.97929465628062
>>> tt.res2.llf    # Stata from test case
-476.9792946562806

The only test failures that I can see, are the direct comparison with Stata in
statsmodels.regression.tests.test_regression.TestWLSExogWeights in loglike, aic and bic.

aside: checking llf for GLSAR which were not enabled,
(GLSHET still doesn't have unit tests because I didn't find an exact equivalent for the FGLS estimator.
Stata can estimate heteroscedasticity only through arch with MLE, as far as I could see.)

glsar against GRETL

======================================================================
FAIL: statsmodels.regression.tests.test_glsar_gretl.TestGLSARGretl.test_all
----------------------------------------------------------------------
Arrays are not almost equal to 7 decimals
 ACTUAL: -763.97521816022379
 DESIRED: -763.9752

a GLSAR case against Stata prais ..., corc rhotype(tscorr)

Arrays are not almost equal to 7 decimals
 ACTUAL: -759.50012510798592
 DESIRED: -759.5001027340874
@bashtage
Contributor
bashtage commented Nov 4, 2013

I have thought a bit more about it and it really it all comes down to how weights should be treated.

The frequency weights in Stata erms should definitely affect the LL since if y_i has a weight w_i when y_i should be repeated n_i times in the model, and so for example if w_i was always 2, the log-likelihood would change by a factor of 2 relative to OLS.

On the other hand, if they are average weights, which essentially turns WLS into GLS, then the log-likelihoods should match..

I don't know a lot about probability weights, and these will probably affect the log-likelihood since they are probabilities. Would need to find a reference for this or derive it from principles.

Importance weights have no theoretical justification and are essentially prior information about which observation the researcher things are more important. In this case, I think treating these more or less the same as average weights so that if they are all the same (up to scale) then the log-likelihood is unaffected.

  • Frequency weights: change affect LL when all identical and !=1.0
  • Average weights: No change in LL when all identical and !=1.0, should exactly match GLS by reinterpreting the weights as sigma_i = 1/sqrt(w_i)
  • Probability weights: Probably affect the LL, although not completely sure when all probability weights are the same
  • Importance weights: I would always normalize these so that sum(w)=nobs, and then treat these as average weights

Eventually general weighting of cross-sectional models may be useful, and I think it would make sense to reconsider this then. Probably further off in the future.

Current
I think I would always treat as average weights and state the relationship between GLS and WLS in the docstring for WLS

@josef-pkt josef-pkt added a commit to josef-pkt/statsmodels that referenced this issue Nov 17, 2013
@josef-pkt josef-pkt TST: enable tests for llf after change to WLS.loglike see #1170 d955aac
@josef-pkt
Member

@bashtage Do you know what happened to the commit that made loglike/llf of WLS the same as the corresponding loglike/llf of WLS?

I thought I merged this as part of the PRs, but master still seems to be missing the adjustment in WLS.loglike

@bashtage
Contributor

Maybe gone forever. I reset my master after you rebased so I could work on new branches, and so this may be gone forever.

Is a pretty simple fix.

@josef-pkt
Member

Can you make a PR?
I can adjust the unit tests that fail, given that we won't match Stata anymore.

There is a unit test class TestOLS_GLS_WLS_equivalence that uses weight=ones, I don't remember if you had the same for WLS/GLS with unequal weights.

@josef-pkt josef-pkt referenced this issue Dec 18, 2013
Merged

Wls llf fix #1253

@josef-pkt josef-pkt closed this in #1253 Dec 19, 2013
@josef-pkt
Member

We merged the change in PR #1253 and adjusted the test.
The only difference between WLS and GLS that I checked in rsquared and centered_tss #1252

Thanks Kevin for looking into these things and improving them.

@josef-pkt josef-pkt added a commit to josef-pkt/statsmodels that referenced this issue Dec 19, 2013
@josef-pkt josef-pkt TST: enable tests for llf after change to WLS.loglike see #1170 1ac21e4
@PierreBdR PierreBdR pushed a commit to PierreBdR/statsmodels that referenced this issue Sep 2, 2014
@josef-pkt josef-pkt TST: enable tests for llf after change to WLS.loglike see #1170 6e59f67
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment