I'm not sure that they're correct. It doesn't look like they're tested. The estimation is fine, but the sum of squares results, etc. do not match up with Stata (not sure if it's just a scaling issue yet).
Hmm, it looks like we are closer to Stata's frequency weights in our assumptions rather than analytic weights but we don't scale nobs as using frequency weights would warrant. This needs a closer look.
related: I was looking at case weights and how they should be defined, while fixing this
for some of the tests it will be important how the weights are interpreted. But I didn't go back to Stata yet to figure out which is which.
This may take a large fix. Something similar to how PySAL is handling weights with objects for different interpretations.
related: I had opened #481
Do you have more details on our differences to Stata, or maybe some tests that are knownfail?
My guess would be that there might be just a few places where our WLS differs from Stata's intepretation.
We use WLS heavily in RLM and GLM, and my impression is that it's just a reparameterized version of GLS with heteroscedasticity.
We weren't calculating the weighted sum of squares correctly. We're fine on SSR, but then we define TSS and subsequently ESS, etc. they're wrong. It should be for centered
tss = np.sum(self.model.weights*(self.model.endog - np.average(self.modell.endog, weights=self.model.weights))**2)
We're also not calculating the scale "correctly" according to Stata but we agree with R. In Stata, the number of observations is always adjusted to that of the sum of weights (except for analytic weights, here the weights are adjusted to the nobs). For importance weights and frequency weights they use for the residual standard deviation (we report the variance)
scale = (np.dot(selfwresid, self.wresid) / (self.model.weights.sum() - self.model.df_model))**.5
For what they call analytic weights, it's the same but the weights are scaled to sum to N before estimation.
what does R do for tss and Rsquared?
What WLS is doing right now is the same as GLS, using wresid. So the interpretation is that weights_i = 1/sigma_i^2 in a heteroscedastic model, I think.
Maybe we should add a keyword for how the weights should be interpreted, similar to the different weights in Stata and SAS.
For the weighted stats I had the same problem (and bug) as in your definition for tss and scale with the interpretation of weights as case weights, where I assumed sum of weights is nobs.
Judging by R2, R does the same for the TSS as Stata without the scaling of weights. I'm okay with not doing any magic to the weights under the hood right now. If you want scaling, you just do
scaled_weights = ((weights * nobs)/weights.sum())
I think we can leave more complicated stuff for weights objects down the road.
What I'm struggling with now is scalar weights. I don't want to carry around another nobs shaped array, but if we want things to be general, we should convert scalar to an nobs 1d array. Otherwise, we have to check always. Speed vs. memory trade-off. Thoughts?
you mean the weights array itself?
I'm on a branch without the missing changes. I think large parts of the code directly worked with weights = np.array() either through special casing or broadcasting.
Keeping a full weights array around for WLS is not a big loss, because for the scalar case it is equivalent to OLS.
The parts that "worked" before weren't doing the right thing, though. I think I can sort this out.
I'm not sure what to do with the nobs x nobs sigma though for TSS, etc. I need to go back and look at the assumptions of GLS.
I would like to get rid of all nobs,nobs calculations/arrays in GLS, and make it completely overwritable by subclasses.
The only area I know that we use rsquared internally are the Lagrange Multiplier tests in diagnostics. I'm not sure I remember this correctly, but much of the test statistics and inference just work on wresid.
Sometimes I wasn't sure whether we need resid or wresid.
Also, for GLS I care more about that the defaults are correct for the common subclasses, GLSAR and GLSHet, since I doubt that GLS itself will see much use.
Then should I separate out GLS results from OLS/WLS that way we don't have to worry about incorrect R-squared, etc. It looks like GLS gets the most use when it's used in the context of panel estimation or some kind of structured data. No one seems to worry about the efficiency gains and just uses OLS + robust standard errors anyway, so there aren't many packages that do a generic GLS with full linear model results.
If you don't have an explicit case for GLS that doesn't work, then I wouldn't separate out a GLSResults until we find that it's necessary. I'd rather avoid having too many different result classes.
rsquared is pretty limited as in the xtgls reference, so I didn't worry too much whether it's correct in another corner case. But I was reading on Lagrange Multiplier tests again recently, and many of them have a representation in terms or rsquared
GLS doesn't work now that we're using weights correctly in TSS, because there are no weights in GLS and you can't use wendog to compute it. So we have several results statistics that are wrong. xtgls also doesn't return any of the sum of square or Rsquared statistics AFAICT. The only correct workaround I see is to have a GLSResults which is a subset of the (weighted) linear models, then have the rest of the sum of squares results added in to have the OLSResults class. There may be a way to compute sum of squares statistics for the GLS case, but I don't know it. I'd prefer to have things correct in other classes rather than lumped together. and if in the future we can extend ANOVA (or whichever appropriate variant) to the GLS case then we'll do it then.
For example, the way I have it now is to check for a weights attribute and if it doesn't exist (i.e., GLS) return the wrong old statistic.