Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unexpected post-convergence loss value #1

Open
benthestatistician opened this issue Oct 5, 2016 · 2 comments
Open

Unexpected post-convergence loss value #1

benthestatistician opened this issue Oct 5, 2016 · 2 comments

Comments

@benthestatistician
Copy link

Dear @msalibian, @kollerma and @mmaechler,

I’ve got a question about the lmrob.S function in robustbase. Perhaps it relates to the goals of this repo, so I’ll take a chance on raising it here.

S-estimates of scale should give a “loss.S” value (to use terms of fasts.R in this repo) equal to b:

$$\frac{1}{n} \sum_1^n \tilde{\rho}(r_i/s) = b,$$

where ${ r_i : i}$ are residuals of the S-fit. Indeed, the S-estimate of scale is defined to be the smallest value of $s$ for which the mean at left is smaller than $b$. However, this appears not to be the case for the S-estimates generated in the examples of robustbase’s lmrob.

I take robustbase’s bb to correspond to the $b$ in my equation above, and its Mchi to my \tilde{\rho}. So in the examples below, the loss.S value looks too small:

>  set.seed(0)
>  m1 <- lmrob(Y ~ ., data=coleman) 
> class(m1$init.S)
[1] "lmrob.S"
> m1$init.S$control$bb
[1] 0.5
> mean(with(m1$init.S, Mchi(residuals/scale, cc= control$tuning.chi, psi = control$psi) ))
[1] 0.3499999

Correspondingly, there are smaller values of $s$ satisfying $\frac{1}{n} \sum_1^n \tilde{\rho}(r_i/s) \leq b$.

> mean(with(m1$init.S, Mchi(residuals/(scale *.7), cc= control$tuning.chi, psi = control$psi) ))
[1] 0.452219

Looking at detailed feedback from the fit by increasing control$trace.lev to 2 (not shown), the difference between scale * .7 and scale does seem large by lmrob.S’s standards for this problem.

I see something similar with similar with lmrob’s other data example, if a bit less extreme.

> m2 <- lmrob(Y ~ ., data=coleman, setting = "KS2011") #to evolve the seed as in the examples
> RlmST <- lmrob(log.light ~ log.Te, data = starsCYG)
> RlmST$init.S$control$bb
[1] 0.5
> mean(with(RlmST$init.S, Mchi(residuals/scale, cc= control$tuning.chi, psi = control$psi) ))
[1] 0.4787234
> mean(with(RlmST$init.S, Mchi(residuals/(scale *.97), cc= control$tuning.chi, psi = control$psi) ))
[1] 0.4917663

These calculations were done with robustbase version 0.92.5, R version 3.2.3.

Was I wrong to think $s$ should solve loss.S = b? Perhaps the solutions are only approximate, with the quality of the solution increasing with sample size? -Ben

@msalibian
Copy link
Owner

@benthestatistician @mmaechler @kollerma I will check this more carefully later today, but this could be related to the fact that the S-estimator in lmrob doesn't solve mean( loss ) = b, but rather sum(loss)/(n-p) = b where n is the sample size, and p is the number of explanatory variables (including the intercept if present).

@benthestatistician
Copy link
Author

Thanks @msalibian. This certainly explains what I'm seeing:

> sum(with(m1$init.S, Mchi(residuals/scale, cc= control$tuning.chi, psi = control$psi) )) / 
+ m1$df.residual
[1] 0.4999999
> sum(with(RlmST$init.S, Mchi(residuals/scale, cc= control$tuning.chi, psi = control$psi) )) / 
+ RlmST$df.residual
[1] 0.5

I think the robustbase function .vcov.avar1 has the same expectation I did, that one should have mean( loss ) ==b, at this line:

u4 <- mean(w0^2 - bb^2) * tcrossprod(a)

Here the vector w0 encodes the loss. The mean part of this calculation aims to recover an empirical variance of that vector, but only does so if mean(w0) == bb, which isn't the case (as seen on this thread).

Perhaps you or @mmaechler could refer this further as appropriate? I'd also be happy to reach out to whoever's in the best position to act on this, but in that case I'd appreciate your advice on who that should be.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants