You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The loss should be using the triangular structure, and thus additive nature of the log-likelihood.
For each sub log-likelihood, we may add the information criterion component.
I.e., we seek to evaluate $$l(u;\hat{\Lambda})=\sum_j l(u_j;\hat{C}_j)$$
and to evaluate $$E[l(u_{test};\hat{\Lambda})]$$
as $$E[l(u_{test};\hat{\Lambda})]\approx \sum_j l(u_{j,~ train};\hat{C}_j) + IC(\hat{C}_j)$$
where we may try for $IC(\hat{C}_j)$
the
AIC: $IC(\hat{C}_j)=ne(j)+1$, easy, standard, and may be computed locally for $\hat{C}_j$ or globally for $\hat{\Lambda}$. Negative: Asymptotics and assumes population precision is in the family of precisions being estimated over.
AICc: $IC(\hat{C}_j)=k + \frac{k^2+k}{n-k-1}$ for $k=ne(j)+1$, easy and relatively standard. May only be computed locally on $\hat{C}_j$. It is an adjustment for small sample sizes but requires $k < n$ still. It also has the same assumptions on the population precision as the AIC.
TIC: $IC(\hat{C}_j)=tr[\nabla^2 l(u_j;\hat{C}_j) (\nabla l(u_j;\hat{C}_j)^2)])$ can be computed locally or globally. Locally makes sense, as we have access to derivatives from the optimization. Avoids the assumptions on population precision.
All of the above employs asymptotic results. Is it possible to use e.g. the bootstrap (or the bootstrap in the frequentist domain) to alleviate these assumptions for when $n$ is small?
The text was updated successfully, but these errors were encountered:
To second order, quite generally we have $$IC(\theta) = tr\left(E\left[\nabla_\theta^2 l(u;\hat{\theta})\right] Cov(\hat{\theta})\right)$$
The sample average to replace $Cov(\hat{\theta})$ is not the best estimator. In fact the "trace inner product" induces the Frobenius norm as a measure, and there exists results on adaptive inflation to improve the estimator under this norm.
It is not exactly the sample estimator that is employed for $Cov(\hat{\theta})$ but rather the Delta method using the sample covariance for $Cov(\nabla_\theta l(u;\hat{\theta}))$. The argument on "best estimator" above still applies. This is particularly relevant for $p>>n$ but a global maximum exists (i.e. under L-2 regularization of the objective).
The loss should be using the triangular structure, and thus additive nature of the log-likelihood.
$$l(u;\hat{\Lambda})=\sum_j l(u_j;\hat{C}_j)$$
For each sub log-likelihood, we may add the information criterion component.
I.e., we seek to evaluate
and to evaluate
$$E[l(u_{test};\hat{\Lambda})]$$
$$E[l(u_{test};\hat{\Lambda})]\approx \sum_j l(u_{j,~ train};\hat{C}_j) + IC(\hat{C}_j)$$ $IC(\hat{C}_j)$
as
where we may try for
the
All of the above employs asymptotic results. Is it possible to use e.g. the bootstrap (or the bootstrap in the frequentist domain) to alleviate these assumptions for when$n$ is small?
The text was updated successfully, but these errors were encountered: