Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FAQ: Which models provide robust mean function estimation? #2074

Open
josef-pkt opened this issue Nov 1, 2014 · 0 comments
Open

FAQ: Which models provide robust mean function estimation? #2074

josef-pkt opened this issue Nov 1, 2014 · 0 comments
Labels

Comments

@josef-pkt
Copy link
Member

I would like to estimate the conditional expectation E(y | x) that is robust to other misspecifications. Which models in statsmodels can I use? What are the properties and advantages of each?

Suppose we have a mean function E(y | x) = m(x, b) where m is an known function and we want to estimate the parameters b. m is the inverse link in GLM or GEE terminology.

All distributions in the linear exponential family can estimate the parameters b consistently without requiring any additional assumptions. However, efficiency of the estimate depends on the specific distribution or more precisely it's implied weighting function.
Also, if we don't want to make variance or distributional assumptions, then we need to use robust covariance matrices for inference.

Caveat: Since we only require that we estimate the mean function, we cannot use other properties of the distribution for prediction. For example the implied distribution of future observations might be incorrect, even though the expectation is consistently estimated.

This robustness applies to all currently implemented families and links in genmod, GLM and GEE, and to the corresponding distributions in discrete and regression.

  • normal/OLS: for data on the real line R, implied variance function is constant, independent of mean
    (GLM family Gaussian, OLS)
  • Bernoulli : for data bound in a compact interval, specifically set {0,1} or real interval [0,1]. Example for first is binary response and for second it is proportions. variance function p * (1 - p) where p = m = E(y | x) ? check
  • Binomial : similar to Bernoulli for integer or real data between zero and known upper bounds. check
    Note: robust cov most likely does not work
  • Poisson : for nonnegative data, nonnegative integers or positve real line R_+ (including zero)
    implied variance function, equal or proportional to m
  • NegativeBinomial, Negbin 1: nonnegative data as in Poisson, variance function m + c m**2 ???
  • geometric : special case of NegativeBinomial
  • ...
  • gamma
  • inverse...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant