Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: weights for GAM and nonparametrics, especially var_weights #5379

Open
josef-pkt opened this issue Nov 13, 2018 · 0 comments
Open

ENH: weights for GAM and nonparametrics, especially var_weights #5379

josef-pkt opened this issue Nov 13, 2018 · 0 comments

Comments

@josef-pkt
Copy link
Member

The new GAM implementation does not have support var_weights or freq_weights.
Maybe also lowess or similar could have data weights.

Somewhere (?) I saw it mentioned that we can handle repeated observations by combining them to an average and use var_weights.

The advantage for non- or semiparametric estimation would be that we have unique exog, and don't have to worry about replicates. For example lowess currently cannot handle large number of replicates or categorical explanatory variables because of the way neighborhoods are computed.

However, the problem is that var_weights or freq_weights would have to be taken into account when we choose knot location based on quantiles of the observed exog. The current algorithm for quantile knots base on patsy/mgcv does not handle weighted quantiles. On the other hand, there is also an advantage to selecting knots based on quantiles of the unique data, i.e. we avoid setting knots to close to each other, but we also loose large knot density at high exog density regions.

Related model: An old Silverman article on splines uses var_weights for the motorcycle dataset which has clear heteroscedasticity increasing in the explanatory variable (time).
(I have not looked at heteroscedasticity in splines yet, except that we have automatic, inherited support for cov_types like HC0.

One detail: helper functions to aggregate raw data to condensed data either by combining unique observations and using freq_weights or combining unique exog with average endog and var_weights.
AFAIR, there are some examples in the GLM weights notebooks or unit tests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant