You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The new GAM implementation does not have support var_weights or freq_weights.
Maybe also lowess or similar could have data weights.
Somewhere (?) I saw it mentioned that we can handle repeated observations by combining them to an average and use var_weights.
The advantage for non- or semiparametric estimation would be that we have unique exog, and don't have to worry about replicates. For example lowess currently cannot handle large number of replicates or categorical explanatory variables because of the way neighborhoods are computed.
However, the problem is that var_weights or freq_weights would have to be taken into account when we choose knot location based on quantiles of the observed exog. The current algorithm for quantile knots base on patsy/mgcv does not handle weighted quantiles. On the other hand, there is also an advantage to selecting knots based on quantiles of the unique data, i.e. we avoid setting knots to close to each other, but we also loose large knot density at high exog density regions.
Related model: An old Silverman article on splines uses var_weights for the motorcycle dataset which has clear heteroscedasticity increasing in the explanatory variable (time).
(I have not looked at heteroscedasticity in splines yet, except that we have automatic, inherited support for cov_types like HC0.
One detail: helper functions to aggregate raw data to condensed data either by combining unique observations and using freq_weights or combining unique exog with average endog and var_weights.
AFAIR, there are some examples in the GLM weights notebooks or unit tests.
The text was updated successfully, but these errors were encountered:
The new GAM implementation does not have support var_weights or freq_weights.
Maybe also lowess or similar could have data weights.
Somewhere (?) I saw it mentioned that we can handle repeated observations by combining them to an average and use var_weights.
The advantage for non- or semiparametric estimation would be that we have unique exog, and don't have to worry about replicates. For example lowess currently cannot handle large number of replicates or categorical explanatory variables because of the way neighborhoods are computed.
However, the problem is that var_weights or freq_weights would have to be taken into account when we choose knot location based on quantiles of the observed exog. The current algorithm for quantile knots base on patsy/mgcv does not handle weighted quantiles. On the other hand, there is also an advantage to selecting knots based on quantiles of the unique data, i.e. we avoid setting knots to close to each other, but we also loose large knot density at high exog density regions.
Related model: An old Silverman article on splines uses var_weights for the motorcycle dataset which has clear heteroscedasticity increasing in the explanatory variable (time).
(I have not looked at heteroscedasticity in splines yet, except that we have automatic, inherited support for cov_types like HC0.
One detail: helper functions to aggregate raw data to condensed data either by combining unique observations and using freq_weights or combining unique exog with average endog and var_weights.
AFAIR, there are some examples in the GLM weights notebooks or unit tests.
The text was updated successfully, but these errors were encountered: