You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When using the default bandwidth selection with un-ordered variables in KDEMultivariate, the result depends on the numeric values of the training variables.
Compare:
import statsmodels.api as sm
x = np.array([0,0,0,0,1])
kde_x = sm.nonparametric.KDEMultivariate(data=x, var_type="u")
print kde_x.pdf([0])
# output: 0.61561605356
With
import statsmodels.api as sm
x = np.array([0,0,0,0,1]) * 1000
kde_x = sm.nonparametric.KDEMultivariate(data=x, var_type="u")
print kde_x.pdf([0])
# output: -183.58394644
This happens because the bandwidth is estimated as if the values were continuous. I suggest that this behavior is changed, or at the very least documented.
One suggestion for default might be to set the bandwidth to 1. This way the probability estimate overlaps with the MLE.
I'm happy to submit a pull request for this if people are ok with the default value.
The text was updated successfully, but these errors were encountered:
default is normal reference and is independent of the variable type.
AFAICS, bandwidth h=0 would put weight only on observations with the same category, in aitchison_aitken .
I don't see an option to specify normal reference for some variables, e.g. on continuous variables, and a fixed bw on other variables.
I'm puzzled by the previous result with bw=1., What's a density when we only have a categorical variable?
Version: '0.8.0'
When using the default bandwidth selection with un-ordered variables in KDEMultivariate, the result depends on the numeric values of the training variables.
Compare:
With
This happens because the bandwidth is estimated as if the values were continuous. I suggest that this behavior is changed, or at the very least documented.
One suggestion for default might be to set the bandwidth to 1. This way the probability estimate overlaps with the MLE.
I'm happy to submit a pull request for this if people are ok with the default value.
The text was updated successfully, but these errors were encountered: