New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fixes for weighted kernel fits #1103
Conversation
candidate for backporting to 0.5.1 still needs review because I don't see whether this handles the weights in all required places, and whether weights should be normalized to sum to 1. I haven't looked yet at how Stata is doing this. |
n = len(xs) | ||
#xs = self.inDomain( xs, xs, x )[0] | ||
|
||
if len(xs)>0: ## Need to do product of marginal distributions | ||
#w = np.sum([self(self._Hrootinv * (xx-x).T ) for xx in xs])/n | ||
#vectorized doesn't work: | ||
w = np.mean(self((xs-x) * self._Hrootinv )) #transposed | ||
if self.weights is not None: | ||
w = np.sum(self((xs-x)/h).T * self_Hrootinv * self.weights, axis=1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why does this have /h
but the else doesn't?
given that this uses sum and the else uses mean, does this mean that weights needs to be normalized in a specific way? sum to 1?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, this looks like a typo in my behalf. I'll have a look at it again soon, I may have had a good reason at the time that I have forgotten now.
Yes I am assuming weights sum to 1 - could easily just normalise weights to force this constraint?
To clarify: This looks good to me and fixes BUG #973 for evaluate/density, but I'm not sure this is all we should do to support weights. |
Yes I did wonder if there we more places weight should be supported. I had a quick look through the rest of the kernel implementation and couldn't see any gaps - most other functions use the evaluated density I think (so weights are included). Will have another look through this again when I get a chance over the next day or so. |
Yes, I'd like to review this when I have time to sit down and look. Weights will still be unsupported for the FFT version, but that's not so trivial IIRC. I'm hoping someone else can make sense of the Silverman book there or find a better reference. |
…nto weights_patch
It took a little more than the few days I hoped, but I patched up that error you pointed out. |
@@ -334,6 +335,8 @@ def kdensity(X, kernel="gau", bw="scott", weights=None, gridsize=None, | |||
weights = np.ones(nobs) | |||
q = nobs | |||
else: | |||
# ensure weights is a numpy array |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if this is the best solution - but the error being thrown if weights was passed as, for example, a list, was very uninformative.
I see that the Travis CI build for this has failed, but the log doesn't seem to indicate it is a problem with the patch. Can you ask Travis to retry? |
I restarted the TravisCI 2.7 job, python 3 was green so there shouldn't be an error or failure. |
Thanks @Padarn, especially also for the test case. I'd like to merge this as is, after a rebase to current master. I also took your test case to compare with Stata. and Stata also has the same results. I saw that Stata uses by default bw='silverman' while we use 'scott' |
|
||
|
||
|
||
kde_vals = [kde.density[10*i] for i in range(6)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bug issue #823 is for evaluate
??? I'm still confused which is which
#1239 problem with evaluate with bounded support kernel for gaussian it looks fine, and the fix in this PR looks good
|
I started a new branch and PR for this #1240 |
Apologies for the confusion around issue number, I got a bit lost too I'll take a closer look tonight at the new PR, but I'll assume that you no longer want me to rebase this. |
Yes, all changes from this PR are now in mine, so I will merge #1240 I'd like to work a bit more on it, but I won't have the time to get weights fully incorporated into the kernels. |
I'll have some free time later this week. If you put together a small 'wishlist' before you stop working on it, I'll see what I can do. |
That would be very good. Right now only Gaussian kernel seems to work. Gaussian passes with and without weights, so the change in this PR with normalized weights looks fine. |
Okay great. I'll follow the new PR and see what I can do. Github question: Should I wait until you accept your current PR before attempting to work on it? |
Yes, wait with working on it. (I'm pretty sure I merge tonight.) What would be useful is to try out more examples of parts that don't have tests, and add more example scripts. |
Kde weights closes statsmodels#1103 closes statsmodels#823 closes statsmodels#1245
Small fixes so that 'evaluate' uses weights if they were used to fit the kernel.
Note: I have assumed weights are given that sum to 1 - perhaps a check should be added for this?
Ref #973
for weights bug see issue #823