fixes for weighted kernel fits #1103

Padarn · 2013-10-06T00:30:32Z

Small fixes so that 'evaluate' uses weights if they were used to fit the kernel.

Note: I have assumed weights are given that sum to 1 - perhaps a check should be added for this?

Ref #973

for weights bug see issue #823

coveralls · 2013-10-06T00:38:56Z

Coverage remained the same when pulling 8668fa8 on Padarn:weights_patch into 3b7082c on statsmodels:master.

josef-pkt · 2013-10-23T01:15:02Z

candidate for backporting to 0.5.1

still needs review because I don't see whether this handles the weights in all required places, and whether weights should be normalized to sum to 1. I haven't looked yet at how Stata is doing this.

josef-pkt · 2013-10-23T01:22:08Z

statsmodels/sandbox/nonparametric/kernels.py

        n = len(xs)
        #xs = self.inDomain( xs, xs, x )[0]

        if len(xs)>0:  ## Need to do product of marginal distributions
            #w = np.sum([self(self._Hrootinv * (xx-x).T ) for xx in xs])/n
            #vectorized doesn't work:
-            w = np.mean(self((xs-x) * self._Hrootinv )) #transposed
+            if self.weights is not None:
+                w = np.sum(self((xs-x)/h).T * self_Hrootinv * self.weights, axis=1)


why does this have /h but the else doesn't?
given that this uses sum and the else uses mean, does this mean that weights needs to be normalized in a specific way? sum to 1?

Hmm, this looks like a typo in my behalf. I'll have a look at it again soon, I may have had a good reason at the time that I have forgotten now.

Yes I am assuming weights sum to 1 - could easily just normalise weights to force this constraint?

josef-pkt · 2013-10-23T01:26:43Z

To clarify: This looks good to me and fixes BUG #973 for evaluate/density, but I'm not sure this is all we should do to support weights.

Padarn · 2013-10-23T01:31:35Z

Yes I did wonder if there we more places weight should be supported. I had a quick look through the rest of the kernel implementation and couldn't see any gaps - most other functions use the evaluated density I think (so weights are included).

Will have another look through this again when I get a chance over the next day or so.

jseabold · 2013-10-23T10:01:25Z

Yes, I'd like to review this when I have time to sit down and look. Weights will still be unsupported for the FFT version, but that's not so trivial IIRC. I'm hoping someone else can make sense of the Silverman book there or find a better reference.

…nto weights_patch

Padarn · 2013-11-09T08:07:22Z

It took a little more than the few days I hoped, but I patched up that error you pointed out.

Padarn · 2013-11-09T08:08:11Z

statsmodels/nonparametric/kde.py

@@ -334,6 +335,8 @@ def kdensity(X, kernel="gau", bw="scott", weights=None, gridsize=None,
        weights = np.ones(nobs)
        q = nobs
    else:
+        # ensure weights is a numpy array


Not sure if this is the best solution - but the error being thrown if weights was passed as, for example, a list, was very uninformative.

Padarn · 2013-11-16T21:57:05Z

I see that the Travis CI build for this has failed, but the log doesn't seem to indicate it is a problem with the patch. Can you ask Travis to retry?

josef-pkt · 2013-11-16T22:05:43Z

I restarted the TravisCI 2.7 job, python 3 was green so there shouldn't be an error or failure.

coveralls · 2013-11-16T22:13:09Z

Coverage remained the same when pulling 900e9e1 on Padarn:weights_patch into 9d4b1f8 on statsmodels:master.

coveralls · 2013-11-24T06:12:28Z

Coverage remained the same when pulling 97820b4 on Padarn:weights_patch into 9d4b1f8 on statsmodels:master.

josef-pkt · 2013-12-16T16:45:23Z

Thanks @Padarn, especially also for the test case.

I'd like to merge this as is, after a rebase to current master.
Can you do a rebase and force push? Otherwise, it's easy for me to do it.

I also took your test case to compare with Stata. and Stata also has the same results. I saw that Stata uses by default bw='silverman' while we use 'scott'
Stata doesn't have other results than density, so I cannot check the other methods that we have.

josef-pkt · 2013-12-16T17:31:50Z

I find it difficult to find my way around here.
There is already a csv file with results for weighted kde in the results folder.
weights were added in 851a278
but only to kdensity, while KDEUnivariate didn't handle it where it didn't call kdensity.

open issue #823

josef-pkt · 2013-12-16T17:44:14Z

statsmodels/nonparametric/tests/test_kernel_density.py

+
+
+
+        kde_vals = [kde.density[10*i] for i in range(6)] 


bug issue #823 is for evaluate
??? I'm still confused which is which

josef-pkt · 2013-12-16T18:03:12Z

#1239 problem with evaluate with bounded support kernel

for gaussian it looks fine, and the fix in this PR looks good

>>> kde2.fit(kernel='gau', fft=False)
>>> kde2.density[:5]
array([ 0.00016808,  0.00035405,  0.00070727,  0.0013416 ,  0.0024194 ])
>>> kde2.evaluate(kde2.support[:5])
array([ 0.00016808,  0.00035405,  0.00070727,  0.0013416 ,  0.0024194 ])

josef-pkt · 2013-12-16T19:28:40Z

I started a new branch and PR for this #1240
I needed to normalize the weights to get the test to pass with evaluate.
I'm not sure why I got the same result with density and evaluate in the previous comment, but I have too many kde instances in my experimenting python session.

Padarn · 2013-12-16T22:21:32Z

Apologies for the confusion around issue number, I got a bit lost too

I'll take a closer look tonight at the new PR, but I'll assume that you no longer want me to rebase this.

josef-pkt · 2013-12-16T22:40:33Z

Yes, all changes from this PR are now in mine, so I will merge #1240

I'd like to work a bit more on it, but I won't have the time to get weights fully incorporated into the kernels.
Looks like there is still some cleanup work to do to get this less "confusing".

Padarn · 2013-12-16T23:10:49Z

I'll have some free time later this week. If you put together a small 'wishlist' before you stop working on it, I'll see what I can do.

josef-pkt · 2013-12-17T00:15:58Z

That would be very good. Right now only Gaussian kernel seems to work.
After fixing the missing asarray in kernels.CustomKernel.density, the three other kernels that are in test_kde.py all fail for KDEUnivariate.evaluate, and the numbers don't make sense.

Gaussian passes with and without weights, so the change in this PR with normalized weights looks fine.

Padarn · 2013-12-17T22:42:32Z

Okay great. I'll follow the new PR and see what I can do.

Github question: Should I wait until you accept your current PR before attempting to work on it?

josef-pkt · 2013-12-17T23:08:43Z

Yes, wait with working on it. (I'm pretty sure I merge tonight.)
If you want you could review my pull request to see if I should do something before merge.
But you can work at it from master after merging.

What would be useful is to try out more examples of parts that don't have tests, and add more example scripts.

Kde weights closes statsmodels#1103 closes statsmodels#823 closes statsmodels#1245

fixes for weighted kernel fits

8668fa8

Padarn mentioned this pull request Oct 6, 2013

Additions to Univariate KDEs #973

Open

josef-pkt reviewed Oct 23, 2013
View reviewed changes

Padarn added 2 commits November 9, 2013 18:42

Merge branch 'master' of https://github.com/statsmodels/statsmodels i…

c6481d2

…nto weights_patch

bug fixes

900e9e1

Padarn reviewed Nov 9, 2013
View reviewed changes

adding tests

97820b4

josef-pkt reviewed Dec 16, 2013
View reviewed changes

josef-pkt mentioned this pull request Dec 16, 2013

Kde weights 1103 823 #1240

Merged

josef-pkt closed this in c0a62a0 Dec 18, 2013

PierreBdR pushed a commit to PierreBdR/statsmodels that referenced this pull request Sep 2, 2014

Merge pull request statsmodels#1240 from josef-pkt/kde_weights_1103_823

385bec1

Kde weights closes statsmodels#1103 closes statsmodels#823 closes statsmodels#1245

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fixes for weighted kernel fits #1103

fixes for weighted kernel fits #1103

Padarn commented Oct 6, 2013

coveralls commented Oct 6, 2013

josef-pkt commented Oct 23, 2013

josef-pkt Oct 23, 2013

Padarn Oct 23, 2013

josef-pkt commented Oct 23, 2013

Padarn commented Oct 23, 2013

jseabold commented Oct 23, 2013

Padarn commented Nov 9, 2013

Padarn Nov 9, 2013

Padarn commented Nov 16, 2013

josef-pkt commented Nov 16, 2013

coveralls commented Nov 16, 2013

coveralls commented Nov 24, 2013

josef-pkt commented Dec 16, 2013

josef-pkt commented Dec 16, 2013

josef-pkt Dec 16, 2013

josef-pkt commented Dec 16, 2013

josef-pkt commented Dec 16, 2013

Padarn commented Dec 16, 2013

josef-pkt commented Dec 16, 2013

Padarn commented Dec 16, 2013

josef-pkt commented Dec 17, 2013

Padarn commented Dec 17, 2013

josef-pkt commented Dec 17, 2013

fixes for weighted kernel fits #1103

fixes for weighted kernel fits #1103

Conversation

Padarn commented Oct 6, 2013

coveralls commented Oct 6, 2013

josef-pkt commented Oct 23, 2013

josef-pkt Oct 23, 2013

Choose a reason for hiding this comment

Padarn Oct 23, 2013

Choose a reason for hiding this comment

josef-pkt commented Oct 23, 2013

Padarn commented Oct 23, 2013

jseabold commented Oct 23, 2013

Padarn commented Nov 9, 2013

Padarn Nov 9, 2013

Choose a reason for hiding this comment

Padarn commented Nov 16, 2013

josef-pkt commented Nov 16, 2013

coveralls commented Nov 16, 2013

coveralls commented Nov 24, 2013

josef-pkt commented Dec 16, 2013

josef-pkt commented Dec 16, 2013

josef-pkt Dec 16, 2013

Choose a reason for hiding this comment

josef-pkt commented Dec 16, 2013

josef-pkt commented Dec 16, 2013

Padarn commented Dec 16, 2013

josef-pkt commented Dec 16, 2013

Padarn commented Dec 16, 2013

josef-pkt commented Dec 17, 2013

Padarn commented Dec 17, 2013

josef-pkt commented Dec 17, 2013