Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fixes for weighted kernel fits #1103

Closed
wants to merge 4 commits into from
Closed

Conversation

Padarn
Copy link
Contributor

@Padarn Padarn commented Oct 6, 2013

Small fixes so that 'evaluate' uses weights if they were used to fit the kernel.

Note: I have assumed weights are given that sum to 1 - perhaps a check should be added for this?

Ref #973

for weights bug see issue #823

@coveralls
Copy link

Coverage Status

Coverage remained the same when pulling 8668fa8 on Padarn:weights_patch into 3b7082c on statsmodels:master.

@josef-pkt
Copy link
Member

candidate for backporting to 0.5.1

still needs review because I don't see whether this handles the weights in all required places, and whether weights should be normalized to sum to 1. I haven't looked yet at how Stata is doing this.

n = len(xs)
#xs = self.inDomain( xs, xs, x )[0]

if len(xs)>0: ## Need to do product of marginal distributions
#w = np.sum([self(self._Hrootinv * (xx-x).T ) for xx in xs])/n
#vectorized doesn't work:
w = np.mean(self((xs-x) * self._Hrootinv )) #transposed
if self.weights is not None:
w = np.sum(self((xs-x)/h).T * self_Hrootinv * self.weights, axis=1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why does this have /h but the else doesn't?
given that this uses sum and the else uses mean, does this mean that weights needs to be normalized in a specific way? sum to 1?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, this looks like a typo in my behalf. I'll have a look at it again soon, I may have had a good reason at the time that I have forgotten now.

Yes I am assuming weights sum to 1 - could easily just normalise weights to force this constraint?

@josef-pkt
Copy link
Member

To clarify: This looks good to me and fixes BUG #973 for evaluate/density, but I'm not sure this is all we should do to support weights.

@Padarn
Copy link
Contributor Author

Padarn commented Oct 23, 2013

Yes I did wonder if there we more places weight should be supported. I had a quick look through the rest of the kernel implementation and couldn't see any gaps - most other functions use the evaluated density I think (so weights are included).

Will have another look through this again when I get a chance over the next day or so.

@jseabold
Copy link
Member

Yes, I'd like to review this when I have time to sit down and look. Weights will still be unsupported for the FFT version, but that's not so trivial IIRC. I'm hoping someone else can make sense of the Silverman book there or find a better reference.

@Padarn
Copy link
Contributor Author

Padarn commented Nov 9, 2013

It took a little more than the few days I hoped, but I patched up that error you pointed out.

@@ -334,6 +335,8 @@ def kdensity(X, kernel="gau", bw="scott", weights=None, gridsize=None,
weights = np.ones(nobs)
q = nobs
else:
# ensure weights is a numpy array
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if this is the best solution - but the error being thrown if weights was passed as, for example, a list, was very uninformative.

@Padarn
Copy link
Contributor Author

Padarn commented Nov 16, 2013

I see that the Travis CI build for this has failed, but the log doesn't seem to indicate it is a problem with the patch. Can you ask Travis to retry?

@josef-pkt
Copy link
Member

I restarted the TravisCI 2.7 job, python 3 was green so there shouldn't be an error or failure.

@coveralls
Copy link

Coverage Status

Coverage remained the same when pulling 900e9e1 on Padarn:weights_patch into 9d4b1f8 on statsmodels:master.

@coveralls
Copy link

Coverage Status

Coverage remained the same when pulling 97820b4 on Padarn:weights_patch into 9d4b1f8 on statsmodels:master.

@josef-pkt
Copy link
Member

Thanks @Padarn, especially also for the test case.

I'd like to merge this as is, after a rebase to current master.
Can you do a rebase and force push? Otherwise, it's easy for me to do it.

I also took your test case to compare with Stata. and Stata also has the same results. I saw that Stata uses by default bw='silverman' while we use 'scott'
Stata doesn't have other results than density, so I cannot check the other methods that we have.

@josef-pkt
Copy link
Member

I find it difficult to find my way around here.
There is already a csv file with results for weighted kde in the results folder.
weights were added in 851a278
but only to kdensity, while KDEUnivariate didn't handle it where it didn't call kdensity.

open issue #823




kde_vals = [kde.density[10*i] for i in range(6)]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bug issue #823 is for evaluate
??? I'm still confused which is which

@josef-pkt
Copy link
Member

#1239 problem with evaluate with bounded support kernel

for gaussian it looks fine, and the fix in this PR looks good

>>> kde2.fit(kernel='gau', fft=False)
>>> kde2.density[:5]
array([ 0.00016808,  0.00035405,  0.00070727,  0.0013416 ,  0.0024194 ])
>>> kde2.evaluate(kde2.support[:5])
array([ 0.00016808,  0.00035405,  0.00070727,  0.0013416 ,  0.0024194 ])

@josef-pkt josef-pkt mentioned this pull request Dec 16, 2013
@josef-pkt
Copy link
Member

I started a new branch and PR for this #1240
I needed to normalize the weights to get the test to pass with evaluate.
I'm not sure why I got the same result with density and evaluate in the previous comment, but I have too many kde instances in my experimenting python session.

@Padarn
Copy link
Contributor Author

Padarn commented Dec 16, 2013

Apologies for the confusion around issue number, I got a bit lost too

I'll take a closer look tonight at the new PR, but I'll assume that you no longer want me to rebase this.

@josef-pkt
Copy link
Member

Yes, all changes from this PR are now in mine, so I will merge #1240

I'd like to work a bit more on it, but I won't have the time to get weights fully incorporated into the kernels.
Looks like there is still some cleanup work to do to get this less "confusing".

@Padarn
Copy link
Contributor Author

Padarn commented Dec 16, 2013

I'll have some free time later this week. If you put together a small 'wishlist' before you stop working on it, I'll see what I can do.

@josef-pkt
Copy link
Member

That would be very good. Right now only Gaussian kernel seems to work.
After fixing the missing asarray in kernels.CustomKernel.density, the three other kernels that are in test_kde.py all fail for KDEUnivariate.evaluate, and the numbers don't make sense.

Gaussian passes with and without weights, so the change in this PR with normalized weights looks fine.

@Padarn
Copy link
Contributor Author

Padarn commented Dec 17, 2013

Okay great. I'll follow the new PR and see what I can do.

Github question: Should I wait until you accept your current PR before attempting to work on it?

@josef-pkt
Copy link
Member

Yes, wait with working on it. (I'm pretty sure I merge tonight.)
If you want you could review my pull request to see if I should do something before merge.
But you can work at it from master after merging.

What would be useful is to try out more examples of parts that don't have tests, and add more example scripts.

@josef-pkt josef-pkt closed this in c0a62a0 Dec 18, 2013
PierreBdR pushed a commit to PierreBdR/statsmodels that referenced this pull request Sep 2, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants