Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kde weights 1103 823 #1240

Merged
merged 14 commits into from Dec 18, 2013

Conversation

Projects
None yet
3 participants
@josef-pkt
Copy link
Member

commented Dec 16, 2013

rebased and corrected version of @Padarns PR #1103

needed to normalize weights to sum to 1 for test to pass with evaluate
fixed unittests to test evaluate

I'm still looking into #1239 to see whether it should be added to this PR.

@coveralls

This comment has been minimized.

Copy link

commented Dec 16, 2013

Coverage Status

Coverage remained the same when pulling 57deb65 on josef-pkt:kde_weights_1103_823 into 5cf4cad on statsmodels:master.

self.weights = np.random.random(nobs)


class TestKDEUnivariate(MyTest):

This comment has been minimized.

Copy link
@josef-pkt

josef-pkt Dec 16, 2013

Author Member

test_kde has the other KDEUnivariate tests
I think we should move this there. (small problem is that we don't have the random seed for the weights, if we don't generate all other random variables.)
I have saved the random values to a csv file for use with Stata.



grid = kde.support
testx = [grid[10*i] for i in range(6)]

This comment has been minimized.

Copy link
@josef-pkt

josef-pkt Dec 16, 2013

Author Member

does testx need to be a list, grid[10 * np.arange(6)] will create an array

@@ -185,7 +191,10 @@ def density(self, xs, x):
xs = xs[:,None]
if len(xs)>0:
h = self.h
w = 1/h * np.mean(self((xs-x)/h), axis=0)
if self.weights is not None:
w = 1/h * np.sum(self((xs-x)/h).T * self.weights, axis=1)

This comment has been minimized.

Copy link
@josef-pkt

josef-pkt Dec 16, 2013

Author Member

What are possible shapes for xs ? .T should work for 0-d 1-d and 2-d

This comment has been minimized.

Copy link
@josef-pkt

josef-pkt Dec 17, 2013

Author Member

see comment above, weights needs to be filtered to correspond to xs inDomain

@josef-pkt

This comment has been minimized.

Copy link
Member Author

commented Dec 16, 2013

needs fix from #1233 before adding tests for uniform

# fails for Epan, Triangular and Biweight, only Gaussian is correct
# added it as test method to TestKDEGauss below
# inDomain is not vectorized
#kde_vals = self.res1.evaluate(self.res1.support)

This comment has been minimized.

Copy link
@josef-pkt

josef-pkt Dec 17, 2013

Author Member

looks like there is a bug somewhere in the non-Gaussian kernels, or I don't use it correctly

@coveralls

This comment has been minimized.

Copy link

commented Dec 17, 2013

Coverage Status

Coverage remained the same when pulling 3179b37 on josef-pkt:kde_weights_1103_823 into 5cf4cad on statsmodels:master.

n = len(xs)
#xs = self.inDomain( xs, xs, x )[0]

if len(xs)>0: ## Need to do product of marginal distributions
#w = np.sum([self(self._Hrootinv * (xx-x).T ) for xx in xs])/n
#vectorized doesn't work:
w = np.mean(self((xs-x) * self._Hrootinv )) #transposed
if self.weights is not None:
w = np.mean(self((xs-x) * self._Hrootinv).T * self.weights)/sum(self.weights)

This comment has been minimized.

Copy link
@josef-pkt

josef-pkt Dec 17, 2013

Author Member

I think this is wrong for bounded support kernels, xs will have the points in the neighborhood, inDomain, weights needs to be "filtered" selected in the same way

in the kde case, I think we just use weights instead of ys (second xs) in self.inDomain(xs, weights, x)

This comment has been minimized.

Copy link
@josef-pkt

josef-pkt Dec 17, 2013

Author Member

Comment should apply below in CustomKernel, This here is NDKernel.
This version of using weights has a normalization by sum(weights)

@josef-pkt

This comment has been minimized.

Copy link
Member Author

commented Dec 17, 2013

AFAICS, I fixed the bugs with bounded support kernels in last commit 0d574e2
We need to divide by the total number of observations, not take the mean over kernel of the neighborhood points.

Also use inDomain to filter the weights. Looks correct in an example, but still need unit tests for weights and non-gaussian kernels

@coveralls

This comment has been minimized.

Copy link

commented Dec 17, 2013

Coverage Status

Coverage remained the same when pulling 0d574e2 on josef-pkt:kde_weights_1103_823 into 5cf4cad on statsmodels:master.

@coveralls

This comment has been minimized.

Copy link

commented Dec 17, 2013

Coverage Status

Coverage remained the same when pulling 758ccc9 on josef-pkt:kde_weights_1103_823 into 5cf4cad on statsmodels:master.

@coveralls

This comment has been minimized.

Copy link

commented Dec 17, 2013

Coverage Status

Coverage remained the same when pulling 1aa494f on josef-pkt:kde_weights_1103_823 into 5cf4cad on statsmodels:master.

@coveralls

This comment has been minimized.

Copy link

commented Dec 17, 2013

Coverage Status

Coverage remained the same when pulling 24e4f91 on josef-pkt:kde_weights_1103_823 into 5cf4cad on statsmodels:master.

@coveralls

This comment has been minimized.

Copy link

commented Dec 17, 2013

Coverage Status

Coverage remained the same when pulling a6fcae0 on josef-pkt:kde_weights_1103_823 into 5cf4cad on statsmodels:master.

@coveralls

This comment has been minimized.

Copy link

commented Dec 17, 2013

Coverage Status

Coverage remained the same when pulling 2e8d4d2 on josef-pkt:kde_weights_1103_823 into 5cf4cad on statsmodels:master.

@coveralls

This comment has been minimized.

Copy link

commented Dec 17, 2013

Coverage Status

Coverage remained the same when pulling 134b590 on josef-pkt:kde_weights_1103_823 into 5cf4cad on statsmodels:master.

@Padarn

This comment has been minimized.

Copy link

commented on 0d574e2 Dec 18, 2013

I wonder if its worth just having the bounded support kernels behave the same as the Gaussian kernel for inDomain and having 0 weights outside of the support - would simplify things a little bit?

Not sure how the performance would be affected. Currently there is no FFT for weights so it might be important.

This comment has been minimized.

Copy link
Owner Author

replied Dec 18, 2013

I thought about it, it would make several things easier, however I think this way can be made faster than always calculating with the full array, I guess. statsmodels#1241
We would also have to create for each point a new kernel/shape array because the nonzero positions will move.

This comment has been minimized.

Copy link

replied Dec 18, 2013

I might not understand how this is working, but will the idea in statsmodels#1241 work with the NdKernel? Not sure if the searchsorted approach can be used there.
Hmm yes, good point about the new kernel/shape array for each point.

This comment has been minimized.

Copy link
Owner Author

replied Dec 18, 2013

No, np.searchsorted only works for 1-D.
The equivalent for NDKernel would be what scikit-learn is using, kdtree or another method to have a fast neighbor search.

@Padarn

This comment has been minimized.

Does self.L2Norm work for all kernels? I seem to remember it being specifically implemented in some (the Gaussian for example).

Nevermind - just checked and looks like it does.

This comment has been minimized.

Copy link
Owner Author

replied Dec 18, 2013

Yes it has a generic implementation, but all current kernels define it as hardcoded number.

@Padarn

This comment has been minimized.

Copy link
Contributor

commented Dec 18, 2013

All looks pretty good to me.

Might be nice to have the new variance estimate across the support grid as an attribute of the kernel class?

@josef-pkt

This comment has been minimized.

Copy link
Member Author

commented Dec 18, 2013

I think we should add it as a method to KDEUnivariate directly. But since you have the open PR for refactoring KDEUnivariate, I didn't want to create additional merge conflicts, and the method should match your refactored version.
One advantage of density_var is that it is cheap since it doesn't require any loops, given that we already have the density estimate, same for confint.

@Padarn

This comment has been minimized.

Copy link
Contributor

commented Dec 18, 2013

That makes sense to me. The KDEUnivariate class could take advantage of density_var when it is working out the gridded values of density anyway.

josef-pkt added a commit that referenced this pull request Dec 18, 2013

@josef-pkt josef-pkt merged commit c0a62a0 into statsmodels:master Dec 18, 2013

1 check passed

default The Travis CI build passed
Details

@josef-pkt josef-pkt deleted the josef-pkt:kde_weights_1103_823 branch Dec 18, 2013

PierreBdR pushed a commit to PierreBdR/statsmodels that referenced this pull request Sep 2, 2014

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.