sandbox kernels, problems with inDomain #1239

Closed
josef-pkt opened this Issue Dec 16, 2013 · 2 comments

Projects

None yet

1 participant

@josef-pkt
Member

for gaussian this works because it doesn't have bounds

for triangular I get an exception see below
first is not vectorized, that's possible by design
second is that inDomain returns list, which causes failure in density

Note: there wasn't a problem for the "smoothconf" in PR #1233


update

Ok fixed KDEUnivariate.evaluate/CustonKernel.density in #1240
However, I don't think we can vectorize evaluate for bounded support kernels because inDomain will not return a rectangular array, each point might have a different number of neighbors.
So we would need to use some other structure to vectorize this (0 weights, sparse, ...). I doubt it's worth to vectorize just inDomain.
For convenience we could add a loop internally to KDEUnivariate evaluate for bounded kernels.


>>> kde2.fit(kernel='tri', fft=False)
>>> kde2.density[:5]
array([ 0.,  0.,  0.,  0.,  0.])
>>> kde2.evaluate(kde2.support[:5])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "e:\josef\eclipsegworkspace\statsmodels-git\statsmodels-all-new2_py27\statsmodels\statsmodels\nonparametric\kde.py", line 259, in evaluate
    return self.kernel.density(self.endog, point)
  File "e:\josef\eclipsegworkspace\statsmodels-git\statsmodels-all-new2_py27\statsmodels\statsmodels\sandbox\nonparametric\kernels.py", line 189, in density
    xs = self.inDomain( xs, xs, x )[0]
  File "e:\josef\eclipsegworkspace\statsmodels-git\statsmodels-all-new2_py27\statsmodels\statsmodels\sandbox\nonparametric\kernels.py", line 176, in inDomain
    filtered = filter(isInDomain, zip(xs, ys))
  File "e:\josef\eclipsegworkspace\statsmodels-git\statsmodels-all-new2_py27\statsmodels\statsmodels\sandbox\nonparametric\kernels.py", line 171, in isInDomain
    return u >= self.domain[0] and u <= self.domain[1]
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

>>> kde2.support[0]
-4.3733738207323043
>>> kde2.evaluate(kde2.support[0])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "e:\josef\eclipsegworkspace\statsmodels-git\statsmodels-all-new2_py27\statsmodels\statsmodels\nonparametric\kde.py", line 259, in evaluate
    return self.kernel.density(self.endog, point)
  File "e:\josef\eclipsegworkspace\statsmodels-git\statsmodels-all-new2_py27\statsmodels\statsmodels\sandbox\nonparametric\kernels.py", line 190, in density
    if xs.ndim == 1:
AttributeError: 'list' object has no attribute 'ndim'
@josef-pkt
Member

possibly introduced by a4d722c
which would mean evaluate wouldn't have worked for some time for bounded support kernels
I cannot check right now because I have a loose HEAD in my checkout

my guess is that there is no test coverage for other kernels.

I have a csv file with results for weighted density from Stata, where I could check this.

@josef-pkt
Member

fixed and unit tested but not vectorized in PR #1240 merged in c0a62a0

@josef-pkt josef-pkt closed this Dec 18, 2013
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment