-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Additions to Univariate KDEs #973
Open
Padarn
wants to merge
19
commits into
statsmodels:main
Choose a base branch
from
Padarn:master
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from 10 commits
Commits
Show all changes
19 commits
Select commit
Hold shift + click to select a range
02d2730
Modifications to cdf and icdf calculation. Addition of integrated ker…
Padarn bee9203
Changes to kde.py
Padarn 5c9b50d
Added approximate variance calculation to kernels
Padarn b3b5be8
Merge branch 'master' of git://github.com/statsmodels/statsmodels
Padarn 006a0ca
Fixes to incorrect calculation of variance, and removal of alternativ…
Padarn 1249044
Renaming functions to fix careless mistake
Padarn 5a38bb5
updated comment on density_variance
Padarn 2732623
update 2 of comment
Padarn 6df077e
update 3 of comment
Padarn 3a1ea35
removed bisect_left and erfinv
Padarn 1122950
work on kde.py
Padarn bd9e9f6
Cleanup and fixes
Padarn cb17a09
typo fix
Padarn 2c06109
Some updates to comments and formatting
Padarn d448ce2
replace icdf and update cdf, cdf_values
Padarn 81c3eeb
Merge branch 'master' of https://github.com/statsmodels/statsmodels
Padarn 4c1ae6e
fix for cdf_values
Padarn e0a5f31
bug fixes + test_update
Padarn b2c35b4
Merge branch 'master' of https://github.com/statsmodels/statsmodels i…
Padarn File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -16,7 +16,7 @@ | |
import warnings | ||
|
||
import numpy as np | ||
from scipy import integrate, stats | ||
from scipy import integrate, stats, interpolate | ||
from statsmodels.sandbox.nonparametric import kernels | ||
from statsmodels.tools.decorators import (cache_readonly, | ||
resettable_cache) | ||
|
@@ -115,7 +115,7 @@ def fit(self, kernel="gau", bw="scott", fft=True, weights=None, | |
is implemented. If FFT is False, then a 'nobs' x 'gridsize' | ||
intermediate array is created. | ||
gridsize : int | ||
If gridsize is None, max(len(X), 50) is used. | ||
If gridsize is None, max(len(X), 512) is used. | ||
cut : float | ||
Defines the length of the grid past the lowest and highest values | ||
of X so that the kernel goes to zero. The end points are | ||
|
@@ -149,6 +149,8 @@ def fit(self, kernel="gau", bw="scott", fft=True, weights=None, | |
self.bw = bw | ||
self.kernel = kernel_switch[kernel](h=bw) # we instantiate twice, | ||
# should this passed to funcs? | ||
|
||
|
||
# put here to ensure empty cache after re-fit with new options | ||
self._cache = resettable_cache() | ||
|
||
|
@@ -160,23 +162,55 @@ def cdf(self): | |
Notes | ||
----- | ||
Will not work if fit has not been called. | ||
|
||
If there is an analytic integrated kernel avaliable for the kernel then | ||
this is used to find the cdf on self.support. Otherwise the cdf is evaluated | ||
numerically. | ||
""" | ||
_checkisfit(self) | ||
density = self.density | ||
kern = self.kernel | ||
if kern.domain is None: # TODO: test for grid point at domain bound | ||
a,b = -np.inf,np.inf | ||
|
||
if getattr(self.kernel, 'cdf', None): | ||
return sum(self.kernel.cdf(np.tile(self.endog,[len(self.support),1]).T, self.support, self.bw))/len(self.endog) | ||
|
||
else: | ||
a,b = kern.domain | ||
func = lambda x,s: kern.density(s,x) | ||
density = self.density | ||
kern = self.kernel | ||
if kern.domain is None: # TODO: test for grid point at domain bound | ||
a,b = -np.inf,np.inf | ||
else: | ||
a,b = kern.domain | ||
func = lambda x,s: kern.density(s,x) | ||
|
||
support = self.support | ||
support = np.r_[a,support] | ||
gridsize = len(support) | ||
endog = self.endog | ||
probs = [integrate.quad(func, support[i-1], support[i], | ||
args=endog)[0] for i in xrange(1,gridsize)] | ||
return np.cumsum(probs) | ||
|
||
def cdf_eval(self, x): | ||
""" | ||
Returns the cumulative distribution function evaluated at x. | ||
|
||
support = self.support | ||
support = np.r_[a,support] | ||
gridsize = len(support) | ||
endog = self.endog | ||
probs = [integrate.quad(func, support[i-1], support[i], | ||
args=endog)[0] for i in xrange(1,gridsize)] | ||
return np.cumsum(probs) | ||
Notes | ||
----- | ||
Will not work if fit has not been called. | ||
|
||
If there is an analytic integrated kernel avaliable for the kernel then | ||
this is used to find the cdf on self.support. Otherwise the cdf is evaluated | ||
numerically. | ||
""" | ||
_checkisfit(self) | ||
|
||
if getattr(self.kernel, 'cdf', None): | ||
return sum(self.kernel.cdf(self.endog, x, self.bw))/len(self.endog) | ||
|
||
else: | ||
kern = self.kernel | ||
func = lambda y: kern.density(self.endog, y) | ||
|
||
return integrate.quad(func, self.support[0], x)[0] | ||
|
||
@cache_readonly | ||
def cumhazard(self): | ||
|
@@ -231,19 +265,59 @@ def entr(x,s): | |
return -integrate.quad(entr, a,b, args=(endog,))[0] | ||
|
||
@cache_readonly | ||
def icdf(self): | ||
def icdf(self, sample_quantile = False): | ||
""" | ||
Inverse Cumulative Distribution (Quantile) Function | ||
Inverse Cumulative Distribution (Quantile) Function over the | ||
range of the cdf stored. | ||
|
||
Notes | ||
----- | ||
Will not work if fit has not been called. Uses | ||
`scipy.stats.mstats.mquantiles`. | ||
|
||
Uses linear interpolation to get values on a uniform | ||
grid. | ||
""" | ||
_checkisfit(self) | ||
gridsize = len(self.density) | ||
return stats.mstats.mquantiles(self.endog, np.linspace(0,1, | ||
gridsize)) | ||
|
||
if sample_quantile: | ||
gridsize = len(self.density) | ||
return stats.mstats.mquantiles(self.endog, np.linspace(0,1, | ||
gridsize)) | ||
|
||
else: | ||
icdf_interp = interpolate.interp1d(self.cdf, self.support, kind='linear') | ||
return icdf_interp(np.linspace(self.cdf[0],self.cdf[-1],len(self.support))) | ||
|
||
def icdf_eval(self, x): | ||
|
||
_checkisfit(self) | ||
|
||
if getattr(self.kernel, 'icdf', None): | ||
print sum(self.kernel.icdf(self.endog, x, self.bw))/len(self.endog) | ||
|
||
if x >= self.support[-1]: | ||
return np.infty | ||
if x <= self.support[0]: | ||
return -1*np.infty | ||
|
||
index = np.searchsorted(self.cdf, x) | ||
support = self.support | ||
cdf = self.cdf | ||
|
||
return (x-cdf[index-1])*1.0/(cdf[index]-cdf[index-1])*(support[index]-support[index-1])+ support[index-1] | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. split into several lines (line length) and make separate return |
||
|
||
def variance(self, point): | ||
""" | ||
Evaluates the variance of the kernel estimator according | ||
to the approximate formula | ||
v = 1/n * (1/h^2 sum(K(x-X_i/h)^2) - fhat(x,h)^2) | ||
|
||
NOTE : NOT WORKING | ||
""" | ||
|
||
_checkisfit(self) | ||
return self.kernel.density_variance(self.endog, point) | ||
|
||
def evaluate(self, point): | ||
""" | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we might need a third option for the exact calculation, then instead of a boolean
sample_quantiles
we might need a string, something likemethod='interpolate'