ENH: add predict_prob to poisson #1088

Merged
merged 2 commits into from Oct 23, 2013

Projects

None yet

3 participants

@jseabold
Member

Adds a method to predict the probability of a certain count to poisson results. Compare to Stata's predict varname, n(#) and predprob from R's pscl. Needs tests but works.

@josef-pkt
Member

my thought was to return the distribution instance itself, which makes all stats.distribution methods directly available
(broadcasting would require users to use x[:,None] or x[..., None] if mean mu/lambda is 1d.)

see #1064

predict_prob might be a bit misleading, would refer to estimate of sample (in binomial, and sklearn) (maybe not too much misleading.
the default 0 to max(endog) could be a huge array. (dangerous as a default)

@jseabold
Member

Mainly just did it because it's the default in R too.

What's 1064 supposed to be?

@josef-pkt
Member

1064 is autocompletion in github
#106 written for GLM but applies to other models

one method I'd like to have available is rvs to simulate the process (besides examples also parametric bootstrap and Monte Carlo).
other nice ones: cdf and interval

some things like var might be more interesting for other distributions, but might not work vectorized until the next scipy release.

@josef-pkt
Member

I don't see much reason not to merge this. But I also don't see any usecase for it. (especially compared to predict_dist )

@jseabold
Member

It's used in the Vuong test for zero-inflated vs. poisson.

@jseabold
Member

Rebased. Added a test. We can get rid of it / change it before 0.6.0 release if there's a better alternative forthcoming.

@jseabold
Member

Hmm, this installs correctly and passes for me locally. Not sure why travis fails.

@josef-pkt
Member

This PR doesn't include an .npy file. And I think it needs to be added to MANIFEST.in

What's in the npy file? I don't really like to use "proprietary" formats in case there are format changes.

@jseabold
Member

Ha, right. Need to add it. It's just some test data. It's smaller as a binary file than a csv.

@coveralls

Coverage Status

Coverage remained the same when pulling ee90385 on jseabold:poisson-count-prob into 84e7607 on statsmodels:master.

@josef-pkt
Member

The only time we had a .npz file that I can find now (in vector_ar), I needed to convert it to a python module, because we ran into some problems across numpy or python versions, AFAIR.

@jseabold
Member

I'll switch it, but my understanding is that the .npy is architecture and python independent. That's the point of it. I don't see any issues on the numpy tracker about this.

@josef-pkt
Member

The only explanation for the .npz conversion (*) I can find is https://groups.google.com/d/msg/pystatsmodels/JIp54_XZ66w/OxUf8tCQAJUJ
From this the problem was npz not npy files in the python 3.2 conversion.

(*) b88d9a3

Using only generic formats removes some possible headaches later on.
(But because I didn't think about it, my first test data file in GMM is a .dta not a csv file.)

@coveralls

Coverage Status

Coverage remained the same when pulling 9f6e2a0 on jseabold:poisson-count-prob into 84e7607 on statsmodels:master.

@jseabold jseabold merged commit da26462 into statsmodels:master Oct 23, 2013

1 check passed

default The Travis CI build passed
Details
@jseabold jseabold deleted the jseabold:poisson-count-prob branch Oct 23, 2013
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment