Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewrite functions for sampling from discrete truncated distributions #77

Closed
3 tasks done
rho62 opened this issue Feb 21, 2022 · 5 comments
Closed
3 tasks done

Rewrite functions for sampling from discrete truncated distributions #77

rho62 opened this issue Feb 21, 2022 · 5 comments
Assignees
Labels
enhancement New feature or request

Comments

@rho62
Copy link
Contributor

rho62 commented Feb 21, 2022

Approach pursued so far

Sample from original (not truncated) distribution, followed by a truncation. In-efficient approach: Samples a surplus of unnecessary elements and difficult to predict the sample size required to achieve the target sample size.

Solution

Sample directly from the truncated distribution:

$$ f_T(x; \theta) = \frac{f(x; \theta)}{[F(b) - F(a)]} $$

Use sample() to sample from $a, ..., b$ with weights $f_T(a, ..., b; \theta)$

Implemented for binomial. See code there. Needs to be implemented for other discrete distributions: Poisson, Neg. bin. others?

  • Binomial
  • Negative binomial
  • Poisson

OBS: weights $f_T(x; \theta)$ are already implemented as dtrunc.XXXX() functions

@wleoncio
Copy link
Member

This is the same as #72, isn't it? Also, is there any impediments for implementing this for continuous distributions as well?

@wleoncio wleoncio added the duplicate This issue or pull request already exists label Feb 21, 2022
@rho62
Copy link
Contributor Author

rho62 commented Feb 21, 2022 via email

@wleoncio wleoncio removed the duplicate This issue or pull request already exists label Feb 21, 2022
@wleoncio
Copy link
Member

On second thought, I think you're right. Seems wise to separate things and leave #72 for the duplicated coding issue and #77 and #78 for the slow-sampling issue.

@wleoncio
Copy link
Member

wleoncio commented Jan 3, 2023

If I understood correctly, the Binomial implementation is here:

TruncExpFam/R/binomial.R

Lines 15 to 31 in 087ad1b

dtrunc.trunc_binomial <- function(
y, size, prob, eta, a = 0, b = attr(y, "parameters")$size, ...
) {
if (missing(eta)) {
eta <- parameters2natural.trunc_binomial(c("size" = size, "prob" = prob))
}
nsize <- attr(y, "parameters")$size
my.dbinom <- function(nsize) dbinom(y, size = nsize, prob = proba)
my.pbinom <- function(z, nsize) pbinom(z, size = nsize, prob = proba)
proba <- 1 / (1 + exp(-eta))
dens <- ifelse((y < a) | (y > b), 0, my.dbinom(nsize))
F.a <- my.pbinom(a - 1, nsize)
F.b <- my.pbinom(b, nsize)
dens <- dens / (F.b - F.a)
attributes(dens) <- attributes(y)
return(dens)
}

The f(x) / [F(b) - F(a)] part is clearly defined on L28. f(x) (i.e., dens) is transformed on L25 using my.dbinom(). If that is correct, then this idea could be replicated for rtrunc(), but unless I'm missing something there's no sampling involved on the function above, only rescaling of the densities (as expected, since resampling is only part of the r* fucntions).

So a DRY solution might involve the following steps:

  • Extract the calculation of f_T(x) from the dtrunc methods into its own function. Could be a generic, since the x argument would have different rtrunc_ classes
  • Use the extracted function from the previous step on a new version of rtrunc(), temporarily coexistent with the current implementation
  • Phase out the old function in favor of the new implementation
  • Adjust test unit expectations

One thing that worries me about this approach is that this will probably make the output of rtrunc() not match their stats counterparts anymore, since the untruncated distribution will no longer be the base for the sampling. Is this acceptable?

@wleoncio
Copy link
Member

wleoncio commented Feb 6, 2023

An alternative to phasing out the old algotirhm is to have rtrunc() contain an argument (like a boolean legacy) that will run the old stats-compatible algo. This gives the user control over comparability with stats results vs speed of the new implementation.

wleoncio added a commit that referenced this issue Feb 6, 2023
Still defaults to the old algo, at least until #77 and #78 are
implemented and the old algo is superseded.
@wleoncio wleoncio self-assigned this Feb 6, 2023
wleoncio added a commit that referenced this issue Feb 22, 2023
Function is similar to `rescaledDensities()`, though the latter is
supposed to serve #77 (i.e., discrete distributions). Some eventual
merging may be in order.
@wleoncio wleoncio added this to the MVP 1.2.0 milestone Apr 13, 2023
wleoncio added a commit that referenced this issue Sep 25, 2023
wleoncio added a commit that referenced this issue Sep 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants