Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Out of support behavior in distribution functions is not consistent with base R functions #5

Open
hennerw opened this issue Jan 31, 2019 · 8 comments

Comments

@hennerw
Copy link

hennerw commented Jan 31, 2019

With base R distribution functions if you ask the d function for a value it will return 0 rather than an error. Similarly for the p functions, if you ask for a q value greater or less than the support it will return 1 or 0 respectively.

Examples:

> pbinom(2,1,.5)
[1] 1
> pbinom(-1,1,.5)
[1] 0
> dbinom(2,1,.5)
[1] 0

With the rmutil functions out of support values give an error.

> pbetabinom(10,9,.2,6)
Error in pbetabinom(10, 9, 0.2, 6) : q must be <= size
> dbetabinom(10,9,.2,6)
Error in dbetabinom(10, 9, 0.2, 6) : y must be <= size
> dbetabinom(10,9,0.2,6)
Error in dbetabinom(10, 9, 0.2, 6) : y must be <= size
> dbetabinom(-1,9,0.2,6)
Error in dbetabinom(-1, 9, 0.2, 6) : y must contain non-negative values

The base R method of dealing with this is the mathematically correct one: P(2 heads, one coin flip) =0. P(5 or fewer heads on 3 coin flips)=1. These are defined, if typically uninteresting, values and should not generate an error.

I have not checked all of the functions, but it looks like the fix should be fairly simple. For pbetabinom at least, removing the check will generate the expected results with no additional code.

@swihart
Copy link
Owner

swihart commented Feb 1, 2019

Thanks for opening an issue. I replaced the fewest lines possible to get the functionality requested. Basically I replaced stop() with message() and updated the message. I had to keep in mind the vector case of inputs.

Try out the following and let me know if it fixes the issue for pbetabinom and dbetabinom...

## We want behavior like pbinom():

## this should handle the below support case
## i.e. negative values...
pbinom(c(-2,-1,0), 5, 0.5)
pbetabinom(c(-2,-1,0), 9, 0.2, 6)

## this should handle the above support case
## i.e. the q>size cases
pbinom(c(5,6,7), 5, 0.5)
pbetabinom(c(9,10,11), 9, 0.2, 6)

which should give:

> ## We want behavior like pbinom():
> 
> ## this should handle the below support case
> ## i.e. negative values...
> pbinom(c(-2,-1,0), 5, 0.5)
[1] 0.00000 0.00000 0.03125
> pbetabinom(c(-2,-1,0), 9, 0.2, 6)
Negative values of q replaced with 0
[1] 0.2859368 0.2859368 0.2859368
> 
> ## this should handle the above support case
> ## i.e. the q>size cases
> pbinom(c(5,6,7), 5, 0.5)
[1] 1 1 1
> pbetabinom(c(9,10,11), 9, 0.2, 6)
Elements of q that were greater than size were set equal to size
[1] 1 1 1

For the density functions, similar approach.

## We want behavior like dbinom():

## this should handle the below support case
## i.e. negative values...
dbinom(c(-2,-1,0), 5, 0.5)
dbetabinom(c(-2,-1,0), 9, 0.2, 6)

## this should handle the above support case
## i.e. the q>size cases
dbinom(c(5,6,7), 5, 0.5)
dbetabinom(c(9,10,11), 9, 0.2, 6)

should give

> ## We want behavior like dbinom():
> 
> ## this should handle the below support case
> ## i.e. negative values...
> dbinom(c(-2,-1,0), 5, 0.5)
[1] 0.00000 0.00000 0.03125
> dbetabinom(c(-2,-1,0), 9, 0.2, 6)
Negative values of y replaced with -1 to yield 0 value density
[1] 0.0000000 0.0000000 0.2859368
> 
> ## this should handle the above support case
> ## i.e. the q>size cases
> dbinom(c(5,6,7), 5, 0.5)
[1] 0.03125 0.00000 0.00000
> dbetabinom(c(9,10,11), 9, 0.2, 6)
Elements of y exist that are greater than size
[1] 0.0008552741 0.0000000000 0.0000000000

I'll just submit these changes to CRAN for pbetabinom and dbetabinom. Were there any other functions you really wanted to work this way?

swihart added a commit that referenced this issue Feb 1, 2019
@hennerw
Copy link
Author

hennerw commented Feb 1, 2019

I think your solution is better than mine.

No correction needs to be made for the q > size cases as the result looks like it will be 1 anyway.

For q <0 I had been doing it inductively:

# after corrects for vector inputs
  if (any(q < 0)){
      # stop("q must contain non-negative values")
     # browser()
     if(all(q < 0)) return(rep(0,len))
     Val <- q>=0
     out <- rep(0,len)
     out[Val] <- pbetabinom_c(q[Val],size[Val],m[Val],s[Val])
     return(out)
   }

However you're right. ifelse(q<0,res,0) at the end would be the simplest solution.

The beta Binomial functions are currently the only functions I am making use of right now. I would suggest updating the rest of them for consistency with other base functions, but all of it's up to you.

Thank you for the quick response (And the great package)!

@swihart
Copy link
Owner

swihart commented Feb 1, 2019

I'll leave this issue open and will work on the rest of the functions as I can. Thanks for writing! I've already submitted v1.1.2 to CRAN.

@hennerw
Copy link
Author

hennerw commented Feb 13, 2019

Hi Swihart,

I took a look at the updated version today and I think that it is incorrect. I'm sorry I did not notice earlier.

Setting q to 0 in the case of q < 0 is not correct. 0 is a possible outcome, -1 is not. This results in incorrect results:

> pbetabinom(0, 5, 0.2, 6)
[1] 0.4297082
> pbetabinom(-1, 5, 0.2, 6)
Negative values of q replaced with 0
[1] 0.4297082

The result in the second case should be 0. It is possible to get 0 or fewer success out of 5, but it is not possible to get -1 or fewer out of 5, so the probability should be 0. (The q > size case does work out)

Here is an easy inductive fix: (Added after vectorization)

  if (any(q > size)){
    # Updating to correctly deal with this siduation
    # stop("q must be <= size")
    if(all(q>size)) return(rep(1,len))
    Val <- q<= size
    out <- rep(1,len)
    out[Val] <- pbetabinom_c(q[Val],size[Val],m[Val],s[Val])
    return(out)
  }
  if (any(q < 0)){
    # stop("q must contain non-negative values")
    if(all(q < 0)) return(rep(0,len))
    Val <- q>=0
    out <- rep(0,len)
    out[Val] <- pbetabinom_c(q[Val],size[Val],m[Val],s[Val])
    return(out)
  }

Here is my use case for out of range results (Besides consistency with base R functions):

I am flipping 20 coins with an unknown probability of coming up heads. After each flip I stop if the probability that I'll have fewer than 9 heads at the end of the experiment is less than 10% and declare the experiment a failure.

I would like to generate a heat map showing the probability of ending with fewer than 9 successes given any potential combination of observations/successes during the experiment:

rplot

I could add my own handling to the cases where the observed successes are already greater than 9 or the number of successes required to reach 9 is fewer than the remaining number of trials. However the coding is cleaner if the pbetabinom function can correctly handle those cases. (and again inline with the handling of the base R functions)

@swihart
Copy link
Owner

swihart commented Mar 4, 2019

Is dbetabinom() okay? Is it just pbetabinom()?

@swihart
Copy link
Owner

swihart commented Mar 4, 2019

I think I have a fix for pbetabinom that changes the original code as little as possible and give the desired results. Thank you for your example code and continued interest. I'll get it out in v1.1.3 soon.

@swihart
Copy link
Owner

swihart commented Mar 4, 2019

## We want behavior like pbinom():

## this should handle the below support case
## i.e. negative values...
pbinom(c(-2,-1,0), 5, 0.5)
pbetabinom(c(-2,-1,0), 9, 0.2, 6)



## this should handle the above support case
## i.e. the q>size cases
pbinom(c(5,6,7), 5, 0.5)
pbetabinom(c(9,10,11), 9, 0.2, 6)

Now gives

> ## We want behavior like pbinom():
> 
> ## this should handle the below support case
> ## i.e. negative values...
> pbinom(c(-2,-1,0), 5, 0.5)
[1] 0.00000 0.00000 0.03125
> pbetabinom(c(-2,-1,0), 9, 0.2, 6)
Negative values of q detected.  `pbetabinom` returns 0 for such values.
[1] 0.0000000 0.0000000 0.2859368
> 
> 
> 
> ## this should handle the above support case
> ## i.e. the q>size cases
> pbinom(c(5,6,7), 5, 0.5)
[1] 1 1 1
> pbetabinom(c(9,10,11), 9, 0.2, 6)
Elements of q that were greater than size detected. `pbetabinom` returns 1 for such values.
[1] 1 1 1

@swihart
Copy link
Owner

swihart commented Mar 4, 2019

check out v1.1.3. off to cran.

I'll look into patching up other [pd] functions in near future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants