Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Family bernoulli() turns 1 into 0 when all observations are 1 #1298

Closed
famuvie opened this issue Feb 17, 2022 · 1 comment
Closed

Family bernoulli() turns 1 into 0 when all observations are 1 #1298

famuvie opened this issue Feb 17, 2022 · 1 comment
Labels
Milestone

Comments

@famuvie
Copy link
Contributor

famuvie commented Feb 17, 2022

Dear Paul,

Here is a MRE of a likely bug in brms handling of the bernoulli family. I removed compilation messages for brevity.

library(brms)
#> Loading required package: Rcpp
#> Loading 'brms' package (version 2.16.7). Useful instructions
#> can be found by typing help('brms'). A more detailed introduction
#> to the package is available through vignette('brms_overview').
#> 
#> Attaching package: 'brms'
#> The following object is masked from 'package:stats':
#> 
#>     ar

## Fit a logistic intercept-only model, where all observations are 1
(fm <- brm(
  y ~ 1,
  family = bernoulli(link = "logit"),
  data = data.frame(y = rep(1, 40)),
  refresh = 0
))
#> Compiling Stan program...
#> ...
#> Warning: Bulk Effective Samples Size (ESS) is too low, indicating posterior means and medians may be unreliable.
#> Running the chains for more iterations may help. See
#> http://mc-stan.org/misc/warnings.html#bulk-ess
#> Warning: Tail Effective Samples Size (ESS) is too low, indicating posterior variances and tail quantiles may be unreliable.
#> Running the chains for more iterations may help. See
#> http://mc-stan.org/misc/warnings.html#tail-ess
#>  Family: bernoulli 
#>   Links: mu = logit 
#> Formula: y ~ 1 
#>    Data: data.frame(y = rep(1, 40)) (Number of observations: 40) 
#>   Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
#>          total post-warmup draws = 4000
#> 
#> Population-Level Effects: 
#>           Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
#> Intercept    -7.17      4.62   -20.88    -2.77 1.00      354      337
#> 
#> Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
#> and Tail_ESS are effective sample size measures, and Rhat is the potential
#> scale reduction factor on split chains (at convergence, Rhat = 1).

## Estimated intercept diverges to -Inf ??
## In probability scale, p -> 0

## Examining the data passed to stan reveals that all 1's have been turned
## into 0's
str(standata(fm))
#> List of 5
#>  $ N         : int 40
#>  $ Y         : num [1:40(1d)] 0 0 0 0 0 0 0 0 0 0 ...
#>  $ K         : int 1
#>  $ X         : num [1:40, 1] 1 1 1 1 1 1 1 1 1 1 ...
#>   ..- attr(*, "dimnames")=List of 2
#>   .. ..$ : chr [1:40] "1" "2" "3" "4" ...
#>   .. ..$ : chr "Intercept"
#>   ..- attr(*, "assign")= int 0
#>  $ prior_only: int 0
#>  - attr(*, "class")= chr [1:2] "standata" "list"

## Adding a single 0 among the observations fixes it
fm1 <- brm(
  y ~ 1,
  family = bernoulli(link = "logit"),
  data = data.frame(y = c(0, rep(1, 40))),
  refresh = 0
)
#> Compiling Stan program...
#> ...
#> Start sampling
fixef(fm1)
#>           Estimate Est.Error     Q2.5    Q97.5
#> Intercept 3.670862 0.9790843 2.128601 5.919072
table(standata(fm1)$Y)
#> 
#>  0  1 
#>  1 40


## Using the equivalent binomial() family with trials = 1
## yields a extreme result in the expected direction.
fm2 <- brm(
  y | trials(1) ~ 1,
  family = binomial(link = "logit"),
  data = data.frame(y = rep(1, 40)),
  refresh = 0
)
#> Only 2 levels detected so that family 'bernoulli' might be a more efficient choice.
#> Compiling Stan program...
#> ...
#> Start sampling
#> Warning: Tail Effective Samples Size (ESS) is too low, indicating posterior variances and tail quantiles may be unreliable.
#> Running the chains for more iterations may help. See
#> http://mc-stan.org/misc/warnings.html#tail-ess
fixef(fm2)
#>           Estimate Est.Error    Q2.5    Q97.5
#> Intercept 6.555139  4.180668 2.85166 16.03014
table(standata(fm2)$Y)
#> Only 2 levels detected so that family 'bernoulli' might be a more efficient choice.
#> 
#>  1 
#> 40

Created on 2022-02-17 by the reprex package (v2.0.1)

paul-buerkner added a commit that referenced this issue Feb 17, 2022
@paul-buerkner paul-buerkner added this to the brms 2.16.0++ milestone Feb 17, 2022
@paul-buerkner
Copy link
Owner

What an interesting bug :-D Thank you for spotting it! Just fixed it on github.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants