Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ash_pois log link problems #114

Open
aksarkar opened this issue Dec 27, 2019 · 0 comments
Open

ash_pois log link problems #114

aksarkar opened this issue Dec 27, 2019 · 0 comments

Comments

@aksarkar
Copy link
Contributor

On simulated NB data
(https://users.rcc.uchicago.edu/~aksarkar/pois-mode-est.Rds), fitting with log
link gets a much worse log likelihood than with identity link.

> dat <- readRDS("/scratch/midway2/aksarkar/modes/test-data/pois-mode-est.Rds")
> with(dat, summary(MASS::glm.nb(x ~ offset(log(s))))$twologlik / 2)
[1] -2008.504
> fit0 <- ashr::ash_pois(dat$x, dat$s, link="identity", mixcompdist="halfuniform")
> fit0$loglik
[1] -2008.271
> lam <- dat$x / dat$s
> eps = 1 / mean(dat$s)
> log_lam <- log(lam + eps)
> se_log_lam <- sqrt(var(lam) / (lam + eps)^2)
> mixsd <- seq(.1 * min(se_log_lam), max(2 * sqrt(log_lam^2 - se_log_lam^2)), by=.5 * log(2))
> fit1 <- ashr::ash_pois(dat$x, dat$s, link="log", mixcompdist="halfuniform", mixsd=mixsd)
> fit1$loglik
[1] -3199.636

autoselect.mixsd needs to be modified to handle this case. For se_log_lam,
I am using Taylor expansion of V[ln(\lambda + \epsilon)]. This calculation also
needs to be verified.

On real data (https://users.rcc.uchicago.edu/~aksarkar/b-cell-data.Rds), the
marginal PMF calculation assuming log link returns probabilities > 1, which
should not happen.

> dat <- readRDS("/scratch/midway2/aksarkar/modes/test-data/b-cell-data.Rds")
> query <- dat[dat["gene"] == "ENSG00000019582",]
> with(query, summary(MASS::glm.nb(count ~ offset(log(size))))$twologlik / 2)
[1] -31414.6
> fit0 <- ashr::ash_pois(query$count, query$size, link="identity", mixcompdist="halfuniform")
> fit0$loglik
[1] -31395.05
> lam <- query$count / query$size
> eps = 1 / mean(query$size)
> log_lam <- log(lam + eps)
> se_log_lam <- sqrt(var(lam) / (lam + eps)^2)
> mixsd <- seq(.1 * min(se_log_lam), max(2 * sqrt(log_lam^2 - se_log_lam^2)), by=.5 * log(2))
> fit1 <- ashr::ash_pois(query$count, query$size, link="log", mixcompdist="halfuniform", mixsd=mixsd)
> fit1$loglik
[1] 10408.96
> any(fit1$fitted_g$pi %*% ashr:::comp_dens_conv(fit1$fitted_g, fit1$data) > 1)
[1] TRUE
any(ashr:::comp_dens_conv(fit1$fitted_g, fit1$data) > 1)
[1] TRUE
> sessionInfo()
R version 3.5.1 (2018-07-02)
Platform: x86_64-conda_cos6-linux-gnu (64-bit)
Running under: Scientific Linux 7.4 (Nitrogen)

Matrix products: default
BLAS/LAPACK: /scratch/midway2/aksarkar/miniconda3/envs/scmodes/lib/R/lib/libRblas.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
 [1] MASS_7.3-50       compiler_3.5.1    Matrix_1.2-14     parallel_3.5.1   
 [5] mixsqp_0.2-2      Rcpp_0.12.18      codetools_0.2-15  SQUAREM_2017.10-1
 [9] pscl_1.5.2        doParallel_1.0.11 grid_3.5.1        iterators_1.0.10 
[13] foreach_1.4.4     truncnorm_1.0-8   ashr_2.2-39       lattice_0.20-35  
aksarkar added a commit to aksarkar/ashr that referenced this issue Dec 27, 2019
Add test cases for Poisson-uniform mixture likelihood computations

Compute marginal PMF over a point mass using dpois directly

This fixes issue stephens999#114
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant