poisson likelihood with different priors #70

jhsiao999 · 2017-05-24T19:48:45Z

@mengyin I got errors with these two cases:

Case 1:
set.seed(100)
l = rexp(1)
x = rpois(100,lambd=l)
ashr::ash(rep(0,100),1,lik=ashr::lik_pois(x), mode=0, prior = "+uniform")

Case 2:
set.seed(100)
l = rexp(1)
x = rpois(100,lambd=l)
ashr::ash(rep(0,100),1,lik=ashr::lik_pois(x), mode= "estimate", prior = "halfuniform")

Both returned the error message: "Error in match.arg(prior) : 'arg' should be one of “nullbiased”, “uniform”, “unit”". And when prior = "uniform", the code runs okay with no errors. Thanks so much!

mengyin · 2017-05-24T20:02:23Z

I think the argument corresponding to "+uniform" or "halfuniform" should be "mixcompdist=..." instead of "prior=..."

This is indeed confusing. The argument "prior" actually means if we add more weight on the mode's point mass component ("nullbiased") or treat all components equally ("uniform"). Sorry for the confusion!

jhsiao999 · 2017-05-24T20:14:26Z

Ah I see. It works now!

I think this had also confused me in the past before. On top of that, I was looking at the ash help file where the prior input is not described. However, ash.workhorse describes both prior and mixcomdist. Thanks so much @mengyin!

pcarbo · 2017-05-24T20:24:31Z

@jhsiao999 Would it help to revise the documentation for ash and ash.workhorse to make this more clear for others? Or was it just that you didn't look in the right place?

jhsiao999 · 2017-05-24T20:30:25Z

@pcarbo I think the documentation for ash needs more details. Now it only describes four of the many arguments available in ash. In addition, what is the main difference between ash and ash.workhorse? Maybe add that in the beginning of each of their descriptions, so it would be immediately obvious to the uses (or maybe I wasn't looking at the right place?)

jhsiao999 · 2017-05-24T20:32:50Z

@mengyin I got this error message while running the mode = 0 and mixcomdist = +uniform option. Do I need to tune the parameters accordingly?

"consider reducing rtolestimated mixing distribution has some negative values:"

pcarbo · 2017-05-24T20:33:43Z

@jhsiao999 I'm not entirely sure the difference---I suspect there is a historical reason for having two functions, but I agree it is confusing.

At the very least, we can combine the two help pages for ash and ash.workhorse.

@stephens999 What do you think, is this a good idea?

mengyin · 2017-05-24T20:36:22Z

@jhsiao999 Could you send me your data so I can check it and debug? Thanks!

pcarbo · 2017-05-24T20:40:49Z

@jhsiao999 If the data aren't large you can also attach it to a Comment here.

jhsiao999 · 2017-05-24T20:49:03Z

@mengyin Here's the data.

countsPoissonDebug.txt

The code I used:
ash(rep(0, length(y)), 1, lik = lik_pois(y),
mixcompdist = "+uniform", mode = 0)

stephens999 · 2017-05-24T22:50:35Z

ash and ash.workhorse are the same - only difference is how many options are documented. i just thought there were too many options for your average user for what is essentially a simple function. So only the simple options are documented in ash. To see all the options you have to look at ash.workhorse. Happy to discuss, but I do think it is nice if the function has limited options for a novice user...

…

On Wed, May 24, 2017 at 3:33 PM, Peter Carbonetto ***@***.***> wrote: @jhsiao999 <https://github.com/jhsiao999> I'm not entirely sure the difference---I suspect there is a historical reason for having two functions, but I agree it is confusing. At the very least, we can combine the two help pages for ash and ash.workhorse. @stephens999 <https://github.com/stephens999> What do you think, is this a good idea? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#70 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABt4xcM5bQdXDmaeopZuwShE1CNkJ2aYks5r9JQpgaJpZM4NllaG> .

stephens999 · 2017-05-25T01:36:09Z

maybe we should introduce ash_pois , ash_binom etc for ash with other (non-normal and non-t) likelihoods?
I think this could have multiple advantages.
For example it seems maybe the defaults for some parameters may differ depending on the likelihood, so it might be clearer to do things this way? Rather than forcing users into ash(...lik= lik_pois()) which is a bit weird...?

mengyin · 2017-05-25T01:54:54Z

Introducing ash_XXX makes more sense to me. Now we have to put betahat=0 and sebetahat=1 for non-normal/t likelihoods, which seems really confusing...

pcarbo · 2017-05-25T13:12:23Z

Happy to discuss, but I do think it is nice if the function has limited options for a novice user...

@stephens999 I was only suggesting we combine the help pages for ash and ash.workhorse. For example read.table and read.csv share the same help page.

But now that you have brought it up, I don't think that having a large number of arguments is an obstacle to using ash---so long as most input arguments have sensible defaults. Consider that functions glm, read.table and bwplot all have long lists of arguments and very long, detailed help pages, and I don't think any of this has stopped people from using these functions. (For example, I've probably used read.table >1000 times yet I don't know what the argument numerals does, and I don't have to.)

So I don't think it is necessary to have separate functions ash and ash.workhorse, and I think it can potentially be more confusing.

Maybe we should introduce ash_pois , ash_binom etc for ash with other (non-normal and non-t) likelihoods?

I don't think it is necessary, but I can see some benefits in doing so. We could consider ash_pois, ash_binom as "convenience functions" that are wrappers to ash/ash.workhorse with specific settings of the inputs, similar to the way that read.csv is basically just a wrapper for read.table with sep=",", etc.

That being said, I don't think we should abandon having a general interface ash/ash.workhorse. There is a lot of power in having a unified interface that allows for solving many different types of problems---in addition to the implementation benefits (reducing the amount of redundant code), it is also more naturally extensible by us and others, and it also has important benefits for users such as learning that all these different models have the same underlying behaviour and properties.

I agree with both of you that the default settings sometimes don't make sense, but here I think the solution is to revise the interface, and the argument defaults, rather than to abandon the general interface entirely.

Matthew touched on the general solution:

It seems maybe the defaults for some parameters may differ depending on the likelihood…

Yes, and the solution is to set the defaults based on the choice of likelihood and/or mixture prior, which I see is something that you are already doing, e.g.,

if (lik$name == "pois") {
  if(missing(nullweight))
    nullweight = 1
}

Another option, if there are a lot of rarely used or specialized arguments, is to have a catch-all argument called "optionss". This strategy is used a lot; e.g., see argument control in function optim, and the functions in the standard graphics packages.

stephens999 · 2017-05-25T14:30:12Z

thanks @pcarbo . I agree with much of what you say. I'm happy to combine the help for ash and ash.workhorse.

I think one benefit of having ash.poisetc would be that the defaults would be immediately more
obvious and documented under ash.pois. And I agree it would call ash.workhorse.
But we would remove from ash.workhorse all those "if(lik$name...)" code that I find really
difficult to maintain and follow.

@mengyin can you draft a simple function ash.pois that calls ash.workhorse with a poisson likelihood and sensible defaults? At the same time can you
remove the $data from the $lik object? I think you don't need it... if that seems not straightforward
maybe we can discuss in person?

pcarbo · 2017-05-25T14:46:54Z

@stephens999 But do we want ash.pois or ash_pois?

I did some research on this (here, here, here & here) and the consensus is that there is no consensus, and it appears that there is even debate within the "R Core Team".

It seems that underscores are safer, but dots are mostly safe, too, and a lot of people don't like underscores (Hadley seems to be the lone advocate). So either one is fine with me.

stephens999 · 2017-05-25T15:13:27Z

I chose .pois to be consistent with ash.workhorse and smash.pois

I generally try to use _ but here it seems when you have "variants" of a procedure the .
notation makes some sense....

Thus get_lfsr but ash.workhorse

mengyin · 2017-05-26T03:58:18Z

@jhsiao999 I also got the warnings "REBayes::KWDual(A, rep(1, k), normalize(w), control = list(rtol = 0.1)) : estimated mixing distribution has some negative values:consider reducing rtol"

I check the data but I'm not sure why the optimization procedure gave negative mixture proportions (sorry!) But in mix_opt.R, line 45-48 force the estimated mixture proportions to be positive, so I guess @stephens999 found this problem and decided to manually fix it in this way?

pcarbo · 2017-05-26T14:04:57Z

@mengyin @jhsiao999 Wei encountered this problem before. It is indeed strange that it occasionally generates negative mixing proportions. It should not do this. Right now we have a temporary solution (set negative proportions to 0) but perhaps in the future we will come up with a better solution.

stephens999 · 2017-05-26T14:12:36Z

@mengyin @jhsiao999 you could try reducing rtol as it suggests (in control =list(rtol=...))

mengyin · 2017-05-26T22:51:13Z

@pcarbo Thanks for the explanation! Maybe in this case we can use optmethod="mixEM" instead? For this example the fitted proportions of "mixIP" (after setting negative proportions to 0) and "mixEM" can be quite different, so I'm a bit worried...

@stephens999 I tried reducing rtol (even to 0.1), but it still gives the warning...

stephens999 · 2017-05-26T23:20:22Z

@mengyin I'm not sure 0.1 is actually small (or what the default is) it was just an example. Id try 1e-8

Which method gives the higher look likelihood and how different are they?

stephens999 · 2017-05-26T23:20:49Z

Sorry, that was loglikelihood

mengyin · 2017-05-26T23:52:46Z

Sorry i should make rtol smaller... now I try rtol=1e-8 and even smaller values (1e-9, 1e-10...), but the results don't change. Anyway mixip's loglikelihood is still lower than that of mixem.

test.mixem$loglik
[1] -369.4327
test.mixip$loglik
[1] -372.6865

The fitted proportions are different: mixem's fitted g is very concentrated on the last third component (pi=9.999999e-01), but mixip's fitted g is more spread out.

get_fitted_g(test.mixem)
$pi
[1] 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
[7] 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
[13] 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 1.128845e-11
[19] 3.789486e-12 5.825406e-15 1.338107e-07 9.999999e-01 0.000000e+00 0.000000e+00

$a
[1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

$b
[1] 0.00000000 0.07323568 0.10357089 0.14647135 0.20714177 0.29294271
[7] 0.41428355 0.58588541 0.82856710 1.17177083 1.65713420 2.34354166
[13] 3.31426840 4.68708331 6.62853679 9.37416663 13.25707358 18.74833326
[19] 26.51414717 37.49666652 53.02829433 74.99333304 106.05658867 149.98666607

attr(,"row.names")
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
attr(,"class")
[1] "unimix"

get_fitted_g(test.mixip)
$pi
[1] 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000
[8] 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000
[15] 0.000000000 0.000000000 0.000000000 0.010121267 0.009872140 0.008541232 0.108509013
[22] 0.831018262 0.017296417 0.014641670

pcarbo added bug enhancement labels May 24, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

poisson likelihood with different priors #70

poisson likelihood with different priors #70

jhsiao999 commented May 24, 2017

mengyin commented May 24, 2017

jhsiao999 commented May 24, 2017

pcarbo commented May 24, 2017

jhsiao999 commented May 24, 2017

jhsiao999 commented May 24, 2017

pcarbo commented May 24, 2017

mengyin commented May 24, 2017

pcarbo commented May 24, 2017

jhsiao999 commented May 24, 2017

stephens999 commented May 24, 2017 via email

stephens999 commented May 25, 2017

mengyin commented May 25, 2017

pcarbo commented May 25, 2017 •

edited

Loading

stephens999 commented May 25, 2017

pcarbo commented May 25, 2017

stephens999 commented May 25, 2017 •

edited

Loading

mengyin commented May 26, 2017 •

edited

Loading

pcarbo commented May 26, 2017

stephens999 commented May 26, 2017

mengyin commented May 26, 2017

stephens999 commented May 26, 2017

stephens999 commented May 26, 2017

mengyin commented May 26, 2017

poisson likelihood with different priors #70

poisson likelihood with different priors #70

Comments

jhsiao999 commented May 24, 2017

mengyin commented May 24, 2017

jhsiao999 commented May 24, 2017

pcarbo commented May 24, 2017

jhsiao999 commented May 24, 2017

jhsiao999 commented May 24, 2017

pcarbo commented May 24, 2017

mengyin commented May 24, 2017

pcarbo commented May 24, 2017

jhsiao999 commented May 24, 2017

stephens999 commented May 24, 2017 via email

stephens999 commented May 25, 2017

mengyin commented May 25, 2017

pcarbo commented May 25, 2017 • edited Loading

stephens999 commented May 25, 2017

pcarbo commented May 25, 2017

stephens999 commented May 25, 2017 • edited Loading

mengyin commented May 26, 2017 • edited Loading

pcarbo commented May 26, 2017

stephens999 commented May 26, 2017

mengyin commented May 26, 2017

stephens999 commented May 26, 2017

stephens999 commented May 26, 2017

mengyin commented May 26, 2017

pcarbo commented May 25, 2017 •

edited

Loading

stephens999 commented May 25, 2017 •

edited

Loading

mengyin commented May 26, 2017 •

edited

Loading