Alpha parameter >50 #28

matthieu-haudiquet · 2019-11-29T14:59:00Z

Hello,

Thank you for this great package ! I have read in several places ( e.g. : https://cran.r-project.org/web/packages/rr2/rr2.pdf ) that an alpha > 50 generally means there is no phylogenetic signal and thus you can (should?) use glm instead of phyloglm to perform the logistic regression.

Is this always correct and could you explain a little bit why ?

Best regards

cecileane · 2019-11-29T15:17:49Z

There is no such hard threshold, because it all depends on branch lengths. If you rescaled your tree to a total height of 1, then yes, you could use α>50 to mean "independent" residuals across species. But if your tree has a total height of 0.01, say, then α=50 would mean high phylogenetic correlation.
Cooper et al. 2016 have a nice section "Recommendations for interpreting α". They recommend to compare the phylogenetic half-life ln(2)/α to the total height of your phylogeny.

matthieu-haudiquet · 2019-11-29T15:48:04Z

Thank you, that makes sense. It is much clearer now

arives · 2019-12-02T16:39:27Z

Matthieu,

This is a good point that you bring up, and a good point in Cecile's answer. The basic problem is that all optimizers have a difficult time finding maxima along boundaries. The form of the OU process in phylolm is parameterized so that alpha -> infinite as phylogenetic signal -> 0, so it is necessary to impose a boundary (alpha = 54, if I remember correctly). (Other formulations of the OU, like the one we used in Blomberg et al. 2003, use d = 1/alpha, but this formulation has exactly the same problem because it has a boundary at zero.) When hitting a boundary of no phylogenetic signal, it is best to refit the data with a model that does not include phylogenetic signal: glm() is the appropriate downgrade for phyloglm().

But Cecile is right to point out that alpha depends on branch lengths. The correct solution is always to transform branch lengths before doing a phylogenetic analysis. I typically standardize to make the determinant of the Brownian motion (starter tree) covariance matrix = 1, but it is also reasonable (and easier for phylo objects) to make the base to tip distance one. Also, it is best to make the phylogeny contemporaneous (ultrametric) for glms.

I'm not sure what the appropriate R etiquette is in a situation like this. Since rr2 is a "downstream" package from phylolm and other packages, I think Daijiang and I have to assume that users of our package are fitting models properly using the "upstream" functions. Maybe by that logic we shouldn't refit models for people when alpha > 50. On the other hand, that might stop people from making mistaken fitting, since I don't think you want to report the results from phylolm when alpha is at its maximum boundary.

Thanks again for point this out, and let me know what you think of my reasoning.

Cheers, Tony

cecileane · 2019-12-04T03:26:13Z

The form of the OU process in phylolm is parameterized so that alpha -> infinite as phylogenetic signal -> 0, so it is necessary to impose a boundary (alpha = 54, if I remember correctly)

The default upper bound on α, in phylolm, does depend on the tree height T: it's 50/T (from here). So if the tree was rescaled to a tree height of 1, the default upper bound would be 50 in phylolm. For phyloglm, the upper bound on is exp(4)/T = 54.6/T, like Tony said.
The user does not need to rescale the tree: the software adjusts both the lower & upper bound on α accordingly. If either bound is hit, I believe that the user is given a warning, for the sake for full disclosure (but these warnings can reasonably be ignored since the defaults are already very wide).

The correct solution is always to transform branch lengths before doing a phylogenetic analysis.

I would not say "always". For example, one might be interested to know the effect of taxon sampling, or to repeat an analysis on various subclades. To be able to compare the analyses, it's best not to rescale the tree between resampling / subsetting strategies.

arives · 2019-12-04T13:05:50Z

Cecile,

Thanks for correcting me! I've always standardized branch lengths and then culled for alpha > 50; when doing lots of simulations, this is how I'd automate the identification of "lack of convergence" cases. I'll fix rr2.

Matthieu,

In case you are interested, here is code that illustrates why you should heed the warning in phyloglm:

n <- 500
phy <- compute.brlen(rtree(n=n), method = "Grafen", power = 1)
phy2 <- phy
phy2$edge.length <- 1000*phy2$edge.length

par(mfrow=c(1,2))
plot(phy, show.tip.label=F)
add.scale.bar()
plot(phy2, show.tip.label=F)
add.scale.bar()

X1 <- rTraitCont(phy, model = "BM", sigma = 1)
X1 <- (X1 - mean(X1))/var(X1)
X = cbind(rep(1,n),X1)

sim.dat <- data.frame(Y=array(0, dim=n), X1=X1, row.names=phy$tip.label)
sim.dat$Y <- rbinTrait(n=1, phy=phy, beta=c(0,0.5), alpha=100, X=X)

Compare with different branch lengths

summary(phyloglm(Y ~ X1, phy=phy, data=sim.dat))
summary(phyloglm(Y ~ X1, phy=phy2, data=sim.dat))

Cheers, Tony

arives · 2019-12-04T13:20:32Z

Cecile,

I just looked at this a little more, and I get cases when phyloglm gives a warning but the converge code is 0. Specifically, from the code in my last comment, you can get warnings and pretty different fits, but still have convergence codes of 0.

Would it be worth changing adding a "warning" code to phyloglm objects, or giving a non-zero convergence code under the same conditions that triggers the warning?

Cheers, Tony

lamho86 · 2019-12-05T04:15:30Z

Hello Tony, I think this is a good idea! Thank you for your suggestion. On the other hand, it is not strange that you can get convergence code of 0 and the warning at the same time. In this situation, the optimizer believes that the estimated alpha is the maximum of the likelihood function in the interval (0,54). On the other hand, the warning appears when the estimated alpha is close to 54. So, they are not mutually exclusive. When this happens, the estimated alpha will be something different from 54 but very close. You can think of this scenario as the likelihood function has a little bump near 54. However, the bump will be tiny since we know that the likelihood surface is quite flat near the boundary. Best, Lam Lam Ho | Assistant Professor FACULTY OF SCIENCE Department of Mathematics and Statistics 902.494.1069 Dalhousie University dal.ca

…

On Dec 4, 2019, at 9:20 AM, Anthony R. Ives ***@***.***> wrote: Cecile, I just looked at this a little more, and I get cases when phyloglm gives a warning but the converge code is 0. Specifically, from the code in my last comment, you can get warnings and pretty different fits, but still have convergence codes of 0. Would it be worth changing adding a "warning" code to phyloglm objects, or giving a non-zero convergence code under the same conditions that triggers the warning? Cheers, Tony — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#28?email_source=notifications&email_token=ABW3HP6KDCI7B3JMDC5J4ZLQW6VCBA5CNFSM4JTARZZKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEF472QQ#issuecomment-561642818>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABW3HP3BXSE73PUG7SQEA5DQW6VCBANCNFSM4JTARZZA>.

arives · 2019-12-05T12:21:19Z

Lam, I totally agree that it is not surprising. I was just surprised that the optimizer would give a convergence code of 0 when the estimates were still quite poor as flagged by the warning. In the example code I used, changing the scaling of the branch lengths could give pretty different estimates of the fixed effects (coefficients) while still giving convergence = 0 under both phylogenetic trees (original and rescaled). It would be nice to have a simple way to flag these cases that I could integrate into rr2. I didn't want to condition refitting phyloglm as an glm on a warning that is not an attribute of the phyloglm object. Of course, there are two other solutions: (i) ignore the issue and leave it to the user to make sure the fit is good, or (ii) automatically refit phyloglm after rescaling the phylogenetic tree outside of phylglm. I'm slowly (too slowly) learning how to make code/packages that work with different styles of analyses that people use. I do automatically rescale trees before analyses, even if the functions I use also rescale trees internally (like phylolm and functions that I've written). But obviously the packages have to be written for people that don't do this, and I need to be more aware of this. Cheers, Tony From: Lam Ho <notifications@github.com> Reply-To: lamho86/phylolm <reply@reply.github.com> Date: Wednesday, December 4, 2019 at 10:15 PM To: lamho86/phylolm <phylolm@noreply.github.com> Cc: "Anthony R. Ives" <arives@wisc.edu>, Comment <comment@noreply.github.com> Subject: Re: [lamho86/phylolm] Alpha parameter >50 (#28) Hello Tony, I think this is a good idea! Thank you for your suggestion. On the other hand, it is not strange that you can get convergence code of 0 and the warning at the same time. In this situation, the optimizer believes that the estimated alpha is the maximum of the likelihood function in the interval (0,54). On the other hand, the warning appears when the estimated alpha is close to 54. So, they are not mutually exclusive. When this happens, the estimated alpha will be something different from 54 but very close. You can think of this scenario as the likelihood function has a little bump near 54. However, the bump will be tiny since we know that the likelihood surface is quite flat near the boundary. Best, Lam Lam Ho | Assistant Professor FACULTY OF SCIENCE Department of Mathematics and Statistics 902.494.1069 Dalhousie University dal.ca

On Dec 4, 2019, at 9:20 AM, Anthony R. Ives ***@***.***> wrote: Cecile, I just looked at this a little more, and I get cases when phyloglm gives a warning but the converge code is 0. Specifically, from the code in my last comment, you can get warnings and pretty different fits, but still have convergence codes of 0. Would it be worth changing adding a "warning" code to phyloglm objects, or giving a non-zero convergence code under the same conditions that triggers the warning? Cheers, Tony — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#28?email_source=notifications&email_token=ABW3HP6KDCI7B3JMDC5J4ZLQW6VCBA5CNFSM4JTARZZKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEF472QQ#issuecomment-561642818>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABW3HP3BXSE73PUG7SQEA5DQW6VCBANCNFSM4JTARZZA>.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub<#28?email_source=notifications&email_token=ACYX6LDFSHVMB2DRJ7PGP53QXB56JA5CNFSM4JTARZZKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEF7NPPY#issuecomment-561960895>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ACYX6LHJWD3HMFNSUIXPTITQXB56JANCNFSM4JTARZZA>.

matthieu-haudiquet · 2019-12-05T14:26:03Z

Everyone,

Thank you Cecile, Anthony and Lam for the fruitful discussion. Especially I wasn't aware about the need to rescale the trees before such analysis.

Anthony thank you for the exemple, you are right

I agree that it would be very practical to have a flag when there is a warning about alpha being too close to its boundaries. Since I run a lot of phyloglm(), for now I just captured the warnings in R to flag it afterwards, but it would be nice if there was a flag so that I could just include a simple test in my functions. Would it be a good approach to increase the log.alpha.bound parameter until this warning (eventually) disappears ?

Best,
Matthieu

arives · 2019-12-05T14:41:11Z

Matthieu, I don't think you want to change the log.alpha.bound parameter boundary, since the way the function is coded (I know this now!) a value close to 54 on the rescaled tree is going to give essentially no signal, and hence optimization problems. When I get the warning, I always just revert to glm. Cheers, Tony

matthieu-haudiquet · 2019-12-05T14:45:20Z

Ok, thanks !

lamho86 · 2019-12-08T19:41:45Z

Hello everyone, I have added a flag (alphaWarn) in phyloglm for this warning. ‘0’ means good, ‘1’ means the estimate of alpha is near the lower bound, and ‘2’ means the estimate of alpha is near the upper bound. Best, Lam Lam Ho | Assistant Professor FACULTY OF SCIENCE Department of Mathematics and Statistics 902.494.1069 Dalhousie University dal.ca

…

On Dec 5, 2019, at 10:45 AM, Matthieu Haudiquet ***@***.***> wrote: Ok, thanks ! — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#28?email_source=notifications&email_token=ABW3HP2NP54CQIHPKTJUSZTQXEHYDA5CNFSM4JTARZZKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEGA6AFI#issuecomment-562159637>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABW3HP6DK7YD6BBHD57BIR3QXEHYDANCNFSM4JTARZZA>.

arives · 2019-12-08T21:03:03Z

Lam, That is really helpful. Thanks! Cheers, Tony From: Lam Ho <notifications@github.com> Reply-To: lamho86/phylolm <reply@reply.github.com> Date: Sunday, December 8, 2019 at 1:41 PM To: lamho86/phylolm <phylolm@noreply.github.com> Cc: "Anthony R. Ives" <arives@wisc.edu>, Comment <comment@noreply.github.com> Subject: Re: [lamho86/phylolm] Alpha parameter >50 (#28) Hello everyone, I have added a flag (alphaWarn) in phyloglm for this warning. ‘0’ means good, ‘1’ means the estimate of alpha is near the lower bound, and ‘2’ means the estimate of alpha is near the upper bound. Best, Lam Lam Ho | Assistant Professor FACULTY OF SCIENCE Department of Mathematics and Statistics 902.494.1069 Dalhousie University dal.ca

On Dec 5, 2019, at 10:45 AM, Matthieu Haudiquet ***@***.***> wrote: Ok, thanks ! — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#28?email_source=notifications&email_token=ABW3HP2NP54CQIHPKTJUSZTQXEHYDA5CNFSM4JTARZZKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEGA6AFI#issuecomment-562159637>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABW3HP6DK7YD6BBHD57BIR3QXEHYDANCNFSM4JTARZZA>.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub<#28?email_source=notifications&email_token=ACYX6LFK6ZIVS6QEMZT6PEDQXVEXVA5CNFSM4JTARZZKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEGHHR5I#issuecomment-562985205>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ACYX6LHFC54SCVI6WQWTLSDQXVEXVANCNFSM4JTARZZA>.

matthieu-haudiquet · 2019-12-09T10:35:05Z

thanks a lot !
Matt

arives · 2020-06-17T16:42:32Z

Lam,

Just a quick question. I'm about to update rr2 on CRAN, and it would be nice to use your updates with alphaWarn handling. Do you plan to update your CRAN version in the near future?

Thanks, Tony

lamho86 · 2020-06-17T16:49:58Z

Hello Tony,

It has been a while since we last updated the CRAN version. I will push the current version to CRAN today.

arives · 2020-06-17T16:51:15Z

Thanks so much! From: Lam Ho <notifications@github.com> Reply-To: lamho86/phylolm <reply@reply.github.com> Date: Wednesday, June 17, 2020 at 11:50 AM To: lamho86/phylolm <phylolm@noreply.github.com> Cc: "Anthony R. Ives" <arives@wisc.edu>, Comment <comment@noreply.github.com> Subject: Re: [lamho86/phylolm] Alpha parameter >50 (#28) Hello Tony, It has been a while since we last updated the CRAN version. I will push the current version to CRAN today. — You are receiving this because you commented. Reply to this email directly, view it on GitHub<#28 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ACYX6LASETE7MSZJYMYWDNDRXDX4LANCNFSM4JTARZZA>.

matthieu-haudiquet mentioned this issue Dec 2, 2019

phyloglm() alpha > 50 arives/rr2#14

Closed

lamho86 closed this as completed Aug 9, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Alpha parameter >50 #28

Alpha parameter >50 #28

matthieu-haudiquet commented Nov 29, 2019

cecileane commented Nov 29, 2019

matthieu-haudiquet commented Nov 29, 2019

arives commented Dec 2, 2019

cecileane commented Dec 4, 2019

arives commented Dec 4, 2019

arives commented Dec 4, 2019

lamho86 commented Dec 5, 2019 via email

arives commented Dec 5, 2019 via email

matthieu-haudiquet commented Dec 5, 2019

arives commented Dec 5, 2019 via email

matthieu-haudiquet commented Dec 5, 2019

lamho86 commented Dec 8, 2019 via email

arives commented Dec 8, 2019 via email

matthieu-haudiquet commented Dec 9, 2019

arives commented Jun 17, 2020

lamho86 commented Jun 17, 2020

arives commented Jun 17, 2020 via email

Alpha parameter >50 #28

Alpha parameter >50 #28

Comments

matthieu-haudiquet commented Nov 29, 2019

cecileane commented Nov 29, 2019

matthieu-haudiquet commented Nov 29, 2019

arives commented Dec 2, 2019

cecileane commented Dec 4, 2019

arives commented Dec 4, 2019

Compare with different branch lengths

arives commented Dec 4, 2019

lamho86 commented Dec 5, 2019 via email

arives commented Dec 5, 2019 via email

matthieu-haudiquet commented Dec 5, 2019

arives commented Dec 5, 2019 via email

matthieu-haudiquet commented Dec 5, 2019

lamho86 commented Dec 8, 2019 via email

arives commented Dec 8, 2019 via email

matthieu-haudiquet commented Dec 9, 2019

arives commented Jun 17, 2020

lamho86 commented Jun 17, 2020

arives commented Jun 17, 2020 via email