Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Alpha parameter >50 #28

Closed
matthieu-haudiquet opened this issue Nov 29, 2019 · 17 comments
Closed

Alpha parameter >50 #28

matthieu-haudiquet opened this issue Nov 29, 2019 · 17 comments

Comments

@matthieu-haudiquet
Copy link

Hello,

Thank you for this great package ! I have read in several places ( e.g. : https://cran.r-project.org/web/packages/rr2/rr2.pdf ) that an alpha > 50 generally means there is no phylogenetic signal and thus you can (should?) use glm instead of phyloglm to perform the logistic regression.

Is this always correct and could you explain a little bit why ?

Best regards

@cecileane
Copy link
Contributor

There is no such hard threshold, because it all depends on branch lengths. If you rescaled your tree to a total height of 1, then yes, you could use α>50 to mean "independent" residuals across species. But if your tree has a total height of 0.01, say, then α=50 would mean high phylogenetic correlation.
Cooper et al. 2016 have a nice section "Recommendations for interpreting α". They recommend to compare the phylogenetic half-life ln(2)/α to the total height of your phylogeny.

@matthieu-haudiquet
Copy link
Author

Thank you, that makes sense. It is much clearer now

@arives
Copy link

arives commented Dec 2, 2019

Matthieu,

This is a good point that you bring up, and a good point in Cecile's answer. The basic problem is that all optimizers have a difficult time finding maxima along boundaries. The form of the OU process in phylolm is parameterized so that alpha -> infinite as phylogenetic signal -> 0, so it is necessary to impose a boundary (alpha = 54, if I remember correctly). (Other formulations of the OU, like the one we used in Blomberg et al. 2003, use d = 1/alpha, but this formulation has exactly the same problem because it has a boundary at zero.) When hitting a boundary of no phylogenetic signal, it is best to refit the data with a model that does not include phylogenetic signal: glm() is the appropriate downgrade for phyloglm().

But Cecile is right to point out that alpha depends on branch lengths. The correct solution is always to transform branch lengths before doing a phylogenetic analysis. I typically standardize to make the determinant of the Brownian motion (starter tree) covariance matrix = 1, but it is also reasonable (and easier for phylo objects) to make the base to tip distance one. Also, it is best to make the phylogeny contemporaneous (ultrametric) for glms.

I'm not sure what the appropriate R etiquette is in a situation like this. Since rr2 is a "downstream" package from phylolm and other packages, I think Daijiang and I have to assume that users of our package are fitting models properly using the "upstream" functions. Maybe by that logic we shouldn't refit models for people when alpha > 50. On the other hand, that might stop people from making mistaken fitting, since I don't think you want to report the results from phylolm when alpha is at its maximum boundary.

Thanks again for point this out, and let me know what you think of my reasoning.

Cheers, Tony

@cecileane
Copy link
Contributor

The form of the OU process in phylolm is parameterized so that alpha -> infinite as phylogenetic signal -> 0, so it is necessary to impose a boundary (alpha = 54, if I remember correctly)

The default upper bound on α, in phylolm, does depend on the tree height T: it's 50/T (from here). So if the tree was rescaled to a tree height of 1, the default upper bound would be 50 in phylolm. For phyloglm, the upper bound on is exp(4)/T = 54.6/T, like Tony said.
The user does not need to rescale the tree: the software adjusts both the lower & upper bound on α accordingly. If either bound is hit, I believe that the user is given a warning, for the sake for full disclosure (but these warnings can reasonably be ignored since the defaults are already very wide).

The correct solution is always to transform branch lengths before doing a phylogenetic analysis.

I would not say "always". For example, one might be interested to know the effect of taxon sampling, or to repeat an analysis on various subclades. To be able to compare the analyses, it's best not to rescale the tree between resampling / subsetting strategies.

@arives
Copy link

arives commented Dec 4, 2019

Cecile,

Thanks for correcting me! I've always standardized branch lengths and then culled for alpha > 50; when doing lots of simulations, this is how I'd automate the identification of "lack of convergence" cases. I'll fix rr2.

Matthieu,

In case you are interested, here is code that illustrates why you should heed the warning in phyloglm:

n <- 500
phy <- compute.brlen(rtree(n=n), method = "Grafen", power = 1)
phy2 <- phy
phy2$edge.length <- 1000*phy2$edge.length

par(mfrow=c(1,2))
plot(phy, show.tip.label=F)
add.scale.bar()
plot(phy2, show.tip.label=F)
add.scale.bar()

X1 <- rTraitCont(phy, model = "BM", sigma = 1)
X1 <- (X1 - mean(X1))/var(X1)
X = cbind(rep(1,n),X1)

sim.dat <- data.frame(Y=array(0, dim=n), X1=X1, row.names=phy$tip.label)
sim.dat$Y <- rbinTrait(n=1, phy=phy, beta=c(0,0.5), alpha=100, X=X)

Compare with different branch lengths

summary(phyloglm(Y ~ X1, phy=phy, data=sim.dat))
summary(phyloglm(Y ~ X1, phy=phy2, data=sim.dat))

Cheers, Tony

@arives
Copy link

arives commented Dec 4, 2019

Cecile,

I just looked at this a little more, and I get cases when phyloglm gives a warning but the converge code is 0. Specifically, from the code in my last comment, you can get warnings and pretty different fits, but still have convergence codes of 0.

Would it be worth changing adding a "warning" code to phyloglm objects, or giving a non-zero convergence code under the same conditions that triggers the warning?

Cheers, Tony

@lamho86
Copy link
Owner

lamho86 commented Dec 5, 2019 via email

@arives
Copy link

arives commented Dec 5, 2019 via email

@matthieu-haudiquet
Copy link
Author

Everyone,

Thank you Cecile, Anthony and Lam for the fruitful discussion. Especially I wasn't aware about the need to rescale the trees before such analysis.

Anthony thank you for the exemple, you are right

I agree that it would be very practical to have a flag when there is a warning about alpha being too close to its boundaries. Since I run a lot of phyloglm(), for now I just captured the warnings in R to flag it afterwards, but it would be nice if there was a flag so that I could just include a simple test in my functions. Would it be a good approach to increase the log.alpha.bound parameter until this warning (eventually) disappears ?

Best,
Matthieu

@arives
Copy link

arives commented Dec 5, 2019 via email

@matthieu-haudiquet
Copy link
Author

Ok, thanks !

@lamho86
Copy link
Owner

lamho86 commented Dec 8, 2019 via email

@arives
Copy link

arives commented Dec 8, 2019 via email

@matthieu-haudiquet
Copy link
Author

thanks a lot !
Matt

@arives
Copy link

arives commented Jun 17, 2020

Lam,

Just a quick question. I'm about to update rr2 on CRAN, and it would be nice to use your updates with alphaWarn handling. Do you plan to update your CRAN version in the near future?

Thanks, Tony

@lamho86
Copy link
Owner

lamho86 commented Jun 17, 2020

Hello Tony,

It has been a while since we last updated the CRAN version. I will push the current version to CRAN today.

@arives
Copy link

arives commented Jun 17, 2020 via email

@lamho86 lamho86 closed this as completed Aug 9, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants