Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

0 branch lengths in Gray et al 2009 MCCT-tree #1

Closed
HedvigS opened this issue May 17, 2021 · 5 comments
Closed

0 branch lengths in Gray et al 2009 MCCT-tree #1

HedvigS opened this issue May 17, 2021 · 5 comments

Comments

@HedvigS
Copy link
Owner

HedvigS commented May 17, 2021

I have a problem when running phytools::make.simmap() with the Gray et al 2009 MCCT-tree. make.simmap() throws this error:

Error in 1:nrow(L) : argument of length 0

For full clarity, I'm running:

phytools::make.simmap(
                                    model = "ARD", 
                                    pi = "estimated", 
                                    method = "optim")

I am fairly confident I've diagnosed the issue, there are branch lengths of length zero in the tree. The 0 branch lengths are there already in the MCCT-tree, they're not the result of pruning or some other manipulation of the tree.

In this project, I'm doing ASR with parsimony (castor::asr_max_parsimony(), script here), ML (corHMM::corHMM(), script here) and SCM (phytools::make.simmap(), script here). This problem with the 0 branch lengths only stops the analysis with make.simmap(), meaning the other two functions are eating up the tree just fine - but that I should probably still be concerned since there are still 0 branch lengths in all analysis with this tree.

Now, I know of three ways of solving this:

  1. compute.brlen()
  2. replacing the 0 branch lengths with something tiny, but not zero. ( for example: tree$edge.length[tree$edge.length==0] <- max(nodeHeights(tree ))*1e-6)
  3. sampling the posterior

I don't want to do (1) because I don't want Grafen branch lengths when I have actual branch lengths to work with. That option is scrapped right away.

The choice is between (2) and (3). I checked all the posterior trees, and as far as I can tell none of them have 0 branch lengths. But, I'm not 100% confident about the way I was checking them so if someone else has a manner of checking them and finds a number greater than 0, holler.

The argument against doing (3) is mainly that this is a project that's aimed at being understood by traditional linguists, and I think they would struggle a bit with the idea that I'm randomly choosing a tree out of a posterior each time I'm doing the ASR and comparing to their findings.

The argument against (2) is that even teeny-tiny branch lengths will do weird things with SCM, since trying to model change along such very very short branches makes for weird results.

I'm leaning towards doing (3). The question then is when to randomly sample:

a) every time the ASR is run on each feature (once per feature + methods combo, i.e. 201 *3)
b) once for all ASR on all features per method (once for parsimony, once for ML and once for SCM, i.e 3 times)
c) once for all ASR on all features on all methods (once in total)

(a) seems the methodologically soundest to me.

Advice @king-ben @SimonGreenhill?

@HedvigS HedvigS changed the title 0 branch lengths in Gray et al 2009-tree 0 branch lengths in Gray et al 2009 MCCT-tree May 17, 2021
@king-ben
Copy link
Collaborator

Could you try recalculating the MCCT tree? I'm confused why there are 0 length branches if there are none in the posterior samples.

@SimonGreenhill
Copy link
Collaborator

The 0 branches are an artefact of the dating method used (r8s) which rounded small branches below a threshold to zero. I only found this a few years ago. I think that I made the MCC tree from the undated posterior distribution (where branches were proportional to change) and then ran r8s on it. The dated posterior should not have the same issue. So, I'd go with option 3.

@HedvigS
Copy link
Owner Author

HedvigS commented May 18, 2021

Thanks @SimonGreenhill . (3) and then the (a) option?

@SimonGreenhill
Copy link
Collaborator

(d) do it on all trees to account for phylogenetic uncertainty.

@HedvigS
Copy link
Owner Author

HedvigS commented May 18, 2021

Makes sense.

Adding "get access to computer cluster to to do-list."

@HedvigS HedvigS closed this as completed May 18, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants