Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MEDUSA parameter estimation (?) error #13

Closed
BenjaminBlanchard opened this issue Jul 29, 2015 · 27 comments
Closed

MEDUSA parameter estimation (?) error #13

BenjaminBlanchard opened this issue Jul 29, 2015 · 27 comments

Comments

@BenjaminBlanchard
Copy link

I'm attempting to run a MEDUSA analysis in geiger - the analysis initially runs along fine, and estimates the number of rate shifts. But then when it comes to the end (following no more significant increases in the aicc score), it produces this error:

Error in uniroot(thresholdR, lower = low.bound, upper = par1) :
f.lower = f(lower) is NA

After looking at the package code, it seems that there's something strange going on with arriving at a confidence interval for parameter "r", but I have no idea what. Also, the issue goes away when I reduce the species number in the richness file, although there are still several NA's in the parameter estimation output. Any idea what might be causing this issue?

@josephwb
Copy link
Collaborator

I'll look into this. The MEDUSA code is in between versions; updating should (hopefully) fix things.

@BenjaminBlanchard
Copy link
Author

Great - thanks! Any estimate on when the new version will be available?

@josephwb
Copy link
Collaborator

@BenjaminBlanchard I apologize for the difficulty; entirely my fault. Unfortunately I am travelling at the moment. I definitely need to get the updates implemented. I'll update as I am able.

@BenjaminBlanchard
Copy link
Author

@josephwb Just wondering, any rough guess on when the update will be completed? I have the most recent version, so if the update has already been implemented, then I'm still having the same problem!

@josephwb
Copy link
Collaborator

@BenjaminBlanchard: apologies for the delay. I am currently overhauling the code.

Note that the NAs that you report are because a yule model was fit to that partition (yule does not have extinction, so epsilon is not estimated). You can force the model to be birth-death, if you like.

In terms of the crashing, it has something to do with the configuration of branch lengths and the diversities such that the optimizer gets stuck. It seems that the estimates of extinction (epsilon) are so small that trying to calculate confidence intervals runs into numerical precision problems. This may be solved by putting a reasonable lower bound on epsilon.

In the meantime, if it is of use I can skip the confidence interval calculations (if they are not of interest to you) and get you just the raw results. Is this something you'd want?

@BenjaminBlanchard
Copy link
Author

@josephwb I very much appreciate the response, and the description of what's going on - very helpful!

Actually, yes, having the raw results without confidence intervals would still be useful if it's not too much trouble! Thank you very much!

@josephwb
Copy link
Collaborator

@BenjaminBlanchard I've been pulling my hair out trying to figure out what is going on with your data. This morning I by accident opened your richness file, and find that 38 tips have richnesses of zero! MEDUSA does not have a check for this, so these zero-diversities go through to analysis and break everything.

To fix things, do:

richness <- richness[-which(richness[,2] == 0),];

and then run your analysis as normal. I've confirmed that this works for your data with bd, yule, and mixed models.

Apologies that the richnesses instructions are not clear. I will update the doc, and implement a check so that no one has to go through what you've gone through.

@BenjaminBlanchard
Copy link
Author

@josephwb Wow, first, I very much appreciate you spending time on this, and second, I apologize for such a simple mistake! I had thought that the richness file was supposed to have the number of additional taxa not included, but I assume, then, that it's actually supposed to be the total diversity, including the tip on the tree?

Again, many, many thanks for working through this, and sorry again for the mistake!

@BenjaminBlanchard
Copy link
Author

@josephwb Actually I see now that it's really clear that "if a tip does not represent a higher level taxon, its richness should be 1". I don't know why I missed that...

@BenjaminBlanchard
Copy link
Author

Actually...... I had sent you the wrong richness file before, and have emailed you the correct one (with no 0's in the file), because I am still getting the same problem even with no 0's, unfortunately.

@josephwb
Copy link
Collaborator

@BenjaminBlanchard Yes, the richnesses in the richness file should represent total diversity.

I've run the richness file you sent me, and everything seems to work fine. Maybe there is junk in your workspace? Maybe try quitting and redoing things.

@josephwb
Copy link
Collaborator

Nevermind, I can recreate the error.

@josephwb
Copy link
Collaborator

@BenjaminBlanchard Please try this:

install.packages("devtools");
library(devtools);
install_github("josephwb/turboMEDUSA/MEDUSA", dependencies=TRUE);

This will install a different version of MEDUSA (the version that will replace the code in the geiger repo). Then run as:

require(MEDUSA);
richness <- read.csv("Richness_Table.csv", header=TRUE);
tree <- read.nexus("Tree_WorkersOnly.nex");
medusa_ants.default <- MEDUSA(tree, richness=richness);

Does that work?

@josephwb
Copy link
Collaborator

@BenjaminBlanchard This is what I get:

> medusa_ants.default <- MEDUSA(tree, richness=richness);
Using AIC-threshold as analysis-terminating criterion.
Appropriate threshold for tree of 270 tips is: 7.002076.

Preparing data for analysis:
  Gathering descendant node information... done.
  Gathering tip richness information... done.
done.

Optimizing parameters for pendant edges... done.
Pre-calculating parameters for internal nodes... done.

Step 1: lnLik=-2290.987; AICc=4585.997; model=bd
Step 2: lnLik=-2263.917; AICc=4537.947; shift at node 328; model=bd; cut=stem; # shifts=1
Step 3: lnLik=-2246.463; AICc=4509.197; shift at node 379; model=bd; cut=stem; # shifts=2
Step 4: lnLik=-2224.974; AICc=4472.45; shift at node 478; model=bd; cut=stem; # shifts=3
Step 5: lnLik=-2211.969; AICc=4452.74; shift at node 441; model=bd; cut=node; # shifts=4
Step 6: lnLik=-2201.492; AICc=4436.027; shift at node 425; model=yule; cut=node; # shifts=5
Step 7: lnLik=-2191.156; AICc=4419.627; shift at node 321; model=yule; cut=node; # shifts=6
Step 8: lnLik=-2181.267; AICc=4404.157; shift at node 401; model=yule; cut=node; # shifts=7
Step 9: lnLik=-2171.455; AICc=4388.872; shift at node 49; model=yule; cut=stem; # shifts=8
Step 10: lnLik=-2162.385; AICc=4375.105; shift at node 485; model=yule; cut=node; # shifts=9
Step 11: lnLik=-2153.526; AICc=4361.795; shift at node 324; model=yule; cut=node; # shifts=10
Step 12: lnLik=-2144.933; AICc=4349.05; shift at node 291; model=yule; cut=stem; # shifts=11
Step 13: lnLik=-2137.135; AICc=4337.931; shift at node 66; model=yule; cut=stem; # shifts=12
Step 14: lnLik=-2129.239; AICc=4326.651; shift at node 155; model=yule; cut=stem; # shifts=13
Step 15: lnLik=-2121.715; AICc=4316.152; shift at node 384; model=yule; cut=node; # shifts=14
Step 16: lnLik=-2114.24; AICc=4305.787; shift at node 427; model=yule; cut=stem; # shifts=15
Step 17: lnLik=-2106.773; AICc=4295.474; shift at node 3; model=yule; cut=stem; # shifts=16
Step 18: lnLik=-2099.498; AICc=4285.582; shift at node 353; model=yule; cut=node; # shifts=17
Step 19: lnLik=-2092.787; AICc=4276.857; shift at node 512; model=yule; cut=stem; # shifts=18
Step 20: lnLik=-2086.579; AICc=4269.174; shift at node 373; model=yule; cut=stem; # shifts=19

No significant increase in aicc score. Disregarding subsequent piecewise models.

Summary of optimal MEDUSA model:
   Model.ID Shift.Node Cut.At Model Ln.Lik.part          r  epsilon
1         1        271   node    bd    -501.183 0.02828970 0.937018
2         2        328   stem    bd   -936.8519 0.05992600 0.746654
3         3        379   stem    bd   -92.70311 0.10316900 0.890933
4         4        478   stem    bd   -48.76239 0.06117870 0.978703
5         5        441   node    bd   -22.60155 0.00174674 0.999623
6         6        425   node  yule   -13.31896 0.16729600       NA
7         7        321   node  yule   -10.99961 0.22944800       NA
8         8        401   node  yule   -5.960211 0.00701145       NA
9         9         49   stem  yule   -6.725217 0.16092200       NA
10       10        485   node  yule   -10.06846 0.27070100       NA
11       11        324   node  yule   -6.137248 0.00587383       NA
12       12        291   stem  yule   -228.4913 0.07673940       NA
13       13         66   stem  yule    -6.91215 0.16631500       NA
14       14        155   stem  yule   -7.327043 0.14682600       NA
15       15        384   node  yule           0 0.00000000       NA
16       16        427   stem  yule   -99.52476 0.06852910       NA
17       17          3   stem  yule           0 0.00000000       NA
18       18        353   node  yule   -8.198065 0.01724200       NA
19       19        512   stem  yule   -62.88513 0.07235690       NA
20       20        373   stem  yule   -17.92887 0.21354500       NA

@BenjaminBlanchard
Copy link
Author

Huzzah!!! It works! Thank you so much, I really appreciate it a lot!

@BenjaminBlanchard
Copy link
Author

@josephwb Oh no.... now I'm having issues with plotting...

plot(medusa_ants.default)

Yields:

Error in plot.window(...) : need finite 'xlim' values In addition: Warning messages: 1: In min(x) : no non-missing arguments to min; returning Inf 2: In max(x) : no non-missing arguments to max; returning -Inf 3: In min(x) : no non-missing arguments to min; returning Inf 4: In max(x) : no non-missing arguments to max; returning -Inf

@josephwb
Copy link
Collaborator

@BenjaminBlanchard Unfortunately all of the commands are off just enough to be thoroughly annoying. 😱

Try this:

summ <- medusaSummary(medusa_ants.default);

While medusaSummary plots the results, you may want to pass the results stored in "summ" above to the function plotPrettyTree for more flexible plotting. To see the options of this function, do:

?plotPrettyTree

For more commands, you can do:

?MEDUSA

@josephwb
Copy link
Collaborator

Here is what the straight-up medusaSummary gives:
summary

@BenjaminBlanchard
Copy link
Author

@josephwb Yet again, thank you very much! I'm glad it was such a quick fix for this one, ha.

@ghost
Copy link

ghost commented Mar 31, 2017

Hi,

Hoping to get an answer from either of you, so I pasted my question here too. :P

I'm preparing a table for species richness, but I have a little difficulty in understanding "All taxa missing from the tree have to be assigned to one of the tips in the richness matrix." in the manual.

For example, one clade on my tree is A family, which including 10 genera, but I only sampled species from 8 genera, but how could I prepare a table reflecting two missing genera?

I know I need to record:
Family A genera1 5 species
Family A ... ...
Family A genra8 6 species

So I want to know how to record richness of genera9, genera 10 in the richness list, and I dont have any of samples from genera9 and genera 10 in the tree. I dont know the relationship of genera 9 or genera 10 between the tips on the tree, how could I "assign genera9 or genera 10 to one of the tips in the richness matrix"?

Thanks!

@josephwb
Copy link
Collaborator

One simple way is to assign richness of the entire family to a single tip. Unfortunately you lose a lot of information that way (i.e. edges), but it is an honest presentation of the data.

Have these genera never been studied before? If they have, and you can make assumptions of monophyly, you can record their richness in with the closest sampled genus.

Failing that, you are kinda stuck, I think. Ideally if one wants to run this type of analysis then sampling should be performed accordingly. If you want to really make assumptions you can add the missing genera as tips in a polytomy with the other genera (MEDUSA will randomly resolve the polytomy).

Finally, if you are willing to assume that the missing taxa are randomly missing, you can run the analysis with only the sampled taxa you have. I'm not sure if I would believe the results that come out of this, but some people do it (because they have little other choice).

I hope this helps, and am sorry I cannot be more constructive! :bowtie:

@ghost
Copy link

ghost commented Mar 31, 2017

@josephwb Thanks for your quick responding! I have tried so hard that expanding my sampling to include more species, so it's hard choice to shrink the whole phylogeny into a family tree in order to save the richness information for those missing samples genera. Those genera not sampled for several reasons: 1, no molecular data (no data for my sampled genes); 2, maybe never studied; 3, only exist in literature or plant list.
I don't think I have a solid reference to add any of those miss genera to the phylogeny tree resolved from actually molecular data, even in a polytomy matter (the branch length for those missing genera could be biased, or misleading, right?).

Thanks for those options, I think I may run it with all the sampled I have (pretend it's complete sampling on genus level), see how the results look like.

@ghost
Copy link

ghost commented Mar 31, 2017

@josephwb I think the MEDUSA used in Tank et al., 2105 (Nested radiations and the pulse of angiosperm diversification: increased diversification rates often follow whole genome duplications) is the same version here, right? I think the genera under each family in the richness list are not complete all of genera recorded? What do you think?

@josephwb
Copy link
Collaborator

Yup, same version.

@ghost
Copy link

ghost commented Apr 3, 2017

@josephwb One more question :)
So if I compress my phylogeny (species level) into genus level, each genus has to be monophyletic? What's your suggestion if some genera are not monophyletic? How to prepare the species richness table?

Thank you very much!

@josephwb
Copy link
Collaborator

josephwb commented Apr 3, 2017

You will need top focus on monophyletic lineages rather than genera (if the latter are not monophyletic). This will mean some of your tips will represent genera, some parts of genera, and some multiple genera. This can be quite tedious if your tree/taxonomy is large.

If you send me your tree & taxonomy I can try to lump things myself to see how differently we do things. I might even be able to come up with a general function that can do this for you (no promises!).

@ghost
Copy link

ghost commented Apr 3, 2017

@josephwb Ok, great! Thanks! I'm trying to finalize the topology, however, I have a phylogeny with around 20,000 species, so as you mentioned above, the perfect monophyly is hard to reach for all the lineages when you had a little bit denser sampling. I'll bother you again when I have my tree and richness table ready. Thank you so much Joseph!!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants