Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resulting xml cannot estimate clock rate in a uniform distribution prior #135

Closed
pedrotaucce opened this issue Feb 9, 2023 · 10 comments
Closed

Comments

@pedrotaucce
Copy link

Describe the bug
I am using the package to run beast in several mitochondrial alignments, it is being really helpful, thanks!

I don't know if I am doing something wrong, but I cannot estimate clock rate using an uniform prior. The xml file produced has 'estimate="false"' on it.

To Reproduce
Script to reproduce the behavior:

fasta_filename <- get_babette_path("anthus_aco_sub.fas")
mrca.taxa <- get_taxa_names(fasta_filename)
mrca.taxa <- mrca.taxa[2:length(mrca.taxa)]
mrca.prior <- create_mrca_prior(taxa_names=mrca.taxa,is_monophyletic = T)
clock.rate <- beautier::create_clock_rate_param(value = "0.0035",estimate=TRUE)
clock.uniform<-beautier::create_uniform_distr(value = 0.0035,lower = 0.00277, upper = 0.00542)

inference_model <- create_inference_model(
  site_model = beautier::create_hky_site_model(),
  clock_model = beautier::create_strict_clock_model(id = NA,clock.rate,clock.uniform),
  tree_prior = create_yule_tree_prior(),
  mrca_prior = mrca.prior,
  beauti_options = beautier::create_beauti_options_v2_6()
)

create_beast2_input_file_from_model(
  input_filename = fasta_filename,
  output_filename = "beast2_test.xml",
  inference_model= inference_model
)

Expected behavior
I expected the resulting xml file to make possible to estimate the clock rate based on the uniform prior interval

Screenshots
If applicable, add screenshots to help explain your problem.

Environment:
Show the results of running the following script:

utils::sessionInfo()
R version 4.2.2 (2022-10-31)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS Ventura 13.1

Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.2-arm64/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] ape_5.6-2       babette_2.3.2   tracerer_2.2.2  mauricer_2.5.2  beastier_2.4.11
[6] beautier_2.6.5  shiny_1.7.4     lumier_1.0     

loaded via a namespace (and not attached):
 [1] sass_0.4.5                 pkgload_1.3.2              jsonlite_1.8.4            
 [4] assertive.sets_0.0-3       bslib_0.4.2                brio_1.1.3                
 [7] assertive.data_0.0-3       remotes_2.4.2              sessioninfo_1.2.2         
[10] pillar_1.8.1               lattice_0.20-45            glue_1.6.2                
[13] assertive.data.uk_0.0-2    assertive.matrices_0.0-2   digest_0.6.31             
[16] assertive.types_0.0-3      promises_1.2.0.1           testit_0.13               
[19] htmltools_0.5.4            httpuv_1.6.8               devtools_2.4.5            
[22] assertive.data.us_0.0-2    assertive.properties_0.0-5 assertive.reflection_0.0-5
[25] purrr_1.0.1                xtable_1.8-4               processx_3.8.0            
[28] later_1.3.0                assertive_0.3-6            beastierinstall_1.0       
[31] usethis_2.1.6              ellipsis_0.3.2             cachem_1.0.6              
[34] assertive.code_0.0-3       cli_3.6.0                  magrittr_2.0.3            
[37] crayon_1.5.2               assertive.strings_0.0-3    mime_0.12                 
[40] memoise_2.0.1              ps_1.7.2                   fs_1.6.1                  
[43] fansi_1.0.4                assertive.numbers_0.0-2    nlme_3.1-160              
[46] MASS_7.3-58.1              pkgbuild_1.4.0             profvis_0.3.7             
[49] tools_4.2.2                prettyunits_1.1.1          formatR_1.14              
[52] lifecycle_1.0.3            assertive.files_0.0-2      stringr_1.5.0             
[55] callr_3.7.3                ade4_1.7-22                compiler_4.2.2            
[58] jquerylib_0.1.4            rlang_1.0.6                grid_4.2.2                
[61] rstudioapi_0.14            rappdirs_0.3.3             htmlwidgets_1.6.1         
[64] assertive.models_0.0-2     assertive.base_0.0-9       miniUI_0.1.1.1            
[67] testthat_3.1.6             waldo_0.4.0                codetools_0.2-18          
[70] curl_5.0.0                 assertive.datetimes_0.0-3  R6_2.5.1                  
[73] knitr_1.42                 fastmap_1.1.0              seqinr_4.2-23             
[76] utf8_1.2.3                 rprojroot_2.0.3            desc_1.4.2                
[79] stringi_1.7.12             parallel_4.2.2             Rcpp_1.0.10               
[82] vctrs_0.5.2                xfun_0.37                  urlchecker_1.0.1          
@richelbilderbeek
Copy link
Member

Hi @pedrotaucce, thanks for this excellent bug report (and sorry for replying so late). I will take a look at it now :-)

@richelbilderbeek
Copy link
Member

@pedrotaucce will work on this tomorrow :-)

@pedrotaucce
Copy link
Author

Thanks! I really appreciate that. I have more than 40 alignments and will have to different beast runs. Your package is being a lifesaver!

@richelbilderbeek
Copy link
Member

Plan:

  • Recreate XML in BEAUti: maybe it cannot be done at all
  • If it can be done: compare XML to babette XML

@richelbilderbeek
Copy link
Member

richelbilderbeek commented Feb 15, 2023

Recreate XML:

  • Install BEAST2 by doing beastierinstall::install_beast2()
  • Loaded the alignments

Screenshot from 2023-02-15 11-57-29

  • Change the clock rate

Screenshot from 2023-02-15 12-04-45

Aha, silly me: a strict clock is always normalized to 1.0 🤦

richelbilderbeek pushed a commit to ropensci/babette that referenced this issue Feb 15, 2023
@richelbilderbeek
Copy link
Member

richelbilderbeek commented Feb 15, 2023

For completeness, this is the XML generated by BEAUti:

<?xml version="1.0" encoding="UTF-8" standalone="no"?><beast beautitemplate='Standard' beautistatus='' namespace="beast.core:beast.evolution.alignment:beast.evolution.tree.coalescent:beast.core.util:beast.evolution.nuc:beast.evolution.operators:beast.evolution.sitemodel:beast.evolution.substitutionmodel:beast.evolution.likelihood" required="" version="2.6">
    <data
id="anthus_aco_sub"
spec="Alignment"
name="alignment">
        <sequence id="seq_61430_aco" spec="Sequence" taxon="61430_aco" totalcount="4" value="ACAGGTTAGAAACTACTCTGTTTTCTGGCTCCTTGTTTAATGCCCTGTCCTATTTTATTGCGAAAATTGTCTGTTTTT"/>
        <sequence id="seq_626029_aco" spec="Sequence" taxon="626029_aco" totalcount="4" value="ACAGGTTAGAAACTACTCTGTTTTCTGGCTGCTTGTTTAATGCCCTCTCCTATTTTATTGTGACGATTGTCTGTTTTT"/>
        <sequence id="seq_630116_aco" spec="Sequence" taxon="630116_aco" totalcount="4" value="ACAGGTTAGAAACTACTCTGTTTTCTGGCTCCTTGTTTAATGCCCTGTCCTATTTTATTGCGAAAATTGTCTGTTTTT"/>
        <sequence id="seq_630210_aco" spec="Sequence" taxon="630210_aco" totalcount="4" value="ACAGGTTAGAAACTACTCTGTTTTCTGGCTCCTTGTTTAATGCCCTGTCCTATTTTATTGTGACAATTGTCTGTTTTT"/>
        <sequence id="seq_B25702_aco" spec="Sequence" taxon="B25702_aco" totalcount="4" value="ACAGGTTAGAAACTACTCTGTTTTCTGGCTCCTTGTTTAATGCCCTGTCCTATTTTATTGTGACAATTGTCTGTTTTT"/>
    </data>

    


    <map name="Uniform" >beast.math.distributions.Uniform</map>
    <map name="Exponential" >beast.math.distributions.Exponential</map>
    <map name="LogNormal" >beast.math.distributions.LogNormalDistributionModel</map>
    <map name="Normal" >beast.math.distributions.Normal</map>
    <map name="Beta" >beast.math.distributions.Beta</map>
    <map name="Gamma" >beast.math.distributions.Gamma</map>
    <map name="LaplaceDistribution" >beast.math.distributions.LaplaceDistribution</map>
    <map name="prior" >beast.math.distributions.Prior</map>
    <map name="InverseGamma" >beast.math.distributions.InverseGamma</map>
    <map name="OneOnX" >beast.math.distributions.OneOnX</map>
    <run id="mcmc" spec="MCMC" chainLength="10000000">
        <state id="state" spec="State" storeEvery="5000">
            <tree id="Tree.t:anthus_aco_sub" spec="beast.evolution.tree.Tree" name="stateNode">
                <taxonset id="TaxonSet.anthus_aco_sub" spec="TaxonSet">
                    <alignment idref="anthus_aco_sub"/>
                </taxonset>
            </tree>
            <parameter id="birthRate.t:anthus_aco_sub" spec="parameter.RealParameter" name="stateNode">1.0</parameter>
        </state>
        <init id="RandomTree.t:anthus_aco_sub" spec="beast.evolution.tree.RandomTree" estimate="false" initial="@Tree.t:anthus_aco_sub" taxa="@anthus_aco_sub">
            <populationModel id="ConstantPopulation0.t:anthus_aco_sub" spec="ConstantPopulation">
                <parameter id="randomPopSize.t:anthus_aco_sub" spec="parameter.RealParameter" name="popSize">1.0</parameter>
            </populationModel>
        </init>
        <distribution id="posterior" spec="util.CompoundDistribution">
            <distribution id="prior" spec="util.CompoundDistribution">
                <distribution id="YuleModel.t:anthus_aco_sub" spec="beast.evolution.speciation.YuleModel" birthDiffRate="@birthRate.t:anthus_aco_sub" tree="@Tree.t:anthus_aco_sub"/>
                <prior id="YuleBirthRatePrior.t:anthus_aco_sub" name="distribution" x="@birthRate.t:anthus_aco_sub">
                    <Uniform id="Uniform.1" name="distr" upper="Infinity"/>
                </prior>
            </distribution>
            <distribution id="likelihood" spec="util.CompoundDistribution" useThreads="true">
                <distribution id="treeLikelihood.anthus_aco_sub" spec="ThreadedTreeLikelihood" data="@anthus_aco_sub" tree="@Tree.t:anthus_aco_sub">
                    <siteModel id="SiteModel.s:anthus_aco_sub" spec="SiteModel">
                        <parameter id="mutationRate.s:anthus_aco_sub" spec="parameter.RealParameter" estimate="false" name="mutationRate">1.0</parameter>
                        <parameter id="gammaShape.s:anthus_aco_sub" spec="parameter.RealParameter" estimate="false" name="shape">1.0</parameter>
                        <parameter id="proportionInvariant.s:anthus_aco_sub" spec="parameter.RealParameter" estimate="false" lower="0.0" name="proportionInvariant" upper="1.0">0.0</parameter>
                        <substModel id="JC69.s:anthus_aco_sub" spec="JukesCantor"/>
                    </siteModel>
                    <branchRateModel id="StrictClock.c:anthus_aco_sub" spec="beast.evolution.branchratemodel.StrictClockModel">
                        <parameter id="clockRate.c:anthus_aco_sub" spec="parameter.RealParameter" estimate="false" lower="0.1" name="clock.rate" upper="0.3">0.2</parameter>
                    </branchRateModel>
                </distribution>
            </distribution>
        </distribution>
        <operator id="YuleBirthRateScaler.t:anthus_aco_sub" spec="ScaleOperator" parameter="@birthRate.t:anthus_aco_sub" weight="3.0"/>
        <operator id="YuleModelTreeScaler.t:anthus_aco_sub" spec="ScaleOperator" scaleFactor="0.5" tree="@Tree.t:anthus_aco_sub" weight="3.0"/>
        <operator id="YuleModelTreeRootScaler.t:anthus_aco_sub" spec="ScaleOperator" rootOnly="true" scaleFactor="0.5" tree="@Tree.t:anthus_aco_sub" weight="3.0"/>
        <operator id="YuleModelUniformOperator.t:anthus_aco_sub" spec="Uniform" tree="@Tree.t:anthus_aco_sub" weight="30.0"/>
        <operator id="YuleModelSubtreeSlide.t:anthus_aco_sub" spec="SubtreeSlide" tree="@Tree.t:anthus_aco_sub" weight="15.0"/>
        <operator id="YuleModelNarrow.t:anthus_aco_sub" spec="Exchange" tree="@Tree.t:anthus_aco_sub" weight="15.0"/>
        <operator id="YuleModelWide.t:anthus_aco_sub" spec="Exchange" isNarrow="false" tree="@Tree.t:anthus_aco_sub" weight="3.0"/>
        <operator id="YuleModelWilsonBalding.t:anthus_aco_sub" spec="WilsonBalding" tree="@Tree.t:anthus_aco_sub" weight="3.0"/>
        <logger id="tracelog" spec="Logger" fileName="anthus_aco_sub.log" logEvery="1000" model="@posterior" sanitiseHeaders="true" sort="smart">
            <log idref="posterior"/>
            <log idref="likelihood"/>
            <log idref="prior"/>
            <log idref="treeLikelihood.anthus_aco_sub"/>
            <log id="TreeHeight.t:anthus_aco_sub" spec="beast.evolution.tree.TreeHeightLogger" tree="@Tree.t:anthus_aco_sub"/>
            <log idref="YuleModel.t:anthus_aco_sub"/>
            <log idref="birthRate.t:anthus_aco_sub"/>
        </logger>
        <logger id="screenlog" spec="Logger" logEvery="1000">
            <log idref="posterior"/>
            <log idref="likelihood"/>
            <log idref="prior"/>
        </logger>
        <logger id="treelog.t:anthus_aco_sub" spec="Logger" fileName="$(tree).trees" logEvery="1000" mode="tree">
            <log id="TreeWithMetaDataLogger.t:anthus_aco_sub" spec="beast.evolution.tree.TreeWithMetaDataLogger" tree="@Tree.t:anthus_aco_sub"/>
        </logger>
        <operatorschedule id="OperatorSchedule" spec="OperatorSchedule"/>
    </run>
</beast>

Take special note at ...

                    <branchRateModel id="StrictClock.c:anthus_aco_sub" spec="beast.evolution.branchratemodel.StrictClockModel">
                        <parameter id="clockRate.c:anthus_aco_sub" spec="parameter.RealParameter" estimate="false" lower="0.1" name="clock.rate" upper="0.3">0.2</parameter>
                    </branchRateModel>

... especially the estimate="false".

This proves -in a way- that time is normalized to 1.0

@richelbilderbeek
Copy link
Member

Here I write down the solution to this Issue:

Dear user,

Thanks for this excellent bug report. It is clear the problem is (1) you want to estimate a clock rate, (2) the output of BEAST2 does not show an estimated clock rate.

This is not a bug in nor babette/beautier, nor BEAST2: it is a feature: BEAST2 normalizes time from time 0 (at the root) to time 1 (the present). This causes -I forgot the details!- that estimating a clock rate is nonsense; it simple is. To find out a more detailed answer, read the BEAST2 book or ask the BEAST2 people; you can use the pictures and XML file above.

I predict you want to, however, get an actual clock rate. The way to do this is simple: if you know the phylogeny is -say- 15 million, then one can multiply/divide things by 15.

Another way is to take a known timespan within the phylogeny and rescale based on that:

       +--- A
   +---+
---+   +--- B
   +------- C

Total phylogeny time: 1.0 (as normalized by BEAST2)
Known time between A and B: 10 million years
Concluded time: Total phylogeny is 20 million years

I hope this answer helps you and future users!

I'll close this Issue if you agree or after a month :-)

@richelbilderbeek
Copy link
Member

richelbilderbeek commented Mar 6, 2023

Some emails later we've zoomed in on the Issue, with again an awesome bug by @pedrotaucce:

[...] I wrote to you at first because I believe beautier and BEAUTi are resulting in different .xml files, if I am not mistaken.

Earlier I misunderstood this. Thanks Pedro for correcting me

I have built 4 xml files using BEAUTi 2.6.7 and beautier (in R 4.2.2) with different priors:

  1. No mrca prior and a single value at clock rate (0.00277)
  2. No mrca prior, estimating clock rate from a uniform prior (starting value: 0.0035, lower: 0.0027, upper: 0.00542)
  3. Mrca prior with monophyletic constrain and a single value at clock rate (0.00277)
  4. Mrca prior with monophyletic constrain and a uniform prior (starting value: 0.0035, lower: 0.0027, upper: 0.00542)

BEAUTi and beautier yielded the same xml file in number 1, but differed in the other three scenarios.

In 2, beautier xml file estimated the clock rate with a starting value of 0.0035, but without the specified uniform prior.

In 3 and 4, the clock rate was 1.0 and estimate = false in beautier xml file.

Here the code and BEAUTi printscreens for each scenario:

  1. beautier:
#without mrca prior, single value at clock rate
fasta_filename <- get_babette_path("anthus_aco_sub.fas")
clock.rate <- beautier::create_clock_rate_param(value = 0.00277,estimate=FALSE)

inference_model <- create_inference_model(
  site_model = beautier::create_hky_site_model(),
  clock_model = beautier::create_strict_clock_model(id = NA,clock.rate),
  tree_prior = create_yule_tree_prior(),
  beauti_options = beautier::create_beauti_options_v2_6()
)

create_beast2_input_file_from_model(
  input_filename = fasta_filename,
  output_filename = "no_mrca_no_estimate_beautier.xml",
  inference_model= inference_model
)

BEAUTi:

image

  1. beautier:
#134 without mrca prior, estimating clock rate from a uniform prior
fasta_filename <- get_babette_path("anthus_aco_sub.fas")
clock.rate <- beautier::create_clock_rate_param(value = "0.0035",estimate=TRUE)
clock.uniform<-beautier::create_uniform_distr(value = 0.0035,lower = 0.00277, upper = 0.00542)

inference_model <- create_inference_model(
  site_model = beautier::create_hky_site_model(),
  clock_model = beautier::create_strict_clock_model(id = NA,clock.rate, clock.uniform),
  tree_prior = create_yule_tree_prior(),
  beauti_options = beautier::create_beauti_options_v2_6()
)

create_beast2_input_file_from_model(
  input_filename = fasta_filename,
  output_filename = "no_mrca_estimate_beautier.xml",
  inference_model= inference_model
)

BEAUTi:

image

  1. beautier:
#With mrca prior, single value at clock rate
fasta_filename <- get_babette_path("anthus_aco_sub.fas")
mrca.taxa <- get_taxa_names(fasta_filename)
mrca.taxa <- mrca.taxa[2:length(mrca.taxa)]
mrca.prior <- create_mrca_prior(taxa_names=mrca.taxa,is_monophyletic = T)
clock.rate <- beautier::create_clock_rate_param(value = 0.00277,estimate=FALSE)

inference_model <- create_inference_model(
  site_model = beautier::create_hky_site_model(),
  clock_model = beautier::create_strict_clock_model(id = NA,clock.rate),
  tree_prior = create_yule_tree_prior(),
  mrca_prior = mrca.prior,
  beauti_options = beautier::create_beauti_options_v2_6()
)

create_beast2_input_file_from_model(
  input_filename = fasta_filename,
  output_filename = "mrca_no_estimate_beautier.xml",
  inference_model= inference_model
)

BEAUTi: same as 1, but with MRCA prior

image

image

  1. beautier:
#With mrca prior, estimating clock rate from a uniform prior

fasta_filename <- get_babette_path("anthus_aco_sub.fas")
mrca.taxa <- get_taxa_names(fasta_filename)
mrca.taxa <- mrca.taxa[2:length(mrca.taxa)]
mrca.prior <- create_mrca_prior(taxa_names=mrca.taxa,is_monophyletic = T)
clock.rate <- beautier::create_clock_rate_param(value = "0.0035",estimate=TRUE)
clock.uniform<-beautier::create_uniform_distr(value = 0.0035,lower = 0.00277, upper = 0.00542)

inference_model <- create_inference_model(
  site_model = beautier::create_hky_site_model(),
  clock_model = beautier::create_strict_clock_model(id = NA,clock.rate,clock.uniform),
  tree_prior = create_yule_tree_prior(),
  mrca_prior = mrca.prior,
  beauti_options = beautier::create_beauti_options_v2_6()
)

create_beast2_input_file_from_model(
  input_filename = fasta_filename,
  output_filename = "mrca_estimate_beautier.xml",
  inference_model= inference_model
)

BEAUTi: the same as number 2, but with MRCA prior just like number 3.

I ran all the xml files, and numbers 2 and 4 did estimate the clock rate according to the prior:

image

issue_135.zip
image

Also, on page 107 of the BEAST2 book, the authors talk about clock rate priors and apparently there is nothing wrong about it and BEAST2 can deal with that, although I am not using the best prior and a lognormal distribution would be probably better.

I am attaching all the xml files and their names are what I expected them to be. I hope I am not being too annoying and I am sorry if I am wrong or making mistakes regarding BEAUTi and/or beautier. [...]

Nope, please keep sending these awesome bug reports!

@richelbilderbeek
Copy link
Member

richelbilderbeek commented Mar 6, 2023

I'll first concentrate on 2.

The error is quite clear: this is what beautier gives:

<branchRateModel id="StrictClock.c:anthus_aco_sub" spec="beast.evolution.branchratemodel.StrictClockModel">
    <parameter id="clockRate.c:anthus_aco_sub" spec="parameter.RealParameter" estimate="true" name="clock.rate">0.0035</parameter>
</branchRateModel>

This is what BEAUti gives:

<branchRateModel id="StrictClock.c:anthus_aco_sub" spec="beast.evolution.branchratemodel.StrictClockModel" clock.rate="@clockRate.c:anthus_aco_sub"/>

Note the lack of estimate="true" in the latter, which is the BEAUti (hence: the truth) text :-/ ?

I will reproduce the BEAUti file with the beautier-friendly BEAUti, i.e. the one installed by beastierinstall::install_beast2.

Problem is that setting 1 already fails, so fix that one first :-)

@richelbilderbeek
Copy link
Member

Here I see a first problem:

Screenshot from 2023-03-06 13-03-49

richelbilderbeek pushed a commit that referenced this issue Mar 6, 2023
richelbilderbeek pushed a commit that referenced this issue Mar 6, 2023
richelbilderbeek pushed a commit that referenced this issue Mar 9, 2023
richelbilderbeek pushed a commit that referenced this issue Mar 9, 2023
richelbilderbeek pushed a commit that referenced this issue Mar 9, 2023
richelbilderbeek pushed a commit that referenced this issue Mar 9, 2023
@richelbilderbeek richelbilderbeek changed the title Resulting xml cannot estimate clock rate Resulting xml cannot estimate clock rate in a uniform distribution prior Apr 5, 2023
richelbilderbeek pushed a commit that referenced this issue Apr 5, 2023
richelbilderbeek pushed a commit that referenced this issue Apr 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants