# Phylogenetic Inference using PNUTS

This notebook gives an example on how to do phylogenetic inference using the PNUTS algorithm described in Wahle (2021) ([bioRxiv Paper](https://doi.org/10.1101/2021.03.16.435623)).

First the `MCPhylo` package and the `Random` package are loaded.

In [10]:
using MCPhylo;
using Random;
Random.seed!(1234);

The next step is to load the data.

In [11]:
tree, data = make_tree_with_data("Example.nex");

└ @ MCPhylo C:\Programming\MCPhylo.jl\src\Parser\ParseNexus.jl:90
└ @ MCPhylo C:\Programming\MCPhylo.jl\src\Parser\ParseNexus.jl:90


The `tree` object contains a r random tree over the leaves specified in the nexus file. You can view a newick string representing the tree by calling the `newick` function on the object.

In [12]:
newick(tree)

"((Lang3:0.9646697763820897,((Lang5:0.2986142783434118,Lang4:0.24683718661000897)6:0.646690981531646,Lang1:0.11248587118714015)7:0.9457754052519123)8:0.8211604203482923,Lang2:0.03416010848943718)9:1.0;"

The input data needs to be stored in a dictionary to make it accessible to the sampler.

In [13]:
data_dictionary = Dict{Symbol, Any}(
  :data => data
);

Define a model by specifing a prior distribution on the equilibrium frequencis, a Dirichlet prior in this case, and a prior on the phylogenetic tree. In this example the Compound Dirichlet distribution (Zhang, Rannala and Yang 2012. ([paper](https://doi.org/10.1093/sysbio/sys030))) is chosen.

The distribution associated with the data is the `PhyloDist`. It is a distribution whose likelihood function is calculated according to Felsensteins Pruning algorithm ([paper](https://doi.org/10.1007/BF01734359)).

The *Restriction Site Model* for character evolution and no rate variation accross sites is chosen.

In [15]:
model =  Model(
    data = Stochastic(3, (tree, eq_freq) ->  PhyloDist(tree, eq_freq, [1.0], [1.0], Restriction), false),
    eq_freq = Stochastic(1, () -> Dirichlet(2,1),true),
    tree = Stochastic(Node(), () -> TreeDistribution(CompoundDirichlet(1.0,1.0,0.100,1.0)), true)
     )

Object of type "Model"
-------------------------------------------------------------------------------
eq_freq:
Object of type "Stochastic{Vector{Float64}}"
Float64[]
-------------------------------------------------------------------------------
tree:
Object of type "Stochastic{GeneralNode{Float64, Int64}}"
Tree with root:
"no_name"
Length:
0.0
Height:
0.0
-------------------------------------------------------------------------------
data:
Object of type "Stochastic{Array{Float64, 3}}"
Array{Float64, 3}(undef, 0, 0, 0)




Select the PNUTS sampler for the phylogenetic tree and the Slice sampler for the equilibrium frequency.

In [17]:
scheme = [PNUTS(:tree, target=0.7, targetNNI=0.5),
          SliceSimplex(:eq_freq),
          ]
setsamplers!(model, scheme);

Set initial values.

In [18]:
inits = [ Dict{Symbol, Union{Any, Real}}(
    :tree => tree,
    :eq_freq=> rand(Dirichlet(2,1)),
    :data => data_dictionary[:data]
    ),
    ];

Run the MCMC. The statusbar is suppressed via the `verbose` argument, to avoid a cluttering of the output.

In [19]:
sim = mcmc(model, data_dictionary, inits, 5000, burnin=2500,thin=5, chains=1, trees=true, verbose=false)

MCMC Simulation of 5000 Iterations x 1 Chain...




[32mChain 1:   0%|█                           |  ETA: 1 days, 5:56:37 (21.57  s/it)[39m[K

[32mChain 1:   2%|█                           |  ETA: 0:36:02 ( 0.44  s/it)[39m[K

[32mChain 1:   3%|█                           |  ETA: 0:24:32 ( 0.30  s/it)[39m[K

[32mChain 1:   3%|█                           |  ETA: 0:22:51 ( 0.28  s/it)[39m[K

[32mChain 1:   4%|█                           |  ETA: 0:20:40 ( 0.26  s/it)[39m[K

[32mChain 1:   4%|██                          |  ETA: 0:19:22 ( 0.24  s/it)[39m[K

[32mChain 1:   4%|██                          |  ETA: 0:16:48 ( 0.21  s/it)[39m[K

[32mChain 1:   5%|██                          |  ETA: 0:15:53 ( 0.20  s/it)[39m[K

[32mChain 1:   5%|██                          |  ETA: 0:14:32 ( 0.18  s/it)[39m[K

[32mChain 1:   5%|██                          |  ETA: 0:13:37 ( 0.17  s/it)[39m[K

[32mChain 1:   6%|██                          |  ETA: 0:12:57 ( 0.16  s/it)[39m[K

[32mChain 1:   6%|██                          |  ETA: 0:12:24 ( 0.16  s/it)[39m[K

[32mChain 1:   6%|██                          |  ETA: 0:12:08 ( 0.16  s/it)[39m[K

[32mChain 1:   6%|██                          |  ETA: 0:11:43 ( 0.15  s/it)[39m[K

[32mChain 1:   7%|██                          |  ETA: 0:11:15 ( 0.14  s/it)[39m[K

[32mChain 1:   7%|██                          |  ETA: 0:10:46 ( 0.14  s/it)[39m[K

[32mChain 1:   7%|███                         |  ETA: 0:10:13 ( 0.13  s/it)[39m[K

[32mChain 1:   8%|███                         |  ETA: 0:09:48 ( 0.13  s/it)[39m[K

[32mChain 1:   8%|███                         |  ETA: 0:09:28 ( 0.12  s/it)[39m[K

[32mChain 1:   8%|███                         |  ETA: 0:08:49 ( 0.12  s/it)[39m[K

[32mChain 1:   9%|███                         |  ETA: 0:08:32 ( 0.11  s/it)[39m[K

[32mChain 1:   9%|███                         |  ETA: 0:08:13 ( 0.11  s/it)[39m[K

[32mChain 1:   9%|███                         |  ETA: 0:07:56 ( 0.11  s/it)[39m[K

[32mChain 1:  10%|███                         |  ETA: 0:07:40 ( 0.10  s/it)[39m[K

[32mChain 1:  10%|███                         |  ETA: 0:07:27 (99.18 ms/it)[39m[K

[32mChain 1:  10%|███                         |  ETA: 0:07:22 (98.26 ms/it)[39m[K

[32mChain 1:  10%|███                         |  ETA: 0:07:15 (96.95 ms/it)[39m[K

[32mChain 1:  10%|███                         |  ETA: 0:07:13 (96.54 ms/it)[39m[K

[32mChain 1:  11%|███                         |  ETA: 0:07:05 (94.98 ms/it)[39m[K

[32mChain 1:  11%|████                        |  ETA: 0:06:59 (93.75 ms/it)[39m[K

[32mChain 1:  11%|████                        |  ETA: 0:06:49 (91.93 ms/it)[39m[K

[32mChain 1:  11%|████                        |  ETA: 0:06:42 (90.48 ms/it)[39m[K

[32mChain 1:  11%|████                        |  ETA: 0:06:35 (89.10 ms/it)[39m[K

[32mChain 1:  12%|████                        |  ETA: 0:06:25 (87.21 ms/it)[39m[K

[32mChain 1:  12%|████                        |  ETA: 0:06:17 (85.58 ms/it)[39m[K

[32mChain 1:  12%|████                        |  ETA: 0:06:10 (84.28 ms/it)[39m[K

[32mChain 1:  12%|████                        |  ETA: 0:06:02 (82.69 ms/it)[39m[K

[32mChain 1:  13%|████                        |  ETA: 0:05:55 (81.32 ms/it)[39m[K

[32mChain 1:  14%|████                        |  ETA: 0:05:20 (74.50 ms/it)[39m[K

[32mChain 1:  14%|████                        |  ETA: 0:05:16 (73.73 ms/it)[39m[K

[32mChain 1:  14%|█████                       |  ETA: 0:05:13 (73.03 ms/it)[39m[K

[32mChain 1:  15%|█████                       |  ETA: 0:05:10 (72.52 ms/it)[39m[K

[32mChain 1:  15%|█████                       |  ETA: 0:05:01 (70.82 ms/it)[39m[K

[32mChain 1:  15%|█████                       |  ETA: 0:04:58 (70.22 ms/it)[39m[K

[32mChain 1:  15%|█████                       |  ETA: 0:04:55 (69.58 ms/it)[39m[K

[32mChain 1:  16%|█████                       |  ETA: 0:04:49 (68.50 ms/it)[39m[K

[32mChain 1:  16%|█████                       |  ETA: 0:04:44 (67.55 ms/it)[39m[K