# Coalescent teaching demo

**The coalescent is a statistical model** for describing the evolutionary history of a population based on its observed genetic variation. In contrast to *phylogenetic models*, which aim to infer a single tree-like history to describe the variation among a set of samples, the coalescent instead embraces the fact that many possible tree-like histories could describe their evolutionary relationships. Indeed, a population of N samples could have an infinite number of different genealogical histories across different parts of their genomes, reflecting differences in topoloy and edge lengths.

**A single coalescent history represents ancestor-descendant relationships** for a single gene copy...

...by treating the genealogical history of those samples as a latent (unobserved) random variable *within populations* that are randomly mating. We are often further interested in **structured coalescent** models, where coalescent waiting times are drawn randomly *within* populations, but if two samples do not coalesce within a given time they can enter a new epoch where the population merges with another to represent an ancestral population or species. This is the framework of the **multispecies coalescent**, which represents a hierarchical model in which genealogies are contained. 

Here we will use the Python libraries *ipcoal* and *toytree* to simulate and draw species tree and gene tree histories to better comprehend the relationship between parameters of the coalescent and multispecies coalescent influence the distribution of genealogies. 

In [1]:
import ipcoal
import toytree

### Species tree or network
A species tree represents the history of lineage splitting. It is a *container* tree depicting the times during which individuals within each branch of the tree could only reproduce with each other. 

In the plot below you can see that the species tree has 8 tips (labeled r0-r7) and it has 2\*ntips-1 nodes (labeled 0-14). We will mostly pay attention to the node labels. The tree was generated using the `.rtree` function of toytree to return a random topology with 8 tips. 

In [2]:
# generate a random species tree topology
tree = toytree.rtree.unittree(ntips=8, treeheight=1e6)

In [6]:
# draw the species tree
canvas, axes, mark = tree.draw(ts='p', admixture_edges=[(3, 5)]);

# add a title
canvas.text(
    x=canvas.width / 2., 
    y=20,
    text="Species tree", 
    style={"font-size": "14px"},
);

### Network plotting

In [7]:
nwk = """
(spinosus,palmeri,(((hypochondriacus,((cruentus,hybridus):2.0262976185585826)
#H13:0.0::0.9349776397963552):10.0,((powellii,retroflexus):0.9769002951270008,
#H13:3.7060099990658673::0.06502236020364488):1.2236074021546304):3.7961254743762742,
(((viridis,blitum):3.140217349597509,(albus,blitoides):3.023595873591815):2.499013271100076,
tuberculatus):3.3712751161824563):4.789682868539485);
"""

In [8]:
nwk = """
(spinosus,palmeri,(((hypochondriacus,(cruentus,hybridus):2.0262976185585826):10,
(powellii,retroflexus):4.7):3.7961254743762742,
(((viridis,blitum):3.140217349597509,(albus,blitoides):3.023595873591815):2.499013271100076,
tuberculatus):3.3712751161824563):4.789682868539485);
""".replace("\n", "")

In [9]:
tre = toytree.tree(nwk)
outgroup = tre.get_tip_labels(idx=18)
rtre = tre.root(outgroup)

In [10]:
rtre.draw(
    width=500,
    height=500,
    node_labels=rtre.get_node_values("idx", 1, 1), 
    #use_edge_lengths=False,
    tip_labels_align=True,
    #admixture_edges=[(13, 14), (18, 11)],
);

In [None]:
tre.get

In [62]:
#toytree.tree(nwk)

In [63]:
nwk = """
(spinosus,palmeri,(((hypochondriacus,((cruentus,hybridus):2.0262976185585826),
((powellii,retroflexus):0.9769002951270008):1.2236074021546304):3.7961254743762742,
(((viridis,blitum):3.140217349597509,(albus,blitoides):3.023595873591815):2.499013271100076,
tuberculatus):3.3712751161824563):4.789682868539485);
""".replace("\n", "")

In [64]:
toytree.tree(nwk).draw();

NewickError: Parentheses do not match. Broken tree data.

In [34]:
tre = toytree.rtree.coaltree(20, seed=123)
tre.draw(node_labels='idx', admixture_edges=[(29, 33)])


(<toyplot.canvas.Canvas at 0x7fcd7e6366a0>,
 <toyplot.coordinates.Cartesian at 0x7fcd7e70a1d0>)

In [4]:
frame = canvas.frame(0, 1, 2, 2)

In [5]:
frame.set_datum_style()

TypeError: set_datum_style() missing 4 required positional arguments: 'mark', 'series', 'datum', and 'style'

### Setting variable Ne on a tree

In [7]:
# set effective population sizes (Ne) 
ntree = tree.set_node_values(
    attr="Ne",
    default=5e5,
    values={
        8: 2e5,
        10: 5e5,
        12: 15e5,
        11: 1e5, 
        13: 1e6,
        14: 1e6,
    },
)

# draw the tree showing Ne as edge width
ntree.draw(
    ts='p', 
    admixture_edges=[(2, 1, None, 3, None, 0.1)],
);

In [87]:
# simulate data with demographic parameters
model = ipcoal.Model(
    tree=tree,
    admixture_edges=[(3, 4, 0.5, 0.5)],
    Ne=1e5, 
    samples=3,
    mut=1e-8, 
    recomb=1e-9,
    seed=999,
)

# simulate n loci of a given length
model.sim_loci(nloci=10, nsites=500);

In [88]:
model.df.head()

Unnamed: 0,locus,start,end,nbps,nsnps,genealogy
0,0,0,338,338,32,"((((r6-0:76848.4,(r6-1:2..."
1,0,338,500,162,14,"((((r6-0:76848.4,(r6-1:2..."
2,1,0,27,27,4,"((r1-1:29857.3,(r1-0:932..."
3,1,27,30,3,0,"((r1-1:29857.3,(r1-0:932..."
4,1,30,178,148,11,"((r1-1:29857.3,(r1-0:932..."


In [89]:
mtre = toytree.mtree(model.df.genealogy)

In [90]:
import toyplot
canvas = toyplot.Canvas(width=800, height=600);
ax0 = canvas.cartesian(bounds=(250, 550, 100, 300))
ax1 = canvas.cartesian(bounds=(50, 750, 350, 550))

ntree.draw(ts='p', axes=ax0, admixture_edges=[(3, 4, None, 5, None, 0.1)]);
mtre.draw_tree_grid(ts='c', axes=ax1, shared_axis=True);

In [22]:
toytree.mtree(model.df.genealogy).draw_tree_grid(shared_axis=True, ts='c', tip_labels=True);

In [10]:
mm = [tree.write()] + model.df.genealogy.tolist()
toytree.mtree(mm).draw_tree_grid(shared_axis=True, **tstyle);

NameError: name 'tstyle' is not defined

### Sample genealogies from this species tree

In [12]:
model = ipcoal.Model(ntree)
model.sim_trees(1000)
#model.seqs

AttributeError: 'Model' object has no attribute 'sim_trees'

### Sample SNPs on genealogies
...

In [44]:
model = ipcoal.Model(ntree)
model.sim_snps(10)
model.df

Unnamed: 0,locus,start,end,nbps,nsnps,genealogy
0,0,0,1,1,1,"(r1:4.95544e+06,((r3:645..."
1,1,0,1,1,1,"((r7:525986,r2:525986):6..."
2,2,0,1,1,1,"(((r7:501828,r2:501828):..."
3,3,0,1,1,1,"(r1:2.79006e+06,(r3:1.57..."
4,4,0,1,1,1,"((r3:1.69749e+06,(r5:655..."
5,5,0,1,1,1,"((r1:875593,(r5:259116,r..."
6,6,0,1,1,1,"(r1:2.34957e+06,((r3:763..."
7,7,0,1,1,1,"((r4:809011,(r6:755066,(..."
8,8,0,1,1,1,"((r5:690630,(r0:527017,r..."
9,9,0,1,1,1,"((r3:695109,(r5:506310,r..."
