You can install the dependencies for this session (include Jupyter) by using mamba/conda and the provided `env.yaml` file:
```{bash}
mamba env create -f env.yaml
```

In `WrightFisherTutorial.ipynb`, we simulated the frequency of an allele forward in time. In this tutorial, we will instead simulate the genealogy of several individuals backward in time. This process is known as the coalescent and is widely used in population genetics. One of main advantages of coalescent simulations is that they are computationally efficient as we only keep track of a few individuals when going back in time. However, one drawback is that is it very difficult to incorporate selection. We will be using `msprime`, a popular and widely used coalescent simulator, to simulate genealogies.


Let's begin by drawing some trees for a sample of 5 individuals using an effective population of size 10,000. Note that each leave and node of the tree is labeled by an integer. For more information on the parameters of `sim_ancestry`, see the [documentation](https://tskit.dev/msprime/docs/stable/api.html#msprime.sim_ancestry) of `msprime.sim_ancestry`.

In [47]:
import msprime as mp

# simulate 3 replicates of tree sequence of length 1
ts = mp.sim_ancestry(samples=5, population_size=10000, num_replicates=3, ploidy=1)
for tree in ts:
    
    # obtain first (and only) tree of the tree sequence
    tree = tree.first()
    
    # draw tree
    print(tree.draw_text())

   8     
┏━━┻━━┓  
┃     7  
┃   ┏━┻━┓
┃   6   ┃
┃  ┏┻━┓ ┃
┃  5  ┃ ┃
┃ ┏┻┓ ┃ ┃
0 1 2 4 3

    8    
  ┏━┻━━┓ 
  7    ┃ 
 ┏┻━┓  ┃ 
 ┃  ┃  6 
 ┃  ┃ ┏┻┓
 5  ┃ ┃ ┃
┏┻┓ ┃ ┃ ┃
0 3 4 1 2

      8  
    ┏━┻━┓
    7   ┃
  ┏━┻━┓ ┃
  6   ┃ ┃
 ┏┻━┓ ┃ ┃
 5  ┃ ┃ ┃
┏┻┓ ┃ ┃ ┃
0 2 3 4 1



Can you determine the average time to the first coalescent event for such trees? You have to generate more replicates to obtain good estimates. You can use `tree.time(5)` to do this (see [documentation](https://tskit.dev/tskit/docs/stable/python-api.html#tskit.Tree.time) for more information). Does the average time agree with the theoretical expectation?

In [25]:
# YOUR CODE HERE

(5429.117853878306, 15340734.75991619)

Bonus: plot the distribution of the time to the first coalescent event. How does it look like?

In [None]:
# YOUR CODE HERE

Now determine the mean and variance of the tree height for such trees. Note that you can use `tree.time(tree.root)` to obtain the height of the tree. Does the result agree with the theoretical expectations?

In [24]:
# YOUR CODE HERE

32767.86277914525

We will now add some demography to our simulations, namely two subpopulations with migration between them. We start with two lineages in each subpopulation. Coalescent events can only take place within subpopulations.

In [46]:
demography = mp.Demography()
demography.add_population(name="pop_0", initial_size=10000)
demography.add_population(name="pop_1", initial_size=10000)
demography.set_symmetric_migration_rate(populations=["pop_0", "pop_1"], rate=1e-6)

ts = mp.sim_ancestry(
    samples={'pop_0': 2, 'pop_1': 2},
    num_replicates=3,
    ploidy=1,
    demography=demography
)

Here the migration rate is set to be much smaller than the coalescence rate. How do you expect the genealogies to look like? Draw some trees to verify your expectations.

In [None]:
# YOUR CODE HERE

Bonus: Plot the tree height distribution for a single population going through a bottleneck. How does it look like?

In [None]:
# YOUR CODE HERE