# Parallel Simulation Demo
This demonstration sets up two parallel simulations in `shadie`, then merges them into a single population and recapitates, simulating two populations with shared ancestry that split at the beginning of the simulation time. 
This is a bare bones demo for quick reference - please consult other tutorials for more details.

In [1]:
import numpy as np
import shadie
import toytree

print("shadie", shadie.__version__)
print("toytree", toytree.__version__)

shadie 0.3.0
toytree 3.0.dev10


## Set Up the Model
In this example we will be using the dioicous bryophyte model and simulating a population split from an ancestral population. In this example both populations have exactly the same parameters (default setting for the model with sporophyte population size 500 and gametophyte population size 1000; same mutation and recombination rates), but you can imagine extending this kind of approach to compare populations with different parameter settings. 

### Run the neutral burn-In
It turns out to be much simpler downstream, and perhaps avoids other bad assumptions, to load a burn-in file as the starting point for parallel simulations. Having SLiM run burn-in simulation using your intended model also ensures that the ancestry and mating dynamics are correct for your simulation. A burn-in should run for 2N-10N generations (where N=diploid population size) and should be compeltely neutral. 

The purpose of the burn-in is to create neutral variation. In our case, because we are using recapitation and mutation with msprime, we are actually just laying down the skeleton of ancestry for the burn-in period without actually simulating any mutations. 

#### Create the burn-in chromosome (fully neutral)

In [5]:
burnin_chrom = shadie.chromosome.explicit({
    (0, 5_000_0000): shadie.NONCDS})
burnin_chrom.draw();

In [6]:
with shadie.Model() as burnin_model:
    burnin_model.initialize(
        chromosome=burnin_chrom, 
        sim_time=2000,
        mutation_rate=1e-8, 
        recomb_rate=1e-8,
        file_out="burnin.trees",
    )
    burnin_model.reproduction.bryophyte_dioicous(
        spo_pop_size=500,
        gam_pop_size=1000)

In [7]:
burnin_model.run()

🌿 ERROR | [31m[1mmodel.py[0m | [30m// Initial random seed:
0

// RunInitializeCallbacks():
initializeSLiMModelType(modelType = 'nonWF');
initializeRecombinationRate(1e-08, 50000000);
initializeMutationRate(1e-08);
initializeTreeSeq();
ERROR (EidosInterpreter::_AssignRValueToLValue): operand type NULL is not supported by the '.' operator.

Error on script line 14, character 28:

  c().haploidDominanceCoeff = 1.0;
                            ^
[0m


SyntaxError: SLiM4 error, see script at /tmp/slim.slim (<string>)

## Setup the models

### Create a simple chromosome

In [2]:
chrom = shadie.chromosome.explicit({
    (0, 1_200_000): shadie.NONCDS,
    (1_200_001, 1_400_000): shadie.EXON,
    (1_400_001, 1_600_000): shadie.INTRON,
    (1_600_001, 1_800_000): shadie.EXON,
    (1_800_001, 5_000_000): shadie.NONCDS,
})
chrom.draw();

In [3]:
with shadie.Model() as model_1:
    model_1.initialize(
        chromosome=chrom, 
        sim_time=1000,
        mutation_rate=1e-8, 
        recomb_rate=1e-8,
        file_out="test-1.trees",
    )
    model_1.reproduction.bryophyte_dioicous(
        spo_pop_size=500,
        gam_pop_size=1000)

In [4]:
with shadie.Model() as model_2:
    model_2.initialize(
        chromosome=chrom, 
        sim_time=1000,
        mutation_rate=1e-8, 
        recomb_rate=1e-8,
        file_out="test-2.trees",
    )
    model_2.reproduction.bryophyte_dioicous(
        spo_pop_size=500, 
        gam_pop_size=1000)

In [5]:
#uncomment to see the model script

#print(model_1.script)

## Set up serial runs
Each simulation is run with a new random seed. When using simulations for data analysis, we suggest saving a list of random seeds and calling values from that list so that simulations can be re-run. 

In [6]:
#save models and seeds to lists
models = [model_1, model_2]
seeds = []

for i in range(0, len(models)):
    seeds.append(np.random.randint(2**31))

In [7]:
#run simulations serially
for idx, sim in enumerate(models):
    sim.run(seed = seeds[idx])

### Run two populations in parallel

You can also use `ProcessPoolExecutor` to run the simulations in parallel. Each simulation model is run from a different random seed, and writes to a different .trees file path. I also provide the path to my updated `slim` binary as an argument.

In [8]:
from concurrent.futures import ProcessPoolExecutor

In [9]:
with ProcessPoolExecutor(2) as pool:
    for model in [model_1, model_2]:
        kwargs = {"seed": np.random.randint(2**31)}
        pool.submit(model.run, **kwargs)

### Get post-processor

In [4]:
import tskit
ts = tskit.load("test-1.trees")
ts.individuals()

<tskit.trees.SimpleContainerSequence at 0x17d2b2420>

In [14]:
import pyslim
tables = ts.dump_tables()
tables.populations[1].metadata['name']='p2'

In [15]:
tables.populations[1].metadata['name']

'p1'

In [22]:
list(range(1,max(list(tables.edges.parent))))

[1,
 2,
 3,
 4,
 5,
 6,
 7,
 8,
 9,
 10,
 11,
 12,
 13,
 14,
 15,
 16,
 17,
 18,
 19,
 20,
 21,
 22,
 23,
 24,
 25,
 26,
 27,
 28,
 29,
 30,
 31,
 32,
 33,
 34,
 35,
 36,
 37,
 38,
 39,
 40,
 41,
 42,
 43,
 44,
 45,
 46,
 47,
 48,
 49,
 50,
 51,
 52,
 53,
 54,
 55,
 56,
 57,
 58,
 59,
 60,
 61,
 62,
 63,
 64,
 65,
 66,
 67,
 68,
 69,
 70,
 71,
 72,
 73,
 74,
 75,
 76,
 77,
 78,
 79,
 80,
 81,
 82,
 83,
 84,
 85,
 86,
 87,
 88,
 89,
 90,
 91,
 92,
 93,
 94,
 95,
 96,
 97,
 98,
 99,
 100,
 101,
 102,
 103,
 104,
 105,
 106,
 107,
 108,
 109,
 110,
 111,
 112,
 113,
 114,
 115,
 116,
 117,
 118,
 119,
 120,
 121,
 122,
 123,
 124,
 125,
 126,
 127,
 128,
 129,
 130,
 131,
 132,
 133,
 134,
 135,
 136,
 137,
 138,
 139,
 140,
 141,
 142,
 143,
 144,
 145,
 146,
 147,
 148,
 149,
 150,
 151,
 152,
 153,
 154,
 155,
 156,
 157,
 158,
 159,
 160,
 161,
 162,
 163,
 164,
 165,
 166,
 167,
 168,
 169,
 170,
 171,
 172,
 173,
 174,
 175,
 176,
 177,
 178,
 179,
 180,
 181,
 182,
 183,
 184,
 185

In [3]:
post = shadie.postsim.TwoSims(
    tree_files=["test-1.trees", "test-2.trees"],
    mut=1e-8,
    recomb=1e-8,
    popsize=500,
    chromosome=chrom, #necessary for some drawing functions, but not for analysis tools
    gens_per_lifecycle=2,
)

ValueError: Duplicate population name: 'p0'

In [None]:
post.tree_sequence.population(2)

### Plot simulation summary

In [None]:
post.draw_tree_sequence(sample=6, seed=333);

### Plot individual trees

In [None]:
post.draw_tree(idx=0, sample=[10, 10], seed=123);

### Calculate statistics

In [None]:
post.stats(sample=10, reps=20)

## Access Metadata
You can access the parameters settings from SLiM in the tree sequence metadata

In [None]:
post.tree_sequence.metadata["SLiM"]["user_metadata"]