## Setting the simulation

All the data needed to run this notebook is here [test-data](test-data/)

In [None]:
import pandas as pd
import time
import numpy as np
import fwdpy11
import msprime
fwdpy11.__version__

## Loading the data

I have generated 200 random samples of 1Mb length from the genome. Here, I take one of those samples, as an example, to set the simulation.
**NOTE:**

- I subtract the start position of the region from the intervals and the recombination map. So, the start position is zero. (Not sure If I need to do this).

In [None]:
# The 1Mb sampled region

chromosome, start, end = np.loadtxt('test-data/region_region_4.bed', dtype=np.int64)
print(f'position chr{chromosome}, {start}, {end}')

In [None]:
# exonic intervals in the sampled region

exons = pd.read_csv('test-data/region_exons_4.bed', sep='\t', names=['chro', 'start', 'end'])
exons.head()

In [None]:
# non coding intervals (intronic and exonic) in the sampled region
nonexonic = pd.read_csv('test-data/region_intronANDinterg_4.bed', sep='\t', names=['chro', 'start', 'end'])
nonexonic.head()

In [None]:
# mutation rates within the region
ml_coding = pd.read_csv('test-data/region_mlcoding_4.csv')
ml_non_coding = np.loadtxt('test-data/region_mlnoncoding_4.txt', dtype=np.object0)[1]
ml_non_coding = float(ml_non_coding)

## Create a dict mapping class name to value

ml = {}
for x, y in ml_coding.iterrows():
    ml[y.Q] = y.mL

ml['noncoding'] = ml_non_coding
print(ml)

In [None]:
# substract start position so the intial positon is zero
exons['start'] = exons['start'] - start
exons['end'] = exons['end'] - start

nonexonic['start'] = nonexonic['start'] - start
nonexonic['end'] = nonexonic['end'] - start

nonexonic.head()

In [None]:
## Recombination map
## Here I use msprime function: msprime.RateMap.read_hapmap to load the recombination map
rmap = msprime.RateMap.read_hapmap('test-data/chr22.b38.gmap', position_col=0, map_col=2)
print(rmap)

In [None]:
## we can take a slice from the map to get the coordinates in the sampled region
# with set trim=True 
rmap = rmap.slice(left=start, right=end, trim=True)
print(rmap)

# Set simulation 

## Neutral regions

In [None]:
## we will label the mutations according to the functional category

mut_labels = {
    'neutral': 0,
    'missense': 1,
    'synonymous': 2,
    'LOF': 3,
}

In [None]:
nonexonic.head()

In [None]:
# Construct the neutral regions from the non-exonic intervals
nregions = []
for _, noexon in nonexonic.iterrows():
    nregions.append(
        fwdpy11.Region(beg=noexon.start, end=noexon.end, weight=1, label=mut_labels['neutral'])
    
    )


In [None]:
nregions[:10]

## Distributions of effect sizes | Selected regions

- For now I use Aaron's infered DFEs [see here](https://moments.readthedocs.io/en/main/modules/dfe.html#all-data).
- The weights establish the relative probability that a mutation comes from a given region.

**NOTE:**

- When multiple “sregion” objects are used, the default behavior is to multiply the input weight by end-beg:
- The weights should depend on the mutation type (i.e. synonymous, missense). We could make the weight
proportional to ml.

**Comments:**

- The selection and dominance should also depend on the mutation class. We'll need to pick an appropiate DFEs for each case.


### DFE for missense variants

The parameters that were fit are alpha and beta (or shape and scale) of the gamma distribution.

- Ne = 11372.91
- shape: 0.1596
- scale: 2332.3

The mean of the gamma distribution is $\alpha\beta$. I need to divide by 2Ne.


In [None]:
(shape * scale)

In [None]:
Ne = 11372.91
shape = 0.1596
scale = 2332.3
mean_s = (shape * scale) / (2 * Ne)
mean_s

In [None]:
# This will be the DFE for missense variants
fwdpy11.GammaS(beg=0, end=1, weight=1, mean=mean_s, shape_parameter=shape, h=1)

In [None]:
# Define the Weights
total_weigth = ml['synonymous'] + ml['missense'] + ml['LOF']


w_syn = ml['synonymous'] / total_weigth
w_mis = ml['missense'] / total_weigth
w_lof = ml['LOF'] / total_weigth

print(f'total weight: {total_weigth}\n\n\nsynonymous={w_syn}\nmissense={w_mis}\nlof={w_lof}')

In [None]:
# Construct the selected regions from the exonic intervals
sregions = []
for _, exon in exons.iterrows():
    # missense
    sregions.append(
        fwdpy11.GammaS(
            beg=exon.start, end=exon.end, weight=w_mis,
            mean=mean_s, shape_parameter=shape,
            h=1,
            label=mut_labels['missense'],)
    
    )
    # synonymous
    #sregions.append(
    #    fwdpy11.ConstantS(beg=exon.start, end=exon.end, weight=w_syn, s=s, label=mut_labels['synonymous'])
    #
    #)
    # loss of function
    #sregions.append(
    #    fwdpy11.ConstantS(beg=exon.start, end=exon.end, weight=w_lof, s=s, label=mut_labels['LOF'])
    #
    #)

In [None]:
sregions[:5]

## Recombination

In [None]:
rmap

In [None]:
nrec = len(rmap) - 1

In [None]:
recregions = []
for i in range(nrec):
    recregions.append(
     fwdpy11.PoissonInterval(
         beg=rmap.left[i],
         end=rmap.right[i],
         mean=rmap.rate[i]
     )   
    )

In [None]:
recregions[:10]

## Rates

We need to specify the total rates

In [None]:
#  The neutral mutation rate, selected mutation rate, and total recombination rate, respectively.
neutral_ml = ml['noncoding']
selected_ml = ml['missense'] + ml['synonymous'] + ml['LOF']
# missesne
selected_ml = ml['missense']

# recomb_rate = ??? | I'm not sure how to set this value
rates = fwdpy11.MutationAndRecombinationRates(
    neutral_mutation_rate=neutral_ml,
    selected_mutation_rate=selected_ml,
    recombination_rate=None)


## Demography

To test the DFE I will use a constant size population model, this will run faster.

In [None]:
pop = fwdpy11.DiploidPopulation(N=5000, length=int(1e6))
pop.N
pop.tables.genome_length


## Setting up the parameters for a simulation


In [None]:
# the parameters that fwdpy11 needs to run the simulation
p = {
    "nregions": nregions,  # neutral mutations (none for now, can add after the fact)
    "gvalue": fwdpy11.Additive(2.0),  # fitness model
    "sregions": sregions, 
    "recregions": recregions,
    "rates": rates,
    "prune_selected": False,
    "demography": fwdpy11.DiscreteDemography(),  # pass the demographic model
    "simlen": 1000
}

In [None]:
params = fwdpy11.ModelParams(**p)

In [None]:
# run the simulation
# set up the random number generator
rng = fwdpy11.GSLrng(54321) 

In [None]:
# run the simulation
print('runnning simulation ...')
time1 = time.time()
fwdpy11.evolvets(
    rng, pop, params, simplification_interval=100, suppress_table_indexing=True
)
print("Simulation took", int(time.time() - time1), "seconds")

# simulation finished
print("Final population sizes =", pop.deme_sizes())

In [1]:
mkdir -p results

In [None]:
# save the simulation results
with gzip.open('results/sim-pop.gz', 'wb') as f:
    pop.pickle_to_file(f)